* OSD failure on start @ 2013-02-13 19:57 Mandell Degerness 2013-02-13 22:08 ` Mike Dawson 0 siblings, 1 reply; 4+ messages in thread From: Mandell Degerness @ 2013-02-13 19:57 UTC (permalink / raw) To: ceph-devel@vger.kernel.org I'm getting this error on one of my OSD's when I try to start it. I can gather more complete log data if no-one recognizes the error from this: Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD journal mode: btrfs not detected Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 bytes, block size 4096 bytes, directio = 1, aio = 0 Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 bytes, block size 4096 bytes, directio = 1, aio = 0 Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13 19:30:04.721278 osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl)) ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0] 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3] 3: (main()+0x4462) [0x7f4f6096d182] 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d] 5: (()+0x16e829) [0x7f4f60968829] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events --- ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start 2013-02-13 19:57 OSD failure on start Mandell Degerness @ 2013-02-13 22:08 ` Mike Dawson 2013-02-13 22:47 ` Mandell Degerness 0 siblings, 1 reply; 4+ messages in thread From: Mike Dawson @ 2013-02-13 22:08 UTC (permalink / raw) To: Mandell Degerness; +Cc: ceph-devel@vger.kernel.org Mandell, A few of us saw a similar failure on 0.56.1. http://tracker.ceph.com/issues/3770 Sam Just patched the issue for 0.56.2. My understanding is Sam's patch prevents the issue in the future, but doesn't repair a previously damaged OSD. If you have good replication (or a good backup), I have had luck removing the affected OSD, formatting, and re-adding it. I believe Sam may have a manual process to fix it if you can't wipe this OSD. Good Luck, Mike On 2/13/2013 2:57 PM, Mandell Degerness wrote: > I'm getting this error on one of my OSD's when I try to start it. > > I can gather more complete log data if no-one recognizes the error from this: > > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847 > 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <> > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147 > 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD > journal mode: btrfs not detected > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965 > 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 > bytes, block size 4096 bytes, directio = 1, aio = 0 > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091 > 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 > bytes, block size 4096 bytes, directio = 1, aio = 0 > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871 > 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef > OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13 > 19:30:04.721278 > osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl)) > > ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) > 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0] > 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3] > 3: (main()+0x4462) [0x7f4f6096d182] > 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d] > 5: (()+0x16e829) [0x7f4f60968829] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events --- > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start 2013-02-13 22:08 ` Mike Dawson @ 2013-02-13 22:47 ` Mandell Degerness 2013-02-14 2:52 ` Samuel Just 0 siblings, 1 reply; 4+ messages in thread From: Mandell Degerness @ 2013-02-13 22:47 UTC (permalink / raw) To: Mike Dawson; +Cc: ceph-devel@vger.kernel.org Thanks. I'm glad to hear it is fixed in new version. Wiping the OSD worked. On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson <mike.dawson@scholarstack.com> wrote: > Mandell, > > A few of us saw a similar failure on 0.56.1. > > http://tracker.ceph.com/issues/3770 > > Sam Just patched the issue for 0.56.2. My understanding is Sam's patch > prevents the issue in the future, but doesn't repair a previously damaged > OSD. > > If you have good replication (or a good backup), I have had luck removing > the affected OSD, formatting, and re-adding it. I believe Sam may have a > manual process to fix it if you can't wipe this OSD. > > Good Luck, > Mike > > > > On 2/13/2013 2:57 PM, Mandell Degerness wrote: >> >> I'm getting this error on one of my OSD's when I try to start it. >> >> I can gather more complete log data if no-one recognizes the error from >> this: >> >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847 >> 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <> >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147 >> 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD >> journal mode: btrfs not detected >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965 >> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 >> bytes, block size 4096 bytes, directio = 1, aio = 0 >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091 >> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 >> bytes, block size 4096 bytes, directio = 1, aio = 0 >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871 >> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef >> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13 >> 19:30:04.721278 >> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl)) >> >> ceph version 0.48.1argonaut >> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) >> 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0] >> 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3] >> 3: (main()+0x4462) [0x7f4f6096d182] >> 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d] >> 5: (()+0x16e829) [0x7f4f60968829] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent >> events --- >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start 2013-02-13 22:47 ` Mandell Degerness @ 2013-02-14 2:52 ` Samuel Just 0 siblings, 0 replies; 4+ messages in thread From: Samuel Just @ 2013-02-14 2:52 UTC (permalink / raw) To: Mandell Degerness; +Cc: Mike Dawson, ceph-devel@vger.kernel.org Actually, that bug did not exist in 48.1, must have been something different. Was the the node you had the trouble with the pg logs on? -Sam On Wed, Feb 13, 2013 at 2:47 PM, Mandell Degerness <mandell@pistoncloud.com> wrote: > Thanks. I'm glad to hear it is fixed in new version. Wiping the OSD worked. > > On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson > <mike.dawson@scholarstack.com> wrote: >> Mandell, >> >> A few of us saw a similar failure on 0.56.1. >> >> http://tracker.ceph.com/issues/3770 >> >> Sam Just patched the issue for 0.56.2. My understanding is Sam's patch >> prevents the issue in the future, but doesn't repair a previously damaged >> OSD. >> >> If you have good replication (or a good backup), I have had luck removing >> the affected OSD, formatting, and re-adding it. I believe Sam may have a >> manual process to fix it if you can't wipe this OSD. >> >> Good Luck, >> Mike >> >> >> >> On 2/13/2013 2:57 PM, Mandell Degerness wrote: >>> >>> I'm getting this error on one of my OSD's when I try to start it. >>> >>> I can gather more complete log data if no-one recognizes the error from >>> this: >>> >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847 >>> 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <> >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147 >>> 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD >>> journal mode: btrfs not detected >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965 >>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 >>> bytes, block size 4096 bytes, directio = 1, aio = 0 >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091 >>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592 >>> bytes, block size 4096 bytes, directio = 1, aio = 0 >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871 >>> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef >>> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13 >>> 19:30:04.721278 >>> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl)) >>> >>> ceph version 0.48.1argonaut >>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) >>> 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0] >>> 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3] >>> 3: (main()+0x4462) [0x7f4f6096d182] >>> 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d] >>> 5: (()+0x16e829) [0x7f4f60968829] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent >>> events --- >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-14 2:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-13 19:57 OSD failure on start Mandell Degerness 2013-02-13 22:08 ` Mike Dawson 2013-02-13 22:47 ` Mandell Degerness 2013-02-14 2:52 ` Samuel Just
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.