All of lore.kernel.org
 help / color / mirror / Atom feed
* ceph-objectstore-tool import failures
@ 2015-06-20  2:25 Sage Weil
  2015-06-20  2:38 ` David Zafman
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2015-06-20  2:25 UTC (permalink / raw)
  To: dzafman, ceph-devel

Hey David,

On this run

	/a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648

ceph-objectstore-tool is failing to import a pg because the pool doesn't 
exist.  It looks like the thrasher is doing an export+import and racing 
with a test that is tearing down a pool.  The crash is

 ceph version 9.0.1-955-ge274efa 
(e274efa450e99a68c02bcb713c8837d7809f1ec3)
 1: ceph-objectstore-tool() [0xa26335]
 2: (()+0xfcb0) [0x7f10cef18cb0]
 3: (gsignal()+0x35) [0x7f10cd5af425]
 4: (abort()+0x17b) [0x7f10cd5b2b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f10cdf0269d]
 6: (()+0xb5846) [0x7f10cdf00846]
 7: (()+0xb5873) [0x7f10cdf00873]
 8: (()+0xb596e) [0x7f10cdf0096e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x259) [0xb0ce09]
 10: (ObjectStoreTool::get_object(ObjectStore*, coll_t, 
ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
 11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, 
std::string)+0x13dd) [0x64a62d]
 12: (main()+0x3017) [0x632037]
 13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
 14: ceph-objectstore-tool() [0x639119]

I don't think this is related to my branch.. but maybe?  Have you seen 
this?  I rebased onto latest master yesterday.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-06-20  2:25 ceph-objectstore-tool import failures Sage Weil
@ 2015-06-20  2:38 ` David Zafman
  2015-06-20  3:16   ` David Zafman
  0 siblings, 1 reply; 11+ messages in thread
From: David Zafman @ 2015-06-20  2:38 UTC (permalink / raw)
  To: Sage Weil, ceph-devel


Have not seen this as an assert before.  Given the code below in 
do_import() of master branch the assert is impossible (?).

   if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
     cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" << 
std::endl;
     // Special exit code for this error, used by test code
     return 10;  // Positive return means exit status
   }


David

On 6/19/15 7:25 PM, Sage Weil wrote:
> Hey David,
>
> On this run
>
> 	/a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
>
> ceph-objectstore-tool is failing to import a pg because the pool doesn't
> exist.  It looks like the thrasher is doing an export+import and racing
> with a test that is tearing down a pool.  The crash is
>
>   ceph version 9.0.1-955-ge274efa
> (e274efa450e99a68c02bcb713c8837d7809f1ec3)
>   1: ceph-objectstore-tool() [0xa26335]
>   2: (()+0xfcb0) [0x7f10cef18cb0]
>   3: (gsignal()+0x35) [0x7f10cd5af425]
>   4: (abort()+0x17b) [0x7f10cd5b2b8b]
>   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f10cdf0269d]
>   6: (()+0xb5846) [0x7f10cdf00846]
>   7: (()+0xb5873) [0x7f10cdf00873]
>   8: (()+0xb596e) [0x7f10cdf0096e]
>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb0ce09]
>   10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
>   11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> std::string)+0x13dd) [0x64a62d]
>   12: (main()+0x3017) [0x632037]
>   13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
>   14: ceph-objectstore-tool() [0x639119]
>
> I don't think this is related to my branch.. but maybe?  Have you seen
> this?  I rebased onto latest master yesterday.
>
> sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-06-20  2:38 ` David Zafman
@ 2015-06-20  3:16   ` David Zafman
  2015-07-06 20:28     ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: David Zafman @ 2015-06-20  3:16 UTC (permalink / raw)
  To: Sage Weil, ceph-devel


This ghobject_t which has a pool of -3 is part of the export.   This 
caused the assert:

Read -3/1c/temp_recovering_1.1c_33'50_39_head/head

This was added by "osd: use per-pool temp poolid for temp objects" 
18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.

You should skip it on export or recreate it on import with special handling.

David

On 6/19/15 7:38 PM, David Zafman wrote:
>
> Have not seen this as an assert before.  Given the code below in 
> do_import() of master branch the assert is impossible (?).
>
>   if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
>     cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" << 
> std::endl;
>     // Special exit code for this error, used by test code
>     return 10;  // Positive return means exit status
>   }
>
>
> David
>
> On 6/19/15 7:25 PM, Sage Weil wrote:
>> Hey David,
>>
>> On this run
>>
>>     /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
>>
>> ceph-objectstore-tool is failing to import a pg because the pool doesn't
>> exist.  It looks like the thrasher is doing an export+import and racing
>> with a test that is tearing down a pool.  The crash is
>>
>>   ceph version 9.0.1-955-ge274efa
>> (e274efa450e99a68c02bcb713c8837d7809f1ec3)
>>   1: ceph-objectstore-tool() [0xa26335]
>>   2: (()+0xfcb0) [0x7f10cef18cb0]
>>   3: (gsignal()+0x35) [0x7f10cd5af425]
>>   4: (abort()+0x17b) [0x7f10cd5b2b8b]
>>   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f10cdf0269d]
>>   6: (()+0xb5846) [0x7f10cdf00846]
>>   7: (()+0xb5873) [0x7f10cdf00873]
>>   8: (()+0xb596e) [0x7f10cdf0096e]
>>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x259) [0xb0ce09]
>>   10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>> ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
>>   11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>> std::string)+0x13dd) [0x64a62d]
>>   12: (main()+0x3017) [0x632037]
>>   13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
>>   14: ceph-objectstore-tool() [0x639119]
>>
>> I don't think this is related to my branch.. but maybe?  Have you seen
>> this?  I rebased onto latest master yesterday.
>>
>> sage
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-06-20  3:16   ` David Zafman
@ 2015-07-06 20:28     ` Sage Weil
  2015-07-06 23:32       ` David Zafman
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2015-07-06 20:28 UTC (permalink / raw)
  To: David Zafman, sjust; +Cc: ceph-devel

On Fri, 19 Jun 2015, David Zafman wrote:
> This ghobject_t which has a pool of -3 is part of the export.   This caused
> the assert:
> 
> Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
> 
> This was added by "osd: use per-pool temp poolid for temp objects"
> 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
> 
> You should skip it on export or recreate it on import with special handling.

Ah, that makes sense.  I think we should include these temp objects in the 
export, though, and make cot understand that they are part of the pool.  
We moved the "clear temp objects on startup" logic into teh OSD, which I 
think will be useful for e.g. multiobject transactions (where we'll want 
some objects that are internal/hidden to persist across peering intervals 
and restarts).

Looking at your wip-temp-zafman, I think the first patch needs to be 
dropped: include the temp objects, and I assume the meta one (which 
has the pg log and other critical pg metadata).

Not sure where to change cot to handle the temp objects though?

Thanks!
sage




> 
> David
> 
> On 6/19/15 7:38 PM, David Zafman wrote:
> > 
> > Have not seen this as an assert before.  Given the code below in do_import()
> > of master branch the assert is impossible (?).
> > 
> >   if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
> >     cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" << std::endl;
> >     // Special exit code for this error, used by test code
> >     return 10;  // Positive return means exit status
> >   }
> > 
> > 
> > David
> > 
> > On 6/19/15 7:25 PM, Sage Weil wrote:
> > > Hey David,
> > > 
> > > On this run
> > > 
> > >     /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
> > > 
> > > ceph-objectstore-tool is failing to import a pg because the pool doesn't
> > > exist.  It looks like the thrasher is doing an export+import and racing
> > > with a test that is tearing down a pool.  The crash is
> > > 
> > >   ceph version 9.0.1-955-ge274efa
> > > (e274efa450e99a68c02bcb713c8837d7809f1ec3)
> > >   1: ceph-objectstore-tool() [0xa26335]
> > >   2: (()+0xfcb0) [0x7f10cef18cb0]
> > >   3: (gsignal()+0x35) [0x7f10cd5af425]
> > >   4: (abort()+0x17b) [0x7f10cd5b2b8b]
> > >   5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f10cdf0269d]
> > >   6: (()+0xb5846) [0x7f10cdf00846]
> > >   7: (()+0xb5873) [0x7f10cdf00873]
> > >   8: (()+0xb596e) [0x7f10cdf0096e]
> > >   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x259) [0xb0ce09]
> > >   10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> > > ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
> > >   11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> > > std::string)+0x13dd) [0x64a62d]
> > >   12: (main()+0x3017) [0x632037]
> > >   13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
> > >   14: ceph-objectstore-tool() [0x639119]
> > > 
> > > I don't think this is related to my branch.. but maybe?  Have you seen
> > > this?  I rebased onto latest master yesterday.
> > > 
> > > sage
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-06 20:28     ` Sage Weil
@ 2015-07-06 23:32       ` David Zafman
  2015-07-07 17:00         ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: David Zafman @ 2015-07-06 23:32 UTC (permalink / raw)
  To: Sage Weil, sjust; +Cc: ceph-devel


Why import temp objects when clear_temp_objects() will just remove it on 
osd start-up?

If we need the temp objects for replay purposes, does it matter if a 
split has occurred after the original export happened?

Or can we  just import all temporary objects without regards to split 
and assume that after replay the clear_temp_objects() will
clean them up?

David


On 7/6/15 1:28 PM, Sage Weil wrote:
> On Fri, 19 Jun 2015, David Zafman wrote:
>> This ghobject_t which has a pool of -3 is part of the export.   This caused
>> the assert:
>>
>> Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
>>
>> This was added by "osd: use per-pool temp poolid for temp objects"
>> 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
>>
>> You should skip it on export or recreate it on import with special handling.
> Ah, that makes sense.  I think we should include these temp objects in the
> export, though, and make cot understand that they are part of the pool.
> We moved the "clear temp objects on startup" logic into teh OSD, which I
> think will be useful for e.g. multiobject transactions (where we'll want
> some objects that are internal/hidden to persist across peering intervals
> and restarts).
>
> Looking at your wip-temp-zafman, I think the first patch needs to be
> dropped: include the temp objects, and I assume the meta one (which
> has the pg log and other critical pg metadata).
>
> Not sure where to change cot to handle the temp objects though?
>
> Thanks!
> sage
>
>
>
>
>> David
>>
>> On 6/19/15 7:38 PM, David Zafman wrote:
>>> Have not seen this as an assert before.  Given the code below in do_import()
>>> of master branch the assert is impossible (?).
>>>
>>>    if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
>>>      cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" << std::endl;
>>>      // Special exit code for this error, used by test code
>>>      return 10;  // Positive return means exit status
>>>    }
>>>
>>>
>>> David
>>>
>>> On 6/19/15 7:25 PM, Sage Weil wrote:
>>>> Hey David,
>>>>
>>>> On this run
>>>>
>>>>      /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
>>>>
>>>> ceph-objectstore-tool is failing to import a pg because the pool doesn't
>>>> exist.  It looks like the thrasher is doing an export+import and racing
>>>> with a test that is tearing down a pool.  The crash is
>>>>
>>>>    ceph version 9.0.1-955-ge274efa
>>>> (e274efa450e99a68c02bcb713c8837d7809f1ec3)
>>>>    1: ceph-objectstore-tool() [0xa26335]
>>>>    2: (()+0xfcb0) [0x7f10cef18cb0]
>>>>    3: (gsignal()+0x35) [0x7f10cd5af425]
>>>>    4: (abort()+0x17b) [0x7f10cd5b2b8b]
>>>>    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f10cdf0269d]
>>>>    6: (()+0xb5846) [0x7f10cdf00846]
>>>>    7: (()+0xb5873) [0x7f10cdf00873]
>>>>    8: (()+0xb596e) [0x7f10cdf0096e]
>>>>    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x259) [0xb0ce09]
>>>>    10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>>>> ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
>>>>    11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>>>> std::string)+0x13dd) [0x64a62d]
>>>>    12: (main()+0x3017) [0x632037]
>>>>    13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
>>>>    14: ceph-objectstore-tool() [0x639119]
>>>>
>>>> I don't think this is related to my branch.. but maybe?  Have you seen
>>>> this?  I rebased onto latest master yesterday.
>>>>
>>>> sage
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-06 23:32       ` David Zafman
@ 2015-07-07 17:00         ` Sage Weil
  2015-07-07 17:12           ` Samuel Just
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2015-07-07 17:00 UTC (permalink / raw)
  To: David Zafman; +Cc: sjust, ceph-devel

On Mon, 6 Jul 2015, David Zafman wrote:
> Why import temp objects when clear_temp_objects() will just remove it on osd
> start-up?

For now we could get away with skipping them, but I suspect in the future 
there will be cases where we want to preserve them across restarts (for 
example, when recording multi-object transactions that are not yet 
committed).

> If we need the temp objects for replay purposes, does it matter if a split has
> occurred after the original export happened?

The replay should happen before the export... it's below the ObjectStore 
interface, so I don't think it matters here.  I'm not sure about the split 
implications, though.  Does the export/import have to do a split, or does 
it let the OSD do that after it's imported?

sage

> Or can we  just import all temporary objects without regards to split and
> assume that after replay the clear_temp_objects() will
> clean them up?
> 
> David
> 
> 
> On 7/6/15 1:28 PM, Sage Weil wrote:
> > On Fri, 19 Jun 2015, David Zafman wrote:
> > > This ghobject_t which has a pool of -3 is part of the export.   This
> > > caused
> > > the assert:
> > > 
> > > Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
> > > 
> > > This was added by "osd: use per-pool temp poolid for temp objects"
> > > 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
> > > 
> > > You should skip it on export or recreate it on import with special
> > > handling.
> > Ah, that makes sense.  I think we should include these temp objects in the
> > export, though, and make cot understand that they are part of the pool.
> > We moved the "clear temp objects on startup" logic into teh OSD, which I
> > think will be useful for e.g. multiobject transactions (where we'll want
> > some objects that are internal/hidden to persist across peering intervals
> > and restarts).
> > 
> > Looking at your wip-temp-zafman, I think the first patch needs to be
> > dropped: include the temp objects, and I assume the meta one (which
> > has the pg log and other critical pg metadata).
> > 
> > Not sure where to change cot to handle the temp objects though?
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> > 
> > > David
> > > 
> > > On 6/19/15 7:38 PM, David Zafman wrote:
> > > > Have not seen this as an assert before.  Given the code below in
> > > > do_import()
> > > > of master branch the assert is impossible (?).
> > > > 
> > > >    if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
> > > >      cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
> > > > std::endl;
> > > >      // Special exit code for this error, used by test code
> > > >      return 10;  // Positive return means exit status
> > > >    }
> > > > 
> > > > 
> > > > David
> > > > 
> > > > On 6/19/15 7:25 PM, Sage Weil wrote:
> > > > > Hey David,
> > > > > 
> > > > > On this run
> > > > > 
> > > > >      /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
> > > > > 
> > > > > ceph-objectstore-tool is failing to import a pg because the pool
> > > > > doesn't
> > > > > exist.  It looks like the thrasher is doing an export+import and
> > > > > racing
> > > > > with a test that is tearing down a pool.  The crash is
> > > > > 
> > > > >    ceph version 9.0.1-955-ge274efa
> > > > > (e274efa450e99a68c02bcb713c8837d7809f1ec3)
> > > > >    1: ceph-objectstore-tool() [0xa26335]
> > > > >    2: (()+0xfcb0) [0x7f10cef18cb0]
> > > > >    3: (gsignal()+0x35) [0x7f10cd5af425]
> > > > >    4: (abort()+0x17b) [0x7f10cd5b2b8b]
> > > > >    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > [0x7f10cdf0269d]
> > > > >    6: (()+0xb5846) [0x7f10cdf00846]
> > > > >    7: (()+0xb5873) [0x7f10cdf00873]
> > > > >    8: (()+0xb596e) [0x7f10cdf0096e]
> > > > >    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > > > const*)+0x259) [0xb0ce09]
> > > > >    10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> > > > > ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
> > > > >    11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> > > > > std::string)+0x13dd) [0x64a62d]
> > > > >    12: (main()+0x3017) [0x632037]
> > > > >    13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
> > > > >    14: ceph-objectstore-tool() [0x639119]
> > > > > 
> > > > > I don't think this is related to my branch.. but maybe?  Have you seen
> > > > > this?  I rebased onto latest master yesterday.
> > > > > 
> > > > > sage
> > > > -- 
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > 
> > > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-07 17:00         ` Sage Weil
@ 2015-07-07 17:12           ` Samuel Just
  2015-07-07 17:22             ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Samuel Just @ 2015-07-07 17:12 UTC (permalink / raw)
  To: Sage Weil; +Cc: David Zafman, ceph-devel

If we think we'll want to persist some temp objects later on, probably better to go ahead and export/import them now.

Replay isn't relevant here since it happens at a lower level.  The ceph_objectstore_tool does do a kind of split during import since it needs to be able to handle the case where the pg was split between the import and the export.  In the event that temp objects need to persist across intervals, we'll have to solve the problem of splitting the temp objects in the osd as well as in the objectstore tool -- probably by creating a class of persistent temp objects with non-fake hashes taken from the corresponding non-temp object.
-Sam

----- Original Message -----
From: "Sage Weil" <sweil@redhat.com>
To: "David Zafman" <dzafman@redhat.com>
Cc: sjust@redhat.com, ceph-devel@vger.kernel.org
Sent: Tuesday, July 7, 2015 10:00:09 AM
Subject: Re: ceph-objectstore-tool import failures

On Mon, 6 Jul 2015, David Zafman wrote:
> Why import temp objects when clear_temp_objects() will just remove it on osd
> start-up?

For now we could get away with skipping them, but I suspect in the future 
there will be cases where we want to preserve them across restarts (for 
example, when recording multi-object transactions that are not yet 
committed).

> If we need the temp objects for replay purposes, does it matter if a split has
> occurred after the original export happened?

The replay should happen before the export... it's below the ObjectStore 
interface, so I don't think it matters here.  I'm not sure about the split 
implications, though.  Does the export/import have to do a split, or does 
it let the OSD do that after it's imported?

sage

> Or can we  just import all temporary objects without regards to split and
> assume that after replay the clear_temp_objects() will
> clean them up?
> 
> David
> 
> 
> On 7/6/15 1:28 PM, Sage Weil wrote:
> > On Fri, 19 Jun 2015, David Zafman wrote:
> > > This ghobject_t which has a pool of -3 is part of the export.   This
> > > caused
> > > the assert:
> > > 
> > > Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
> > > 
> > > This was added by "osd: use per-pool temp poolid for temp objects"
> > > 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
> > > 
> > > You should skip it on export or recreate it on import with special
> > > handling.
> > Ah, that makes sense.  I think we should include these temp objects in the
> > export, though, and make cot understand that they are part of the pool.
> > We moved the "clear temp objects on startup" logic into teh OSD, which I
> > think will be useful for e.g. multiobject transactions (where we'll want
> > some objects that are internal/hidden to persist across peering intervals
> > and restarts).
> > 
> > Looking at your wip-temp-zafman, I think the first patch needs to be
> > dropped: include the temp objects, and I assume the meta one (which
> > has the pg log and other critical pg metadata).
> > 
> > Not sure where to change cot to handle the temp objects though?
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> > 
> > > David
> > > 
> > > On 6/19/15 7:38 PM, David Zafman wrote:
> > > > Have not seen this as an assert before.  Given the code below in
> > > > do_import()
> > > > of master branch the assert is impossible (?).
> > > > 
> > > >    if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
> > > >      cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
> > > > std::endl;
> > > >      // Special exit code for this error, used by test code
> > > >      return 10;  // Positive return means exit status
> > > >    }
> > > > 
> > > > 
> > > > David
> > > > 
> > > > On 6/19/15 7:25 PM, Sage Weil wrote:
> > > > > Hey David,
> > > > > 
> > > > > On this run
> > > > > 
> > > > >      /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
> > > > > 
> > > > > ceph-objectstore-tool is failing to import a pg because the pool
> > > > > doesn't
> > > > > exist.  It looks like the thrasher is doing an export+import and
> > > > > racing
> > > > > with a test that is tearing down a pool.  The crash is
> > > > > 
> > > > >    ceph version 9.0.1-955-ge274efa
> > > > > (e274efa450e99a68c02bcb713c8837d7809f1ec3)
> > > > >    1: ceph-objectstore-tool() [0xa26335]
> > > > >    2: (()+0xfcb0) [0x7f10cef18cb0]
> > > > >    3: (gsignal()+0x35) [0x7f10cd5af425]
> > > > >    4: (abort()+0x17b) [0x7f10cd5b2b8b]
> > > > >    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > [0x7f10cdf0269d]
> > > > >    6: (()+0xb5846) [0x7f10cdf00846]
> > > > >    7: (()+0xb5873) [0x7f10cdf00873]
> > > > >    8: (()+0xb596e) [0x7f10cdf0096e]
> > > > >    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > > > const*)+0x259) [0xb0ce09]
> > > > >    10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> > > > > ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
> > > > >    11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> > > > > std::string)+0x13dd) [0x64a62d]
> > > > >    12: (main()+0x3017) [0x632037]
> > > > >    13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
> > > > >    14: ceph-objectstore-tool() [0x639119]
> > > > > 
> > > > > I don't think this is related to my branch.. but maybe?  Have you seen
> > > > > this?  I rebased onto latest master yesterday.
> > > > > 
> > > > > sage
> > > > -- 
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > 
> > > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-07 17:12           ` Samuel Just
@ 2015-07-07 17:22             ` Sage Weil
  2015-07-07 17:34               ` Samuel Just
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2015-07-07 17:22 UTC (permalink / raw)
  To: Samuel Just; +Cc: David Zafman, ceph-devel

On Tue, 7 Jul 2015, Samuel Just wrote:
> If we think we'll want to persist some temp objects later on, probably 
> better to go ahead and export/import them now.
> 
> Replay isn't relevant here since it happens at a lower level.  The 
> ceph_objectstore_tool does do a kind of split during import since it 
> needs to be able to handle the case where the pg was split between the 
> import and the export.  In the event that temp objects need to persist 
> across intervals, we'll have to solve the problem of splitting the temp 
> objects in the osd as well as in the objectstore tool -- probably by 
> creating a class of persistent temp objects with non-fake hashes taken 
> from the corresponding non-temp object.

Yeah.. I suspect the right thing to do is make the temp object hash match 
the eventual target hash.  We can do this now for the temp recovery 
objects (even though they'll be deleted by the OSD).  Presumably the same 
trick will work for recorded transaction objects too, or whatever 
else...

In any case, for now the cot split can just look at hash like it does with 
the non-temp objects and we're good, right?

sage


> -Sam
> 
> ----- Original Message -----
> From: "Sage Weil" <sweil@redhat.com>
> To: "David Zafman" <dzafman@redhat.com>
> Cc: sjust@redhat.com, ceph-devel@vger.kernel.org
> Sent: Tuesday, July 7, 2015 10:00:09 AM
> Subject: Re: ceph-objectstore-tool import failures
> 
> On Mon, 6 Jul 2015, David Zafman wrote:
> > Why import temp objects when clear_temp_objects() will just remove it on osd
> > start-up?
> 
> For now we could get away with skipping them, but I suspect in the future 
> there will be cases where we want to preserve them across restarts (for 
> example, when recording multi-object transactions that are not yet 
> committed).
> 
> > If we need the temp objects for replay purposes, does it matter if a split has
> > occurred after the original export happened?
> 
> The replay should happen before the export... it's below the ObjectStore 
> interface, so I don't think it matters here.  I'm not sure about the split 
> implications, though.  Does the export/import have to do a split, or does 
> it let the OSD do that after it's imported?
> 
> sage
> 
> > Or can we  just import all temporary objects without regards to split and
> > assume that after replay the clear_temp_objects() will
> > clean them up?
> > 
> > David
> > 
> > 
> > On 7/6/15 1:28 PM, Sage Weil wrote:
> > > On Fri, 19 Jun 2015, David Zafman wrote:
> > > > This ghobject_t which has a pool of -3 is part of the export.   This
> > > > caused
> > > > the assert:
> > > > 
> > > > Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
> > > > 
> > > > This was added by "osd: use per-pool temp poolid for temp objects"
> > > > 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
> > > > 
> > > > You should skip it on export or recreate it on import with special
> > > > handling.
> > > Ah, that makes sense.  I think we should include these temp objects in the
> > > export, though, and make cot understand that they are part of the pool.
> > > We moved the "clear temp objects on startup" logic into teh OSD, which I
> > > think will be useful for e.g. multiobject transactions (where we'll want
> > > some objects that are internal/hidden to persist across peering intervals
> > > and restarts).
> > > 
> > > Looking at your wip-temp-zafman, I think the first patch needs to be
> > > dropped: include the temp objects, and I assume the meta one (which
> > > has the pg log and other critical pg metadata).
> > > 
> > > Not sure where to change cot to handle the temp objects though?
> > > 
> > > Thanks!
> > > sage
> > > 
> > > 
> > > 
> > > 
> > > > David
> > > > 
> > > > On 6/19/15 7:38 PM, David Zafman wrote:
> > > > > Have not seen this as an assert before.  Given the code below in
> > > > > do_import()
> > > > > of master branch the assert is impossible (?).
> > > > > 
> > > > >    if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
> > > > >      cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
> > > > > std::endl;
> > > > >      // Special exit code for this error, used by test code
> > > > >      return 10;  // Positive return means exit status
> > > > >    }
> > > > > 
> > > > > 
> > > > > David
> > > > > 
> > > > > On 6/19/15 7:25 PM, Sage Weil wrote:
> > > > > > Hey David,
> > > > > > 
> > > > > > On this run
> > > > > > 
> > > > > >      /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
> > > > > > 
> > > > > > ceph-objectstore-tool is failing to import a pg because the pool
> > > > > > doesn't
> > > > > > exist.  It looks like the thrasher is doing an export+import and
> > > > > > racing
> > > > > > with a test that is tearing down a pool.  The crash is
> > > > > > 
> > > > > >    ceph version 9.0.1-955-ge274efa
> > > > > > (e274efa450e99a68c02bcb713c8837d7809f1ec3)
> > > > > >    1: ceph-objectstore-tool() [0xa26335]
> > > > > >    2: (()+0xfcb0) [0x7f10cef18cb0]
> > > > > >    3: (gsignal()+0x35) [0x7f10cd5af425]
> > > > > >    4: (abort()+0x17b) [0x7f10cd5b2b8b]
> > > > > >    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > > [0x7f10cdf0269d]
> > > > > >    6: (()+0xb5846) [0x7f10cdf00846]
> > > > > >    7: (()+0xb5873) [0x7f10cdf00873]
> > > > > >    8: (()+0xb596e) [0x7f10cdf0096e]
> > > > > >    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > > > > const*)+0x259) [0xb0ce09]
> > > > > >    10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> > > > > > ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
> > > > > >    11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> > > > > > std::string)+0x13dd) [0x64a62d]
> > > > > >    12: (main()+0x3017) [0x632037]
> > > > > >    13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
> > > > > >    14: ceph-objectstore-tool() [0x639119]
> > > > > > 
> > > > > > I don't think this is related to my branch.. but maybe?  Have you seen
> > > > > > this?  I rebased onto latest master yesterday.
> > > > > > 
> > > > > > sage
> > > > > -- 
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > 
> > > > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-07 17:22             ` Sage Weil
@ 2015-07-07 17:34               ` Samuel Just
  2015-07-07 20:56                 ` David Zafman
  0 siblings, 1 reply; 11+ messages in thread
From: Samuel Just @ 2015-07-07 17:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: David Zafman, ceph-devel

In the sense that the osd will still clear them, sure.  I've changed my mind though, probably best to not import or export them for now, and update the code to handle the persistent-temp objects when they exist (by looking at the hash).  We don't record anything about the in progress push, so the recovery temp objects at least aren't valuable to keep around.
-Sam

----- Original Message -----
From: "Sage Weil" <sweil@redhat.com>
To: "Samuel Just" <sjust@redhat.com>
Cc: "David Zafman" <dzafman@redhat.com>, ceph-devel@vger.kernel.org
Sent: Tuesday, July 7, 2015 10:22:32 AM
Subject: Re: ceph-objectstore-tool import failures

On Tue, 7 Jul 2015, Samuel Just wrote:
> If we think we'll want to persist some temp objects later on, probably 
> better to go ahead and export/import them now.
> 
> Replay isn't relevant here since it happens at a lower level.  The 
> ceph_objectstore_tool does do a kind of split during import since it 
> needs to be able to handle the case where the pg was split between the 
> import and the export.  In the event that temp objects need to persist 
> across intervals, we'll have to solve the problem of splitting the temp 
> objects in the osd as well as in the objectstore tool -- probably by 
> creating a class of persistent temp objects with non-fake hashes taken 
> from the corresponding non-temp object.

Yeah.. I suspect the right thing to do is make the temp object hash match 
the eventual target hash.  We can do this now for the temp recovery 
objects (even though they'll be deleted by the OSD).  Presumably the same 
trick will work for recorded transaction objects too, or whatever 
else...

In any case, for now the cot split can just look at hash like it does with 
the non-temp objects and we're good, right?

sage


> -Sam
> 
> ----- Original Message -----
> From: "Sage Weil" <sweil@redhat.com>
> To: "David Zafman" <dzafman@redhat.com>
> Cc: sjust@redhat.com, ceph-devel@vger.kernel.org
> Sent: Tuesday, July 7, 2015 10:00:09 AM
> Subject: Re: ceph-objectstore-tool import failures
> 
> On Mon, 6 Jul 2015, David Zafman wrote:
> > Why import temp objects when clear_temp_objects() will just remove it on osd
> > start-up?
> 
> For now we could get away with skipping them, but I suspect in the future 
> there will be cases where we want to preserve them across restarts (for 
> example, when recording multi-object transactions that are not yet 
> committed).
> 
> > If we need the temp objects for replay purposes, does it matter if a split has
> > occurred after the original export happened?
> 
> The replay should happen before the export... it's below the ObjectStore 
> interface, so I don't think it matters here.  I'm not sure about the split 
> implications, though.  Does the export/import have to do a split, or does 
> it let the OSD do that after it's imported?
> 
> sage
> 
> > Or can we  just import all temporary objects without regards to split and
> > assume that after replay the clear_temp_objects() will
> > clean them up?
> > 
> > David
> > 
> > 
> > On 7/6/15 1:28 PM, Sage Weil wrote:
> > > On Fri, 19 Jun 2015, David Zafman wrote:
> > > > This ghobject_t which has a pool of -3 is part of the export.   This
> > > > caused
> > > > the assert:
> > > > 
> > > > Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
> > > > 
> > > > This was added by "osd: use per-pool temp poolid for temp objects"
> > > > 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
> > > > 
> > > > You should skip it on export or recreate it on import with special
> > > > handling.
> > > Ah, that makes sense.  I think we should include these temp objects in the
> > > export, though, and make cot understand that they are part of the pool.
> > > We moved the "clear temp objects on startup" logic into teh OSD, which I
> > > think will be useful for e.g. multiobject transactions (where we'll want
> > > some objects that are internal/hidden to persist across peering intervals
> > > and restarts).
> > > 
> > > Looking at your wip-temp-zafman, I think the first patch needs to be
> > > dropped: include the temp objects, and I assume the meta one (which
> > > has the pg log and other critical pg metadata).
> > > 
> > > Not sure where to change cot to handle the temp objects though?
> > > 
> > > Thanks!
> > > sage
> > > 
> > > 
> > > 
> > > 
> > > > David
> > > > 
> > > > On 6/19/15 7:38 PM, David Zafman wrote:
> > > > > Have not seen this as an assert before.  Given the code below in
> > > > > do_import()
> > > > > of master branch the assert is impossible (?).
> > > > > 
> > > > >    if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
> > > > >      cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
> > > > > std::endl;
> > > > >      // Special exit code for this error, used by test code
> > > > >      return 10;  // Positive return means exit status
> > > > >    }
> > > > > 
> > > > > 
> > > > > David
> > > > > 
> > > > > On 6/19/15 7:25 PM, Sage Weil wrote:
> > > > > > Hey David,
> > > > > > 
> > > > > > On this run
> > > > > > 
> > > > > >      /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
> > > > > > 
> > > > > > ceph-objectstore-tool is failing to import a pg because the pool
> > > > > > doesn't
> > > > > > exist.  It looks like the thrasher is doing an export+import and
> > > > > > racing
> > > > > > with a test that is tearing down a pool.  The crash is
> > > > > > 
> > > > > >    ceph version 9.0.1-955-ge274efa
> > > > > > (e274efa450e99a68c02bcb713c8837d7809f1ec3)
> > > > > >    1: ceph-objectstore-tool() [0xa26335]
> > > > > >    2: (()+0xfcb0) [0x7f10cef18cb0]
> > > > > >    3: (gsignal()+0x35) [0x7f10cd5af425]
> > > > > >    4: (abort()+0x17b) [0x7f10cd5b2b8b]
> > > > > >    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
> > > > > > [0x7f10cdf0269d]
> > > > > >    6: (()+0xb5846) [0x7f10cdf00846]
> > > > > >    7: (()+0xb5873) [0x7f10cdf00873]
> > > > > >    8: (()+0xb596e) [0x7f10cdf0096e]
> > > > > >    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > > > > const*)+0x259) [0xb0ce09]
> > > > > >    10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
> > > > > > ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
> > > > > >    11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
> > > > > > std::string)+0x13dd) [0x64a62d]
> > > > > >    12: (main()+0x3017) [0x632037]
> > > > > >    13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
> > > > > >    14: ceph-objectstore-tool() [0x639119]
> > > > > > 
> > > > > > I don't think this is related to my branch.. but maybe?  Have you seen
> > > > > > this?  I rebased onto latest master yesterday.
> > > > > > 
> > > > > > sage
> > > > > -- 
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > 
> > > > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-07 17:34               ` Samuel Just
@ 2015-07-07 20:56                 ` David Zafman
  2015-07-07 21:10                   ` Samuel Just
  0 siblings, 1 reply; 11+ messages in thread
From: David Zafman @ 2015-07-07 20:56 UTC (permalink / raw)
  To: Samuel Just, Sage Weil; +Cc: ceph-devel


I'm going to skip exporting of temp objects in a new wip-temp-zafman 
branch.    Also, when we have persistent-temp objects, we'll probably 
need to enhance object_locator_to_pg() to adjust for negative pool numbers.

David

On 7/7/15 10:34 AM, Samuel Just wrote:
> In the sense that the osd will still clear them, sure.  I've changed my mind though, probably best to not import or export them for now, and update the code to handle the persistent-temp objects when they exist (by looking at the hash).  We don't record anything about the in progress push, so the recovery temp objects at least aren't valuable to keep around.
> -Sam
>
> ----- Original Message -----
> From: "Sage Weil" <sweil@redhat.com>
> To: "Samuel Just" <sjust@redhat.com>
> Cc: "David Zafman" <dzafman@redhat.com>, ceph-devel@vger.kernel.org
> Sent: Tuesday, July 7, 2015 10:22:32 AM
> Subject: Re: ceph-objectstore-tool import failures
>
> On Tue, 7 Jul 2015, Samuel Just wrote:
>> If we think we'll want to persist some temp objects later on, probably
>> better to go ahead and export/import them now.
>>
>> Replay isn't relevant here since it happens at a lower level.  The
>> ceph_objectstore_tool does do a kind of split during import since it
>> needs to be able to handle the case where the pg was split between the
>> import and the export.  In the event that temp objects need to persist
>> across intervals, we'll have to solve the problem of splitting the temp
>> objects in the osd as well as in the objectstore tool -- probably by
>> creating a class of persistent temp objects with non-fake hashes taken
>> from the corresponding non-temp object.
> Yeah.. I suspect the right thing to do is make the temp object hash match
> the eventual target hash.  We can do this now for the temp recovery
> objects (even though they'll be deleted by the OSD).  Presumably the same
> trick will work for recorded transaction objects too, or whatever
> else...
>
> In any case, for now the cot split can just look at hash like it does with
> the non-temp objects and we're good, right?
>
> sage
>
>
>> -Sam
>>
>> ----- Original Message -----
>> From: "Sage Weil" <sweil@redhat.com>
>> To: "David Zafman" <dzafman@redhat.com>
>> Cc: sjust@redhat.com, ceph-devel@vger.kernel.org
>> Sent: Tuesday, July 7, 2015 10:00:09 AM
>> Subject: Re: ceph-objectstore-tool import failures
>>
>> On Mon, 6 Jul 2015, David Zafman wrote:
>>> Why import temp objects when clear_temp_objects() will just remove it on osd
>>> start-up?
>> For now we could get away with skipping them, but I suspect in the future
>> there will be cases where we want to preserve them across restarts (for
>> example, when recording multi-object transactions that are not yet
>> committed).
>>
>>> If we need the temp objects for replay purposes, does it matter if a split has
>>> occurred after the original export happened?
>> The replay should happen before the export... it's below the ObjectStore
>> interface, so I don't think it matters here.  I'm not sure about the split
>> implications, though.  Does the export/import have to do a split, or does
>> it let the OSD do that after it's imported?
>>
>> sage
>>
>>> Or can we  just import all temporary objects without regards to split and
>>> assume that after replay the clear_temp_objects() will
>>> clean them up?
>>>
>>> David
>>>
>>>
>>> On 7/6/15 1:28 PM, Sage Weil wrote:
>>>> On Fri, 19 Jun 2015, David Zafman wrote:
>>>>> This ghobject_t which has a pool of -3 is part of the export.   This
>>>>> caused
>>>>> the assert:
>>>>>
>>>>> Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
>>>>>
>>>>> This was added by "osd: use per-pool temp poolid for temp objects"
>>>>> 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
>>>>>
>>>>> You should skip it on export or recreate it on import with special
>>>>> handling.
>>>> Ah, that makes sense.  I think we should include these temp objects in the
>>>> export, though, and make cot understand that they are part of the pool.
>>>> We moved the "clear temp objects on startup" logic into teh OSD, which I
>>>> think will be useful for e.g. multiobject transactions (where we'll want
>>>> some objects that are internal/hidden to persist across peering intervals
>>>> and restarts).
>>>>
>>>> Looking at your wip-temp-zafman, I think the first patch needs to be
>>>> dropped: include the temp objects, and I assume the meta one (which
>>>> has the pg log and other critical pg metadata).
>>>>
>>>> Not sure where to change cot to handle the temp objects though?
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>
>>>>
>>>>> David
>>>>>
>>>>> On 6/19/15 7:38 PM, David Zafman wrote:
>>>>>> Have not seen this as an assert before.  Given the code below in
>>>>>> do_import()
>>>>>> of master branch the assert is impossible (?).
>>>>>>
>>>>>>     if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
>>>>>>       cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
>>>>>> std::endl;
>>>>>>       // Special exit code for this error, used by test code
>>>>>>       return 10;  // Positive return means exit status
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 6/19/15 7:25 PM, Sage Weil wrote:
>>>>>>> Hey David,
>>>>>>>
>>>>>>> On this run
>>>>>>>
>>>>>>>       /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
>>>>>>>
>>>>>>> ceph-objectstore-tool is failing to import a pg because the pool
>>>>>>> doesn't
>>>>>>> exist.  It looks like the thrasher is doing an export+import and
>>>>>>> racing
>>>>>>> with a test that is tearing down a pool.  The crash is
>>>>>>>
>>>>>>>     ceph version 9.0.1-955-ge274efa
>>>>>>> (e274efa450e99a68c02bcb713c8837d7809f1ec3)
>>>>>>>     1: ceph-objectstore-tool() [0xa26335]
>>>>>>>     2: (()+0xfcb0) [0x7f10cef18cb0]
>>>>>>>     3: (gsignal()+0x35) [0x7f10cd5af425]
>>>>>>>     4: (abort()+0x17b) [0x7f10cd5b2b8b]
>>>>>>>     5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
>>>>>>> [0x7f10cdf0269d]
>>>>>>>     6: (()+0xb5846) [0x7f10cdf00846]
>>>>>>>     7: (()+0xb5873) [0x7f10cdf00873]
>>>>>>>     8: (()+0xb596e) [0x7f10cdf0096e]
>>>>>>>     9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>> const*)+0x259) [0xb0ce09]
>>>>>>>     10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>>>>>>> ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
>>>>>>>     11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>>>>>>> std::string)+0x13dd) [0x64a62d]
>>>>>>>     12: (main()+0x3017) [0x632037]
>>>>>>>     13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
>>>>>>>     14: ceph-objectstore-tool() [0x639119]
>>>>>>>
>>>>>>> I don't think this is related to my branch.. but maybe?  Have you seen
>>>>>>> this?  I rebased onto latest master yesterday.
>>>>>>>
>>>>>>> sage
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ceph-objectstore-tool import failures
  2015-07-07 20:56                 ` David Zafman
@ 2015-07-07 21:10                   ` Samuel Just
  0 siblings, 0 replies; 11+ messages in thread
From: Samuel Just @ 2015-07-07 21:10 UTC (permalink / raw)
  To: David Zafman; +Cc: Sage Weil, ceph-devel

Sounds reasonable to me.
-Sam

----- Original Message -----
From: "David Zafman" <dzafman@redhat.com>
To: "Samuel Just" <sjust@redhat.com>, "Sage Weil" <sweil@redhat.com>
Cc: ceph-devel@vger.kernel.org
Sent: Tuesday, July 7, 2015 1:56:36 PM
Subject: Re: ceph-objectstore-tool import failures


I'm going to skip exporting of temp objects in a new wip-temp-zafman 
branch.    Also, when we have persistent-temp objects, we'll probably 
need to enhance object_locator_to_pg() to adjust for negative pool numbers.

David

On 7/7/15 10:34 AM, Samuel Just wrote:
> In the sense that the osd will still clear them, sure.  I've changed my mind though, probably best to not import or export them for now, and update the code to handle the persistent-temp objects when they exist (by looking at the hash).  We don't record anything about the in progress push, so the recovery temp objects at least aren't valuable to keep around.
> -Sam
>
> ----- Original Message -----
> From: "Sage Weil" <sweil@redhat.com>
> To: "Samuel Just" <sjust@redhat.com>
> Cc: "David Zafman" <dzafman@redhat.com>, ceph-devel@vger.kernel.org
> Sent: Tuesday, July 7, 2015 10:22:32 AM
> Subject: Re: ceph-objectstore-tool import failures
>
> On Tue, 7 Jul 2015, Samuel Just wrote:
>> If we think we'll want to persist some temp objects later on, probably
>> better to go ahead and export/import them now.
>>
>> Replay isn't relevant here since it happens at a lower level.  The
>> ceph_objectstore_tool does do a kind of split during import since it
>> needs to be able to handle the case where the pg was split between the
>> import and the export.  In the event that temp objects need to persist
>> across intervals, we'll have to solve the problem of splitting the temp
>> objects in the osd as well as in the objectstore tool -- probably by
>> creating a class of persistent temp objects with non-fake hashes taken
>> from the corresponding non-temp object.
> Yeah.. I suspect the right thing to do is make the temp object hash match
> the eventual target hash.  We can do this now for the temp recovery
> objects (even though they'll be deleted by the OSD).  Presumably the same
> trick will work for recorded transaction objects too, or whatever
> else...
>
> In any case, for now the cot split can just look at hash like it does with
> the non-temp objects and we're good, right?
>
> sage
>
>
>> -Sam
>>
>> ----- Original Message -----
>> From: "Sage Weil" <sweil@redhat.com>
>> To: "David Zafman" <dzafman@redhat.com>
>> Cc: sjust@redhat.com, ceph-devel@vger.kernel.org
>> Sent: Tuesday, July 7, 2015 10:00:09 AM
>> Subject: Re: ceph-objectstore-tool import failures
>>
>> On Mon, 6 Jul 2015, David Zafman wrote:
>>> Why import temp objects when clear_temp_objects() will just remove it on osd
>>> start-up?
>> For now we could get away with skipping them, but I suspect in the future
>> there will be cases where we want to preserve them across restarts (for
>> example, when recording multi-object transactions that are not yet
>> committed).
>>
>>> If we need the temp objects for replay purposes, does it matter if a split has
>>> occurred after the original export happened?
>> The replay should happen before the export... it's below the ObjectStore
>> interface, so I don't think it matters here.  I'm not sure about the split
>> implications, though.  Does the export/import have to do a split, or does
>> it let the OSD do that after it's imported?
>>
>> sage
>>
>>> Or can we  just import all temporary objects without regards to split and
>>> assume that after replay the clear_temp_objects() will
>>> clean them up?
>>>
>>> David
>>>
>>>
>>> On 7/6/15 1:28 PM, Sage Weil wrote:
>>>> On Fri, 19 Jun 2015, David Zafman wrote:
>>>>> This ghobject_t which has a pool of -3 is part of the export.   This
>>>>> caused
>>>>> the assert:
>>>>>
>>>>> Read -3/1c/temp_recovering_1.1c_33'50_39_head/head
>>>>>
>>>>> This was added by "osd: use per-pool temp poolid for temp objects"
>>>>> 18eb2a5fea9b0af74a171c3717d1c91766b15f0c in your branch.
>>>>>
>>>>> You should skip it on export or recreate it on import with special
>>>>> handling.
>>>> Ah, that makes sense.  I think we should include these temp objects in the
>>>> export, though, and make cot understand that they are part of the pool.
>>>> We moved the "clear temp objects on startup" logic into teh OSD, which I
>>>> think will be useful for e.g. multiobject transactions (where we'll want
>>>> some objects that are internal/hidden to persist across peering intervals
>>>> and restarts).
>>>>
>>>> Looking at your wip-temp-zafman, I think the first patch needs to be
>>>> dropped: include the temp objects, and I assume the meta one (which
>>>> has the pg log and other critical pg metadata).
>>>>
>>>> Not sure where to change cot to handle the temp objects though?
>>>>
>>>> Thanks!
>>>> sage
>>>>
>>>>
>>>>
>>>>
>>>>> David
>>>>>
>>>>> On 6/19/15 7:38 PM, David Zafman wrote:
>>>>>> Have not seen this as an assert before.  Given the code below in
>>>>>> do_import()
>>>>>> of master branch the assert is impossible (?).
>>>>>>
>>>>>>     if (!curmap.have_pg_pool(pgid.pgid.m_pool)) {
>>>>>>       cerr << "Pool " << pgid.pgid.m_pool << " no longer exists" <<
>>>>>> std::endl;
>>>>>>       // Special exit code for this error, used by test code
>>>>>>       return 10;  // Positive return means exit status
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 6/19/15 7:25 PM, Sage Weil wrote:
>>>>>>> Hey David,
>>>>>>>
>>>>>>> On this run
>>>>>>>
>>>>>>>       /a/sage-2015-06-18_15:51:18-rados-wip-temp---basic-multi/939648
>>>>>>>
>>>>>>> ceph-objectstore-tool is failing to import a pg because the pool
>>>>>>> doesn't
>>>>>>> exist.  It looks like the thrasher is doing an export+import and
>>>>>>> racing
>>>>>>> with a test that is tearing down a pool.  The crash is
>>>>>>>
>>>>>>>     ceph version 9.0.1-955-ge274efa
>>>>>>> (e274efa450e99a68c02bcb713c8837d7809f1ec3)
>>>>>>>     1: ceph-objectstore-tool() [0xa26335]
>>>>>>>     2: (()+0xfcb0) [0x7f10cef18cb0]
>>>>>>>     3: (gsignal()+0x35) [0x7f10cd5af425]
>>>>>>>     4: (abort()+0x17b) [0x7f10cd5b2b8b]
>>>>>>>     5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
>>>>>>> [0x7f10cdf0269d]
>>>>>>>     6: (()+0xb5846) [0x7f10cdf00846]
>>>>>>>     7: (()+0xb5873) [0x7f10cdf00873]
>>>>>>>     8: (()+0xb596e) [0x7f10cdf0096e]
>>>>>>>     9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>>>> const*)+0x259) [0xb0ce09]
>>>>>>>     10: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>>>>>>> ceph::buffer::list&, OSDMap&, bool*)+0x143f) [0x64829f]
>>>>>>>     11: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>>>>>>> std::string)+0x13dd) [0x64a62d]
>>>>>>>     12: (main()+0x3017) [0x632037]
>>>>>>>     13: (__libc_start_main()+0xed) [0x7f10cd59a76d]
>>>>>>>     14: ceph-objectstore-tool() [0x639119]
>>>>>>>
>>>>>>> I don't think this is related to my branch.. but maybe?  Have you seen
>>>>>>> this?  I rebased onto latest master yesterday.
>>>>>>>
>>>>>>> sage
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-07 21:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-20  2:25 ceph-objectstore-tool import failures Sage Weil
2015-06-20  2:38 ` David Zafman
2015-06-20  3:16   ` David Zafman
2015-07-06 20:28     ` Sage Weil
2015-07-06 23:32       ` David Zafman
2015-07-07 17:00         ` Sage Weil
2015-07-07 17:12           ` Samuel Just
2015-07-07 17:22             ` Sage Weil
2015-07-07 17:34               ` Samuel Just
2015-07-07 20:56                 ` David Zafman
2015-07-07 21:10                   ` Samuel Just

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.