All of lore.kernel.org
 help / color / mirror / Atom feed
* better temp objects
@ 2013-08-29 20:40 Sage Weil
  2013-08-29 22:29 ` Loic Dachary
  0 siblings, 1 reply; 2+ messages in thread
From: Sage Weil @ 2013-08-29 20:40 UTC (permalink / raw)
  To: sam.just; +Cc: greg.farnum, ceph-devel

Here's what I'm thinking:

Above the ObjectStore, we clear everything from temp on restart anyway.

We always write bits of a temp object in pieces, and then at the end 
copy/move it into the main collection.  

On replay, we should *only* do that final move/rename if the temp object 
was replayed in its entirety.

So:

- clear out temp collections in the filestore on startup.

- give temp objects unique names so that they don't collide with non-temp 
object fd caching (or whatever else).  for the DBObjectMap part there is 
probably some futzing though to make this work right.

- add a new 'move_from_temp' type operation that renames an object a temp 
(coll_t::is_temp()) collection to a non-temp one.  it will succeed iff the 
temp source exists.

- all operations that write to temp objects fail if the object doesn't 
already exist, except an explicit 'create' op

- all transactions the osds generate that write to temp object start with 
that explicit create.

The combination of these thigns means that we will only have a temp source 
for the move_from_temp op if it is complete.  Which I think means we can 
avoid any of the fsync guard stuff entirely.

The DBObjectMap I'm very fuzzy on, so I suspect that's where the tricky 
part will be.  Maybe the temp object name includes the intended hobject_t 
in it somewhere, or something, so that the rename can be reflected 
in leveldb at the end.

Thoughts?  Maybe we can do a quick hangout this afternoon to make sure 
this will work before I start putting it together...

sage

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-29 22:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-29 20:40 better temp objects Sage Weil
2013-08-29 22:29 ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.