Re: [PATCH] libceph: change how "safe" callback is used

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Elder <elder@inktank.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>,
	"Yan, Zheng" <zheng.z.yan@intel.com>
Subject: Re: [PATCH] libceph: change how "safe" callback is used
Date: Wed, 17 Apr 2013 12:28:39 -0500	[thread overview]
Message-ID: <516EDBC7.6080008@inktank.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1304170830480.28750@cobra.newdream.net>

On 04/17/2013 10:35 AM, Sage Weil wrote:
>>> > > This might not get called if we initially map to no osd... and later, if 
>>> > > the request gets kicked, it isn't called then.
>> > 
>> > You're right, we need to hook into the place a request gets
>> > actually sent for the first time.  That may require a small
>> > rearrangement of code to make that point accessible.
> I think this is slightly misinterpreting the purpose of the 'unsafe' list.  
> That list is used for fsync(fd) so that we wait for any pending requests.  
> That is really writes from the perspective of the syscall/user/vfs, not 
> the backend writes to the osd.  If we have a write that is not sent yet 
> because the osd is down and fsync that fd, we still want to wait until 
> some osd comes online and the write actually happens.  The 'unsafe' start 
> is when ceph_sync_write() is called... not the when the request is sent 
> to the backend.

I'd argue that the notion of a request being unsafe is an osd
request concept--more general than just this one file system
case.  It is useful to know that we've issued something that
might change osd state, and then to know the state change is
durable.  That aligns well with what is needed here:  you want
to wait until you know the write request is durable (even if
you don't care whether the request has been sent or not).

From the perspective of the caller of ceph_sync_write(), it
doesn't matter that it's the osd is deciding when it's safe or
not.  ceph_sync_write() waits for the osd request to complete
before returning.  Once it returns, the unsafe period is
over (and may have never even existed if there was an error).
If sync_write_wait() should wait for something beyond just
the durability of writes then it should be waiting something
different than osd request completions.

I hope I'm not off base here.

My second version does the "request is unsafe" callback
exactly once, when it actually gets sent the first time
rather than when the request gets started.

					-Alex

> That being the case, it will become unsafe exactly once, and safe again 
> exactly once (on success, or abort).

next prev parent reply	other threads:[~2013-04-17 17:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-15 16:20 [PATCH] libceph: change how "safe" callback is used Alex Elder
2013-04-17  3:26 ` Sage Weil
2013-04-17 13:32   ` Alex Elder
2013-04-17 15:35     ` Sage Weil
2013-04-17 17:28       ` Alex Elder [this message]
2013-04-17 17:41         ` Sage Weil
2013-04-17 13:32 ` [PATCH, v2] " Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516EDBC7.6080008@inktank.com \
    --to=elder@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.