All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matt W. Benjamin" <matt@linuxbox.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>,
	Gregory Farnum <greg@inktank.com>
Subject: Re: ceph caps (Ganesha + Ceph pnfs)
Date: Tue, 8 Jan 2013 12:11:48 -0500 (EST)	[thread overview]
Message-ID: <546032248.22.1357665108393.JavaMail.root@thunderbeast.private.linuxbox.com> (raw)
In-Reply-To: <1538446321.14.1357663643915.JavaMail.root@thunderbeast.private.linuxbox.com>

Hi Sage,

----- "Sage Weil" <sage@inktank.com> wrote:

> 
> Your prevoius question made it sound like the DS was interacting with
> 
> libcephfs and dealing with (some) MDS capabilities.  Is that right?
> 
> I wonder if a much simpler approach would be to make a different fh
> format 
> or type, and just cram the inode and ceph object/block number in
> there.  
> Then the DS can just go direct to rados and avoid interacting with the
> fs 
> at all.  There are some additional semantics surrounding the truncate
> 
> metadata, but if we're lucky that can fit inside the fh, and the DS 
> servers could really just act like object targets--no libcephfs or MDS
> 
> interaction at all.

The current architecture gets the inode and block information to the DS 
reliably already without change to the Ceph fh--decoding steering information
happens at the MDS, rather than the DS.  It is important to us to ensure that
the total steering information be "finite and manageable," though, since
we need it to travel with the pNFS layout to the NFS client.

It is definitely the goal for the DS to go direct to rados.  I think the
outstanding issue may be limited to getting the MDS view of metadata up-to-date
after an extending or truncating i/o completes (at least in the immediate
term).

You may well be thinking, "sheesh, the client is doing out-of-band i/o, why doesn't it send the LAYOUTCOMMIT operation to the MDS to update the metadata."  The unsatisfactory answer is that currently (due to our use of the "files"
layout type) clients can insist that the DS do the commit.  The Linux kernel client does so for writes below a size threshold.

For the longer term, an option is shaping up that would allow us to use the objects layout (RFC 5664), which always commits layouts.  This discussion seems to be adding to the argument in support of switching, frankly.  My intuition is that it's preferable to let the DS jump layers to commit, though, even if we want to elide such commits in future (not just for expediency, but because the flexibility to do it seems like a win for the Ceph architecture).

> 
> Either way, to your first (original question), yes, we should expose a
> way 
> via libcephfs to take a reference on the capability that isn't
> released 
> until the layout is committed.  That should be pretty straightforward
> to 
> do, I think.

Excellent.

> 
> Hopefully my understanding is getting closer!
> 
> :) sage
> 

Indeed, thanks

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309

       reply	other threads:[~2013-01-08 17:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1538446321.14.1357663643915.JavaMail.root@thunderbeast.private.linuxbox.com>
2013-01-08 17:11 ` Matt W. Benjamin [this message]
2013-01-10  1:47   ` ceph caps (Ganesha + Ceph pnfs) Sage Weil
     [not found] <507490260.8.1357402950428.JavaMail.root@thunderbeast.private.linuxbox.com>
2013-01-05 16:23 ` Matt W. Benjamin
     [not found] <681824234.175.1357346630910.JavaMail.root@thunderbeast.private.linuxbox.com>
2013-01-05  0:51 ` Matt W. Benjamin
2013-01-05 16:36   ` Sage Weil
2013-01-05 17:29     ` Matt W. Benjamin
2013-01-08  0:23       ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546032248.22.1357665108393.JavaMail.root@thunderbeast.private.linuxbox.com \
    --to=matt@linuxbox.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@inktank.com \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.