From: Casey Bodley <casey@linuxbox.com>
To: Gregory Farnum <greg@inktank.com>
Cc: "Matt W. Benjamin" <matt@linuxbox.com>,
ceph-devel@vger.kernel.org, aemerson <aemerson@linuxbox.com>,
peter honeyman <peter.honeyman@gmail.com>,
Sage Weil <sage@inktank.com>
Subject: Re: parent xattrs on file objects
Date: Wed, 17 Oct 2012 17:51:24 -0400 (EDT) [thread overview]
Message-ID: <1471957339.149.1350510684603.JavaMail.root@thunderbeast.private.linuxbox.com> (raw)
In-Reply-To: <937776470.145.1350510476081.JavaMail.root@thunderbeast.private.linuxbox.com>
Hi Greg,
In this case where an inode is created on mds.a and exported to mds.b, there is a potential race on mds.b between a subsequent lookup-by-ino and the primary link actually making it into the inode container.
Our tentative solution was to rely on the way InoTable breaks up the range of inode numbers based on mds nodeid. So when a lookup on the inode container fails, we can determine which mds would have allocated that inode number and attempt to find the inode there. The originating mds.a should always find the inode in its cache while it's pinned for export. Depending on whether the inode is found on mds.a, the lookup-by-ino on mds.b either returns failure or waits for the import to finish.
Casey
----- Original Message -----
From: "Gregory Farnum" <greg@inktank.com>
To: "Casey Bodley" <casey@linuxbox.com>
Cc: "Matt W. Benjamin" <matt@linuxbox.com>, ceph-devel@vger.kernel.org, "aemerson" <aemerson@linuxbox.com>, "peter honeyman" <peter.honeyman@gmail.com>, "Sage Weil" <sage@inktank.com>
Sent: Wednesday, October 17, 2012 4:18:04 PM
Subject: Re: parent xattrs on file objects
On Wed, Oct 17, 2012 at 12:40 PM, Casey Bodley <casey@linuxbox.com> wrote:
> To expand on what Matt said, we're also trying to address this issue of lookups by inode number for use with NFS.
>
> The design we've been exploring is to create a single system inode, designated the 'inode container' directory, which stores the primary links to all inodes in the filesystem. These links are named by their inode number to satisfy lookups and obviate the need for an anchor table. This design allows the inode container to make use of existing directory fragmentation and load balancing to distribute the inodes over the MDS cluster.
>
> When a new file is created, it then adds two links: a primary link into the inode container, and a remote link into the filesystem namespace. In the case where the parent directory fragment's authority is different than the corresponding inode container fragment's, it is created in the parent directory then exported to the inode container via an asynchronous slave request.
>
> We welcome additional discussion, both on this design specifically and on the general topic of scalable ino lookups.
So if the primary link isn't always in the "inode container", you must
be preserving the anchor table for this setup. Am I understanding that
correctly? Or is there some other mechanism for linking them that's
less expensive?
-Greg
next parent reply other threads:[~2012-10-17 21:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <937776470.145.1350510476081.JavaMail.root@thunderbeast.private.linuxbox.com>
2012-10-17 21:51 ` Casey Bodley [this message]
2012-10-17 22:04 ` parent xattrs on file objects Gregory Farnum
2012-10-17 22:15 ` Adam C. Emerson
2012-10-19 21:17 ` Sage Weil
[not found] <1743327214.12.1350731614461.JavaMail.root@thunderbeast.private.linuxbox.com>
2012-10-20 12:09 ` Matt W. Benjamin
2012-10-22 21:27 ` Sage Weil
[not found] <2054435269.116.1350502651797.JavaMail.root@thunderbeast.private.linuxbox.com>
2012-10-17 19:40 ` Casey Bodley
2012-10-17 19:53 ` Sage Weil
2012-10-17 20:18 ` Gregory Farnum
2012-10-16 21:17 Sage Weil
2012-10-16 21:26 ` Gregory Farnum
2012-10-16 21:35 ` Sage Weil
2012-10-16 21:47 ` Yehuda Sadeh Weinraub
2012-10-16 21:54 ` Gregory Farnum
2012-10-16 21:32 ` Mark Nelson
2012-10-16 21:35 ` Matt W. Benjamin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1471957339.149.1350510684603.JavaMail.root@thunderbeast.private.linuxbox.com \
--to=casey@linuxbox.com \
--cc=aemerson@linuxbox.com \
--cc=ceph-devel@vger.kernel.org \
--cc=greg@inktank.com \
--cc=matt@linuxbox.com \
--cc=peter.honeyman@gmail.com \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox