cephfs quotas

All of lore.kernel.org
 help / color / mirror / Atom feed

* cephfs quotas
@ 2017-10-18 10:11 Jan Fajerski
  2017-10-18 11:27 ` John Spray
  2017-12-11 16:52 ` Luis Henriques
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Fajerski @ 2017-10-18 10:11 UTC (permalink / raw)
  To: ceph-devel; +Cc: Luis Henriques, Sage Weil, John Spray, Patrick Donnelly

Hi list,
A while ago this list saw a little discussion about quota support for the cephfs 
kernel client. The result was that instead of adding kernel support for the 
current implementation, a new quota implementation would be the preferred 
solution. Here we would like to propose such an implementation.

The objective is to implement quotas such that the implementation scales well, 
it can be implemented in ceph-fuse, the kernel client and libcephfs based 
clients and are enforceable without relying on client cooperation. The latter 
suggests that ceph daemon(s) must be involved in checking quota limits. We think 
that an approach as described in "Quota Enforcement for High-Performance 
Distributed Storage Systems" by Pollack et al. 
(https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good blueprint 
for such an implementation. This approach enforces quota limits with the help of 
vouchers. At a very high level this system works by one or more quota servers 
(in our case MDSs) issuing vouchers carrying (among other things) an expiration 
timestamp, an amount, a uid and a (cryptographic) signature to clients. An MDS 
can track how much space it has given out by tracking the vouchers it issues. A 
client can spend these vouchers on OSDs by sending them along with a write 
request. The OSD can verify a valid voucher by the signature. It will deduct the 
amount of written data from the voucher and might return the voucher if the 
voucher was not used up in full.  The client can return the remaining amount or 
it can give it back to the MDS.  Client failures and misbehaving clients are 
handled through a periodical reconciliation phase where the MDSs and OSDs 
reconciles issued and used vouchers. Vouchers held by a failed client can be 
detected by the expiration timestamp attached to the vouchers. Any unused and 
invalid vouchers can be reclaimed by an MDS. Clients that try to cheat by 
spending the same voucher on multiple OSDs are detected by the uid of the 
voucher. This means that adversarial clients can exceed the quota, but will be 
caught within a limited time period. The signature ensure that clients can not 
fabricate valid vouchers.  For a much better and much more detailed description 
please refer to the paper.

This approach has been implemented in Ceph before as described here 
http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however not 
find the source code for this and it seemingly didn't find its way in to the 
current code base.
The virtues of a protocol like this are that it can scale well, since there is 
no central entity that keeps a global state of the quotas, while still being 
able to enforce (somewhat) hard quotas.
On the downside there is a protocol overhead that impacts performance. Research 
and reports on implementations suggest that this overhead can be kept fairly 
small though (2% performance penalty or less). Furthermore additional state must 
be kept on MDSs, OSDs and clients. Such a solution also adds considerable 
complexity to all involved components.

We'd like to hear criticism and comments from the community, before a more 
in-depth CDM discussion.

Best,
Luis and Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 10:11 cephfs quotas Jan Fajerski
@ 2017-10-18 11:27 ` John Spray
  2017-10-18 12:32   ` Jan Fajerski
                     ` (2 more replies)
  2017-12-11 16:52 ` Luis Henriques
  1 sibling, 3 replies; 14+ messages in thread
From: John Spray @ 2017-10-18 11:27 UTC (permalink / raw)
  To: ceph-devel, Luis Henriques, Sage Weil, John Spray,
	Patrick Donnelly

On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
> Hi list,
> A while ago this list saw a little discussion about quota support for the
> cephfs kernel client. The result was that instead of adding kernel support
> for the current implementation, a new quota implementation would be the
> preferred solution. Here we would like to propose such an implementation.
>
> The objective is to implement quotas such that the implementation scales
> well, it can be implemented in ceph-fuse, the kernel client and libcephfs
> based clients and are enforceable without relying on client cooperation. The
> latter suggests that ceph daemon(s) must be involved in checking quota
> limits. We think that an approach as described in "Quota Enforcement for
> High-Performance Distributed Storage Systems" by Pollack et al.
> (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
> blueprint for such an implementation. This approach enforces quota limits
> with the help of vouchers. At a very high level this system works by one or
> more quota servers (in our case MDSs) issuing vouchers carrying (among other
> things) an expiration timestamp, an amount, a uid and a (cryptographic)
> signature to clients. An MDS can track how much space it has given out by
> tracking the vouchers it issues. A client can spend these vouchers on OSDs
> by sending them along with a write request. The OSD can verify a valid
> voucher by the signature. It will deduct the amount of written data from the
> voucher and might return the voucher if the voucher was not used up in full.
> The client can return the remaining amount or it can give it back to the
> MDS.  Client failures and misbehaving clients are handled through a
> periodical reconciliation phase where the MDSs and OSDs reconciles issued
> and used vouchers. Vouchers held by a failed client can be detected by the
> expiration timestamp attached to the vouchers. Any unused and invalid
> vouchers can be reclaimed by an MDS. Clients that try to cheat by spending
> the same voucher on multiple OSDs are detected by the uid of the voucher.
> This means that adversarial clients can exceed the quota, but will be caught
> within a limited time period. The signature ensure that clients can not
> fabricate valid vouchers.  For a much better and much more detailed
> description please refer to the paper.
>
> This approach has been implemented in Ceph before as described here
> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however
> not find the source code for this and it seemingly didn't find its way in to
> the current code base.
> The virtues of a protocol like this are that it can scale well, since there
> is no central entity that keeps a global state of the quotas, while still
> being able to enforce (somewhat) hard quotas.
> On the downside there is a protocol overhead that impacts performance.
> Research and reports on implementations suggest that this overhead can be
> kept fairly small though (2% performance penalty or less). Furthermore
> additional state must be kept on MDSs, OSDs and clients. Such a solution
> also adds considerable complexity to all involved components.
>
> We'd like to hear criticism and comments from the community, before a more
> in-depth CDM discussion.

Interesting!

My immediate thoughts:
 - The key element for implement kclient support is to implement a
mechanism whereby the clients do not have to backwards-traverse from a
file to find the nearest ancestor with a quota set.  I think that if
implementing a voucher-based approach, you'd still have to do this
work in addition to implementing the voucher system (the vouchers
would basically be the security layer on top of the refactor of
quotas)
 - The simple voucher approach is not sufficient for doing efficient
quotas on arbitrary ancestor directories: the OSD doesn't know what
directory a file is in, so how can it know whether a particular
voucher is valid for writes to a particular file?  The hack to make it
work would be to issue vouchers individually for each inode, but then
clients can overshoot their quota very far by opening many files at
once.
- In the reconciliation phase, the awkward part would be calculating
the actual size of the data in the quota-enforced directory, as the
vouchers could have been used for either overwrites or appends.  The
OSD voucher refunds would have to do something like tracking the
highest offset written in the file, and they would need passing back
up to the MDS so that it could accurately update its statistics about
the directory, perhaps.
- From reading the PDF link, it seems like they are not implementing
directory quotas, but per-client (or group of client) quotas.

I imagine that implementing directory quotas in a secure way would
require a more complex scheme, where the client would have to be able
to prove to the OSD which "quota realm" (i.e. ancestor dir with a
quota set) a particular inode belonged to.  You could potentially
issue such a token when granting write caps on a file: for files that
the client is allowed to write, it would get a signed token from the
MDS saying that the client may write, and also saying which quota
realm the file is in.  Then, the client would send that in addition to
a quota voucher for that particular realm, and the OSD would look at
both the token and the voucher.

This is related to ideas about doing broader OSD-side enforcement of
e.g. permissions: the MDS could issue tokens that said exactly what
the client is allowed to do with specific inodes, rather than clients
having free reign over everything in the data pool.

It would be ideal to find a design that decouples the security
enforcement aspect from the overall protocol aspect as much as
possible.  That way we could have an initial implementation that adds
quota support to the kernel client (introducing quota realm concept
but not actually passing tokens around), then work on the optional
crypto enforcement piece separately.

John




>
> Best,
> Luis and Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 11:27 ` John Spray
@ 2017-10-18 12:32   ` Jan Fajerski
  2017-10-19 11:08     ` Luis Henriques
  2017-10-18 21:44   ` Gregory Farnum
  2017-10-19 14:28   ` Jan Fajerski
  2 siblings, 1 reply; 14+ messages in thread
From: Jan Fajerski @ 2017-10-18 12:32 UTC (permalink / raw)
  To: John Spray; +Cc: ceph-devel, Luis Henriques, Sage Weil, Patrick Donnelly

On Wed, Oct 18, 2017 at 12:27:18PM +0100, John Spray wrote:
>On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
>> Hi list,
>> A while ago this list saw a little discussion about quota support for the
>> cephfs kernel client. The result was that instead of adding kernel support
>> for the current implementation, a new quota implementation would be the
>> preferred solution. Here we would like to propose such an implementation.
>>
>> The objective is to implement quotas such that the implementation scales
>> well, it can be implemented in ceph-fuse, the kernel client and libcephfs
>> based clients and are enforceable without relying on client cooperation. The
>> latter suggests that ceph daemon(s) must be involved in checking quota
>> limits. We think that an approach as described in "Quota Enforcement for
>> High-Performance Distributed Storage Systems" by Pollack et al.
>> (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>> blueprint for such an implementation. This approach enforces quota limits
>> with the help of vouchers. At a very high level this system works by one or
>> more quota servers (in our case MDSs) issuing vouchers carrying (among other
>> things) an expiration timestamp, an amount, a uid and a (cryptographic)
>> signature to clients. An MDS can track how much space it has given out by
>> tracking the vouchers it issues. A client can spend these vouchers on OSDs
>> by sending them along with a write request. The OSD can verify a valid
>> voucher by the signature. It will deduct the amount of written data from the
>> voucher and might return the voucher if the voucher was not used up in full.
>> The client can return the remaining amount or it can give it back to the
>> MDS.  Client failures and misbehaving clients are handled through a
>> periodical reconciliation phase where the MDSs and OSDs reconciles issued
>> and used vouchers. Vouchers held by a failed client can be detected by the
>> expiration timestamp attached to the vouchers. Any unused and invalid
>> vouchers can be reclaimed by an MDS. Clients that try to cheat by spending
>> the same voucher on multiple OSDs are detected by the uid of the voucher.
>> This means that adversarial clients can exceed the quota, but will be caught
>> within a limited time period. The signature ensure that clients can not
>> fabricate valid vouchers.  For a much better and much more detailed
>> description please refer to the paper.
>>
>> This approach has been implemented in Ceph before as described here
>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however
>> not find the source code for this and it seemingly didn't find its way in to
>> the current code base.
>> The virtues of a protocol like this are that it can scale well, since there
>> is no central entity that keeps a global state of the quotas, while still
>> being able to enforce (somewhat) hard quotas.
>> On the downside there is a protocol overhead that impacts performance.
>> Research and reports on implementations suggest that this overhead can be
>> kept fairly small though (2% performance penalty or less). Furthermore
>> additional state must be kept on MDSs, OSDs and clients. Such a solution
>> also adds considerable complexity to all involved components.
>>
>> We'd like to hear criticism and comments from the community, before a more
>> in-depth CDM discussion.
>
>Interesting!
>
>My immediate thoughts:
> - The key element for implement kclient support is to implement a
>mechanism whereby the clients do not have to backwards-traverse from a
>file to find the nearest ancestor with a quota set.  I think that if
>implementing a voucher-based approach, you'd still have to do this
>work in addition to implementing the voucher system (the vouchers
>would basically be the security layer on top of the refactor of
>quotas)
> - The simple voucher approach is not sufficient for doing efficient
>quotas on arbitrary ancestor directories: the OSD doesn't know what
>directory a file is in, so how can it know whether a particular
>voucher is valid for writes to a particular file?  The hack to make it
>work would be to issue vouchers individually for each inode, but then
>clients can overshoot their quota very far by opening many files at
>once.
The idea is that the MDS is doing the traversing before issuing a voucher. I 
certainly oversimplified on the voucher description. In the paper a voucher 
carries a user id to tie the voucher to a set quota. In Ceph's current quota 
scheme this would have to be a "quota realm" (as named below). I hadn't yet 
thought about how an OSD can verify that the voucher can be spend on this 
particular piece of data.
>- In the reconciliation phase, the awkward part would be calculating
>the actual size of the data in the quota-enforced directory, as the
>vouchers could have been used for either overwrites or appends.  The
>OSD voucher refunds would have to do something like tracking the
>highest offset written in the file, and they would need passing back
>up to the MDS so that it could accurately update its statistics about
>the directory, perhaps.
Can the OSD not determine the amount that was used of the voucher, i.e.  
overwrite vs. append? And yes ideally a client hand back unused vouchers.  
Otherwise the MDS can reclaim them after they timed out (say in case of a 
crashed client)
>- From reading the PDF link, it seems like they are not implementing
>directory quotas, but per-client (or group of client) quotas.
>
>I imagine that implementing directory quotas in a secure way would
>require a more complex scheme, where the client would have to be able
>to prove to the OSD which "quota realm" (i.e. ancestor dir with a
>quota set) a particular inode belonged to.  You could potentially
>issue such a token when granting write caps on a file: for files that
>the client is allowed to write, it would get a signed token from the
>MDS saying that the client may write, and also saying which quota
>realm the file is in.  Then, the client would send that in addition to
>a quota voucher for that particular realm, and the OSD would look at
>both the token and the voucher.
I had just assumed such a token would be part of the voucher. But essentially 
what you describe here is what we had in mind. My lack of Ceph knowledge 
probably hindered a more sensible description.
>
>This is related to ideas about doing broader OSD-side enforcement of
>e.g. permissions: the MDS could issue tokens that said exactly what
>the client is allowed to do with specific inodes, rather than clients
>having free reign over everything in the data pool.
>
>It would be ideal to find a design that decouples the security
>enforcement aspect from the overall protocol aspect as much as
>possible.  That way we could have an initial implementation that adds
>quota support to the kernel client (introducing quota realm concept
>but not actually passing tokens around), then work on the optional
>crypto enforcement piece separately.
>
>John
>
>
>
>
>>
>> Best,
>> Luis and Jan
>--
>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 12:32   ` Jan Fajerski
@ 2017-10-19 11:08     ` Luis Henriques
  0 siblings, 0 replies; 14+ messages in thread
From: Luis Henriques @ 2017-10-19 11:08 UTC (permalink / raw)
  To: John Spray, ceph-devel, Sage Weil, Patrick Donnelly

On Wed, Oct 18, 2017 at 02:32:44PM +0200, Jan Fajerski wrote:
> On Wed, Oct 18, 2017 at 12:27:18PM +0100, John Spray wrote:
> > On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
<snip>
> > > We'd like to hear criticism and comments from the community, before a more
> > > in-depth CDM discussion.
> > 
> > Interesting!
> > 
> > My immediate thoughts:
> > - The key element for implement kclient support is to implement a
> > mechanism whereby the clients do not have to backwards-traverse from a
> > file to find the nearest ancestor with a quota set.  I think that if
> > implementing a voucher-based approach, you'd still have to do this
> > work in addition to implementing the voucher system (the vouchers
> > would basically be the security layer on top of the refactor of
> > quotas)
> > - The simple voucher approach is not sufficient for doing efficient
> > quotas on arbitrary ancestor directories: the OSD doesn't know what
> > directory a file is in, so how can it know whether a particular
> > voucher is valid for writes to a particular file?  The hack to make it
> > work would be to issue vouchers individually for each inode, but then
> > clients can overshoot their quota very far by opening many files at
> > once.
> The idea is that the MDS is doing the traversing before issuing a voucher. I
> certainly oversimplified on the voucher description. In the paper a voucher
> carries a user id to tie the voucher to a set quota. In Ceph's current quota
> scheme this would have to be a "quota realm" (as named below). I hadn't yet
> thought about how an OSD can verify that the voucher can be spend on this
> particular piece of data.

Ok, I must admit that my initial (naïve) idea was to actually have a
voucher per inode (including directories, as you would need these for
preventing users to exceed the max_files limit).  This would allow the
clients to simply ignore the quota realms at all, as the MDS would be
taking care all the details -- the MDS would be responsible for figuring
out the inode quota realm and decide whether to grant a voucher to the
client or not.

However, as you pointed out, a voucher per inode would be a bad idea --
not only the clients could easily overshoot their quota very quickly but
the overall performance would likely suffer a lot with a much more verbose
protocol.

So, I agree that even in a voucher-based approach the client will still
require to figure out which quota realm it belongs to.  And this is where
the MDS requires to provide support for this new 'quota realm' concept.

My initial thought on this would be that each inode would need to start
including info about its quota realm.  This could also be a bit expensive,
though: simply setting quotas on a directory would require touching
*every* inode recursively!  And this would be needed for moving
directories/files between different quota realms.

Unfortunately, I don't really know how an OSD would figure out if a
voucher could be used in a specific write operation :-(  I assumed,
probably incorrectly, that this would be possible using the quota realm
info that could be included in a voucher.

> > - In the reconciliation phase, the awkward part would be calculating
> > the actual size of the data in the quota-enforced directory, as the
> > vouchers could have been used for either overwrites or appends.  The
> > OSD voucher refunds would have to do something like tracking the
> > highest offset written in the file, and they would need passing back
> > up to the MDS so that it could accurately update its statistics about
> > the directory, perhaps.
> Can the OSD not determine the amount that was used of the voucher, i.e.
> overwrite vs. append? And yes ideally a client hand back unused vouchers.
> Otherwise the MDS can reclaim them after they timed out (say in case of a
> crashed client)

The client can also truncate files.  And if we keep the same quota model
(max_files and max_bytes), there are other operations: delete files,
create new files, and links.  Some of these operations that require quota
checks can probably be handled by the MDS only, though.

> > - From reading the PDF link, it seems like they are not implementing
> > directory quotas, but per-client (or group of client) quotas.
> > 
> > I imagine that implementing directory quotas in a secure way would
> > require a more complex scheme, where the client would have to be able
> > to prove to the OSD which "quota realm" (i.e. ancestor dir with a
> > quota set) a particular inode belonged to.  You could potentially
> > issue such a token when granting write caps on a file: for files that
> > the client is allowed to write, it would get a signed token from the
> > MDS saying that the client may write, and also saying which quota
> > realm the file is in.  Then, the client would send that in addition to
> > a quota voucher for that particular realm, and the OSD would look at
> > both the token and the voucher.
> I had just assumed such a token would be part of the voucher. But
> essentially what you describe here is what we had in mind. My lack of Ceph
> knowledge probably hindered a more sensible description.
> > 
> > This is related to ideas about doing broader OSD-side enforcement of
> > e.g. permissions: the MDS could issue tokens that said exactly what
> > the client is allowed to do with specific inodes, rather than clients
> > having free reign over everything in the data pool.
> > 
> > It would be ideal to find a design that decouples the security
> > enforcement aspect from the overall protocol aspect as much as
> > possible.  That way we could have an initial implementation that adds
> > quota support to the kernel client (introducing quota realm concept
> > but not actually passing tokens around), then work on the optional
> > crypto enforcement piece separately.

Basically you're suggesting that an initial implementation should be
identical to the one currently available on the fuse-client, except that
it would be using quota realms instead of the backward-traverse.  Do you
think this will allow us to easily extend it in the future for a
voucher-based approach?  Although I'm inclined to agree with that, my
major concern is that it could introduce constraints I'm not considering
at the moment, and that these constraints could make it difficult to
evolve from there (breaking backward compatibility is a major regression,
specially in kernel ;-)

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 11:27 ` John Spray
  2017-10-18 12:32   ` Jan Fajerski
@ 2017-10-18 21:44   ` Gregory Farnum
  2017-10-19  9:29     ` Jan Fajerski
  2017-10-19 11:23     ` Luis Henriques
  2017-10-19 14:28   ` Jan Fajerski
  2 siblings, 2 replies; 14+ messages in thread
From: Gregory Farnum @ 2017-10-18 21:44 UTC (permalink / raw)
  To: John Spray; +Cc: ceph-devel, Luis Henriques, Sage Weil, Patrick Donnelly

On Wed, Oct 18, 2017 at 4:27 AM, John Spray <jspray@redhat.com> wrote:
> On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
>> Hi list,
>> A while ago this list saw a little discussion about quota support for the
>> cephfs kernel client. The result was that instead of adding kernel support
>> for the current implementation, a new quota implementation would be the
>> preferred solution. Here we would like to propose such an implementation.
>>
>> The objective is to implement quotas such that the implementation scales
>> well, it can be implemented in ceph-fuse, the kernel client and libcephfs
>> based clients and are enforceable without relying on client cooperation. The
>> latter suggests that ceph daemon(s) must be involved in checking quota
>> limits. We think that an approach as described in "Quota Enforcement for
>> High-Performance Distributed Storage Systems" by Pollack et al.
>> (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>> blueprint for such an implementation. This approach enforces quota limits
>> with the help of vouchers. At a very high level this system works by one or
>> more quota servers (in our case MDSs) issuing vouchers carrying (among other
>> things) an expiration timestamp, an amount, a uid and a (cryptographic)
>> signature to clients. An MDS can track how much space it has given out by
>> tracking the vouchers it issues. A client can spend these vouchers on OSDs
>> by sending them along with a write request. The OSD can verify a valid
>> voucher by the signature. It will deduct the amount of written data from the
>> voucher and might return the voucher if the voucher was not used up in full.
>> The client can return the remaining amount or it can give it back to the
>> MDS.  Client failures and misbehaving clients are handled through a
>> periodical reconciliation phase where the MDSs and OSDs reconciles issued
>> and used vouchers. Vouchers held by a failed client can be detected by the
>> expiration timestamp attached to the vouchers. Any unused and invalid
>> vouchers can be reclaimed by an MDS. Clients that try to cheat by spending
>> the same voucher on multiple OSDs are detected by the uid of the voucher.
>> This means that adversarial clients can exceed the quota, but will be caught
>> within a limited time period. The signature ensure that clients can not
>> fabricate valid vouchers.  For a much better and much more detailed
>> description please refer to the paper.
>>
>> This approach has been implemented in Ceph before as described here
>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however
>> not find the source code for this and it seemingly didn't find its way in to
>> the current code base.
>> The virtues of a protocol like this are that it can scale well, since there
>> is no central entity that keeps a global state of the quotas, while still
>> being able to enforce (somewhat) hard quotas.
>> On the downside there is a protocol overhead that impacts performance.
>> Research and reports on implementations suggest that this overhead can be
>> kept fairly small though (2% performance penalty or less). Furthermore
>> additional state must be kept on MDSs, OSDs and clients. Such a solution
>> also adds considerable complexity to all involved components.
>>
>> We'd like to hear criticism and comments from the community, before a more
>> in-depth CDM discussion.
>
> Interesting!
>
> My immediate thoughts:
>  - The key element for implement kclient support is to implement a
> mechanism whereby the clients do not have to backwards-traverse from a
> file to find the nearest ancestor with a quota set.  I think that if
> implementing a voucher-based approach, you'd still have to do this
> work in addition to implementing the voucher system (the vouchers
> would basically be the security layer on top of the refactor of
> quotas)
>  - The simple voucher approach is not sufficient for doing efficient
> quotas on arbitrary ancestor directories: the OSD doesn't know what
> directory a file is in, so how can it know whether a particular
> voucher is valid for writes to a particular file?  The hack to make it
> work would be to issue vouchers individually for each inode, but then
> clients can overshoot their quota very far by opening many files at
> once.

I'm not sure we need to focus on the existing directory-based quotas:
the reason we chose that model is because uid-based quotas did not
seem feasible. If this work does make them feasible, why not use the
model people are familiar with? (Bonus: if different UIDs map to
different namespaces, it's very easy for the OSDs to check they are
valid for a given object.)

That said, (without having read the papers) I'm a little skeptical it
will work. I've seen several "low-cost" abstractions that have hidden
global state computations which turn out to be very costly once you
exceed a threshold number of nodes.


> - In the reconciliation phase, the awkward part would be calculating
> the actual size of the data in the quota-enforced directory, as the
> vouchers could have been used for either overwrites or appends.  The
> OSD voucher refunds would have to do something like tracking the
> highest offset written in the file, and they would need passing back
> up to the MDS so that it could accurately update its statistics about
> the directory, perhaps.
> - From reading the PDF link, it seems like they are not implementing
> directory quotas, but per-client (or group of client) quotas.
>
> I imagine that implementing directory quotas in a secure way would
> require a more complex scheme, where the client would have to be able
> to prove to the OSD which "quota realm" (i.e. ancestor dir with a
> quota set) a particular inode belonged to.  You could potentially
> issue such a token when granting write caps on a file: for files that
> the client is allowed to write, it would get a signed token from the
> MDS saying that the client may write, and also saying which quota
> realm the file is in.  Then, the client would send that in addition to
> a quota voucher for that particular realm, and the OSD would look at
> both the token and the voucher.
>
> This is related to ideas about doing broader OSD-side enforcement of
> e.g. permissions: the MDS could issue tokens that said exactly what
> the client is allowed to do with specific inodes, rather than clients
> having free reign over everything in the data pool.

Yeah, we've read a number of papers relevant to this topic. They were
generally focused on access permissions rather than quotas, though,
and generally had higher costs than are claimed here. I'm not sure if
any of them are extensible to quota enforcement; I tend to think not.
(They mostly involved the MDS signing statements with a timeout
granting access to the client holding them, but not feeding from the
OSD back to the MDS.)

See especially "Macaroons: Cookies with Contextual Caveats for
Decentralized Authorization in the Cloud". "Scalable Security for
Petascale Parallel File Systems" was interesting but I think pretty
much superseded by macaroons. "Horus: Fine-Grained Encryption-Based
Security for Large-Scale Storage" was very different, but has the
"security" tag in my database program and might be more useful for
quotas, as it is about accessing file ranges rather than inodes.
-Greg


>
> It would be ideal to find a design that decouples the security
> enforcement aspect from the overall protocol aspect as much as
> possible.  That way we could have an initial implementation that adds
> quota support to the kernel client (introducing quota realm concept
> but not actually passing tokens around), then work on the optional
> crypto enforcement piece separately.
>
> John
>
>
>
>
>>
>> Best,
>> Luis and Jan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 21:44   ` Gregory Farnum
@ 2017-10-19  9:29     ` Jan Fajerski
  2017-10-19 11:23     ` Luis Henriques
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Fajerski @ 2017-10-19  9:29 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: John Spray, ceph-devel, Luis Henriques, Sage Weil,
	Patrick Donnelly

On Wed, Oct 18, 2017 at 02:44:13PM -0700, Gregory Farnum wrote:
>On Wed, Oct 18, 2017 at 4:27 AM, John Spray <jspray@redhat.com> wrote:
>> On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
>>> Hi list,
>>> A while ago this list saw a little discussion about quota support for the
>>> cephfs kernel client. The result was that instead of adding kernel support
>>> for the current implementation, a new quota implementation would be the
>>> preferred solution. Here we would like to propose such an implementation.
>>>
>>> The objective is to implement quotas such that the implementation scales
>>> well, it can be implemented in ceph-fuse, the kernel client and libcephfs
>>> based clients and are enforceable without relying on client cooperation. The
>>> latter suggests that ceph daemon(s) must be involved in checking quota
>>> limits. We think that an approach as described in "Quota Enforcement for
>>> High-Performance Distributed Storage Systems" by Pollack et al.
>>> (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>>> blueprint for such an implementation. This approach enforces quota limits
>>> with the help of vouchers. At a very high level this system works by one or
>>> more quota servers (in our case MDSs) issuing vouchers carrying (among other
>>> things) an expiration timestamp, an amount, a uid and a (cryptographic)
>>> signature to clients. An MDS can track how much space it has given out by
>>> tracking the vouchers it issues. A client can spend these vouchers on OSDs
>>> by sending them along with a write request. The OSD can verify a valid
>>> voucher by the signature. It will deduct the amount of written data from the
>>> voucher and might return the voucher if the voucher was not used up in full.
>>> The client can return the remaining amount or it can give it back to the
>>> MDS.  Client failures and misbehaving clients are handled through a
>>> periodical reconciliation phase where the MDSs and OSDs reconciles issued
>>> and used vouchers. Vouchers held by a failed client can be detected by the
>>> expiration timestamp attached to the vouchers. Any unused and invalid
>>> vouchers can be reclaimed by an MDS. Clients that try to cheat by spending
>>> the same voucher on multiple OSDs are detected by the uid of the voucher.
>>> This means that adversarial clients can exceed the quota, but will be caught
>>> within a limited time period. The signature ensure that clients can not
>>> fabricate valid vouchers.  For a much better and much more detailed
>>> description please refer to the paper.
>>>
>>> This approach has been implemented in Ceph before as described here
>>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however
>>> not find the source code for this and it seemingly didn't find its way in to
>>> the current code base.
>>> The virtues of a protocol like this are that it can scale well, since there
>>> is no central entity that keeps a global state of the quotas, while still
>>> being able to enforce (somewhat) hard quotas.
>>> On the downside there is a protocol overhead that impacts performance.
>>> Research and reports on implementations suggest that this overhead can be
>>> kept fairly small though (2% performance penalty or less). Furthermore
>>> additional state must be kept on MDSs, OSDs and clients. Such a solution
>>> also adds considerable complexity to all involved components.
>>>
>>> We'd like to hear criticism and comments from the community, before a more
>>> in-depth CDM discussion.
>>
>> Interesting!
>>
>> My immediate thoughts:
>>  - The key element for implement kclient support is to implement a
>> mechanism whereby the clients do not have to backwards-traverse from a
>> file to find the nearest ancestor with a quota set.  I think that if
>> implementing a voucher-based approach, you'd still have to do this
>> work in addition to implementing the voucher system (the vouchers
>> would basically be the security layer on top of the refactor of
>> quotas)
>>  - The simple voucher approach is not sufficient for doing efficient
>> quotas on arbitrary ancestor directories: the OSD doesn't know what
>> directory a file is in, so how can it know whether a particular
>> voucher is valid for writes to a particular file?  The hack to make it
>> work would be to issue vouchers individually for each inode, but then
>> clients can overshoot their quota very far by opening many files at
>> once.
>
>I'm not sure we need to focus on the existing directory-based quotas:
>the reason we chose that model is because uid-based quotas did not
>seem feasible. If this work does make them feasible, why not use the
>model people are familiar with? (Bonus: if different UIDs map to
>different namespaces, it's very easy for the OSDs to check they are
>valid for a given object.)

UID based quotas would, I think, require MDS code to determine which MDS is 
responsible for a given UID (for quota accounting). With the directory/file 
based approach this code exists already. Not that this is an argument for either 
approach, I think both could work with this approach.
>
>That said, (without having read the papers) I'm a little skeptical it
>will work. I've seen several "low-cost" abstractions that have hidden
>global state computations which turn out to be very costly once you
>exceed a threshold number of nodes.

Yes scalability issues are certainly a concern. Also one sensitive point of this 
protocol is issuing the vouchers, particularly the voucher size. Generally 
larger vouchers reduce the protocol overhead, since clients can operate without 
constantly requesting new vouchers. A initial voucher pool would be another 
approach.
When quotas are being filled up however large voucher sizes (or clients 
maintaining a pool of vouchers) can lead to starving clients or thrashing of 
voucher requests.
 Jan
>
>
>> - In the reconciliation phase, the awkward part would be calculating
>> the actual size of the data in the quota-enforced directory, as the
>> vouchers could have been used for either overwrites or appends.  The
>> OSD voucher refunds would have to do something like tracking the
>> highest offset written in the file, and they would need passing back
>> up to the MDS so that it could accurately update its statistics about
>> the directory, perhaps.
>> - From reading the PDF link, it seems like they are not implementing
>> directory quotas, but per-client (or group of client) quotas.
>>
>> I imagine that implementing directory quotas in a secure way would
>> require a more complex scheme, where the client would have to be able
>> to prove to the OSD which "quota realm" (i.e. ancestor dir with a
>> quota set) a particular inode belonged to.  You could potentially
>> issue such a token when granting write caps on a file: for files that
>> the client is allowed to write, it would get a signed token from the
>> MDS saying that the client may write, and also saying which quota
>> realm the file is in.  Then, the client would send that in addition to
>> a quota voucher for that particular realm, and the OSD would look at
>> both the token and the voucher.
>>
>> This is related to ideas about doing broader OSD-side enforcement of
>> e.g. permissions: the MDS could issue tokens that said exactly what
>> the client is allowed to do with specific inodes, rather than clients
>> having free reign over everything in the data pool.
>
>Yeah, we've read a number of papers relevant to this topic. They were
>generally focused on access permissions rather than quotas, though,
>and generally had higher costs than are claimed here. I'm not sure if
>any of them are extensible to quota enforcement; I tend to think not.
>(They mostly involved the MDS signing statements with a timeout
>granting access to the client holding them, but not feeding from the
>OSD back to the MDS.)
>
>See especially "Macaroons: Cookies with Contextual Caveats for
>Decentralized Authorization in the Cloud". "Scalable Security for
>Petascale Parallel File Systems" was interesting but I think pretty
>much superseded by macaroons. "Horus: Fine-Grained Encryption-Based
>Security for Large-Scale Storage" was very different, but has the
>"security" tag in my database program and might be more useful for
>quotas, as it is about accessing file ranges rather than inodes.
>-Greg
>
>
>>
>> It would be ideal to find a design that decouples the security
>> enforcement aspect from the overall protocol aspect as much as
>> possible.  That way we could have an initial implementation that adds
>> quota support to the kernel client (introducing quota realm concept
>> but not actually passing tokens around), then work on the optional
>> crypto enforcement piece separately.
>>
>> John
>>
>>
>>
>>
>>>
>>> Best,
>>> Luis and Jan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 21:44   ` Gregory Farnum
  2017-10-19  9:29     ` Jan Fajerski
@ 2017-10-19 11:23     ` Luis Henriques
  2017-10-19 23:52       ` Gregory Farnum
  1 sibling, 1 reply; 14+ messages in thread
From: Luis Henriques @ 2017-10-19 11:23 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: John Spray, ceph-devel, Sage Weil, Patrick Donnelly

On Wed, Oct 18, 2017 at 02:44:13PM -0700, Gregory Farnum wrote:
> On Wed, Oct 18, 2017 at 4:27 AM, John Spray <jspray@redhat.com> wrote:
> > On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
<snip>
> > My immediate thoughts:
> >  - The key element for implement kclient support is to implement a
> > mechanism whereby the clients do not have to backwards-traverse from a
> > file to find the nearest ancestor with a quota set.  I think that if
> > implementing a voucher-based approach, you'd still have to do this
> > work in addition to implementing the voucher system (the vouchers
> > would basically be the security layer on top of the refactor of
> > quotas)
> >  - The simple voucher approach is not sufficient for doing efficient
> > quotas on arbitrary ancestor directories: the OSD doesn't know what
> > directory a file is in, so how can it know whether a particular
> > voucher is valid for writes to a particular file?  The hack to make it
> > work would be to issue vouchers individually for each inode, but then
> > clients can overshoot their quota very far by opening many files at
> > once.
> 
> I'm not sure we need to focus on the existing directory-based quotas:
> the reason we chose that model is because uid-based quotas did not
> seem feasible. If this work does make them feasible, why not use the
> model people are familiar with? (Bonus: if different UIDs map to
> different namespaces, it's very easy for the OSDs to check they are
> valid for a given object.)

Correct, this could be used to move to a BSD-like quotas implementation,
where we could have 'user', 'group' and the more recent 'project' quotas
(which pretty much corresponds to the cephfs directory-based quotas).

Obviously, a challenge would be to ensure consistent user/group IDs across
the different clients.

> That said, (without having read the papers) I'm a little skeptical it
> will work. I've seen several "low-cost" abstractions that have hidden
> global state computations which turn out to be very costly once you
> exceed a threshold number of nodes.
> 
> 
> > - In the reconciliation phase, the awkward part would be calculating
> > the actual size of the data in the quota-enforced directory, as the
> > vouchers could have been used for either overwrites or appends.  The
> > OSD voucher refunds would have to do something like tracking the
> > highest offset written in the file, and they would need passing back
> > up to the MDS so that it could accurately update its statistics about
> > the directory, perhaps.
> > - From reading the PDF link, it seems like they are not implementing
> > directory quotas, but per-client (or group of client) quotas.
> >
> > I imagine that implementing directory quotas in a secure way would
> > require a more complex scheme, where the client would have to be able
> > to prove to the OSD which "quota realm" (i.e. ancestor dir with a
> > quota set) a particular inode belonged to.  You could potentially
> > issue such a token when granting write caps on a file: for files that
> > the client is allowed to write, it would get a signed token from the
> > MDS saying that the client may write, and also saying which quota
> > realm the file is in.  Then, the client would send that in addition to
> > a quota voucher for that particular realm, and the OSD would look at
> > both the token and the voucher.
> >
> > This is related to ideas about doing broader OSD-side enforcement of
> > e.g. permissions: the MDS could issue tokens that said exactly what
> > the client is allowed to do with specific inodes, rather than clients
> > having free reign over everything in the data pool.
> 
> Yeah, we've read a number of papers relevant to this topic. They were
> generally focused on access permissions rather than quotas, though,
> and generally had higher costs than are claimed here. I'm not sure if
> any of them are extensible to quota enforcement; I tend to think not.
> (They mostly involved the MDS signing statements with a timeout
> granting access to the client holding them, but not feeding from the
> OSD back to the MDS.)

Just out of curiosity, is there any work being done on ceph to implement
this OSD permissions enforcement?

> See especially "Macaroons: Cookies with Contextual Caveats for
> Decentralized Authorization in the Cloud". "Scalable Security for
> Petascale Parallel File Systems" was interesting but I think pretty
> much superseded by macaroons. "Horus: Fine-Grained Encryption-Based
> Security for Large-Scale Storage" was very different, but has the
> "security" tag in my database program and might be more useful for
> quotas, as it is about accessing file ranges rather than inodes.

Interesting weekend literature, thanks!

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-19 11:23     ` Luis Henriques
@ 2017-10-19 23:52       ` Gregory Farnum
  0 siblings, 0 replies; 14+ messages in thread
From: Gregory Farnum @ 2017-10-19 23:52 UTC (permalink / raw)
  To: Luis Henriques; +Cc: John Spray, ceph-devel, Sage Weil, Patrick Donnelly

On Thu, Oct 19, 2017 at 2:29 AM, Jan Fajerski <jfajerski@suse.com> wrote:
> UID based quotas would, I think, require MDS code to determine which MDS is
> responsible for a given UID (for quota accounting). With the directory/file
> based approach this code exists already. Not that this is an argument for
> either approach, I think both could work with this approach.

I'm not sure that's going to be any different, actually. Quotas can
already cross MDS subtree boundaries, and you need any MDS issuing
caps on an inode to be able to issue vouchers. This would look pretty
similar, though it might exercise the mds-side sharing/reconciliation
code more. *shrug*

On Thu, Oct 19, 2017 at 4:23 AM, Luis Henriques <lhenriques@suse.com> wrote:
> On Wed, Oct 18, 2017 at 02:44:13PM -0700, Gregory Farnum wrote:
>> On Wed, Oct 18, 2017 at 4:27 AM, John Spray <jspray@redhat.com> wrote:
>> > This is related to ideas about doing broader OSD-side enforcement of
>> > e.g. permissions: the MDS could issue tokens that said exactly what
>> > the client is allowed to do with specific inodes, rather than clients
>> > having free reign over everything in the data pool.
>>
>> Yeah, we've read a number of papers relevant to this topic. They were
>> generally focused on access permissions rather than quotas, though,
>> and generally had higher costs than are claimed here. I'm not sure if
>> any of them are extensible to quota enforcement; I tend to think not.
>> (They mostly involved the MDS signing statements with a timeout
>> granting access to the client holding them, but not feeding from the
>> OSD back to the MDS.)
>
> Just out of curiosity, is there any work being done on ceph to implement
> this OSD permissions enforcement?

No, we've never looked at it seriously. Just some occasional thoughts
bumping around in my head after reading those papers. ;)
-Greg

>> See especially "Macaroons: Cookies with Contextual Caveats for
>> Decentralized Authorization in the Cloud". "Scalable Security for
>> Petascale Parallel File Systems" was interesting but I think pretty
>> much superseded by macaroons. "Horus: Fine-Grained Encryption-Based
>> Security for Large-Scale Storage" was very different, but has the
>> "security" tag in my database program and might be more useful for
>> quotas, as it is about accessing file ranges rather than inodes.
>
> Interesting weekend literature, thanks!
>
> Cheers,
> --
> Luís

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 11:27 ` John Spray
  2017-10-18 12:32   ` Jan Fajerski
  2017-10-18 21:44   ` Gregory Farnum
@ 2017-10-19 14:28   ` Jan Fajerski
  2 siblings, 0 replies; 14+ messages in thread
From: Jan Fajerski @ 2017-10-19 14:28 UTC (permalink / raw)
  To: John Spray; +Cc: ceph-devel, Luis Henriques, Sage Weil, Patrick Donnelly

On Wed, Oct 18, 2017 at 12:27:18PM +0100, John Spray wrote:
>On Wed, Oct 18, 2017 at 11:11 AM, Jan Fajerski <jfajerski@suse.com> wrote:
>> Hi list,
>> A while ago this list saw a little discussion about quota support for the
>> cephfs kernel client. The result was that instead of adding kernel support
>> for the current implementation, a new quota implementation would be the
>> preferred solution. Here we would like to propose such an implementation.
>>
>> The objective is to implement quotas such that the implementation scales
>> well, it can be implemented in ceph-fuse, the kernel client and libcephfs
>> based clients and are enforceable without relying on client cooperation. The
>> latter suggests that ceph daemon(s) must be involved in checking quota
>> limits. We think that an approach as described in "Quota Enforcement for
>> High-Performance Distributed Storage Systems" by Pollack et al.
>> (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>> blueprint for such an implementation. This approach enforces quota limits
>> with the help of vouchers. At a very high level this system works by one or
>> more quota servers (in our case MDSs) issuing vouchers carrying (among other
>> things) an expiration timestamp, an amount, a uid and a (cryptographic)
>> signature to clients. An MDS can track how much space it has given out by
>> tracking the vouchers it issues. A client can spend these vouchers on OSDs
>> by sending them along with a write request. The OSD can verify a valid
>> voucher by the signature. It will deduct the amount of written data from the
>> voucher and might return the voucher if the voucher was not used up in full.
>> The client can return the remaining amount or it can give it back to the
>> MDS.  Client failures and misbehaving clients are handled through a
>> periodical reconciliation phase where the MDSs and OSDs reconciles issued
>> and used vouchers. Vouchers held by a failed client can be detected by the
>> expiration timestamp attached to the vouchers. Any unused and invalid
>> vouchers can be reclaimed by an MDS. Clients that try to cheat by spending
>> the same voucher on multiple OSDs are detected by the uid of the voucher.
>> This means that adversarial clients can exceed the quota, but will be caught
>> within a limited time period. The signature ensure that clients can not
>> fabricate valid vouchers.  For a much better and much more detailed
>> description please refer to the paper.
>>
>> This approach has been implemented in Ceph before as described here
>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however
>> not find the source code for this and it seemingly didn't find its way in to
>> the current code base.
>> The virtues of a protocol like this are that it can scale well, since there
>> is no central entity that keeps a global state of the quotas, while still
>> being able to enforce (somewhat) hard quotas.
>> On the downside there is a protocol overhead that impacts performance.
>> Research and reports on implementations suggest that this overhead can be
>> kept fairly small though (2% performance penalty or less). Furthermore
>> additional state must be kept on MDSs, OSDs and clients. Such a solution
>> also adds considerable complexity to all involved components.
>>
>> We'd like to hear criticism and comments from the community, before a more
>> in-depth CDM discussion.
>
>Interesting!
>
>My immediate thoughts:
> - The key element for implement kclient support is to implement a
>mechanism whereby the clients do not have to backwards-traverse from a
>file to find the nearest ancestor with a quota set.  I think that if
>implementing a voucher-based approach, you'd still have to do this
>work in addition to implementing the voucher system (the vouchers
>would basically be the security layer on top of the refactor of
>quotas)
> - The simple voucher approach is not sufficient for doing efficient
>quotas on arbitrary ancestor directories: the OSD doesn't know what
>directory a file is in, so how can it know whether a particular
>voucher is valid for writes to a particular file?  The hack to make it
>work would be to issue vouchers individually for each inode, but then
>clients can overshoot their quota very far by opening many files at
>once.

One more point here: The OSD doesn't have to know whether a client is allowed to 
spend a voucher on a particular inode. An OSD only check if the voucher is valid 
(as in issued by an MDS) and that it is sufficient for the write request it 
accompanies. It also records the voucher and tracks the total allocation for the 
vouchers id (in the paper this is a user, but it could also be a snap_realm or 
something). This should be sufficient for clients that play fairly since an MDS 
only issues vouchers if the quota hasn't been exceeded.
A byzantine client can of course spend a voucher on a different file (in another 
snap_realm or a different user_quota) or spend the same voucher multiple times.  
This will succeed initially but is bounded, i.e. quotas can be exceeded by a 
limited amount. This will however be caught during the reconciliation phase.
At least that is my understanding of the paper...I might of course be missing 
something.
 Jan
>- In the reconciliation phase, the awkward part would be calculating
>the actual size of the data in the quota-enforced directory, as the
>vouchers could have been used for either overwrites or appends.  The
>OSD voucher refunds would have to do something like tracking the
>highest offset written in the file, and they would need passing back
>up to the MDS so that it could accurately update its statistics about
>the directory, perhaps.
>- From reading the PDF link, it seems like they are not implementing
>directory quotas, but per-client (or group of client) quotas.
>
>I imagine that implementing directory quotas in a secure way would
>require a more complex scheme, where the client would have to be able
>to prove to the OSD which "quota realm" (i.e. ancestor dir with a
>quota set) a particular inode belonged to.  You could potentially
>issue such a token when granting write caps on a file: for files that
>the client is allowed to write, it would get a signed token from the
>MDS saying that the client may write, and also saying which quota
>realm the file is in.  Then, the client would send that in addition to
>a quota voucher for that particular realm, and the OSD would look at
>both the token and the voucher.
>
>This is related to ideas about doing broader OSD-side enforcement of
>e.g. permissions: the MDS could issue tokens that said exactly what
>the client is allowed to do with specific inodes, rather than clients
>having free reign over everything in the data pool.
>
>It would be ideal to find a design that decouples the security
>enforcement aspect from the overall protocol aspect as much as
>possible.  That way we could have an initial implementation that adds
>quota support to the kernel client (introducing quota realm concept
>but not actually passing tokens around), then work on the optional
>crypto enforcement piece separately.
>
>John
>
>
>
>
>>
>> Best,
>> Luis and Jan
>--
>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-10-18 10:11 cephfs quotas Jan Fajerski
  2017-10-18 11:27 ` John Spray
@ 2017-12-11 16:52 ` Luis Henriques
  2017-12-11 18:36   ` Gregory Farnum
  2017-12-12  2:27   ` Yan, Zheng
  1 sibling, 2 replies; 14+ messages in thread
From: Luis Henriques @ 2017-12-11 16:52 UTC (permalink / raw)
  To: Jan Fajerski
  Cc: Sage Weil, John Spray, Patrick Donnelly, Yan, Zheng, ceph-devel

Hi,

[ and sorry for hijacking this old thread! ]

Here's a write-up of what I was saying earlier on the cephfs standup:

Basically, by using the ceph branch wip-cephfs-quota-realm branch[1] the
kernel client should have everything needed to implement client-side
enforced quotas (just like the current fuse client).  That branch
contains code that will create a new realm whenever a client sets a
quota xattr, and the clients will be updated with this new realm.

My first question would be: is there something on the kernel client to
handle this realms (a snaprealm) that is still missing?  As far as I
could understand from reading the code there's nothing missing -- it
should be possible to walk through the realms hierarchy as the kernel
client will always get the updated realms hierarchy from the MDS -- both
for snapshots and for this new 'quota realms'.  Implementing a 'quota
realms' PoC based on the RFC I sent out a few weeks ago shouldn't take
too long.  Or is there something obvious that I'm missing?

Now, the 2nd (big!) question is how to proceed.  Or, to be more clear,
what are the expectations :-) My understanding was that John Spray would
like to see a client-side quota enforcement as an initial step, and then
have everything else added on top of it.  But I'm afraid that this would
introduce complexity for future releases -- for example, if in the
future we have a cluster-side enforced quotas (voucher-based or other),
I guess that the kernel clients would be require to support both
scenarios => maintenance burden.  Not to talk about clusters migration
from different quotas implementations.

My personal preference would be to stay away from client quotas.  That's
obviously the best short-term solution but not necessarily the best in
the long run.

Thoughts?

[1] https://github.com/ukernel/ceph/tree/wip-cephfs-quota-realm

Cheers,
-- 
Luis

Jan Fajerski <jfajerski@suse.com> writes:

> Hi list,
> A while ago this list saw a little discussion about quota support for the cephfs
> kernel client. The result was that instead of adding kernel support for the
> current implementation, a new quota implementation would be the preferred
> solution. Here we would like to propose such an implementation.
>
> The objective is to implement quotas such that the implementation scales well,
> it can be implemented in ceph-fuse, the kernel client and libcephfs based
> clients and are enforceable without relying on client cooperation. The latter
> suggests that ceph daemon(s) must be involved in checking quota limits. We think
> that an approach as described in "Quota Enforcement for High-Performance
> Distributed Storage Systems" by Pollack et
> al. (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
> blueprint for such an implementation. This approach enforces quota limits with
> the help of vouchers. At a very high level this system works by one or more
> quota servers (in our case MDSs) issuing vouchers carrying (among other things)
> an expiration timestamp, an amount, a uid and a (cryptographic) signature to
> clients. An MDS can track how much space it has given out by tracking the
> vouchers it issues. A client can spend these vouchers on OSDs by sending them
> along with a write request. The OSD can verify a valid voucher by the
> signature. It will deduct the amount of written data from the voucher and might
> return the voucher if the voucher was not used up in full.  The client can
> return the remaining amount or it can give it back to the MDS.  Client failures
> and misbehaving clients are handled through a periodical reconciliation phase
> where the MDSs and OSDs reconciles issued and used vouchers. Vouchers held by a
> failed client can be detected by the expiration timestamp attached to the
> vouchers. Any unused and invalid vouchers can be reclaimed by an MDS. Clients
> that try to cheat by spending the same voucher on multiple OSDs are detected by
> the uid of the voucher. This means that adversarial clients can exceed the
> quota, but will be caught within a limited time period. The signature ensure
> that clients can not fabricate valid vouchers.  For a much better and much more
> detailed description please refer to the paper.
>
> This approach has been implemented in Ceph before as described here
> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however not
> find the source code for this and it seemingly didn't find its way in to the
> current code base.
> The virtues of a protocol like this are that it can scale well, since there is
> no central entity that keeps a global state of the quotas, while still being
> able to enforce (somewhat) hard quotas.
> On the downside there is a protocol overhead that impacts performance. Research
> and reports on implementations suggest that this overhead can be kept fairly
> small though (2% performance penalty or less). Furthermore additional state must
> be kept on MDSs, OSDs and clients. Such a solution also adds considerable
> complexity to all involved components.
>
> We'd like to hear criticism and comments from the community, before a more
> in-depth CDM discussion.
>
> Best,
> Luis and Jan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-12-11 16:52 ` Luis Henriques
@ 2017-12-11 18:36   ` Gregory Farnum
  2017-12-12  9:12     ` Luis Henriques
  2017-12-12  2:27   ` Yan, Zheng
  1 sibling, 1 reply; 14+ messages in thread
From: Gregory Farnum @ 2017-12-11 18:36 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Jan Fajerski, Sage Weil, John Spray, Patrick Donnelly, Yan, Zheng,
	ceph-devel

On Mon, Dec 11, 2017 at 8:52 AM, Luis Henriques <lhenriques@suse.com> wrote:
> Hi,
>
> [ and sorry for hijacking this old thread! ]
>
> Here's a write-up of what I was saying earlier on the cephfs standup:
>
> Basically, by using the ceph branch wip-cephfs-quota-realm branch[1] the
> kernel client should have everything needed to implement client-side
> enforced quotas (just like the current fuse client).  That branch
> contains code that will create a new realm whenever a client sets a
> quota xattr, and the clients will be updated with this new realm.
>
> My first question would be: is there something on the kernel client to
> handle this realms (a snaprealm) that is still missing?  As far as I
> could understand from reading the code there's nothing missing -- it
> should be possible to walk through the realms hierarchy as the kernel
> client will always get the updated realms hierarchy from the MDS -- both
> for snapshots and for this new 'quota realms'.  Implementing a 'quota
> realms' PoC based on the RFC I sent out a few weeks ago shouldn't take
> too long.  Or is there something obvious that I'm missing?

So with that branch, the MDS is maintaining quota realms and sending
out the realm info to the clients. But unless there's a kernel branch
somewhere else, the kernel client doesn't know how to do anything with
those for quotas. So all of that code needs to be written.
But reading your second question, you may here be asking some other
question I don't understand...?

> Now, the 2nd (big!) question is how to proceed.  Or, to be more clear,
> what are the expectations :-) My understanding was that John Spray would
> like to see a client-side quota enforcement as an initial step, and then
> have everything else added on top of it.  But I'm afraid that this would
> introduce complexity for future releases -- for example, if in the
> future we have a cluster-side enforced quotas (voucher-based or other),
> I guess that the kernel clients would be require to support both
> scenarios => maintenance burden.  Not to talk about clusters migration
> from different quotas implementations.

Any quota system we might implement server-side will be well-served by
having the clients do checks voluntarily as well. I don't think a
voluntary client-side system is going to look much different than just
doing the checks to avoid sending off writes we know the servers will
reject.

More to the point, we have a working model for client-side enforcement
of quotas, and we *don't* have one for server-side enforcement yet.
Don't make the perfect the enemy of the good. :)
-Greg

>
> My personal preference would be to stay away from client quotas.  That's
> obviously the best short-term solution but not necessarily the best in
> the long run.
>
> Thoughts?
>
> [1] https://github.com/ukernel/ceph/tree/wip-cephfs-quota-realm
>
> Cheers,
> --
> Luis
>
> Jan Fajerski <jfajerski@suse.com> writes:
>
>> Hi list,
>> A while ago this list saw a little discussion about quota support for the cephfs
>> kernel client. The result was that instead of adding kernel support for the
>> current implementation, a new quota implementation would be the preferred
>> solution. Here we would like to propose such an implementation.
>>
>> The objective is to implement quotas such that the implementation scales well,
>> it can be implemented in ceph-fuse, the kernel client and libcephfs based
>> clients and are enforceable without relying on client cooperation. The latter
>> suggests that ceph daemon(s) must be involved in checking quota limits. We think
>> that an approach as described in "Quota Enforcement for High-Performance
>> Distributed Storage Systems" by Pollack et
>> al. (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>> blueprint for such an implementation. This approach enforces quota limits with
>> the help of vouchers. At a very high level this system works by one or more
>> quota servers (in our case MDSs) issuing vouchers carrying (among other things)
>> an expiration timestamp, an amount, a uid and a (cryptographic) signature to
>> clients. An MDS can track how much space it has given out by tracking the
>> vouchers it issues. A client can spend these vouchers on OSDs by sending them
>> along with a write request. The OSD can verify a valid voucher by the
>> signature. It will deduct the amount of written data from the voucher and might
>> return the voucher if the voucher was not used up in full.  The client can
>> return the remaining amount or it can give it back to the MDS.  Client failures
>> and misbehaving clients are handled through a periodical reconciliation phase
>> where the MDSs and OSDs reconciles issued and used vouchers. Vouchers held by a
>> failed client can be detected by the expiration timestamp attached to the
>> vouchers. Any unused and invalid vouchers can be reclaimed by an MDS. Clients
>> that try to cheat by spending the same voucher on multiple OSDs are detected by
>> the uid of the voucher. This means that adversarial clients can exceed the
>> quota, but will be caught within a limited time period. The signature ensure
>> that clients can not fabricate valid vouchers.  For a much better and much more
>> detailed description please refer to the paper.
>>
>> This approach has been implemented in Ceph before as described here
>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however not
>> find the source code for this and it seemingly didn't find its way in to the
>> current code base.
>> The virtues of a protocol like this are that it can scale well, since there is
>> no central entity that keeps a global state of the quotas, while still being
>> able to enforce (somewhat) hard quotas.
>> On the downside there is a protocol overhead that impacts performance. Research
>> and reports on implementations suggest that this overhead can be kept fairly
>> small though (2% performance penalty or less). Furthermore additional state must
>> be kept on MDSs, OSDs and clients. Such a solution also adds considerable
>> complexity to all involved components.
>>
>> We'd like to hear criticism and comments from the community, before a more
>> in-depth CDM discussion.
>>
>> Best,
>> Luis and Jan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-12-11 18:36   ` Gregory Farnum
@ 2017-12-12  9:12     ` Luis Henriques
  0 siblings, 0 replies; 14+ messages in thread
From: Luis Henriques @ 2017-12-12  9:12 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: Jan Fajerski, Sage Weil, John Spray, Patrick Donnelly, Yan, Zheng,
	ceph-devel

Gregory Farnum <gfarnum@redhat.com> writes:

> On Mon, Dec 11, 2017 at 8:52 AM, Luis Henriques <lhenriques@suse.com> wrote:
>> Hi,
>>
>> [ and sorry for hijacking this old thread! ]
>>
>> Here's a write-up of what I was saying earlier on the cephfs standup:
>>
>> Basically, by using the ceph branch wip-cephfs-quota-realm branch[1] the
>> kernel client should have everything needed to implement client-side
>> enforced quotas (just like the current fuse client).  That branch
>> contains code that will create a new realm whenever a client sets a
>> quota xattr, and the clients will be updated with this new realm.
>>
>> My first question would be: is there something on the kernel client to
>> handle this realms (a snaprealm) that is still missing?  As far as I
>> could understand from reading the code there's nothing missing -- it
>> should be possible to walk through the realms hierarchy as the kernel
>> client will always get the updated realms hierarchy from the MDS -- both
>> for snapshots and for this new 'quota realms'.  Implementing a 'quota
>> realms' PoC based on the RFC I sent out a few weeks ago shouldn't take
>> too long.  Or is there something obvious that I'm missing?
>
> So with that branch, the MDS is maintaining quota realms and sending
> out the realm info to the clients. But unless there's a kernel branch
> somewhere else, the kernel client doesn't know how to do anything with
> those for quotas. So all of that code needs to be written.

Oh, that's absolutely clear!  I know the kernel doesn't know anything
about quotas or quota realms.  And one of my priorities at this point is
to change that ;-)

I just asked this question because I was under the impression that
during the standup someone hinted that there was something else missing
in the kernel code regarding snaprealms (not quotas-specific).  Anyway,
sorry if I managed to confuse everyone.

> But reading your second question, you may here be asking some other
> question I don't understand...?
>
>> Now, the 2nd (big!) question is how to proceed.  Or, to be more clear,
>> what are the expectations :-) My understanding was that John Spray would
>> like to see a client-side quota enforcement as an initial step, and then
>> have everything else added on top of it.  But I'm afraid that this would
>> introduce complexity for future releases -- for example, if in the
>> future we have a cluster-side enforced quotas (voucher-based or other),
>> I guess that the kernel clients would be require to support both
>> scenarios => maintenance burden.  Not to talk about clusters migration
>> from different quotas implementations.
>
> Any quota system we might implement server-side will be well-served by
> having the clients do checks voluntarily as well. I don't think a
> voluntary client-side system is going to look much different than just
> doing the checks to avoid sending off writes we know the servers will
> reject.
>
> More to the point, we have a working model for client-side enforcement
> of quotas, and we *don't* have one for server-side enforcement yet.
> Don't make the perfect the enemy of the good. :)

Ok, gotcha.  I wanted to raise my concerns one last time and make sure
we're all on the same page.  I guess it's now time to go re-write those
old patches.  Thanks!

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-12-11 16:52 ` Luis Henriques
  2017-12-11 18:36   ` Gregory Farnum
@ 2017-12-12  2:27   ` Yan, Zheng
  2017-12-12  9:13     ` Luis Henriques
  1 sibling, 1 reply; 14+ messages in thread
From: Yan, Zheng @ 2017-12-12  2:27 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Jan Fajerski, Sage Weil, John Spray, Patrick Donnelly, ceph-devel



> On 12 Dec 2017, at 00:52, Luis Henriques <lhenriques@suse.com> wrote:
> 
> Hi,
> 
> [ and sorry for hijacking this old thread! ]
> 
> Here's a write-up of what I was saying earlier on the cephfs standup:
> 
> Basically, by using the ceph branch wip-cephfs-quota-realm branch[1] the
> kernel client should have everything needed to implement client-side
> enforced quotas (just like the current fuse client).  That branch
> contains code that will create a new realm whenever a client sets a
> quota xattr, and the clients will be updated with this new realm.
> 
> My first question would be: is there something on the kernel client to
> handle this realms (a snaprealm) that is still missing?  As far as I
> could understand from reading the code there's nothing missing -- it
> should be possible to walk through the realms hierarchy as the kernel
> client will always get the updated realms hierarchy from the MDS -- both
> for snapshots and for this new 'quota realms'.  Implementing a 'quota
> realms' PoC based on the RFC I sent out a few weeks ago shouldn't take
> too long.  Or is there something obvious that I'm missing?
> 

For maintaining realm hierarchy on kclient, nothing is missing.

Regards
Yan, Zheng

> Now, the 2nd (big!) question is how to proceed.  Or, to be more clear,
> what are the expectations :-) My understanding was that John Spray would
> like to see a client-side quota enforcement as an initial step, and then
> have everything else added on top of it.  But I'm afraid that this would
> introduce complexity for future releases -- for example, if in the
> future we have a cluster-side enforced quotas (voucher-based or other),
> I guess that the kernel clients would be require to support both
> scenarios => maintenance burden.  Not to talk about clusters migration
> from different quotas implementations.
> 
> My personal preference would be to stay away from client quotas.  That's
> obviously the best short-term solution but not necessarily the best in
> the long run.
> 
> Thoughts?
> 
> [1] https://github.com/ukernel/ceph/tree/wip-cephfs-quota-realm
> 
> Cheers,
> -- 
> Luis
> 
> Jan Fajerski <jfajerski@suse.com> writes:
> 
>> Hi list,
>> A while ago this list saw a little discussion about quota support for the cephfs
>> kernel client. The result was that instead of adding kernel support for the
>> current implementation, a new quota implementation would be the preferred
>> solution. Here we would like to propose such an implementation.
>> 
>> The objective is to implement quotas such that the implementation scales well,
>> it can be implemented in ceph-fuse, the kernel client and libcephfs based
>> clients and are enforceable without relying on client cooperation. The latter
>> suggests that ceph daemon(s) must be involved in checking quota limits. We think
>> that an approach as described in "Quota Enforcement for High-Performance
>> Distributed Storage Systems" by Pollack et
>> al. (https://www.ssrc.ucsc.edu/pub/pollack07-msst.html) can provide a good
>> blueprint for such an implementation. This approach enforces quota limits with
>> the help of vouchers. At a very high level this system works by one or more
>> quota servers (in our case MDSs) issuing vouchers carrying (among other things)
>> an expiration timestamp, an amount, a uid and a (cryptographic) signature to
>> clients. An MDS can track how much space it has given out by tracking the
>> vouchers it issues. A client can spend these vouchers on OSDs by sending them
>> along with a write request. The OSD can verify a valid voucher by the
>> signature. It will deduct the amount of written data from the voucher and might
>> return the voucher if the voucher was not used up in full.  The client can
>> return the remaining amount or it can give it back to the MDS.  Client failures
>> and misbehaving clients are handled through a periodical reconciliation phase
>> where the MDSs and OSDs reconciles issued and used vouchers. Vouchers held by a
>> failed client can be detected by the expiration timestamp attached to the
>> vouchers. Any unused and invalid vouchers can be reclaimed by an MDS. Clients
>> that try to cheat by spending the same voucher on multiple OSDs are detected by
>> the uid of the voucher. This means that adversarial clients can exceed the
>> quota, but will be caught within a limited time period. The signature ensure
>> that clients can not fabricate valid vouchers.  For a much better and much more
>> detailed description please refer to the paper.
>> 
>> This approach has been implemented in Ceph before as described here
>> http://drona.csa.iisc.ernet.in/~gopi/docs/amarnath-MSc.pdf. We could however not
>> find the source code for this and it seemingly didn't find its way in to the
>> current code base.
>> The virtues of a protocol like this are that it can scale well, since there is
>> no central entity that keeps a global state of the quotas, while still being
>> able to enforce (somewhat) hard quotas.
>> On the downside there is a protocol overhead that impacts performance. Research
>> and reports on implementations suggest that this overhead can be kept fairly
>> small though (2% performance penalty or less). Furthermore additional state must
>> be kept on MDSs, OSDs and clients. Such a solution also adds considerable
>> complexity to all involved components.
>> 
>> We'd like to hear criticism and comments from the community, before a more
>> in-depth CDM discussion.
>> 
>> Best,
>> Luis and Jan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cephfs quotas
  2017-12-12  2:27   ` Yan, Zheng
@ 2017-12-12  9:13     ` Luis Henriques
  0 siblings, 0 replies; 14+ messages in thread
From: Luis Henriques @ 2017-12-12  9:13 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Jan Fajerski, Sage Weil, John Spray, Patrick Donnelly, ceph-devel

"Yan, Zheng" <zyan@redhat.com> writes:

>> On 12 Dec 2017, at 00:52, Luis Henriques <lhenriques@suse.com> wrote:
>> 
>> Hi,
>> 
>> [ and sorry for hijacking this old thread! ]
>> 
>> Here's a write-up of what I was saying earlier on the cephfs standup:
>> 
>> Basically, by using the ceph branch wip-cephfs-quota-realm branch[1] the
>> kernel client should have everything needed to implement client-side
>> enforced quotas (just like the current fuse client).  That branch
>> contains code that will create a new realm whenever a client sets a
>> quota xattr, and the clients will be updated with this new realm.
>> 
>> My first question would be: is there something on the kernel client to
>> handle this realms (a snaprealm) that is still missing?  As far as I
>> could understand from reading the code there's nothing missing -- it
>> should be possible to walk through the realms hierarchy as the kernel
>> client will always get the updated realms hierarchy from the MDS -- both
>> for snapshots and for this new 'quota realms'.  Implementing a 'quota
>> realms' PoC based on the RFC I sent out a few weeks ago shouldn't take
>> too long.  Or is there something obvious that I'm missing?
>> 
>
> For maintaining realm hierarchy on kclient, nothing is missing.

Awesome, that was my understanding as well.  Thanks a lot for
confirming!

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-12-12  9:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 10:11 cephfs quotas Jan Fajerski
2017-10-18 11:27 ` John Spray
2017-10-18 12:32   ` Jan Fajerski
2017-10-19 11:08     ` Luis Henriques
2017-10-18 21:44   ` Gregory Farnum
2017-10-19  9:29     ` Jan Fajerski
2017-10-19 11:23     ` Luis Henriques
2017-10-19 23:52       ` Gregory Farnum
2017-10-19 14:28   ` Jan Fajerski
2017-12-11 16:52 ` Luis Henriques
2017-12-11 18:36   ` Gregory Farnum
2017-12-12  9:12     ` Luis Henriques
2017-12-12  2:27   ` Yan, Zheng
2017-12-12  9:13     ` Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.