From: Theodore Ts'o <tytso@mit.edu>
To: Li Xi <pkuelelixi@gmail.com>
Cc: Shuichi Ihara <sihara@ddn.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"hch@infradead.org" <hch@infradead.org>, Jan Kara <jack@suse.cz>,
Andreas Dilger <adilger@dilger.ca>,
"Niu, Yawei" <yawei.niu@intel.com>
Subject: Re: [PATCH v2 0/4] quota: add project quota support
Date: Mon, 11 Aug 2014 09:48:36 -0400 [thread overview]
Message-ID: <20140811134836.GA3506@thunk.org> (raw)
In-Reply-To: <CAPTn0cBGU+bsBc6DumassOyw0HuDP2tAetFaoAOV4TFw21Ud4A@mail.gmail.com>
On Mon, Aug 11, 2014 at 06:23:53PM +0800, Li Xi wrote:
> As a distributed file system, Lustre is able to use hundreds of seperate
> ext4 file systems to store its data as well as metadata, yet provides a
> united global name space. Some of users start to use SSD devices for better
> performance on Lustre. However as we can expect, they might want to replace
> only part of the drivers to SSD, since SSD is expensive. That means, part
> of the ext4 file systems are using SSD and the other part of the ext4 file
> systems are using hard disks. In the sight of Lustre, users can choose to
> locate files on SSDs or hard disks using features of Lustre, namely 'stripe'
> and 'OST pool'. Here comes the problem, how to limit the usage of SSD since
> all end users want good performance badly?
Ext4 quotas are per-disk, and storage technologies are per disk. So
if *I* were designing a clustered file system, and we had different
cost centers, say, "mail", and "maps", "social", and "search", each of
which might have differnt amounts disk drive and SSD space, which
might be based on how much SSD each of the product area budgets are
willing to pay, and what the requires of each of the products might
be, I'd simply assign different groups to each of these cost centers.
For the purposes of usages of clustered file systems, you don't want
to do quota enforcement. If you've spent tens or hundreds of CPU
years working on some distributed computation, you don't want to throw
it all away due to a quota failure. Or if you are running an
international web-based service, causing a even a partial downtime of
everyone's maps or e-mail due to quota failure is also considered,
well, not cool.
So let's assume that you're only doing usage tracking, but even if you
wanted to do usage control, the files will be scattered across many
different servers and file systems, and so it doesn't make sense to do
quota control, or even usage tracking, on a disk by disk basis.
Hence, the clustered file system will have to sum up the usage quotas
of every each underlying file system, with different sums for the
HDD's and SSD's, by group. Fortunately, Map Reduce is your friend.
Then for each group the cluster file system can report usage of HDD
and SSD space and inodes, separately. When a project gets within a
few terabytes of being filled, or the overall free space in the
cluster drops below a few petabytes, you page the your SRE or devops
team so they can take care of things, perhaps by negotiating an
emergency quota increase, or moving files around, or deleting old
files, etc.
The bottom line is that you *can* run an exabyte+ cluster file system
supporting many different budget/cost centers with only group-level
quotas and nothing else. And you can do this even supporting both
HDD's and SSD's, with separate quota tracking of the two storage
technologies.
Can you go into more detail about how Lustre would use project quotas
from a the cluster file system centric perspective, such as I've
sketched out above?
> Of course, we might be able to find some walk-around ways using group quota.
> However, because the owners of the files can change the group attributes
> freely, it is so easy for the users to evade the group quota and steal the
> tight resources.
But all of the users will be sending chgrp request through Lustre, or
whatever the cluster file system is. So Lustre can enforce whatever
permissions policy it would like.
> For example, in order to steal SSD space, a user can just
> creating the files using the sepcific group ID and then change it back.
But since you've been arguing that the project id should get preserved
across renames, they can evade quota usage by doing:
touch /product/mail/huge_file
mv /product/mail/huge_file /product/maps
And if you allow the rename, and allow the project id to be preserved
across renames, then the quota evasion is just as easy. And yes, you
could prevent renames at the cluster file system level. But the
question remains what makes sense on a single disk system, and if
users can trivially subvert the project quota by creating the file in
one directory, where it inherits the quota of project A, and then be
able to move the file to another directory, they have evaded quota
enforcement just as surely if they used chgrp.
Hence, to prevent this, you need to restrict administrator changes to
the superuser, *and* not allow renames across project hierarchies.
And surprise! That looks exactly what XFS has built.
Cheers,
- Ted
next prev parent reply other threads:[~2014-08-11 13:48 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-11 10:23 [PATCH v2 0/4] quota: add project quota support Li Xi
2014-08-11 13:48 ` Theodore Ts'o [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-08-14 1:34 Li Xi
2014-08-13 2:32 Li Xi
2014-08-13 13:22 ` Theodore Ts'o
2014-08-11 15:03 Li Xi
2014-08-11 14:40 Li Xi
2014-08-11 14:45 ` Theodore Ts'o
2014-08-11 14:49 ` Li Xi
2014-08-11 14:16 Li Xi
2014-08-11 0:19 Li Xi
2014-08-11 0:06 Li Xi
2014-08-10 0:38 Li Xi
2014-08-08 16:58 Li Xi
2014-08-08 16:39 Li Xi
2014-08-08 22:33 ` Theodore Ts'o
2014-08-09 14:24 ` Li Xi
2014-08-09 17:24 ` Theodore Ts'o
2014-08-09 22:17 ` Theodore Ts'o
2014-08-09 23:38 ` Dave Chinner
2014-08-10 0:09 ` Theodore Ts'o
2014-08-10 22:18 ` Dave Chinner
2014-08-10 2:15 ` Li Xi
2014-08-11 10:49 ` Jan Kara
2014-08-10 8:38 ` Shuichi Ihara
2014-08-10 16:52 ` Theodore Ts'o
2014-08-10 20:47 ` James Bottomley
2014-08-10 21:49 ` Theodore Ts'o
2014-08-09 22:14 ` Dave Chinner
2014-08-11 14:41 ` Theodore Ts'o
2014-08-12 15:35 ` Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140811134836.GA3506@thunk.org \
--to=tytso@mit.edu \
--cc=adilger@dilger.ca \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=pkuelelixi@gmail.com \
--cc=sihara@ddn.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yawei.niu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).