From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Monakhov Subject: Re: [PATCH 2/3] quota: introduce get_id callback Date: Wed, 3 Feb 2010 22:54:15 +0300 Message-ID: References: <1265122826-5370-1-git-send-email-dmonakhov@openvz.org> <1265122826-5370-2-git-send-email-dmonakhov@openvz.org> <1265122826-5370-3-git-send-email-dmonakhov@openvz.org> <20100202160509.GI7056@quack.suse.cz> <874olyg9q0.fsf@openvz.org> <20100203104754.GB3216@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-fsdevel@vger.kernel.org To: Jan Kara Return-path: Received: from mail-ew0-f228.google.com ([209.85.219.228]:50145 "EHLO mail-ew0-f228.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757095Ab0BCTyR convert rfc822-to-8bit (ORCPT ); Wed, 3 Feb 2010 14:54:17 -0500 Received: by ewy28 with SMTP id 28so1871622ewy.28 for ; Wed, 03 Feb 2010 11:54:16 -0800 (PST) In-Reply-To: <20100203104754.GB3216@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Feb 3, 2010 at 1:47 PM, Jan Kara wrote: > On Wed 03-02-10 08:38:15, Dmitry Monakhov wrote: >> Jan Kara writes: >> >> > On Tue 02-02-10 18:00:24, Dmitry Monakhov wrote: >> >> During some quota oparations we have to determine quota_id for gi= ven inode >> >> according to quota_type. But only USRQUOTA/GRPQUOTA id are interm= ediately >> >> accessible from generic vfs-inode. This patch introduce new per_s= b quota >> >> operation for this purpose. >> > =C2=A0 Hmm, but you do not intend to ever change what is returned = for USRQUOTA >> > and GRPQUOTA, do you? So we could just have something like >> Hmm... In fact i've considered this option. For example: >> In case of containers(trees), each container administrator want >> user/group quota to work inside it's container. I've considered >> following approach: >> 1) enlarge qid_t to u64 >> 2) encode quota_uid and group_uid like follows: >> =C2=A0 =C2=A0quid =3D treeid << 32 + uid >> =C2=A0 =C2=A0qgid =3D treeid << 32 + gid >> 3) Introduce new 64-bit quota format file to support wide qid_t. >> >> Currently i dont know better way to support user/group quota >> inside tree. It does not affect old fs-internal code, just replace >> all hard-coded (int =3D> u64) in fs/quota-XXX. Old 32-bit quota user= s >> not affected because qid_t will be shrink ed on quota-save for >> old(most of) users. > =C2=A0I see. But from what you write it seems to me that actually you= 'd like > a separate filesystem for each container - you'll get a separate quot= a > files for each container (so no need for id mapping) and a natural to= tal > limitation of how much the container can use (the filesystem size). > =C2=A0Now I understand that having really a separate filesystem for e= ach > container is impractical when you want to change sizes of each contai= ner > and also the overhead of separate filesystem might be too big. But I'= d like > to understand your needs... Because it might be feasible to introduce > a support for lightweight "subfilesystems" of a filesystem if that wo= uld > solve your case... Sorry for a long response. Some weeks ago i've prepared a paper about quota-tree feature with patch-queue http://2ka.mipt.ru/~mov/quota.html Currently that patch-queue is mostly obsoleted and may be interested only in history reasons. *Container* Container is a set of resources. Each container isolated from another as much as possible. Container has its own root tree. Containers tree is exported inside CT by numerous possible ways (bind-mount, virtual-stack-fs, chroot) Container's root are independent tree(subtree of bare-metal host filesystem's tree) usually they organized like follows /ct_roots/CT_${ID}/TREE_CONTENT In terms of simplicity you may think of container as a secure CHROOT: Bare-metal host file hierarchy: find /ct-roots /ct-roots/ /ct-1/bin, etc, ..... /ct-2/bin, etc, .... /ct-400/bin,etc ..... enter to the container: chroot /ct-roots/ct-1 /bin/bash There are many reasons to keep this trees separate one from another(no hardlinks) - inode attr: If inode has links in A n B trees. And A-user call chown() for this inode, then B's owner will be surprised. The only way to overcome this is to virtualize inode atributes (for each tree) which is madness IMHO. - checkpoint/restore/online-backup: This is like suspend resume for VM, but in this case only container's process are stopped(freezed) for some time. After CT's process are stopped we may create backup CT's tree without freezin= g FS as a whole. The only way to implement journalled quota for containers is to implement it on native fs level. "Containers directory tree-id" assumptions: (1) Tree id is embedded inside inode ( inside xattr ) (2) Tree id is inherent from parent dir (3) Inode can not belongs to different directory trees In your terms: "subfilesystem" of a filesystem is: 1) is subtree 2) all content starting from subtree root includes in to this subtre= e 3) Thre is no intersection between two different subfilesystems About quota files: It is totally impractical to use separate quota files for each container because each container requires 2 quota files, Recent servers allow to run abou= t 1000 of containers, so it is madness to has 2*1000 quota files , just think abo= ut orphan-list cleanup after unclean umount :). What's why i whant to encode tree_uid as (treeid << 32 + uid). This allow us to use just 3 quota files.(wide_user, wide_group, treeid= ) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html