From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Date: Sat, 25 Oct 2008 00:28:24 +0200 Subject: [Ocfs2-devel] [PATCH] ocfs2: Add quota calls for allocation and freeing of inodes and space In-Reply-To: <20081022173149.GD27819@mail.oracle.com> References: <1224523443806-git-send-email-jack@suse.cz> <20081021233446.GI2871@mail.oracle.com> <20081022144900.GK13868@duck.suse.cz> <20081022173149.GD27819@mail.oracle.com> Message-ID: <20081024222824.GD10038@duck.suse.cz> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com On Wed 22-10-08 10:31:49, Joel Becker wrote: > On Wed, Oct 22, 2008 at 04:49:00PM +0200, Jan Kara wrote: > > On Tue 21-10-08 16:34:46, Joel Becker wrote: > > > On Mon, Oct 20, 2008 at 07:23:58PM +0200, Jan Kara wrote: > > > > Add quota calls for allocation and freeing of inodes and space, also update > > > > estimates on number of needed credits for a transaction. Move out inode > > > > allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called > > > > outside of a transaction. > > > > > > > > Signed-off-by: Jan Kara > > > > @@ -5954,6 +5955,8 @@ static int ocfs2_do_truncate(struct ocfs2_super *osb, > > > > goto bail; > > > > } > > > > > > > > + vfs_dq_free_space_nodirty(inode, > > > > + ocfs2_clusters_to_bytes(osb->sb, clusters_to_del)); > > > > spin_lock(&OCFS2_I(inode)->ip_lock); > > > > OCFS2_I(inode)->ip_clusters = le32_to_cpu(fe->i_clusters) - > > > > clusters_to_del; > > > > > > In this one instance, you don't clean up the quota change on > > > error, even though clusters_to_del might remain allocated. How come? > > Good point. Is there a way to find out how many clusters were actually > > truncated before we failed? > > Not quite sure, really. And this brings up the second question. > > > > On that note, I don't see us a) fixing the inode up on failure > > > or b) writing out the new values. Now, I remember from the past that we > > > allow a smaller i_clusters than actually in the tree, because our alloc > > > code can handle that. So I'm sure that the fallout from us not fixing > > > up i_clusters and/or writing it is safe. > > > But I wonder what the right behavior is with regards to quota. > > Well, quota usage must be always in sync with i_clusters - or better > > i_blocks - because on chown or chgrp, that is what you're going to transfer > > from one user to another one. > > As a side note, I've actually noticed a few paths (even for cases of > > ENOSPC) in OCFS2 where if we fail during allocation, we actually leave > > behind allocated blocks not attached anywhere (i.e., only fsck can reclaim > > them). I discussed it with Mark and it's not completely trivial to fix. So > > I've worked it around so that quota allocation is done before allocating > > any blocks and I also tried to take care that at least quota stays correct > > after a failure. > > Right, there are a couple places like this, and the second > question is what matters for quota - what is supposed to be allocated > (i_clusters) or what is really allocated (only different from i_clusters > in these corner cases). > In that truncate case above, i_clusters on the ocfs2_dinode will > stay at the "supposed to be truncated" number, but journal_dirty() isn't > called. If nothing else happens to that buffer, it won't be written. > However, if it is written out due to another operation, the smaller > i_clusters will appear on disk. This is a safe behavior from the view > of the file (truncate happened or did not). The fact that the > allocation remains in both cases is something we silently handle in the > allocation paths and clean up in fsck when we spot it. Well, this is a bit difficult, but let's say that while the filesystem is consistent with what is says to quota, quota is happy. Probably i_clusters are more important to quota because they're used on chown etc. So fs should make sure it tells quota about space changes exactly as it changes i_clusters. Anyway, the situation when inode buffer with i_clusters change may or may not be written to disk seems icky. I think that should be somehow fixed to deterministic behavior, preferably so that i_clusters always match real usage ;) I agree that error handling of these cases is nasty though and in cases of IO errors or so we cannot guarantee anything. Honza -- Jan Kara SUSE Labs, CR