linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.de>
To: Jan Schmidt <list.btrfs@jan-o-sch.net>
Cc: linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH 2/3] btrfs: extended inode refs
Date: Mon, 6 Aug 2012 16:31:41 -0700	[thread overview]
Message-ID: <20120806233141.GB3141@wotan.suse.de> (raw)
In-Reply-To: <4FF6FCCB.4050505@jan-o-sch.net>

On Fri, Jul 06, 2012 at 04:57:15PM +0200, Jan Schmidt wrote:
> Thought about this search_done once again, I'd like to repeat our May's
> conversation:
> 
> On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > > You moved this comment and assignment out of the "if (ret == 0)" case.
> > > I'm not sure if this is still doing exactly the same now (?).
> > > Previously, we were executing another btrfs_search_slot,
> > > btrfs_lookup_dir_index_item, ... after the "goto again" case, which
> > > would be skipped with this patch.
> > Hmm, ok you're definitely right that the search_done line there is broken.
> > Come to think of it, I'm not quite sure what the meaning of that tiny bit of
> > code was. I'll come back to this one once I've looked closer.
> 
> What's the result of looking closer?

Ok so to describe what, to the best of my understanding,
add_inode_ref() does before my patch (corrections are appreciated):

add_inode_ref() is iterating through a backref found in the tree log. As it
iterates though each log tree backref item it checks the following against
the subvolume tree:

1) Does a backref item with the logged key exist in the subvolume tree? If so
   each individual backref must be checked to make sure it exists in the log.
   If not, it is removed.

2) Does a directory index item for the log refs name exist in the parent? Remove
   it.

3) Does a directory item exist for the log refs name in the parent? Remove
   it.

The 'search_done' variable is set the first time step 1 is executed. We
never need to execute that step more than once since the key we're
processing never changes (so we'd always pull up the same ref item) and we
fully process the item we get from it the 1st time.

I'm not sure why Steps 2 and 3 are also skipped though if we found subvolume
refs. It seems to me that we would want to do those checks on every log
ref. In the case that the subvolume has no existing refs we'd certainly
exectute that once for every log ref.

Also to note is that the condition for step 1 either hits the first time we
execute or we'll never hit it. There's nothing in the code there that
*adds* a ref item to the subtree so if it exists we'll get it otherwise
we'll never see it.


So in order to preserve this behavior, I'll update the patch so that
search_done is set within the two blocks which look over (extended and
traditional) refs found in the subvolume. The goto can stay in the same
place, as can all the labels.


How does that sound?


> >  	/* look for a conflicting sequence number */
> >  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> > -					 btrfs_inode_ref_index(eb, ref),
> > -					 name, namelen, 0);
> > +					 ref_index, name, namelen, 0);
> >  	if (di && !IS_ERR(di)) {
> >  		ret = drop_one_dir_item(trans, root, path, dir, di);
> >  		BUG_ON(ret);
> > @@ -932,17 +1053,25 @@ again:
> >  
> >  insert:
> >  	/* insert our name */
> > -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> > -			     btrfs_inode_ref_index(eb, ref));
> > +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
> >  	BUG_ON(ret);
> >  
> >  	btrfs_update_inode(trans, root, inode);
> >  
> >  out:
> > -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> > +	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
> >  	kfree(name);
> > -	if (ref_ptr < ref_end)
> > +	if (ref_ptr < ref_end) {
> > +		if (log_ref_ver) {
> > +			ret = extref_get_fields(eb, slot, &namelen, &name,
> > +						&ref_index, NULL);
> > +		} else {
> > +			ret = ref_get_fields(key, eb, slot, &namelen, &name,
> > +					     &ref_index, NULL);
> > +		}
> > +		BUG_ON(ret);
> 
> We return ret above and BUG_ON ret, here. Is that on purpose? May make sense, I
> just don't see the difference immediately.

Ahh, the first block we can return safely from since the function has not
done work yet. At this point though we've started the operation and it can
not be unrolled. The reason I say this "can not be unrolled" is because
almost every other call in between this and the start of the function has a
"BUG_ON(ret);" after it. As that was the case it seemed to make sense to put
this there.


> 
> >  		goto again;
> > +	}
> >  
> >  	/* finally write the back reference in the inode */
> >  	ret = overwrite_item(trans, root, path, eb, slot, key);
> > @@ -965,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
> >  	return ret;
> >  }
> >  
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	int name_len;
> > +	unsigned int nlink = 0;
> > +	u32 item_size;
> > +	u32 cur_offset = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	unsigned long ptr;
> > +	struct btrfs_inode_extref *extref;
> > +	struct extent_buffer *leaf;
> >  
> > -/*
> > - * There are a few corners where the link count of the file can't
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &extref, &offset);
> > +		if (ret)
> > +			break;
> 
> Still looking strange. We should ask harder for an answer here.
> 
> On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > > Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
> > > really want count_inode_extrefs return 0 here? I agree that the previous
> > > code suffers from the same problem, but still: it's a problem.
> > Yeah as you note, I'm just keeping the same behavior as before. This I think
> > is probably a question for Chris...
> 
> To me it seems the best choice would be to return a negative value on error and
> check for that in the caller.

There's no good back out as far as I can tell. You really want this changed
so I went ahead and did it your way. What will happen now is that the
BUG_ON(ret) in fixup_inode_link_counts() will get triggered. This diverges
from how count_inode_refs() handles the error (it doesn't). I don't think changing
count_inode_refs() is in the scope of a patch to introduce extended refs so
I did not touch it.

IMHO, the problem of handling errors where it's very difficult to back out
is bigger than this one tiny function, and I don't want to try to solve it
any more than what we've done already.


Btw, the rest of your review comments have been addressed AFAICT. I should be
sending patches to the list very soon for more review :)

Thanks,
	--Mark

--
Mark Fasheh

  reply	other threads:[~2012-08-06 23:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-21 21:46 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
2012-07-06 14:56   ` Jan Schmidt
2012-07-06 15:14     ` Stefan Behrens
2012-07-09 19:05     ` Mark Fasheh
2012-07-09 20:33     ` Mark Fasheh
2012-05-21 21:46 ` [PATCH 2/3] " Mark Fasheh
2012-07-06 14:57   ` Jan Schmidt
2012-08-06 23:31     ` Mark Fasheh [this message]
2012-05-21 21:46 ` [PATCH 3/3] " Mark Fasheh
2012-07-06 14:57   ` Jan Schmidt
2012-07-09 20:24     ` Mark Fasheh
  -- strict thread matches above, loose matches on Subject: below --
2012-08-08 18:55 [PATCH 0/3] " Mark Fasheh
2012-08-08 18:55 ` [PATCH 2/3] " Mark Fasheh
2012-08-15 15:04   ` Jan Schmidt
2012-08-15 17:59     ` Mark Fasheh
2012-04-05 20:09 [PATCH 0/3] " Mark Fasheh
2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-05-03 23:12     ` Mark Fasheh
2012-05-04 11:39       ` David Sterba
2012-04-12 15:53   ` Jan Schmidt
2012-05-01 18:39     ` Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120806233141.GB3141@wotan.suse.de \
    --to=mfasheh@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=list.btrfs@jan-o-sch.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).