public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Niebler <gniebler@suse.com>
To: Nikolay Borisov <nborisov@suse.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com
Subject: Re: [PATCH v2] btrfs: Turn delayed_nodes_tree into an XArray
Date: Thu, 14 Apr 2022 11:26:29 +0200	[thread overview]
Message-ID: <3a4a506f-e604-9847-2844-810b53223ef1@suse.com> (raw)
In-Reply-To: <c3347f72-0737-38bc-0fd7-70fa7bb272b2@suse.com>

Am 13.04.22 um 15:34 schrieb Nikolay Borisov:
> On 12.04.22 г. 15:35 ч., Gabriel Niebler wrote:
>> […]
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index 748bf6b0d860..89e1c39c2aef 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -78,7 +78,7 @@ static struct btrfs_delayed_node 
>> *btrfs_get_delayed_node(
>>       }
>>       spin_lock(&root->inode_lock);
>> -    node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
>> +    node = xa_load(&root->delayed_nodes, (unsigned long)ino);
> 
> Isn't this problematic, because  you'd be truncating the ino in case we 
> have a number which is larger than unsigned long. For architectures 
> which use long as a 64 bit type this cast is a noop since u64 is defined 
> as unsigned long (via typedef unsigned long __u64; typedef __u64 u64). 
> But on arches which use long long for their 64bit type then this would 
> truncate the ino number, so I think this cast is redundant. In any case 
> the code is broken for u64 for arches which use long long and as the 
> author of xarray put it just now:
> 
> <willy> yup.  It's not a great data structure for storing u64s in
> 
> However, looking at the existing radix_tree_insert the index is also an 
> unsigned long. TLDR is remove the cast.

Right. This is something I wasn't too sure of myself and I lacked the 
deeper understanding to make a better informed decision here, so thank 
you for your insight. I will remove the cast.

In particular, the fact that radix trees already have this problem was 
something I missed (even though it's kind of clear that they must, as 
it's the same data structure under the hood, but I didn't look too 
closely at the functions and the fact they take an unsigned long in the 
same places as the XArray equivalents).

This will also play a role in upcoming patches where I often did the 
same cast, but now I know that I don't need it. If it worked before, it 
will work after.

>> @@ -1870,29 +1863,32 @@ void btrfs_kill_delayed_inode_items(struct 
>> btrfs_inode *inode)
>>   void btrfs_kill_all_delayed_nodes(struct btrfs_root *root)
>>   {
>> -    u64 inode_id = 0;
>> +    unsigned long index;
>> +    struct btrfs_delayed_node *delayed_node;
>>       struct btrfs_delayed_node *delayed_nodes[8];
>>       int i, n;
>>       while (1) {
>>           spin_lock(&root->inode_lock);
>> -        n = radix_tree_gang_lookup(&root->delayed_nodes_tree,
>> -                       (void **)delayed_nodes, inode_id,
>> -                       ARRAY_SIZE(delayed_nodes));
>> -        if (!n) {
>> +        if (xa_empty(&root->delayed_nodes)) {
>>               spin_unlock(&root->inode_lock);
>>               break;
>>           }
>> -        inode_id = delayed_nodes[n - 1]->inode_id + 1;
>> -        for (i = 0; i < n; i++) {
>> +        n = 0;
>> +        xa_for_each_start(&root->delayed_nodes, index,
>> +                  delayed_node, index) {
> 
> Here index is used not only as the index of the current entry but also 
> as the "start from this index". But you are not actually initializing it 
> so index has garbage on the first iteration. If you want to start from 
> the 0th index then you should initialize it when defining it.

Whoops, what an oversight. Good catch! Thanks for pointing this out! 
Will fix.

>>               /*
>>                * Don't increase refs in case the node is dead and
>>                * about to be removed from the tree in the loop below
>>                */
>> -            if (!refcount_inc_not_zero(&delayed_nodes[i]->refs))
>> -                delayed_nodes[i] = NULL;
>> +            if (!refcount_inc_not_zero(&delayed_node->refs))
>> +                delayed_nodes[n] = NULL;
> 
> This is broken. What the original code did was query up to 8 delayed 
> nodes at a time and and then for every node which got written into the 
> delayed_nodes array it incremented the refcount so that later the nodes 
> can be destroyed outside of spin_unlock. In your code you don't write 
> the node into the local delayed_nodes array, meaning the for() loop 
> never frees anything.

You're right, of course. All that's missing is a
             delayed_nodes[n] = delayed_node;
before the `if`, but without that it doesn't work. I don't know how I 
missed that one. Again, thanks for noticing! Will fix.


  reply	other threads:[~2022-04-14  9:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 12:35 [PATCH v2] btrfs: Turn delayed_nodes_tree into an XArray Gabriel Niebler
2022-04-13 13:34 ` Nikolay Borisov
2022-04-14  9:26   ` Gabriel Niebler [this message]
2022-04-13 14:14 ` David Sterba
2022-04-14  9:39   ` Gabriel Niebler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a4a506f-e604-9847-2844-810b53223ef1@suse.com \
    --to=gniebler@suse.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox