From: Li Zefan <lizf@cn.fujitsu.com>
To: kreijack@libero.it
Cc: Chris Mason <chris.mason@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Date: Wed, 15 Dec 2010 11:33:33 +0800 [thread overview]
Message-ID: <4D08370D.6020705@cn.fujitsu.com> (raw)
In-Reply-To: <201012141916.41682.kreijack@libero.it>
Goffredo Baroncelli wrote:
> On Tuesday, 14 December, 2010, Li Zefan wrote:
>> Goffredo Baroncelli wrote:
>>> Hi Li,
>>>
>>> On Monday, 13 December, 2010, Li Zefan wrote:
>>>> The keys returned by tree search ioctl should be restricted to:
>>>>
>>>> key.objectid = [min_objectid, max_objectid] &&
>>>> key.offset = [min_offset, max_offset] &&
>>>> key.type = [min_type, max_type]
>>>>
>>>> But actually it returns those keys:
>>>>
>>>> [(min_objectid, min_type, min_offset),
>>>> (max_objectid, max_type, max_offset)].
>>>>
>>> I have to admit that I had need several minutes to understand what you
> wrote
>>> :). Then I came to conclusion that the tree search ioctl is basically
> wrong.
>>> IMHO, the error in this API is to use the lower bound of the acceptance
>>> criteria (the min_objectid, min_offset, min_type fields) also as starting
>>> point for the search.
>>>
>>> Let me explain with an example.
>>>
>>> Suppose to want to search all the keys in the range
>>>
>>> key.objectid = 10..20
>>> key.offset = 100..200
>>> key.type = 2..5
>>>
>>>
>>> Suppose to set sk->nr_items to 1 for simplicity, and the keys available
> which
>>> fit in the range are
>>>
>>> 1) [15,150,3]
>>> 2) [16,160,4]
>>> 3) [17,180,3]
>>>
>>> All these key satisfy the "acceptance criteria", but because we have to
>>> restart the search from the last key found, the code should resemble
>>>
>>> sk = &args.key
>>>
>>> sk->min_objectid=10; sk->max_objectid=20
>>> sk->min_offset=100; sk->max_offset=200
>>> sk->min_type=2; sk->max_type=5
>>> sk->nr_items = 1;
>>>
>>> while(1){
>>> ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
>>> if( !sk->nr_items )
>>> break
>>>
>>> for(off = 0, i=0 ; i < sk->nr_items ; i ){
>>> sh = (struct btrfs_ioctl_search_header *)(args.buf
>>> off);
>>>
>>> [...]
>>> sk->min_objectid = sh->objectid;
>>> sk->min_offset = sh->offset;
>>> sk->min_type = sh->type;
>>> }
>>>
>>> <increase the sk->min_* key of 1>
>>>
>>> }
>>>
>>> But in this case, the code after found the key #2, sets the minimum
> acceptance
>>> criteria to [16,160,4], which exclude the key #3 because min_type is too
> high.
>>> Ideally, we should add three new field to the search key structure:
>>>
>>> sk->start_objectid
>>> sk->start_offset
>>> sk->start_type
>>>
>>> And after every iteration the code (even the kernel code) should set these
>>> fields as "last key found 1", leaving the min_* fields as they are.
>>>
>>> My analysis is correct or I miss something ?
>>>
>> After looking more deeply, I found the ioctl was changed in this way
>> on purpose, to support "btrfs subvolume find-new" specifically.
>>
>> See this commit:
>>
>> commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33
>> Author: Chris Mason <chris.mason@oracle.com>
>> Date: Thu Mar 18 12:10:08 2010 -0400
>>
>> Btrfs: fix key checks and advance in the search ioctl
>>
>> The search ioctl was working well for finding tree roots, but using it
> for
>> generic searches requires a few changes to how the keys are advanced.
>> This treats the search control min fields for objectid, type and offset
>> more like a key, where we drop the offset to zero once we bump the type,
>> etc.
>>
>> The downside of this is that we are changing the min_type and min_offset
>> fields during the search, and so the ioctl caller needs extra checks to
> make
>> the keys in the result are the ones it wanted.
>>
>> This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
>> things more readable.
>>
>> So I think we can just fix the btrfs tool. Though adding sk->start_xxx
> should
>> also be able to meet the needs for "btrfs subvolume find-new".
>
> Sorry, but I have to disagree. This API seems to me simply bugged. The example
> above (which is quite generic) highlights this fact. But I can provide a more
> real case: suppose to use the BTRFS_IOC_TREE_SEARCH ioctl to find the new
> files. We are interested to the following items:
>
> - BTRFS_EXTENT_DATA_KEY (type = 1)
> - BTRFS_INODE_ITEM_KEY (type = 24)
> - BTRFS_XATTR_ITEM_KEY (type = 108)
>
> Acceptance criteria:
>
> min_type = 1
> max_type = 108
> min_offset = 0
> max_offset = ~0
> min_objectid = 0
> max_objectid = ~0
> min_transid = <the base generation number>
>
> Pay attention that we aren't interested in the offset.
>
> Suppose to have the following sequence keys [objectid, type, offset]:
>
> [...]
> 1) [300, BTRFS_EXTENT_DATA_KEY, xx]
> 2) [300, BTRFS_INODE_ITEM_KEY, xx]
> 3) [300, BTRFS_XATTR_ITEM_KEY, xx]
> 4) [301, BTRFS_EXTENT_DATA_KEY, xx]
> 5) [301, BTRFS_INODE_ITEM_KEY, xx]
> 7) [30200, BTRFS_EXTENT_DATA_KEY, xx]
> 8) [30200, BTRFS_INODE_ITEM_KEY, xx]
> 9) [30200, BTRFS_XATTR_ITEM_KEY, xx]
> [...]
>
>
> Suppose that the buffer is filled between the item 2 and 3. We should restart
> the search, but how set the min_* key ? Try the following hypothesis
>
> h1) objectid++, type = 0 -> In the next search the key 3 would be skipped
> h2) objectid asis, type ++, -> in the next search the key 4 would be skipped
> h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be
h4) objectid asis, type asis, offset++ -> we should get the correct result.
because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and
end_key, and it returns all keys that falls in [start_key, end_key].
So this btrfs-progs patch should fix missing subvolumes in the output of
"subvolume list":
diff --git a/btrfs-list.c b/btrfs-list.c
index 93766a8..1b9ea45 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -620,7 +620,10 @@ int list_subvols(int fd)
/* this iteration is done, step forward one root for the next
* ioctl
*/
- if (sk->min_objectid < (u64)-1) {
+ if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) {
+ sk->min_type = BTRFS_ROOT_BACKREF_KEY;
+ sk->min_offset = 0;
+ } else if (sk->min_objectid < (u64)-1) {
sk->min_objectid++;
sk->min_type = BTRFS_ROOT_BACKREF_KEY;
sk->min_offset = 0;
> returned a second time...
>
> Pay attention that every inode may have more key type BTRFS_XATTR_ITEM_KEY or
> type BTRFS_EXTENT_DATA_KEY, so it is not possible to know in advance when the
> buffer is filled.
>
> Only as theoretical exercise, we can improve the search logic in userspace so
> when an item is returned, in the next search we set the minimum type as
> previous type+1, and the *maximum* objectid as the latest ofound bject id.
> When we are sure that there are not more key with this objectid we can reuse
> the old max_objectid and min_type... But to me it seems very fragile.
>
> Chris what do you think ? Otherwise I missed something this seems a severe bug
> in the api ?
>
> In another email I will propose a patch which may address this problem.
>
next prev parent reply other threads:[~2010-12-15 3:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-13 9:47 [PATCH 1/3] Btrfs: Really return keys within specified range Li Zefan
2010-12-13 9:50 ` [PATCH 2/3] Btrfs: Don't return items more than user specified Li Zefan
2010-12-13 9:50 ` [PATCH 3/3] Btrfs: Clean up tree search ioctl code Li Zefan
2010-12-13 18:13 ` Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range] Goffredo Baroncelli
2010-12-14 5:37 ` Li Zefan
2010-12-14 18:16 ` Goffredo Baroncelli
2010-12-15 3:33 ` Li Zefan [this message]
2010-12-15 6:53 ` Goffredo Baroncelli
2010-12-15 7:13 ` Li Zefan
2010-12-15 18:48 ` Goffredo Baroncelli
2010-12-16 1:03 ` Li Zefan
2010-12-15 16:14 ` Chris Mason
2010-12-15 18:42 ` Goffredo Baroncelli
2010-12-15 18:51 ` Chris Mason
2010-12-15 19:13 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D08370D.6020705@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.