* BTRFS_IOC_TREE_SEARCH ioctl @ 2015-01-05 17:15 Lennart Poettering 2015-01-05 18:22 ` Hugo Mills 2015-01-05 18:54 ` Goffredo Baroncelli 0 siblings, 2 replies; 6+ messages in thread From: Lennart Poettering @ 2015-01-05 17:15 UTC (permalink / raw) To: linux-btrfs Heya, I recently added some btrfs magic to systemd's machinectl/nspawn tool. More specifically it can now show the disk usage of a container that is stored in a btrfs subvolume. For that I made use of the btrfs quota logic. To read the current disk usage of a subvolume I took inspiration from btrfs-progs, most specifically the BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the ioctl seems to to be lacking, but there are some things about it I fail to grok: What precisely are the semantics of the ioctl, regarding the search key min/max values (the fields of "struct btrfs_ioctl_search_key")? I kinda assumed that setting them would result in in only objects to be returned that are within the min/max ranges. However, that appears not to be the case. At least the min_offset/max_offset setting appears to be ignored? The code I hacked up is this one: http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427 I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY objects for the subvolume I care about. Hence I initialize .min_type and .max_type to the two types (in the right order), and then .min_offset and .max_offset to subvolume id. However, the search ioctl will still give me entries back with offsets != the subvolume id... Is this intended behaviour of the search ioctl? If so, what's the rationale? My code currently invokes the search ioctl in a loop to work around the fact that .min_offset/.max_offset don't work as I wish they did... I wish I could get rid of this loop and filtering out of the entries I get back that aren't in th range I specified... Lennart ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BTRFS_IOC_TREE_SEARCH ioctl 2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering @ 2015-01-05 18:22 ` Hugo Mills 2015-01-05 19:11 ` Lennart Poettering 2015-01-05 18:54 ` Goffredo Baroncelli 1 sibling, 1 reply; 6+ messages in thread From: Hugo Mills @ 2015-01-05 18:22 UTC (permalink / raw) To: Lennart Poettering; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3328 bytes --] On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote: > Heya, > > I recently added some btrfs magic to systemd's machinectl/nspawn > tool. More specifically it can now show the disk usage of a container > that is stored in a btrfs subvolume. For that I made use of the btrfs > quota logic. To read the current disk usage of a subvolume I took > inspiration from btrfs-progs, most specifically the > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the > ioctl seems to to be lacking, but there are some things about it I > fail to grok: > > What precisely are the semantics of the ioctl, regarding the search > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I > kinda assumed that setting them would result in in only objects to be > returned that are within the min/max ranges. However, that appears not > to be the case. At least the min_offset/max_offset setting appears to > be ignored? This is an old argument. :) Keys have three parts, so it's plausible (but, in this case, wrong) to consider the space you're searching to be a 3-dimensional space of (object, type, offset), which seems to be what you're expecting. A min, max pair would then define an oblong subset of the keyspace from which to retrieve keys. However, that's not actually what's happening. Keys are indexed within their tree(s) by a concatenation of the items in the key. A key, therefore, should be thought of as a single 136-bit integer, and the keys are lexically ordered, (object||type||offset), where "||" is the concatenation operator. You get every key _lexically ordered_ between the min and max values. This is a superset of the 3-dimensional results above. About 3-4 years ago, we see-sawed through several messy patches in userspace (and at least one in the kernel) before this distinction and difference in semantics was understood. > The code I hacked up is this one: > > http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427 > > I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY > objects for the subvolume I care about. Hence I initialize .min_type > and .max_type to the two types (in the right order), and then > .min_offset and .max_offset to subvolume id. However, the search ioctl > will still give me entries back with offsets != the subvolume id... > > Is this intended behaviour of the search ioctl? If so, what's the > rationale? Yes, it is. The rationale is that it's simply walking through the key values in the tree linearly until the max value is found. > My code currently invokes the search ioctl in a loop to work around > the fact that .min_offset/.max_offset don't work as I wish they > did... I wish I could get rid of this loop and filtering out of the > entries I get back that aren't in th range I specified... You'd have to do this in kernel space if you wanted the 3D semantics instead of the concatenated semantics. There's no free lunch here. It might be a good idea for "libbtrfs" (such as it is) to implement this, as it's a (moderately rare) repeat request. Hugo. -- Hugo Mills | Klytus, I'm bored. What plaything can you offer me hugo@... carfax.org.uk | today? http://carfax.org.uk/ | PGP: 65E74AC0 | Ming the Merciless, Flash Gordon [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BTRFS_IOC_TREE_SEARCH ioctl 2015-01-05 18:22 ` Hugo Mills @ 2015-01-05 19:11 ` Lennart Poettering 2015-01-05 19:35 ` Hugo Mills 0 siblings, 1 reply; 6+ messages in thread From: Lennart Poettering @ 2015-01-05 19:11 UTC (permalink / raw) To: Hugo Mills, linux-btrfs On Mon, 05.01.15 18:22, Hugo Mills (hugo@carfax.org.uk) wrote: > On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote: > > Heya, > > > > I recently added some btrfs magic to systemd's machinectl/nspawn > > tool. More specifically it can now show the disk usage of a container > > that is stored in a btrfs subvolume. For that I made use of the btrfs > > quota logic. To read the current disk usage of a subvolume I took > > inspiration from btrfs-progs, most specifically the > > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the > > ioctl seems to to be lacking, but there are some things about it I > > fail to grok: > > > > What precisely are the semantics of the ioctl, regarding the search > > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I > > kinda assumed that setting them would result in in only objects to be > > returned that are within the min/max ranges. However, that appears not > > to be the case. At least the min_offset/max_offset setting appears to > > be ignored? > > This is an old argument. :) > > Keys have three parts, so it's plausible (but, in this case, wrong) > to consider the space you're searching to be a 3-dimensional space of > (object, type, offset), which seems to be what you're expecting. A > min, max pair would then define an oblong subset of the keyspace from > which to retrieve keys. > > However, that's not actually what's happening. Keys are indexed > within their tree(s) by a concatenation of the items in the key. A > key, therefore, should be thought of as a single 136-bit integer, and > the keys are lexically ordered, (object||type||offset), where "||" is > the concatenation operator. You get every key _lexically ordered_ > between the min and max values. This is a superset of the > 3-dimensional results above. Ah, I see. Makes sense. I figure the comments in btrfs.h next to "struct btrfs_ioctl_search_key" could use some updating in this regard. They pretty explicitly suggest that the 3 axis were independent and each eleent individually would be between the respective min/max when returning... Ideally the structure would just have two fields called "max", and "min" or so, of type btrfs_disk_key, right? In that case I figure the behaviour would have been clear. It's particular confusing that the disk key fields appear in a different order than otherwise used and with the min_transid+max_transid in the middle... Which brings me to my question: how does {min|max}_transid affect the search result? Is this axis orthogonal or is it neither? Thanks for the explanations! Lennart -- Lennart Poettering, Red Hat ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BTRFS_IOC_TREE_SEARCH ioctl 2015-01-05 19:11 ` Lennart Poettering @ 2015-01-05 19:35 ` Hugo Mills [not found] ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Hugo Mills @ 2015-01-05 19:35 UTC (permalink / raw) To: Lennart Poettering; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3150 bytes --] On Mon, Jan 05, 2015 at 08:11:56PM +0100, Lennart Poettering wrote: > On Mon, 05.01.15 18:22, Hugo Mills (hugo@carfax.org.uk) wrote: > > > On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote: > > > Heya, > > > > > > I recently added some btrfs magic to systemd's machinectl/nspawn > > > tool. More specifically it can now show the disk usage of a container > > > that is stored in a btrfs subvolume. For that I made use of the btrfs > > > quota logic. To read the current disk usage of a subvolume I took > > > inspiration from btrfs-progs, most specifically the > > > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the > > > ioctl seems to to be lacking, but there are some things about it I > > > fail to grok: > > > > > > What precisely are the semantics of the ioctl, regarding the search > > > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I > > > kinda assumed that setting them would result in in only objects to be > > > returned that are within the min/max ranges. However, that appears not > > > to be the case. At least the min_offset/max_offset setting appears to > > > be ignored? > > > > This is an old argument. :) > > > > Keys have three parts, so it's plausible (but, in this case, wrong) > > to consider the space you're searching to be a 3-dimensional space of > > (object, type, offset), which seems to be what you're expecting. A > > min, max pair would then define an oblong subset of the keyspace from > > which to retrieve keys. > > > > However, that's not actually what's happening. Keys are indexed > > within their tree(s) by a concatenation of the items in the key. A > > key, therefore, should be thought of as a single 136-bit integer, and > > the keys are lexically ordered, (object||type||offset), where "||" is > > the concatenation operator. You get every key _lexically ordered_ > > between the min and max values. This is a superset of the > > 3-dimensional results above. > > Ah, I see. Makes sense. > > I figure the comments in btrfs.h next to "struct > btrfs_ioctl_search_key" could use some updating in this regard. They > pretty explicitly suggest that the 3 axis were independent and each > eleent individually would be between the respective min/max when > returning... > > Ideally the structure would just have two fields called "max", and > "min" or so, of type btrfs_disk_key, right? In that case I figure the > behaviour would have been clear. It's particular confusing that the > disk key fields appear in a different order than otherwise used and > with the min_transid+max_transid in the middle... Yes, it's not exactly the most obvious structure. > Which brings me to my question: how does {min|max}_transid affect the > search result? Is this axis orthogonal or is it neither? Hmm. Good question. I don't know the answer to that one, I'm afraid. I _think_ it's orthogonal (since it's not indexed in the same B-tree structures). Hugo. -- Hugo Mills | What do you give the man who has everything? hugo@... carfax.org.uk | Penicillin is a good start... http://carfax.org.uk/ | PGP: 65E74AC0 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com>]
* Re: BTRFS_IOC_TREE_SEARCH ioctl [not found] ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com> @ 2015-01-07 12:14 ` Lennart Poettering 0 siblings, 0 replies; 6+ messages in thread From: Lennart Poettering @ 2015-01-07 12:14 UTC (permalink / raw) To: Nehemiah Dacres; +Cc: Hugo Mills, linux-btrfs On Mon, 05.01.15 19:14, Nehemiah Dacres (vivacarlie@gmail.com) wrote: > Is libbtrfs documented or even stable yet? What stage of development is it > in anyway? is there a design spec yet? Note that the code we use in systemd is not based on libbtrfs, we just call the ioctls directly. Lennart -- Lennart Poettering, Red Hat ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BTRFS_IOC_TREE_SEARCH ioctl 2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering 2015-01-05 18:22 ` Hugo Mills @ 2015-01-05 18:54 ` Goffredo Baroncelli 1 sibling, 0 replies; 6+ messages in thread From: Goffredo Baroncelli @ 2015-01-05 18:54 UTC (permalink / raw) To: Lennart Poettering, linux-btrfs On 2015-01-05 18:15, Lennart Poettering wrote: > Heya, > > I recently added some btrfs magic to systemd's machinectl/nspawn > tool. More specifically it can now show the disk usage of a container > that is stored in a btrfs subvolume. For that I made use of the btrfs > quota logic. To read the current disk usage of a subvolume I took > inspiration from btrfs-progs, most specifically the > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the > ioctl seems to to be lacking, but there are some things about it I > fail to grok: > > What precisely are the semantics of the ioctl, regarding the search > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I > kinda assumed that setting them would result in in only objects to be > returned that are within the min/max ranges. However, that appears not > to be the case. At least the min_offset/max_offset setting appears to > be ignored? > > The code I hacked up is this one: > > http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427 > > I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY > objects for the subvolume I care about. Hence I initialize .min_type > and .max_type to the two types (in the right order), and then > .min_offset and .max_offset to subvolume id. However, the search ioctl > will still give me entries back with offsets != the subvolume id... > > Is this intended behaviour of the search ioctl? If so, what's the > rationale? The search is done linearity; the min_* are the starting point, and the max_* are the ending point; in the past someone gave me this example: if you think in two dimensions, the scan is *not* performed in a rectangular region but in a horizontal area... My ascii art: this is what you are expecting: ............ ..XXXXXXXX.. ..XXXXXXXX.. ..XXXXXXXX.. ............ this is what BTRFS_IOC_TREE_SEARCH returns: ............ ..XXXXXXXXXX XXXXXXXXXXXX XXXXXXXXXX.. ............ > > My code currently invokes the search ioctl in a loop to work around > the fact that .min_offset/.max_offset don't work as I wish they > did... On the best of my (limited) btrfs knowledge, your "workaround" is needed due to the ioctl behavior. > I wish I could get rid of this loop and filtering out of the > entries I get back that aren't in th range I specified... See this thread [1] for what happened to me long time ago > > Lennart Goffreo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [1] http://www.spinics.net/lists/linux-btrfs/msg07641.html -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-07 12:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering
2015-01-05 18:22 ` Hugo Mills
2015-01-05 19:11 ` Lennart Poettering
2015-01-05 19:35 ` Hugo Mills
[not found] ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com>
2015-01-07 12:14 ` Lennart Poettering
2015-01-05 18:54 ` Goffredo Baroncelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).