linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS_IOC_TREE_SEARCH ioctl
@ 2015-01-05 17:15 Lennart Poettering
  2015-01-05 18:22 ` Hugo Mills
  2015-01-05 18:54 ` Goffredo Baroncelli
  0 siblings, 2 replies; 6+ messages in thread
From: Lennart Poettering @ 2015-01-05 17:15 UTC (permalink / raw)
  To: linux-btrfs

Heya,

I recently added some btrfs magic to systemd's machinectl/nspawn
tool. More specifically it can now show the disk usage of a container
that is stored in a btrfs subvolume. For that I made use of the btrfs
quota logic. To read the current disk usage of a subvolume I took
inspiration from btrfs-progs, most specifically the
BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the
ioctl seems to to be lacking, but there are some things about it I
fail to grok:

What precisely are the semantics of the ioctl, regarding the search
key min/max values (the fields of "struct btrfs_ioctl_search_key")? I
kinda assumed that setting them would result in in only objects to be
returned that are within the min/max ranges. However, that appears not
to be the case. At least the min_offset/max_offset setting appears to
be ignored?

The code I hacked up is this one:

http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427

I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY
objects for the subvolume I care about. Hence I initialize .min_type
and .max_type to the two types (in the right order), and then
.min_offset and .max_offset to subvolume id. However, the search ioctl
will still give me entries back with offsets != the subvolume id...

Is this intended behaviour of the search ioctl? If so, what's the
rationale?

My code currently invokes the search ioctl in a loop to work around
the fact that .min_offset/.max_offset don't work as I wish they
did... I wish I could get rid of this loop and filtering out of the
entries I get back that aren't in th range I specified...

Lennart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BTRFS_IOC_TREE_SEARCH ioctl
  2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering
@ 2015-01-05 18:22 ` Hugo Mills
  2015-01-05 19:11   ` Lennart Poettering
  2015-01-05 18:54 ` Goffredo Baroncelli
  1 sibling, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2015-01-05 18:22 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3328 bytes --]

On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote:
> Heya,
> 
> I recently added some btrfs magic to systemd's machinectl/nspawn
> tool. More specifically it can now show the disk usage of a container
> that is stored in a btrfs subvolume. For that I made use of the btrfs
> quota logic. To read the current disk usage of a subvolume I took
> inspiration from btrfs-progs, most specifically the
> BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the
> ioctl seems to to be lacking, but there are some things about it I
> fail to grok:
> 
> What precisely are the semantics of the ioctl, regarding the search
> key min/max values (the fields of "struct btrfs_ioctl_search_key")? I
> kinda assumed that setting them would result in in only objects to be
> returned that are within the min/max ranges. However, that appears not
> to be the case. At least the min_offset/max_offset setting appears to
> be ignored?

   This is an old argument. :)

   Keys have three parts, so it's plausible (but, in this case, wrong)
to consider the space you're searching to be a 3-dimensional space of
(object, type, offset), which seems to be what you're expecting. A
min, max pair would then define an oblong subset of the keyspace from
which to retrieve keys.

   However, that's not actually what's happening. Keys are indexed
within their tree(s) by a concatenation of the items in the key. A
key, therefore, should be thought of as a single 136-bit integer, and
the keys are lexically ordered, (object||type||offset), where "||" is
the concatenation operator. You get every key _lexically ordered_
between the min and max values. This is a superset of the
3-dimensional results above.

   About 3-4 years ago, we see-sawed through several messy patches in
userspace (and at least one in the kernel) before this distinction and
difference in semantics was understood.

> The code I hacked up is this one:
> 
> http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427
> 
> I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY
> objects for the subvolume I care about. Hence I initialize .min_type
> and .max_type to the two types (in the right order), and then
> .min_offset and .max_offset to subvolume id. However, the search ioctl
> will still give me entries back with offsets != the subvolume id...
> 
> Is this intended behaviour of the search ioctl? If so, what's the
> rationale?

   Yes, it is. The rationale is that it's simply walking through the
key values in the tree linearly until the max value is found.

> My code currently invokes the search ioctl in a loop to work around
> the fact that .min_offset/.max_offset don't work as I wish they
> did... I wish I could get rid of this loop and filtering out of the
> entries I get back that aren't in th range I specified...

   You'd have to do this in kernel space if you wanted the 3D
semantics instead of the concatenated semantics. There's no free lunch
here. It might be a good idea for "libbtrfs" (such as it is) to
implement this, as it's a (moderately rare) repeat request.

   Hugo.

-- 
Hugo Mills             | Klytus, I'm bored. What plaything can you offer me
hugo@... carfax.org.uk | today?
http://carfax.org.uk/  |
PGP: 65E74AC0          |                      Ming the Merciless, Flash Gordon

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BTRFS_IOC_TREE_SEARCH ioctl
  2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering
  2015-01-05 18:22 ` Hugo Mills
@ 2015-01-05 18:54 ` Goffredo Baroncelli
  1 sibling, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli @ 2015-01-05 18:54 UTC (permalink / raw)
  To: Lennart Poettering, linux-btrfs

On 2015-01-05 18:15, Lennart Poettering wrote:
> Heya,
> 
> I recently added some btrfs magic to systemd's machinectl/nspawn
> tool. More specifically it can now show the disk usage of a container
> that is stored in a btrfs subvolume. For that I made use of the btrfs
> quota logic. To read the current disk usage of a subvolume I took
> inspiration from btrfs-progs, most specifically the
> BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the
> ioctl seems to to be lacking, but there are some things about it I
> fail to grok:
> 
> What precisely are the semantics of the ioctl, regarding the search
> key min/max values (the fields of "struct btrfs_ioctl_search_key")? I
> kinda assumed that setting them would result in in only objects to be
> returned that are within the min/max ranges. However, that appears not
> to be the case. At least the min_offset/max_offset setting appears to
> be ignored?
> 
> The code I hacked up is this one:
> 
> http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427
> 
> I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY
> objects for the subvolume I care about. Hence I initialize .min_type
> and .max_type to the two types (in the right order), and then
> .min_offset and .max_offset to subvolume id. However, the search ioctl
> will still give me entries back with offsets != the subvolume id...
> 
> Is this intended behaviour of the search ioctl? If so, what's the
> rationale?

The search is done linearity; the min_* are the starting point, and
the max_* are the ending point; in the past someone gave me this example: 
if you think in two dimensions, the scan is *not* performed in a rectangular region but in a horizontal area...

My ascii art: this is what you are expecting:

	............
	..XXXXXXXX..
	..XXXXXXXX..
	..XXXXXXXX..
	............


this is what BTRFS_IOC_TREE_SEARCH returns:

	............
	..XXXXXXXXXX
	XXXXXXXXXXXX
	XXXXXXXXXX..
	............


> 
> My code currently invokes the search ioctl in a loop to work around
> the fact that .min_offset/.max_offset don't work as I wish they
> did... 

On the best of my (limited) btrfs knowledge, your "workaround"
is needed due to the ioctl behavior.

> I wish I could get rid of this loop and filtering out of the
> entries I get back that aren't in th range I specified...

See this thread [1] for what happened to me long time ago

> 
> Lennart

Goffreo

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

[1] http://www.spinics.net/lists/linux-btrfs/msg07641.html
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BTRFS_IOC_TREE_SEARCH ioctl
  2015-01-05 18:22 ` Hugo Mills
@ 2015-01-05 19:11   ` Lennart Poettering
  2015-01-05 19:35     ` Hugo Mills
  0 siblings, 1 reply; 6+ messages in thread
From: Lennart Poettering @ 2015-01-05 19:11 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

On Mon, 05.01.15 18:22, Hugo Mills (hugo@carfax.org.uk) wrote:

> On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote:
> > Heya,
> > 
> > I recently added some btrfs magic to systemd's machinectl/nspawn
> > tool. More specifically it can now show the disk usage of a container
> > that is stored in a btrfs subvolume. For that I made use of the btrfs
> > quota logic. To read the current disk usage of a subvolume I took
> > inspiration from btrfs-progs, most specifically the
> > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the
> > ioctl seems to to be lacking, but there are some things about it I
> > fail to grok:
> > 
> > What precisely are the semantics of the ioctl, regarding the search
> > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I
> > kinda assumed that setting them would result in in only objects to be
> > returned that are within the min/max ranges. However, that appears not
> > to be the case. At least the min_offset/max_offset setting appears to
> > be ignored?
> 
>    This is an old argument. :)
> 
>    Keys have three parts, so it's plausible (but, in this case, wrong)
> to consider the space you're searching to be a 3-dimensional space of
> (object, type, offset), which seems to be what you're expecting. A
> min, max pair would then define an oblong subset of the keyspace from
> which to retrieve keys.
>
>    However, that's not actually what's happening. Keys are indexed
> within their tree(s) by a concatenation of the items in the key. A
> key, therefore, should be thought of as a single 136-bit integer, and
> the keys are lexically ordered, (object||type||offset), where "||" is
> the concatenation operator. You get every key _lexically ordered_
> between the min and max values. This is a superset of the
> 3-dimensional results above.

Ah, I see. Makes sense.

I figure the comments in btrfs.h next to "struct
btrfs_ioctl_search_key" could use some updating in this regard. They
pretty explicitly suggest that the 3 axis were independent and each
eleent individually would be between the respective min/max when
returning...

Ideally the structure would just have two fields called "max", and
"min" or so, of type btrfs_disk_key, right? In that case I figure the
behaviour would have been clear. It's particular confusing that the
disk key fields appear in a different order than otherwise used and
with the min_transid+max_transid in the middle...

Which brings me to my question: how does {min|max}_transid affect the
search result? Is this axis orthogonal or is it neither?

Thanks for the explanations!

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BTRFS_IOC_TREE_SEARCH ioctl
  2015-01-05 19:11   ` Lennart Poettering
@ 2015-01-05 19:35     ` Hugo Mills
       [not found]       ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2015-01-05 19:35 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]

On Mon, Jan 05, 2015 at 08:11:56PM +0100, Lennart Poettering wrote:
> On Mon, 05.01.15 18:22, Hugo Mills (hugo@carfax.org.uk) wrote:
> 
> > On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote:
> > > Heya,
> > > 
> > > I recently added some btrfs magic to systemd's machinectl/nspawn
> > > tool. More specifically it can now show the disk usage of a container
> > > that is stored in a btrfs subvolume. For that I made use of the btrfs
> > > quota logic. To read the current disk usage of a subvolume I took
> > > inspiration from btrfs-progs, most specifically the
> > > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the
> > > ioctl seems to to be lacking, but there are some things about it I
> > > fail to grok:
> > > 
> > > What precisely are the semantics of the ioctl, regarding the search
> > > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I
> > > kinda assumed that setting them would result in in only objects to be
> > > returned that are within the min/max ranges. However, that appears not
> > > to be the case. At least the min_offset/max_offset setting appears to
> > > be ignored?
> > 
> >    This is an old argument. :)
> > 
> >    Keys have three parts, so it's plausible (but, in this case, wrong)
> > to consider the space you're searching to be a 3-dimensional space of
> > (object, type, offset), which seems to be what you're expecting. A
> > min, max pair would then define an oblong subset of the keyspace from
> > which to retrieve keys.
> >
> >    However, that's not actually what's happening. Keys are indexed
> > within their tree(s) by a concatenation of the items in the key. A
> > key, therefore, should be thought of as a single 136-bit integer, and
> > the keys are lexically ordered, (object||type||offset), where "||" is
> > the concatenation operator. You get every key _lexically ordered_
> > between the min and max values. This is a superset of the
> > 3-dimensional results above.
> 
> Ah, I see. Makes sense.
> 
> I figure the comments in btrfs.h next to "struct
> btrfs_ioctl_search_key" could use some updating in this regard. They
> pretty explicitly suggest that the 3 axis were independent and each
> eleent individually would be between the respective min/max when
> returning...
> 
> Ideally the structure would just have two fields called "max", and
> "min" or so, of type btrfs_disk_key, right? In that case I figure the
> behaviour would have been clear. It's particular confusing that the
> disk key fields appear in a different order than otherwise used and
> with the min_transid+max_transid in the middle...

   Yes, it's not exactly the most obvious structure.

> Which brings me to my question: how does {min|max}_transid affect the
> search result? Is this axis orthogonal or is it neither?

   Hmm. Good question. I don't know the answer to that one, I'm
afraid. I _think_ it's orthogonal (since it's not indexed in the same
B-tree structures).

   Hugo.

-- 
Hugo Mills             | What do you give the man who has everything?
hugo@... carfax.org.uk | Penicillin is a good start...
http://carfax.org.uk/  |
PGP: 65E74AC0          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BTRFS_IOC_TREE_SEARCH ioctl
       [not found]       ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com>
@ 2015-01-07 12:14         ` Lennart Poettering
  0 siblings, 0 replies; 6+ messages in thread
From: Lennart Poettering @ 2015-01-07 12:14 UTC (permalink / raw)
  To: Nehemiah Dacres; +Cc: Hugo Mills, linux-btrfs

On Mon, 05.01.15 19:14, Nehemiah Dacres (vivacarlie@gmail.com) wrote:

> Is libbtrfs documented or even stable yet? What stage of development is it
> in anyway? is there a design spec yet?

Note that the code we use in systemd is not based on libbtrfs, we just
call the ioctls directly.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-07 12:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 17:15 BTRFS_IOC_TREE_SEARCH ioctl Lennart Poettering
2015-01-05 18:22 ` Hugo Mills
2015-01-05 19:11   ` Lennart Poettering
2015-01-05 19:35     ` Hugo Mills
     [not found]       ` <CAJSBqdfJ9EpR3AgLFkCEU+yYSPtJTyVvo5r15WaeF1UszQ_3Yg@mail.gmail.com>
2015-01-07 12:14         ` Lennart Poettering
2015-01-05 18:54 ` Goffredo Baroncelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).