About 'key type for persistent [...] items'

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* About 'key type for persistent [...] items'
@ 2017-11-24 19:16 Hans van Kranenburg
  2017-11-25  1:12 ` Qu Wenruo
  2017-11-28 17:34 ` David Sterba
  0 siblings, 2 replies; 9+ messages in thread
From: Hans van Kranenburg @ 2017-11-24 19:16 UTC (permalink / raw)
  To: linux-btrfs, David Sterba

Hi,

Last week, when implementing the automatic classifier to dynamically
create tree item data objects by key type in python-btrfs, I ran into
the following commits in btrfs-progs:

  commit 8609c8bad68528f668d9ce564b868aa4828107a0
  btrfs-progs: print-tree: factor out temporary_item dump
and
  commit a4b65f00d53deb1b495728dd58253af44fcf70df
  btrfs-progs: print-tree: factor out persistent_item dump

...which are related to kernel...

  commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
  btrfs: introduce key type for persistent permanent items
and
  commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
  btrfs: introduce key type for persistent temporary items

Afaics the goal is to overload types because there can be only 256 in
total. However, I'm missing the design decisions behind the
implementation of it. It's not in the commit messages, and it hasn't
been on the mailing list.

Before, there was an 1:1 mapping from key types to data structures. Now,
with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
which use this type can be using any data structure they want, so it's
some kind of YOLO_ITEM_KEY.

The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
case BTRFS_DEV_STATS_OBJECTID.

However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
0, and I'm storing a btrfs_kebab_item struct indexed at
(BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
will try to parse the data by calling print_dev_stats?

What's the idea behind that? Instead of having the key type field define
the struct and meaning, we now suddenly need the tuple (tree, objectid,
type), and we need all three to determine what's inside the item data?
So, the code in print_tree.c would also need to know about the tree
number and pass that into the different functions.

Am I missing something, or is my observation correct?

Thanks,--
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-24 19:16 About 'key type for persistent [...] items' Hans van Kranenburg
@ 2017-11-25  1:12 ` Qu Wenruo
  2017-11-25  1:44   ` Qu Wenruo
  2017-11-28 17:40   ` David Sterba
  2017-11-28 17:34 ` David Sterba
  1 sibling, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2017-11-25  1:12 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs, David Sterba


[-- Attachment #1.1: Type: text/plain, Size: 3193 bytes --]



On 2017年11月25日 03:16, Hans van Kranenburg wrote:
> Hi,
> 
> Last week, when implementing the automatic classifier to dynamically
> create tree item data objects by key type in python-btrfs, I ran into
> the following commits in btrfs-progs:
> 
>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
>   btrfs-progs: print-tree: factor out temporary_item dump
> and
>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
>   btrfs-progs: print-tree: factor out persistent_item dump
> 
> ...which are related to kernel...
> 
>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
>   btrfs: introduce key type for persistent permanent items
> and
>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
>   btrfs: introduce key type for persistent temporary items
> 
> Afaics the goal is to overload types because there can be only 256 in
> total.

Personally speaking, to overload types, we can easily make different
meanings of type based on tree owner.

> However, I'm missing the design decisions behind the
> implementation of it. It's not in the commit messages, and it hasn't
> been on the mailing list.

Btrfs_tree.h has a good enough description for it.

> 
> Before, there was an 1:1 mapping from key types to data structures. Now,
> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
> which use this type can be using any data structure they want, so it's
> some kind of YOLO_ITEM_KEY.

For PERSISTENT and TEMPORARY item, they use the objectid and type to
determine the real data structure type.

Since in case of existing PERSISTENT/TEMPORARY item usage, their
objectid is not really used for storing data.

> 
> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
> case BTRFS_DEV_STATS_OBJECTID.
> 
> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
> 0, and I'm storing a btrfs_kebab_item struct indexed at
> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
> will try to parse the data by calling print_dev_stats?

In this case, you shouldn't have your "BTRFS_MOUNTON_OBJECTID" assigned
to 0.

> 
> What's the idea behind that? Instead of having the key type field define
> the struct and meaning, we now suddenly need the tuple (tree, objectid,
> type),

Not exactly, it's (objectid, type) only to determine the data structure
type for PERSISTENT/TEMPORARY key type.
Btrfs doesn't (yet) use root to determine the meaning.

So current btrfs-progs works fine.

Thanks,
Qu

> and we need all three to determine what's inside the item data?
> So, the code in print_tree.c would also need to know about the tree
> number and pass that into the different functions.
> 
> Am I missing something, or is my observation correct?
> 
> Thanks,--
> Hans van Kranenburg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-25  1:12 ` Qu Wenruo
@ 2017-11-25  1:44   ` Qu Wenruo
  2017-11-28 17:40   ` David Sterba
  1 sibling, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2017-11-25  1:44 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs, David Sterba


[-- Attachment #1.1: Type: text/plain, Size: 3807 bytes --]



On 2017年11月25日 09:12, Qu Wenruo wrote:
> 
> 
> On 2017年11月25日 03:16, Hans van Kranenburg wrote:
>> Hi,
>>
>> Last week, when implementing the automatic classifier to dynamically
>> create tree item data objects by key type in python-btrfs, I ran into
>> the following commits in btrfs-progs:
>>
>>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
>>   btrfs-progs: print-tree: factor out temporary_item dump
>> and
>>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
>>   btrfs-progs: print-tree: factor out persistent_item dump
>>
>> ...which are related to kernel...
>>
>>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
>>   btrfs: introduce key type for persistent permanent items
>> and
>>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
>>   btrfs: introduce key type for persistent temporary items
>>
>> Afaics the goal is to overload types because there can be only 256 in
>> total.
> 
> Personally speaking, to overload types, we can easily make different
> meanings of type based on tree owner.
> 
>> However, I'm missing the design decisions behind the
>> implementation of it. It's not in the commit messages, and it hasn't
>> been on the mailing list.
> 
> Btrfs_tree.h has a good enough description for it.
> 
>>
>> Before, there was an 1:1 mapping from key types to data structures. Now,
>> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
>> which use this type can be using any data structure they want, so it's
>> some kind of YOLO_ITEM_KEY.
> 
> For PERSISTENT and TEMPORARY item, they use the objectid and type to
> determine the real data structure type.
> 
> Since in case of existing PERSISTENT/TEMPORARY item usage, their
> objectid is not really used for storing data.

PS: PERSISTENT/TEMPORARY keys are not the only keys where only
key->offset is really used.

CHUNK_ITME also has a fixed objecitd, BTRFS_FIRST_CHUNK_TREE_OBJECTID.

And QGROUP_STATUS is even a step further, with 0 as both objectid and
offset.

So from this point, such PERSISTENT/TEMPORARY design doesn't really
follow the existing type design.
It may only help for later expansion, but I doubt if we will use such
schema.

Thanks,
Qu

> 
>>
>> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
>> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
>> case BTRFS_DEV_STATS_OBJECTID.
>>
>> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
>> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
>> 0, and I'm storing a btrfs_kebab_item struct indexed at
>> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
>> will try to parse the data by calling print_dev_stats?
> 
> In this case, you shouldn't have your "BTRFS_MOUNTON_OBJECTID" assigned
> to 0.
> 
>>
>> What's the idea behind that? Instead of having the key type field define
>> the struct and meaning, we now suddenly need the tuple (tree, objectid,
>> type),
> 
> Not exactly, it's (objectid, type) only to determine the data structure
> type for PERSISTENT/TEMPORARY key type.
> Btrfs doesn't (yet) use root to determine the meaning.
> 
> So current btrfs-progs works fine.
> 
> Thanks,
> Qu
> 
>> and we need all three to determine what's inside the item data?
>> So, the code in print_tree.c would also need to know about the tree
>> number and pass that into the different functions.
>>
>> Am I missing something, or is my observation correct?
>>
>> Thanks,--
>> Hans van Kranenburg
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-24 19:16 About 'key type for persistent [...] items' Hans van Kranenburg
  2017-11-25  1:12 ` Qu Wenruo
@ 2017-11-28 17:34 ` David Sterba
  2017-11-28 18:00   ` Hans van Kranenburg
  1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2017-11-28 17:34 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs, David Sterba

On Fri, Nov 24, 2017 at 08:16:05PM +0100, Hans van Kranenburg wrote:
> Last week, when implementing the automatic classifier to dynamically
> create tree item data objects by key type in python-btrfs, I ran into
> the following commits in btrfs-progs:
> 
>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
>   btrfs-progs: print-tree: factor out temporary_item dump
> and
>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
>   btrfs-progs: print-tree: factor out persistent_item dump
> 
> ...which are related to kernel...
> 
>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
>   btrfs: introduce key type for persistent permanent items
> and
>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
>   btrfs: introduce key type for persistent temporary items
> 
> Afaics the goal is to overload types because there can be only 256 in
> total. However, I'm missing the design decisions behind the
> implementation of it. It's not in the commit messages, and it hasn't
> been on the mailing list.

The reason is avoid wasting key types but still allow to store new types
of data to the btrees, if they were not part of the on-disk format.

I'm not sure if this has been discussed or mentioned under some patches
or maybe unrelated patches. I do remember that I discussed that with
Chris in private on IRC and have the logs, dated 2015-09-02.

At that time the balance item and dev stats item were introduced, maybe
also the qgroup status item type. This had me alarmed enough to
reconsider how the keys are allocated.

> Before, there was an 1:1 mapping from key types to data structures. Now,
> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
> which use this type can be using any data structure they want, so it's
> some kind of YOLO_ITEM_KEY.

In some sense it is, so it's key+objectid to determine the structure.

> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
> case BTRFS_DEV_STATS_OBJECTID.
> 
> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
> 0, and I'm storing a btrfs_kebab_item struct indexed at
> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
> will try to parse the data by calling print_dev_stats?

As answered by Qu, you can't use 0 for BTRFS_MOUTON_OBJECTID in that
case.

> What's the idea behind that? Instead of having the key type field define
> the struct and meaning, we now suddenly need the tuple (tree, objectid,
> type), and we need all three to determine what's inside the item data?
> So, the code in print_tree.c would also need to know about the tree
> number and pass that into the different functions.

No, all key types, even the persistent/temporary are independent of the
tree type. So it's only type <-> structure mapping, besides
persistent/temporary types.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-25  1:12 ` Qu Wenruo
  2017-11-25  1:44   ` Qu Wenruo
@ 2017-11-28 17:40   ` David Sterba
  1 sibling, 0 replies; 9+ messages in thread
From: David Sterba @ 2017-11-28 17:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hans van Kranenburg, linux-btrfs, David Sterba

On Sat, Nov 25, 2017 at 09:12:35AM +0800, Qu Wenruo wrote:
> On 2017年11月25日 03:16, Hans van Kranenburg wrote:
> > Last week, when implementing the automatic classifier to dynamically
> > create tree item data objects by key type in python-btrfs, I ran into
> > the following commits in btrfs-progs:
> > 
> >   commit 8609c8bad68528f668d9ce564b868aa4828107a0
> >   btrfs-progs: print-tree: factor out temporary_item dump
> > and
> >   commit a4b65f00d53deb1b495728dd58253af44fcf70df
> >   btrfs-progs: print-tree: factor out persistent_item dump
> > 
> > ...which are related to kernel...
> > 
> >   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
> >   btrfs: introduce key type for persistent permanent items
> > and
> >   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
> >   btrfs: introduce key type for persistent temporary items
> > 
> > Afaics the goal is to overload types because there can be only 256 in
> > total.
> 
> Personally speaking, to overload types, we can easily make different
> meanings of type based on tree owner.

Key type tied to a tree would cause some ambiguity, while now we always
know what's in the type item if we eg. read a random metadata block.

We're now at ~40 key types, scattered in the 256 range so further
extensions are possible should we need a particular number due to the
ordering.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-28 17:34 ` David Sterba
@ 2017-11-28 18:00   ` Hans van Kranenburg
  2017-11-28 19:12     ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-11-28 18:00 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 11/28/2017 06:34 PM, David Sterba wrote:
> On Fri, Nov 24, 2017 at 08:16:05PM +0100, Hans van Kranenburg wrote:
>> Last week, when implementing the automatic classifier to dynamically
>> create tree item data objects by key type in python-btrfs, I ran into
>> the following commits in btrfs-progs:
>>
>>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
>>   btrfs-progs: print-tree: factor out temporary_item dump
>> and
>>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
>>   btrfs-progs: print-tree: factor out persistent_item dump
>>
>> ...which are related to kernel...
>>
>>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
>>   btrfs: introduce key type for persistent permanent items
>> and
>>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
>>   btrfs: introduce key type for persistent temporary items
>>
>> Afaics the goal is to overload types because there can be only 256 in
>> total. However, I'm missing the design decisions behind the
>> implementation of it. It's not in the commit messages, and it hasn't
>> been on the mailing list.
> 
> The reason is avoid wasting key types but still allow to store new types
> of data to the btrees, if they were not part of the on-disk format.
> 
> I'm not sure if this has been discussed or mentioned under some patches
> or maybe unrelated patches. I do remember that I discussed that with
> Chris in private on IRC and have the logs, dated 2015-09-02.
> 
> At that time the balance item and dev stats item were introduced, maybe
> also the qgroup status item type. This had me alarmed enough to
> reconsider how the keys are allocated.
> 
>> Before, there was an 1:1 mapping from key types to data structures. Now,
>> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
>> which use this type can be using any data structure they want, so it's
>> some kind of YOLO_ITEM_KEY.
> 
> In some sense it is, so it's key+objectid to determine the structure.
> 
>> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
>> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
>> case BTRFS_DEV_STATS_OBJECTID.
>>
>> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
>> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
>> 0, and I'm storing a btrfs_kebab_item struct indexed at
>> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
>> will try to parse the data by calling print_dev_stats?
> 
> As answered by Qu, you can't use 0 for BTRFS_MOUTON_OBJECTID in that
> case.

(I'm just thinking out loud here, if you think I'm wasting your time
just say.)

Yes, so the objectid numbers have to be "registered" / "reserved" in the
documentation, and they have to be unique over all trees.

Maybe the information I was looking for is... in what cases should or
shouldn't this be used? Because that limits the possible usage quite a
bit. Or is it only for very very specific things.

E.g. if I wanted to (just a random idea) add per device statistics, and
use this, I'd need to use the key also with objectid 1, 2, 3, etc... if
I have multiple devices. That's already a no go if there's anyone in any
other tree that is doing anything with any objectid in the range of
valid device numbers.

>> What's the idea behind that? Instead of having the key type field define
>> the struct and meaning, we now suddenly need the tuple (tree, objectid,
>> type), and we need all three to determine what's inside the item data?
>> So, the code in print_tree.c would also need to know about the tree
>> number and pass that into the different functions.
> 
> No, all key types, even the persistent/temporary are independent of the
> tree type. So it's only type <-> structure mapping, besides
> persistent/temporary types.

Yeah, I wasn't explicit about that, I meant only for the
persistent/temporary case yes.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-28 18:00   ` Hans van Kranenburg
@ 2017-11-28 19:12     ` David Sterba
  2017-11-28 19:28       ` Hans van Kranenburg
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2017-11-28 19:12 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: dsterba, linux-btrfs

On Tue, Nov 28, 2017 at 07:00:28PM +0100, Hans van Kranenburg wrote:
> On 11/28/2017 06:34 PM, David Sterba wrote:
> > On Fri, Nov 24, 2017 at 08:16:05PM +0100, Hans van Kranenburg wrote:
> >> Last week, when implementing the automatic classifier to dynamically
> >> create tree item data objects by key type in python-btrfs, I ran into
> >> the following commits in btrfs-progs:
> >>
> >>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
> >>   btrfs-progs: print-tree: factor out temporary_item dump
> >> and
> >>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
> >>   btrfs-progs: print-tree: factor out persistent_item dump
> >>
> >> ...which are related to kernel...
> >>
> >>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
> >>   btrfs: introduce key type for persistent permanent items
> >> and
> >>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
> >>   btrfs: introduce key type for persistent temporary items
> >>
> >> Afaics the goal is to overload types because there can be only 256 in
> >> total. However, I'm missing the design decisions behind the
> >> implementation of it. It's not in the commit messages, and it hasn't
> >> been on the mailing list.
> > 
> > The reason is avoid wasting key types but still allow to store new types
> > of data to the btrees, if they were not part of the on-disk format.
> > 
> > I'm not sure if this has been discussed or mentioned under some patches
> > or maybe unrelated patches. I do remember that I discussed that with
> > Chris in private on IRC and have the logs, dated 2015-09-02.
> > 
> > At that time the balance item and dev stats item were introduced, maybe
> > also the qgroup status item type. This had me alarmed enough to
> > reconsider how the keys are allocated.
> > 
> >> Before, there was an 1:1 mapping from key types to data structures. Now,
> >> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
> >> which use this type can be using any data structure they want, so it's
> >> some kind of YOLO_ITEM_KEY.
> > 
> > In some sense it is, so it's key+objectid to determine the structure.
> > 
> >> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
> >> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
> >> case BTRFS_DEV_STATS_OBJECTID.
> >>
> >> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
> >> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
> >> 0, and I'm storing a btrfs_kebab_item struct indexed at
> >> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
> >> will try to parse the data by calling print_dev_stats?
> > 
> > As answered by Qu, you can't use 0 for BTRFS_MOUTON_OBJECTID in that
> > case.
> 
> (I'm just thinking out loud here, if you think I'm wasting your time
> just say.)
> 
> Yes, so the objectid numbers have to be "registered" / "reserved" in the
> documentation, and they have to be unique over all trees.

Right.

> Maybe the information I was looking for is... in what cases should or
> shouldn't this be used? Because that limits the possible usage quite a
> bit. Or is it only for very very specific things.

The keys are not free for use, they need to have a defined meaning and
are sort of part of the on-disk format. There must be some usecase and
reasoning why it's necessary to be done that way, and not in another.
Like xattr.

I don't have an example now, but I could imagine some per-tree
information that can be tracked and updated at commit time.

> E.g. if I wanted to (just a random idea) add per device statistics, and
> use this, I'd need to use the key also with objectid 1, 2, 3, etc... if
> I have multiple devices. That's already a no go if there's anyone in any
> other tree that is doing anything with any objectid in the range of
> valid device numbers.

In that case there would be a new objectid PER_DEVICE_OBJECTID, with the
persistent key, and all the device ids can go to the offset field. The
objectid field should not be dynamic, by design.

> >> What's the idea behind that? Instead of having the key type field define
> >> the struct and meaning, we now suddenly need the tuple (tree, objectid,
> >> type), and we need all three to determine what's inside the item data?
> >> So, the code in print_tree.c would also need to know about the tree
> >> number and pass that into the different functions.
> > 
> > No, all key types, even the persistent/temporary are independent of the
> > tree type. So it's only type <-> structure mapping, besides
> > persistent/temporary types.
> 
> Yeah, I wasn't explicit about that, I meant only for the
> persistent/temporary case yes.

So for this case it's type + objectid, the tree independence stays.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-28 19:12     ` David Sterba
@ 2017-11-28 19:28       ` Hans van Kranenburg
  2017-11-28 19:32         ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-11-28 19:28 UTC (permalink / raw)
  To: dsterba, linux-btrfs

Ok, just want to add one more thing :)

On 11/28/2017 08:12 PM, David Sterba wrote:
> On Tue, Nov 28, 2017 at 07:00:28PM +0100, Hans van Kranenburg wrote:
>> On 11/28/2017 06:34 PM, David Sterba wrote:
>>> On Fri, Nov 24, 2017 at 08:16:05PM +0100, Hans van Kranenburg wrote:
>>>> Last week, when implementing the automatic classifier to dynamically
>>>> create tree item data objects by key type in python-btrfs, I ran into
>>>> the following commits in btrfs-progs:
>>>>
>>>>   commit 8609c8bad68528f668d9ce564b868aa4828107a0
>>>>   btrfs-progs: print-tree: factor out temporary_item dump
>>>> and
>>>>   commit a4b65f00d53deb1b495728dd58253af44fcf70df
>>>>   btrfs-progs: print-tree: factor out persistent_item dump
>>>>
>>>> ...which are related to kernel...
>>>>
>>>>   commit 50c2d5abe64c1726b48d292a2ab04f60e8238933
>>>>   btrfs: introduce key type for persistent permanent items
>>>> and
>>>>   commit 0bbbccb17fea86818e1a058faf5903aefd20b31a
>>>>   btrfs: introduce key type for persistent temporary items
>>>>
>>>> Afaics the goal is to overload types because there can be only 256 in
>>>> total. However, I'm missing the design decisions behind the
>>>> implementation of it. It's not in the commit messages, and it hasn't
>>>> been on the mailing list.
>>>
>>> The reason is avoid wasting key types but still allow to store new types
>>> of data to the btrees, if they were not part of the on-disk format.
>>>
>>> I'm not sure if this has been discussed or mentioned under some patches
>>> or maybe unrelated patches. I do remember that I discussed that with
>>> Chris in private on IRC and have the logs, dated 2015-09-02.
>>>
>>> At that time the balance item and dev stats item were introduced, maybe
>>> also the qgroup status item type. This had me alarmed enough to
>>> reconsider how the keys are allocated.
>>>
>>>> Before, there was an 1:1 mapping from key types to data structures. Now,
>>>> with the new PERSISTENT_ITEM_KEY and TEMPORARY_ITEM_KEY, it seems items
>>>> which use this type can be using any data structure they want, so it's
>>>> some kind of YOLO_ITEM_KEY.
>>>
>>> In some sense it is, so it's key+objectid to determine the structure.
>>>
>>>> The print-tree code in progs 8609c8b and a4b65f0 seems incomplete. For
>>>> example, for the PERSISTENT_ITEM_KEY, there's a switch (objectid) with
>>>> case BTRFS_DEV_STATS_OBJECTID.
>>>>
>>>> However, BTRFS_DEV_STATS_OBJECTID is just the value 0. So, that means
>>>> that if I want to have another tree where BTRFS_MOUTON_OBJECTID is also
>>>> 0, and I'm storing a btrfs_kebab_item struct indexed at
>>>> (BTRFS_MOUTON_OBJECTID, PERSISTENT_ITEM_KEY, 31337), then print_tree.c
>>>> will try to parse the data by calling print_dev_stats?
>>>
>>> As answered by Qu, you can't use 0 for BTRFS_MOUTON_OBJECTID in that
>>> case.
>>
>> (I'm just thinking out loud here, if you think I'm wasting your time
>> just say.)
>>
>> Yes, so the objectid numbers have to be "registered" / "reserved" in the
>> documentation, and they have to be unique over all trees.
> 
> Right.
> 
>> Maybe the information I was looking for is... in what cases should or
>> shouldn't this be used? Because that limits the possible usage quite a
>> bit. Or is it only for very very specific things.
> 
> The keys are not free for use, they need to have a defined meaning and
> are sort of part of the on-disk format. There must be some usecase and
> reasoning why it's necessary to be done that way, and not in another.
> Like xattr.
> 
> I don't have an example now, but I could imagine some per-tree
> information that can be tracked and updated at commit time.
> 
>> E.g. if I wanted to (just a random idea) add per device statistics, and
>> use this, I'd need to use the key also with objectid 1, 2, 3, etc... if
>> I have multiple devices. That's already a no go if there's anyone in any
>> other tree that is doing anything with any objectid in the range of
>> valid device numbers.
> 
> In that case there would be a new objectid PER_DEVICE_OBJECTID, with the
> persistent key, and all the device ids can go to the offset field. The
> objectid field should not be dynamic, by design.

Ok, didn't think of that yet, clear.

So... just a random idea...

How cool would it be to have the block group items being done this
way... PER_CHUNK_OBJECTID.

Imagine the time to mount a large btrfs filesystem going down from
minutes or tens of minutes to just a few seconds...

\:D/

>>>> What's the idea behind that? Instead of having the key type field define
>>>> the struct and meaning, we now suddenly need the tuple (tree, objectid,
>>>> type), and we need all three to determine what's inside the item data?
>>>> So, the code in print_tree.c would also need to know about the tree
>>>> number and pass that into the different functions.
>>>
>>> No, all key types, even the persistent/temporary are independent of the
>>> tree type. So it's only type <-> structure mapping, besides
>>> persistent/temporary types.
>>
>> Yeah, I wasn't explicit about that, I meant only for the
>> persistent/temporary case yes.
> 
> So for this case it's type + objectid, the tree independence stays.

Clear, thanks.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: About 'key type for persistent [...] items'
  2017-11-28 19:28       ` Hans van Kranenburg
@ 2017-11-28 19:32         ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2017-11-28 19:32 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: dsterba, linux-btrfs

On Tue, Nov 28, 2017 at 08:28:58PM +0100, Hans van Kranenburg wrote:
> >> E.g. if I wanted to (just a random idea) add per device statistics, and
> >> use this, I'd need to use the key also with objectid 1, 2, 3, etc... if
> >> I have multiple devices. That's already a no go if there's anyone in any
> >> other tree that is doing anything with any objectid in the range of
> >> valid device numbers.
> > 
> > In that case there would be a new objectid PER_DEVICE_OBJECTID, with the
> > persistent key, and all the device ids can go to the offset field. The
> > objectid field should not be dynamic, by design.
> 
> Ok, didn't think of that yet, clear.
> 
> So... just a random idea...

You're not the first one to have that idea :)

> How cool would it be to have the block group items being done this
> way... PER_CHUNK_OBJECTID.
> 
> Imagine the time to mount a large btrfs filesystem going down from
> minutes or tens of minutes to just a few seconds...

Exactly for that reason, but in such case we're allowed to grab a new
key type and set objectid to a value that would group all the bg items,
mayb also moving the to the beginning of the key space.

The slightly tricky part would be the backward compatibility when we'd
have to keep the existing bg and new-bg in sync, but there are no
fundamental obstacles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-28 19:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-24 19:16 About 'key type for persistent [...] items' Hans van Kranenburg
2017-11-25  1:12 ` Qu Wenruo
2017-11-25  1:44   ` Qu Wenruo
2017-11-28 17:40   ` David Sterba
2017-11-28 17:34 ` David Sterba
2017-11-28 18:00   ` Hans van Kranenburg
2017-11-28 19:12     ` David Sterba
2017-11-28 19:28       ` Hans van Kranenburg
2017-11-28 19:32         ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).