All of lore.kernel.org
 help / color / mirror / Atom feed
* packing structures and numbers
@ 2008-10-03 11:42 Avi Kivity
  2008-10-03 12:22 ` Chris Mason
  2008-10-03 18:37 ` Zach Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Avi Kivity @ 2008-10-03 11:42 UTC (permalink / raw)
  To: linux-btrfs

I've been reading btrfs's on-disk format, and two things caught my eye

- attribute((packed)) structures everywhere, often with misaligned 
fields.  This conserves space, but can be harmful to in-memory 
performance on some archs.
- le64's everywhere.   This scales nicely, but wastes space.  My home 
directory is unlikely to have more than 4G objects or 4GB extents (let 
alone >2 devices).

I think the two issues can be improved by separating the on-disk format 
and the in-memory structure, and by using uleb128 as the on-disk format 
for numbers.  uleb128 is a variable-length format that encodes 7 bits of 
a number in each byte, using the eighth bit as a stop bit.

So, for example

struct btrfs_disk_key {
    __le64 objectid;
    u8 type;
    __le64 offset;
} __attribute__ ((__packed__));

With 1M objectids, and 1T offsets, this reduces in size from 17 bytes to 
10 bytes.  Most other structures show similar gains.  We can also have 
more than 256 types if the need arises.

There are, off course, disadvantages to switching to uleb128:

- need to write encode and decode functions, which is tedious.  This can 
be automated a la xdr.
- increased cpu utilization for decoding and encoding
- can no longer know the size of the in-memory structures in advance
- it's just wonderful to rewrite the entire disk format so close to 
freezing it

The advantages, IMO, outweigh the disadvantages:

- better packing reduces tree depth and therefore seekage, the most 
important cost on rotating media
- the disk format is infinitely growable
- in-memory format is more efficient for archs which prefer aligned accesses

I'm not volunteering to do this, but please consider this proposal.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-03 11:42 packing structures and numbers Avi Kivity
@ 2008-10-03 12:22 ` Chris Mason
  2008-10-03 13:40   ` Avi Kivity
  2008-10-04  0:22   ` Daniel Phillips
  2008-10-03 18:37 ` Zach Brown
  1 sibling, 2 replies; 12+ messages in thread
From: Chris Mason @ 2008-10-03 12:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-btrfs

On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote:
> I've been reading btrfs's on-disk format, and two things caught my eye
> 
> - attribute((packed)) structures everywhere, often with misaligned 
> fields.  This conserves space, but can be harmful to in-memory 
> performance on some archs.

packed is important to make sure that a given field takes exactly the
same amount of space everywhere, regardless of compiler optimization or
arch.

> - le64's everywhere.   This scales nicely, but wastes space.  My home 
> directory is unlikely to have more than 4G objects or 4GB extents (let 
> alone >2 devices).
> 
> I think the two issues can be improved by separating the on-disk format 
> and the in-memory structure, and by using uleb128 as the on-disk format 
> for numbers.  uleb128 is a variable-length format that encodes 7 bits of 
> a number in each byte, using the eighth bit as a stop bit.
> 

This couldn't be used everywhere, as the array of items headers and keys
need to be a fixed sized the current bin_search code.  The items can be
variable sized but in general they don't have as many le64s.

The place we're really suffering from poor packing is in the extent
allocation tree.

But, most of the space being wasted there is in the item headers and
keys ;)  I'd prefer to address that by consolidating the items stored,
especially for the btree block extents.

-chris



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-03 12:22 ` Chris Mason
@ 2008-10-03 13:40   ` Avi Kivity
  2008-10-04  0:22   ` Daniel Phillips
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2008-10-03 13:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

Chris Mason wrote:
> On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote:
>   
>> I've been reading btrfs's on-disk format, and two things caught my eye
>>
>> - attribute((packed)) structures everywhere, often with misaligned 
>> fields.  This conserves space, but can be harmful to in-memory 
>> performance on some archs.
>>     
>
> packed is important to make sure that a given field takes exactly the
> same amount of space everywhere, regardless of compiler optimization or
> arch.
>   

Yes, of course.

>> - le64's everywhere.   This scales nicely, but wastes space.  My home 
>> directory is unlikely to have more than 4G objects or 4GB extents (let 
>> alone >2 devices).
>>
>> I think the two issues can be improved by separating the on-disk format 
>> and the in-memory structure, and by using uleb128 as the on-disk format 
>> for numbers.  uleb128 is a variable-length format that encodes 7 bits of 
>> a number in each byte, using the eighth bit as a stop bit.
>>
>>     
>
> This couldn't be used everywhere, as the array of items headers and keys
> need to be a fixed sized the current bin_search code.  The items can be
> variable sized but in general they don't have as many le64s.
>
>   

You'd decode the keys and headers before searching.  This of couse 
negates the idea behind a binary search, unless you cache the decoded nodes.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-03 11:42 packing structures and numbers Avi Kivity
  2008-10-03 12:22 ` Chris Mason
@ 2008-10-03 18:37 ` Zach Brown
  2008-10-04  8:31   ` Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Zach Brown @ 2008-10-03 18:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-btrfs

Avi Kivity wrote:
> I've been reading btrfs's on-disk format, and two things caught my eye
> 
> - attribute((packed)) structures everywhere, often with misaligned
> fields.  This conserves space, but can be harmful to in-memory
> performance on some archs.

How harmful?  Do you have any profiles that can even pick this out of
the noise?

- z

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-03 12:22 ` Chris Mason
  2008-10-03 13:40   ` Avi Kivity
@ 2008-10-04  0:22   ` Daniel Phillips
  2008-10-04  0:28     ` Chris Mason
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Phillips @ 2008-10-04  0:22 UTC (permalink / raw)
  To: Chris Mason; +Cc: Avi Kivity, linux-btrfs

On Friday 03 October 2008 05:22, Chris Mason wrote:
> On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote:
> > I've been reading btrfs's on-disk format, and two things caught my eye
> > 
> > - attribute((packed)) structures everywhere, often with misaligned 
> > fields.  This conserves space, but can be harmful to in-memory 
> > performance on some archs.
> 
> packed is important to make sure that a given field takes exactly the
> same amount of space everywhere, regardless of compiler optimization or
> arch.

Amen.  Are we sure that attribute ((packed)) works the same on all
arches?

And what about bitfields, does GCC pack them to the same size on all
arches in this case or take the easy way out as permitted by the
standard?

Regards,

Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04  0:22   ` Daniel Phillips
@ 2008-10-04  0:28     ` Chris Mason
  2008-10-04  0:35       ` Daniel Phillips
  2008-10-04  0:43       ` Daniel Phillips
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Mason @ 2008-10-04  0:28 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Avi Kivity, linux-btrfs

On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote:
> On Friday 03 October 2008 05:22, Chris Mason wrote:
> > On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote:
> > > I've been reading btrfs's on-disk format, and two things caught my eye
> > > 
> > > - attribute((packed)) structures everywhere, often with misaligned 
> > > fields.  This conserves space, but can be harmful to in-memory 
> > > performance on some archs.
> > 
> > packed is important to make sure that a given field takes exactly the
> > same amount of space everywhere, regardless of compiler optimization or
> > arch.
> 
> Amen.  Are we sure that attribute ((packed)) works the same on all
> arches?

As long as you use types with strictly defined size, yes.

> 
> And what about bitfields, does GCC pack them to the same size on all
> arches in this case or take the easy way out as permitted by the
> standard?

I'm not sure, I believe a bit field on a strictly defined size will be
packed the same everywhere. But bitfields + endian conversion are messy,
so I avoid them.

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04  0:28     ` Chris Mason
@ 2008-10-04  0:35       ` Daniel Phillips
  2008-10-04  0:43       ` Daniel Phillips
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Phillips @ 2008-10-04  0:35 UTC (permalink / raw)
  To: Chris Mason; +Cc: Avi Kivity, linux-btrfs

On Friday 03 October 2008 17:28, Chris Mason wrote:
> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote:
> > And what about bitfields, does GCC pack them to the same size on all
> > arches in this case or take the easy way out as permitted by the
> > standard?
> 
> I'm not sure, I believe a bit field on a strictly defined size will be
> packed the same everywhere. But bitfields + endian conversion are messy,
> so I avoid them.

Likewise, sigh.  On the other hand, packed + endian can't really be
avoided.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04  0:28     ` Chris Mason
  2008-10-04  0:35       ` Daniel Phillips
@ 2008-10-04  0:43       ` Daniel Phillips
  2008-10-04  8:27         ` Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Phillips @ 2008-10-04  0:43 UTC (permalink / raw)
  To: Chris Mason; +Cc: Avi Kivity, linux-btrfs

On Friday 03 October 2008 17:28, Chris Mason wrote:
> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote:
> > ...Are we sure that attribute ((packed)) works the same on all
> > arches?
> 
> As long as you use types with strictly defined size, yes.

Just to be a language lawyer about that... int does not have a
strictly defined size, yet we define uint32_t as a typedef of int
on 32 bit arches do we not?

Regards,

Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04  0:43       ` Daniel Phillips
@ 2008-10-04  8:27         ` Avi Kivity
  0 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2008-10-04  8:27 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Chris Mason, linux-btrfs

Daniel Phillips wrote:
> On Friday 03 October 2008 17:28, Chris Mason wrote:
>   
>> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote:
>>     
>>> ...Are we sure that attribute ((packed)) works the same on all
>>> arches?
>>>       
>> As long as you use types with strictly defined size, yes.
>>     
>
> Just to be a language lawyer about that... int does not have a
> strictly defined size, yet we define uint32_t as a typedef of int
> on 32 bit arches do we not?
>   

What else would you define it to?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-03 18:37 ` Zach Brown
@ 2008-10-04  8:31   ` Avi Kivity
  2008-10-04 16:34     ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Avi Kivity @ 2008-10-04  8:31 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-btrfs

Zach Brown wrote:
> Avi Kivity wrote:
>   
>> I've been reading btrfs's on-disk format, and two things caught my eye
>>
>> - attribute((packed)) structures everywhere, often with misaligned
>> fields.  This conserves space, but can be harmful to in-memory
>> performance on some archs.
>>     
>
> How harmful?  Do you have any profiles that can even pick this out of
> the noise?
>   

On those archs that take faults on unaligned accesses it's unlikely to 
be in the noise.  But we could (and should) stick a get_unaligned() in 
the accessor functions.

uleb128 encoding is orthogonal to this issue, however.  It's only about 
getting better density.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04  8:31   ` Avi Kivity
@ 2008-10-04 16:34     ` Andi Kleen
  2008-10-05  5:31       ` Avi Kivity
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2008-10-04 16:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Zach Brown, linux-btrfs

Avi Kivity <avi@redhat.com> writes:

> Zach Brown wrote:
>> Avi Kivity wrote:
>>
>>> I've been reading btrfs's on-disk format, and two things caught my eye
>>>
>>> - attribute((packed)) structures everywhere, often with misaligned
>>> fields.  This conserves space, but can be harmful to in-memory
>>> performance on some archs.
>>>
>>
>> How harmful?  Do you have any profiles that can even pick this out of
>> the noise?
>>
>
> On those archs that take faults on unaligned accesses it's unlikely to
> be in the noise.  But we could (and should) stick a get_unaligned() in
> the accessor functions.

Normally the compiler on such architectures generates special load/store code
for known unaligned types that does not actually fault, but just uses multiple
instructions. That is slower than a normal memory access, but still much
faster than a exception.

You only really get the full exception fault penalty when the compiler
cannot figure out at compile time that a given variable is unaligned.
But with packed it normally assumes that (I think).

-Andi

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: packing structures and numbers
  2008-10-04 16:34     ` Andi Kleen
@ 2008-10-05  5:31       ` Avi Kivity
  0 siblings, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2008-10-05  5:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Zach Brown, linux-btrfs

Andi Kleen wrote:
>> On those archs that take faults on unaligned accesses it's unlikely to
>> be in the noise.  But we could (and should) stick a get_unaligned() in
>> the accessor functions.
>>     
>
> Normally the compiler on such architectures generates special load/store code
> for known unaligned types that does not actually fault, but just uses multiple
> instructions. That is slower than a normal memory access, but still much
> faster than a exception.
>
> You only really get the full exception fault penalty when the compiler
> cannot figure out at compile time that a given variable is unaligned.
> But with packed it normally assumes that (I think)

Sounds reasonable.  In which case the unaligned access issue I raised is 
a red herring.  So using uleb128 or not is down to whether the improved 
packing efficiency is worth the increased complexity; it seems unlikely 
that it is.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-10-05  5:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-03 11:42 packing structures and numbers Avi Kivity
2008-10-03 12:22 ` Chris Mason
2008-10-03 13:40   ` Avi Kivity
2008-10-04  0:22   ` Daniel Phillips
2008-10-04  0:28     ` Chris Mason
2008-10-04  0:35       ` Daniel Phillips
2008-10-04  0:43       ` Daniel Phillips
2008-10-04  8:27         ` Avi Kivity
2008-10-03 18:37 ` Zach Brown
2008-10-04  8:31   ` Avi Kivity
2008-10-04 16:34     ` Andi Kleen
2008-10-05  5:31       ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.