Hi Russell & Eric,

Basing on Russell's patch, I was thinking of something like the
attached patch.
However, I'm wondering if the xfs_attr_set_int() change to use
a req_size for non-fitting shortform EA's is worth it - as it is a bit
of a prediction (trying to codify what I was thinking).
Russell, perhaps I should just send in sf_size like you initially intended.
In fact, the more I think about it, I'm more inclined to just pass
in sf_size.

Haven't tested the patch out yet. Just wanted to discuss it a bit.

Cheers,
Tim.

Russell Cattelan wrote:
> Timothy Shimmin wrote:
> 
>> Hi Guys,
>>
>> So just looking at the first part, which as Eric suggested can be 
>> considered
>> on its own.
>>
>> Index: work_gfs/fs/xfs/xfs_attr.c
>> ===================================================================
>> --- work_gfs.orig/fs/xfs/xfs_attr.c    2006-11-21 18:38:27.572949303 
>> -0600
>> +++ work_gfs/fs/xfs/xfs_attr.c    2006-11-21 18:44:51.666033422 -0600
>> @@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
>>      * (inode must not be locked when we call this routine)
>>      */
>>     if (XFS_IFORK_Q(dp) == 0) {
>> -        if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> -            return(error);
>> +        if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
>> +            ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
>> +             (dp->i_d.di_anextents == 0))) {
>> +            /* xfs_bmap_add_attrfork will set the forkoffset based on
>> +             * the size needed, the local attr case needs the size
>> +             * attr plus the size of the hdr, if the size of
>> +             * header is not accounted for initially the forkoffset
>> +             * won't allow enough space, the actually attr add will
>> +             * then be forced out out line to extents
>> +             */
>> +            size += sizeof(xfs_attr_sf_hdr_t);
>> +            if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> +                return(error);
>> +        }
>>     }
>>
>> --On 22 November 2006 10:38:16 AM -0600 Eric Sandeen 
>> <sandeen@sandeen.net> wrote:
>>
>>>> By fixing the initial size calculation at least things like SElinux
>>>> which is adding one attr won't cause the attr segment to flip to 
>>>> extents
>>>> immediately.
>>>> The second attr will cause the flip but not the first one.
>>>
>>>
>>> I'd say this part (fixing up proper space for the initial attr fork 
>>> setup) should probably go in
>>> soon if it gets good reviews (with the removal of the extra tests, as 
>>> we discussed on irc last
>>> night).  I think this proper change stands on its own just fine.
>>>
>>
>> So yeah, as you said in IRC, the brace is in the wrong spot and
>> the di_aformat tests don't make any sense here.
>> Basically, we know that fork offset is zero and therefore that the 
>> di_aformat should be
>> set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
>> As this is the state before we add in an attribute fork.
>> Why we have this initial state as extents, I'm not too sure and 
>> wondered in the
>> past. Maybe because this state is one which doesn't occupy any space 
>> in the literal area.
>> A shortform EA has a header at least.
>>
>> My next concern is that the size that is calculated is presumably 
>> trying to accomodate
>> the shortform EA. However, the calculation is for the sf header and 
>> the space for a
>> a xfs_attr_leaf_name_local with given namelen and valuelen.
>> It would be better to base it on an xfs_attr_sf_entry type.
>> So I think we need to rework this calculation.
>>
>> Which leads me on to the next issue.
>> We don't know what EA form we are going to take,
>> so we can't really assume that it will be shortform.
>> If the EA name or value is big then the EA will go into extents and 
>> could occupy very
>> little room in the inode.
>> With the current & proposed test this could make the bytesfit function 
>> return 0
>> (the offset calculated in bytesfit could also go negative) and
>> then we would set the forkoff back at the old attr1 default.
>> So we might have 1 EA extent in the inode taking little space and yet 
>> setting the forkoff
>> in the middle.
> 
> Yes I agree worst cast scenario is that the inode has reverted to an 
> attr1 split and
> that space is being wasted in the attr portion. By the time an inode has 
> flipped to
> btree mode for di_u how much of a performance hit is really going to be 
> noticed?
> mapping the blocks for that inode is going to take multiple reads.
> Attr2 seems most effective at space optimization for the local and extent
> versions of di_u and probably not so much for btree.
> 
> At least by fixing the size calculation the shortforms that do fit  into 
> di_a will now
> be added inline.  What is happening now the btree is being re factored
> which is probably expensive and the attr is being added as extents since 
> the
> original size used for the btree refactoring wasn't enough.
> 
> So the change to add in the header size will at least make single case
> attrs more efficient since they will now be inline.
> If the attr does not fit inline then worst case the forkoff flips to attr1
> default  or a half and half split.
> 
> Given the cost of refactoring a btree it might be better to have attr1 
> behavior?
> Since di_a will have extra space additional attr adds won't  cause 
> forkoff to
> move and thus won't cause a rebalance of di_u.
> 
> So in thinking about this more does it make sense to actually not
> try to optimize the space needed for di_u when it is a btree?
> Maybe the first time an attr is added simply split the inode
> space doing the rebalance once?
> That would allow for more attrs to be added without rebalancing
> the data btree.
> The other scheme of space optimzation if the root btree node
> is sparse would say sure give more space to di_a but at
> the expense of a reblanace.
> 
> So ya it's a bit of a guessing game.
> 
>>
>> Of course the setting of the forkoff is a bit of a guessing game since 
>> we can't
>> predict the future usage but I think the plan is to set it to the 
>> minimum to fit
>> on a first come first served basis.
>>
>> So I'm thinking that we should set it based on the size of shortform 
>> if that
>> is how it will be stored or to the size taken up by the EA extents -
>> I was initially thinking that this would be 1 extent but with a remote 
>> value
>> block of up to 64K this could in theory be an extent for each fsb of 
>> the value
>> I guess.
>> Have to think about this some more.
>>
>> --Tim
>>