From: Mark Harmstone <mark@harmstone.com>
To: Sun YangKai <sunk67188@gmail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Newbie questions about DIR_ITEM and DIR_INDEX design
Date: Wed, 20 Aug 2025 18:06:22 +0100 [thread overview]
Message-ID: <cefd585a-0934-436d-b65f-5ec729ea99f3@harmstone.com> (raw)
In-Reply-To: <12726025.O9o76ZdvQC@saltykitkat>
Hi Sun,
On 20/08/2025 2.59 pm, Sun YangKai wrote:
> Hello btrfs developers
>
> I am a beginner studying the implementation of btrfs. While examining the
> structures of DIR_ITEM and DIR_INDEX, I noticed that both store similar
> content but differ in their offset values. This led me to two questions:
>
> 1. The offset field in DIR_ITEM is computed as a CRC32 hash of the filename. In
> practice, is there a risk of hash collisions? If so, how does the current
> implementation handle such collisions?
I think the mathematics is something like for a hash of n bits, you need
approximately 2^(n/2) instances for a 50% chance of a collision. So 65,536
filenames in this case.
If we have a collision, we append a second DIR_ITEM after the first, in
the same slot. So it'd only be an issue if you could generate enough hash
collisions to fill a 16KB leaf, but I think coming up with such a list
would be difficult enough in itself.
The collisions in /usr/share/dict/words, if you want to play around with
this yourself:
dolent, Tang
dunghill, fricatrice
accusable, aggrandizement
acarodermatitis, unfallacious
hydrotical, hylomorphist
gratillity, radiobserver
creophagist, machinability
boomdas, Loxiinae
> 2. The offset field in DIR_INDEX appears to be an auto-incrementing number,
> possibly indicating the creation order of entries within a directory. Why is
> this necessary, given that DIR_ITEM already exists?
Well, the collisions are one reason: each dentry is a directory has to have
a unique d_off value. I wouldn't be surprised if it also simplifies the
implementation of readdir.
Mark
> I would greatly appreciate any insights or references to relevant
> documentation or code snippets to help me understand these design choices.
>
> Thank you for your time and guidance.
>
> Best regards,
> Sun YangKai
next prev parent reply other threads:[~2025-08-20 17:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-20 13:59 Newbie questions about DIR_ITEM and DIR_INDEX design Sun YangKai
2025-08-20 17:06 ` Mark Harmstone [this message]
2025-08-21 8:26 ` Sun YangKai
2025-08-21 9:20 ` Mark Harmstone
2025-08-21 10:06 ` Sun YangKai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cefd585a-0934-436d-b65f-5ec729ea99f3@harmstone.com \
--to=mark@harmstone.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sunk67188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox