[LSF/MM/BPF TOPIC] Should we make inode->i

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64?
@ 2026-02-18 15:36 Jeff Layton
  2026-02-19 14:31 ` Christian Brauner
  0 siblings, 1 reply; 2+ messages in thread
From: Jeff Layton @ 2026-02-18 15:36 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-fsdevel, Christian Brauner, Christoph Hellwig, Jan Kara,
	Al Viro

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

For historical reasons, the inode->i_ino field is an unsigned long.
Because it's only 32 bits on 32-bit CPUs, this has caused a lot of fs-
specific hacks on filesystems that have native 64-bit inode numbers
when running a 32-bit arch.

It would be a lot simpler if we just converted i_ino to be 64-bits and
dealt with the conversion at the kernel's edges. This would be a non-
event for the most part on 64-bit arches since unsigned long is already
64 bits there.

The kernel itself doesn't deal much with i_ino, so the internal changes
look fairly straightforward. The bulk of the patches will be to format
strings and to tracepoints.

I think that the biggest problem will be that this will grow struct
inode on 32-bit arches by at least 4 bytes. That may have cacheline
alignment and slab sizing implications. We're actively talking about
deprecating 32-bit arches in the future however, so maybe we can
rationalize that away.

FWIW, I had Claude spin up a plan to do this (attached). It's not bad.
I'm tempted to tell it generate patches for this, since this is mostly
a mechanical change, but I'm curious whether anyone else might have
reasons that we shouldn't go ahead and do it.
-- 
Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: lazy-swimming-alpaca.md --]
[-- Type: text/markdown, Size: 12037 bytes --]

# Plan: Change `i_ino` in `struct inode` from `unsigned long` to `u64`

## Motivation

On 32-bit architectures, `unsigned long` is 32 bits, limiting inode numbers
to 2^32. Several filesystems (NFS, CIFS, XFS, Ceph, FUSE) natively use 64-bit
inode numbers and must hash/fold them to fit `ino_t`/`unsigned long` on 32-bit.
The VFS intermediate structure `kstat.ino` is already `u64`, and `statx.stx_ino`
is already `__u64`. Making `i_ino` itself `u64` eliminates a type mismatch at
the VFS core and removes the need for filesystem-specific folding hacks on
32-bit architectures.

On 64-bit architectures, `unsigned long` is already 64 bits, so this change is
effectively a type alias change with no runtime impact (only format string and
type signature cleanups).

## Scope Assessment

This is a tree-wide change touching hundreds of files. It should be broken into
a series of patches, each handling one logical subsystem or concern. The series
should be structured so that the core type change is a single commit, preceded
by preparatory patches and followed by cleanups.

## Phase 1: Preparatory Patches (before the type change)

### 1.1 Update VFS inode hash/lookup API signatures

**Files:**
- `include/linux/fs.h` — function declarations
- `fs/inode.c` — function definitions

**Functions to update** (change `unsigned long` ino/hashval params to `u64`):

| Function | File:Line | Change |
|----------|-----------|--------|
| `hash()` (static) | `fs/inode.c:675` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_fast()` | `fs/inode.c:1089` | `unsigned long ino` → `u64 ino` |
| `find_inode_by_ino_rcu()` | `fs/inode.c:1814` | `unsigned long ino` → `u64 ino` |
| `test_inode_iunique()` | `fs/inode.c:1530` | `unsigned long ino` → `u64 ino` |
| `__insert_inode_hash()` | `fs/inode.c:693`, `include/linux/fs.h:3018` | `unsigned long hashval` → `u64 hashval` |
| `iget_locked()` | `fs/inode.c:1458`, `include/linux/fs.h` | `unsigned long ino` → `u64 ino` |
| `iget5_locked()` | `fs/inode.c:1381`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `iget5_locked_rcu()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `ilookup()` | `fs/inode.c:1680`, `include/linux/fs.h` | `unsigned long ino` → `u64 ino` |
| `ilookup5()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `ilookup5_nowait()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_nowait()` | `fs/inode.c:1728`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_rcu()` | `include/linux/fs.h:2960` | `unsigned long` → `u64` |
| `inode_insert5()` | `fs/inode.c:1304`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `insert_inode_locked4()` | `fs/inode.c:1887`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |

**Note:** `insert_inode_locked()` at `fs/inode.c:1833` uses `ino_t ino = inode->i_ino;`
internally. This local variable should change to `u64` (or just use `inode->i_ino`
directly).

The `hash()` function currently takes `unsigned long hashval`. On 32-bit, when
`i_ino` becomes `u64`, we need the hash function to handle 64-bit values. Consider
using `hash_64()` or folding the upper 32 bits on 32-bit architectures:
```c
static unsigned long hash(struct super_block *sb, u64 hashval)
{
    unsigned long tmp;
    tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES;
    ...
}
```

This patch can be done first because `unsigned long` implicitly converts to `u64`
on all architectures (widening), so changing function signatures to `u64` is
backward-compatible even before `i_ino` itself changes.

### 1.2 Update `ino_t` typedef (optional, separate discussion)

**File:** `include/linux/types.h:21`

Currently: `typedef __kernel_ulong_t ino_t;`

Options:
- **Option A**: Change `ino_t` to `u64` — this is the cleanest approach but
  affects every use of `ino_t` across the tree. Many local variables use `ino_t`.
- **Option B**: Leave `ino_t` as-is and only change `i_ino` — creates a type
  mismatch between `i_ino` (u64) and `ino_t` (unsigned long) on 32-bit.
- **Recommendation**: Option A is preferred for consistency, but this is a
  separate patch and a separate discussion with the community. The `i_ino`
  change can proceed without it by using `u64` directly.

**Note:** `__kernel_ino_t` in UAPI headers MUST NOT change — it's userspace ABI.

## Phase 2: The Core Type Change

### 2.1 Change `i_ino` field type

**File:** `include/linux/fs.h:786`

```c
// Before:
unsigned long       i_ino;

// After:
u64                 i_ino;
```

This single change will cause compiler warnings across the tree for format
string mismatches (`%lu` vs `u64`) on 64-bit architectures (where `u64` is
`unsigned long long` but `unsigned long` is also 64-bit). On 32-bit, it will
cause actual width changes.

### 2.2 Struct size impact on 32-bit

`struct inode` will grow by 4 bytes on 32-bit architectures due to `i_ino`
expanding from 4 to 8 bytes. The field may also require 8-byte alignment,
potentially adding padding. Review the field ordering in `struct inode` to
minimize padding impact.

## Phase 3: Tree-Wide Format String Fixes

### 3.1 Strategy for format strings

**Recommended approach**: Use `%llu` with a `(unsigned long long)` cast, which
is the standard kernel pattern for printing `u64` values portably. Do NOT
introduce a new format helper macro — the kernel community generally prefers
explicit types over macros for printk formats.

Pattern to search and replace:
```
%lu.*i_ino  →  %llu with (unsigned long long) cast
%lx.*i_ino  →  %llx with (unsigned long long) cast
```

### 3.2 Trace events (highest volume of changes)

Each trace event file should be a separate patch:

| File | Est. Changes | Description |
|------|-------------|-------------|
| `include/trace/events/ext4.h` | ~100+ sites | Change `__field(ino_t, ino)` → `__field(u64, ino)`, update all `%lu` → `%llu`, remove `(unsigned long)` casts |
| `include/trace/events/f2fs.h` | ~50+ sites | Update `show_dev_ino()` macro and all trace events |
| `include/trace/events/writeback.h` | ~15+ sites | Update all writeback trace events |
| `include/trace/events/filelock.h` | ~10 sites | Change `__field(unsigned long, i_ino)` → `__field(u64, i_ino)`, `%lx` → `%llx` |

**Important:** Changing trace event field types changes the binary trace format.
This will affect `trace-cmd` and `perf` trace parsing. This is acceptable but
should be noted in commit messages.

### 3.3 Filesystem-specific format strings

Each filesystem should be a separate patch:

| Filesystem | Key Files | Notes |
|-----------|-----------|-------|
| ext4 | `fs/ext4/super.c`, `fs/ext4/ext4.h` | `__ext4_error_inode()`, `__ext4_warning_inode()`, `__ext4_grp_locked_error()` — change `%lu` → `%llu`, update `unsigned long ino` parameters |
| btrfs | Various | May already use `u64` internally |
| f2fs | Various | Fix format strings |
| Other fs | Various | Grep for `i_ino` with `%lu` in each |

### 3.4 /proc interfaces (ABI-sensitive)

| File | Interface | Current | Notes |
|------|-----------|---------|-------|
| `fs/locks.c:2899` | `/proc/locks` | `%lu` for `inode->i_ino` | Change to `%llu`, values stay the same on 64-bit; on 32-bit, wider numbers possible but no parsers should break since field is positional/decimal |
| `fs/proc/task_mmu.c:468` | `/proc/PID/maps` | `unsigned long ino = 0;` → assigned from `inode->i_ino` | Change local var to `u64`, `seq_put_decimal_ull()` already accepts `unsigned long long` |

## Phase 4: Filesystem-Specific Inode Number Handling

### 4.1 NFS

**Files:**
- `include/linux/nfs_fs.h:670-677` — `nfs_fileid_to_ino_t()`
- `fs/nfs/inode.c:69-73` — `nfs_fattr_to_ino_t()`
- `fs/nfs/inode.c:93-107` — `nfs_compat_user_ino64()`

If `i_ino` is `u64`, NFS can store the full 64-bit fileid in `i_ino` without
folding. The `nfs_fileid_to_ino_t()` function could be simplified or its callers
updated to assign directly. However, the hash value passed to `iget5_locked()`
may still need folding for hash distribution on 32-bit — this is separate from
`i_ino` storage.

**Approach**: Update `nfs_fattr_to_ino_t()` to return `u64` and store the full
fileid. Keep hash folding only for the hash parameter to `iget5_locked()`.

### 4.2 CIFS/SMB

**File:** `fs/smb/client/cifsfs.h:16-29` — `cifs_uniqueid_to_ino_t()`

Same pattern as NFS. With `u64 i_ino`, CIFS can store the full 64-bit uniqueid
without hashing. Update to return `u64`.

**File:** `fs/smb/client/inode.c:1614-1623` — update hash variable type.

### 4.3 Other filesystems

Filesystems that already use 64-bit inode numbers internally (XFS, Ceph, FUSE)
may benefit from simplified code paths. Each is a separate cleanup patch.

## Phase 5: Related Type Updates

### 5.1 `get_next_ino()` (`fs/inode.c:1145`)

Returns `unsigned int` (32-bit). Used by pseudo-filesystems (tmpfs, sysfs,
debugfs, etc.). This is deliberately limited to 32 bits to avoid EOVERFLOW from
`stat()` on 32-bit userspace. **No change needed** — widening to `u64` storage
is fine; the values still fit.

### 5.2 `iunique()` (`fs/inode.c:1556`)

Uses `static unsigned int counter`. Returns `ino_t`. Same consideration as
`get_next_ino()` — keeps values in 32-bit range for compat. **No change needed**
for the counter, but return type should follow `ino_t` (which may or may not
change per Phase 1.2).

### 5.3 `is_zero_ino()` (`include/linux/fs.h:2986`)

Already casts to `(u32)` explicitly. **No change needed.**

## Patch Series Structure (Recommended Ordering)

```
1/N  vfs: widen inode hash/lookup functions to take u64
2/N  vfs: change i_ino from unsigned long to u64
3/N  tracefs: update writeback trace events for u64 i_ino
4/N  tracefs: update filelock trace events for u64 i_ino
5/N  ext4: update trace events for u64 i_ino
6/N  ext4: update error/warning functions for u64 i_ino
7/N  f2fs: update trace events for u64 i_ino
8/N  proc: update /proc/locks format for u64 i_ino
9/N  proc: update /proc/PID/maps for u64 i_ino
10/N nfs: store full 64-bit fileid in i_ino
11/N cifs: store full 64-bit uniqueid in i_ino
12/N <per-filesystem format string fixes as needed>
...
N/N  vfs: update ino_t typedef to u64 (if community agrees)
```

Each patch should:
- Build cleanly (no new warnings with W=1)
- Pass `scripts/checkpatch.pl --strict`
- Be bisectable (every commit in the series must compile and run)

## Risks and Considerations

1. **32-bit struct inode size increase**: 4 bytes larger on 32-bit archs. May
   affect slab cache efficiency for inode allocation. Embedded systems with
   constrained memory could be affected.

2. **Trace event binary format change**: Tools parsing binary trace data
   (trace-cmd, perf) will see different field widths. Self-describing format
   metadata handles this, but older tools may need updating.

3. **BPF programs**: CO-RE programs handle field relocation automatically.
   Legacy BPF programs with hardcoded offsets into `struct inode` on 32-bit
   will break. This is expected and acceptable.

4. **Userspace ABI**: No change — `struct stat`, `struct stat64`, `struct statx`
   are frozen. The existing overflow checks in `cp_new_stat()`, `cp_old_stat()`,
   `cp_compat_stat()` already handle 64-bit ino → 32-bit `st_ino` narrowing
   with `-EOVERFLOW`.

5. **`/proc/locks` and `/proc/PID/maps` format**: Wider numbers in output on
   32-bit. Parsers using positional fields should be unaffected; parsers using
   fixed-width expectations could break.

## Verification

- Build with `make allmodconfig` on x86_64 (catches format string warnings)
- Build with `make allmodconfig ARCH=arm` (32-bit, validates u64 expansion)
- Build with `make allmodconfig ARCH=i386` (32-bit x86)
- Run `scripts/checkpatch.pl --strict` on each patch
- Run `make C=2` (sparse) to catch type mismatches
- Run filesystem-specific selftests (`make kselftest TARGETS=filesystems`)
- Test NFS and CIFS with 64-bit inode numbers on 32-bit kernel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64?
  2026-02-18 15:36 [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64? Jeff Layton
@ 2026-02-19 14:31 ` Christian Brauner
  0 siblings, 0 replies; 2+ messages in thread
From: Christian Brauner @ 2026-02-19 14:31 UTC (permalink / raw)
  To: Jeff Layton; +Cc: lsf-pc, linux-fsdevel, Christoph Hellwig, Jan Kara, Al Viro

On Wed, Feb 18, 2026 at 10:36:01AM -0500, Jeff Layton wrote:
> For historical reasons, the inode->i_ino field is an unsigned long.
> Because it's only 32 bits on 32-bit CPUs, this has caused a lot of fs-
> specific hacks on filesystems that have native 64-bit inode numbers
> when running a 32-bit arch.
> 
> It would be a lot simpler if we just converted i_ino to be 64-bits and
> dealt with the conversion at the kernel's edges. This would be a non-
> event for the most part on 64-bit arches since unsigned long is already
> 64 bits there.
> 
> The kernel itself doesn't deal much with i_ino, so the internal changes
> look fairly straightforward. The bulk of the patches will be to format
> strings and to tracepoints.
> 
> I think that the biggest problem will be that this will grow struct
> inode on 32-bit arches by at least 4 bytes. That may have cacheline
> alignment and slab sizing implications. We're actively talking about
> deprecating 32-bit arches in the future however, so maybe we can
> rationalize that away.

If you already have a Claude instance open you may want ask it to please
find the last ten mails about 32-bit that Linus sent and what his
opinions are about worrying about it when doing such changes... :)

> FWIW, I had Claude spin up a plan to do this (attached). It's not bad.
> I'm tempted to tell it generate patches for this, since this is mostly
> a mechanical change, but I'm curious whether anyone else might have
> reasons that we shouldn't go ahead and do it.

Please just do it. I didn't have time to do it myself yet.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-02-19 14:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 15:36 [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64? Jeff Layton
2026-02-19 14:31 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox