public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,
	 Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64?
Date: Wed, 18 Feb 2026 10:36:01 -0500	[thread overview]
Message-ID: <08f8444c7237566ffb4ba8c9eb0ab4b4a5f14440.camel@kernel.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

For historical reasons, the inode->i_ino field is an unsigned long.
Because it's only 32 bits on 32-bit CPUs, this has caused a lot of fs-
specific hacks on filesystems that have native 64-bit inode numbers
when running a 32-bit arch.

It would be a lot simpler if we just converted i_ino to be 64-bits and
dealt with the conversion at the kernel's edges. This would be a non-
event for the most part on 64-bit arches since unsigned long is already
64 bits there.

The kernel itself doesn't deal much with i_ino, so the internal changes
look fairly straightforward. The bulk of the patches will be to format
strings and to tracepoints.

I think that the biggest problem will be that this will grow struct
inode on 32-bit arches by at least 4 bytes. That may have cacheline
alignment and slab sizing implications. We're actively talking about
deprecating 32-bit arches in the future however, so maybe we can
rationalize that away.

FWIW, I had Claude spin up a plan to do this (attached). It's not bad.
I'm tempted to tell it generate patches for this, since this is mostly
a mechanical change, but I'm curious whether anyone else might have
reasons that we shouldn't go ahead and do it.
-- 
Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: lazy-swimming-alpaca.md --]
[-- Type: text/markdown, Size: 12037 bytes --]

# Plan: Change `i_ino` in `struct inode` from `unsigned long` to `u64`

## Motivation

On 32-bit architectures, `unsigned long` is 32 bits, limiting inode numbers
to 2^32. Several filesystems (NFS, CIFS, XFS, Ceph, FUSE) natively use 64-bit
inode numbers and must hash/fold them to fit `ino_t`/`unsigned long` on 32-bit.
The VFS intermediate structure `kstat.ino` is already `u64`, and `statx.stx_ino`
is already `__u64`. Making `i_ino` itself `u64` eliminates a type mismatch at
the VFS core and removes the need for filesystem-specific folding hacks on
32-bit architectures.

On 64-bit architectures, `unsigned long` is already 64 bits, so this change is
effectively a type alias change with no runtime impact (only format string and
type signature cleanups).

## Scope Assessment

This is a tree-wide change touching hundreds of files. It should be broken into
a series of patches, each handling one logical subsystem or concern. The series
should be structured so that the core type change is a single commit, preceded
by preparatory patches and followed by cleanups.

## Phase 1: Preparatory Patches (before the type change)

### 1.1 Update VFS inode hash/lookup API signatures

**Files:**
- `include/linux/fs.h` — function declarations
- `fs/inode.c` — function definitions

**Functions to update** (change `unsigned long` ino/hashval params to `u64`):

| Function | File:Line | Change |
|----------|-----------|--------|
| `hash()` (static) | `fs/inode.c:675` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_fast()` | `fs/inode.c:1089` | `unsigned long ino` → `u64 ino` |
| `find_inode_by_ino_rcu()` | `fs/inode.c:1814` | `unsigned long ino` → `u64 ino` |
| `test_inode_iunique()` | `fs/inode.c:1530` | `unsigned long ino` → `u64 ino` |
| `__insert_inode_hash()` | `fs/inode.c:693`, `include/linux/fs.h:3018` | `unsigned long hashval` → `u64 hashval` |
| `iget_locked()` | `fs/inode.c:1458`, `include/linux/fs.h` | `unsigned long ino` → `u64 ino` |
| `iget5_locked()` | `fs/inode.c:1381`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `iget5_locked_rcu()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `ilookup()` | `fs/inode.c:1680`, `include/linux/fs.h` | `unsigned long ino` → `u64 ino` |
| `ilookup5()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `ilookup5_nowait()` | `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_nowait()` | `fs/inode.c:1728`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `find_inode_rcu()` | `include/linux/fs.h:2960` | `unsigned long` → `u64` |
| `inode_insert5()` | `fs/inode.c:1304`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |
| `insert_inode_locked4()` | `fs/inode.c:1887`, `include/linux/fs.h` | `unsigned long hashval` → `u64 hashval` |

**Note:** `insert_inode_locked()` at `fs/inode.c:1833` uses `ino_t ino = inode->i_ino;`
internally. This local variable should change to `u64` (or just use `inode->i_ino`
directly).

The `hash()` function currently takes `unsigned long hashval`. On 32-bit, when
`i_ino` becomes `u64`, we need the hash function to handle 64-bit values. Consider
using `hash_64()` or folding the upper 32 bits on 32-bit architectures:
```c
static unsigned long hash(struct super_block *sb, u64 hashval)
{
    unsigned long tmp;
    tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES;
    ...
}
```

This patch can be done first because `unsigned long` implicitly converts to `u64`
on all architectures (widening), so changing function signatures to `u64` is
backward-compatible even before `i_ino` itself changes.

### 1.2 Update `ino_t` typedef (optional, separate discussion)

**File:** `include/linux/types.h:21`

Currently: `typedef __kernel_ulong_t ino_t;`

Options:
- **Option A**: Change `ino_t` to `u64` — this is the cleanest approach but
  affects every use of `ino_t` across the tree. Many local variables use `ino_t`.
- **Option B**: Leave `ino_t` as-is and only change `i_ino` — creates a type
  mismatch between `i_ino` (u64) and `ino_t` (unsigned long) on 32-bit.
- **Recommendation**: Option A is preferred for consistency, but this is a
  separate patch and a separate discussion with the community. The `i_ino`
  change can proceed without it by using `u64` directly.

**Note:** `__kernel_ino_t` in UAPI headers MUST NOT change — it's userspace ABI.

## Phase 2: The Core Type Change

### 2.1 Change `i_ino` field type

**File:** `include/linux/fs.h:786`

```c
// Before:
unsigned long       i_ino;

// After:
u64                 i_ino;
```

This single change will cause compiler warnings across the tree for format
string mismatches (`%lu` vs `u64`) on 64-bit architectures (where `u64` is
`unsigned long long` but `unsigned long` is also 64-bit). On 32-bit, it will
cause actual width changes.

### 2.2 Struct size impact on 32-bit

`struct inode` will grow by 4 bytes on 32-bit architectures due to `i_ino`
expanding from 4 to 8 bytes. The field may also require 8-byte alignment,
potentially adding padding. Review the field ordering in `struct inode` to
minimize padding impact.

## Phase 3: Tree-Wide Format String Fixes

### 3.1 Strategy for format strings

**Recommended approach**: Use `%llu` with a `(unsigned long long)` cast, which
is the standard kernel pattern for printing `u64` values portably. Do NOT
introduce a new format helper macro — the kernel community generally prefers
explicit types over macros for printk formats.

Pattern to search and replace:
```
%lu.*i_ino  →  %llu with (unsigned long long) cast
%lx.*i_ino  →  %llx with (unsigned long long) cast
```

### 3.2 Trace events (highest volume of changes)

Each trace event file should be a separate patch:

| File | Est. Changes | Description |
|------|-------------|-------------|
| `include/trace/events/ext4.h` | ~100+ sites | Change `__field(ino_t, ino)` → `__field(u64, ino)`, update all `%lu` → `%llu`, remove `(unsigned long)` casts |
| `include/trace/events/f2fs.h` | ~50+ sites | Update `show_dev_ino()` macro and all trace events |
| `include/trace/events/writeback.h` | ~15+ sites | Update all writeback trace events |
| `include/trace/events/filelock.h` | ~10 sites | Change `__field(unsigned long, i_ino)` → `__field(u64, i_ino)`, `%lx` → `%llx` |

**Important:** Changing trace event field types changes the binary trace format.
This will affect `trace-cmd` and `perf` trace parsing. This is acceptable but
should be noted in commit messages.

### 3.3 Filesystem-specific format strings

Each filesystem should be a separate patch:

| Filesystem | Key Files | Notes |
|-----------|-----------|-------|
| ext4 | `fs/ext4/super.c`, `fs/ext4/ext4.h` | `__ext4_error_inode()`, `__ext4_warning_inode()`, `__ext4_grp_locked_error()` — change `%lu` → `%llu`, update `unsigned long ino` parameters |
| btrfs | Various | May already use `u64` internally |
| f2fs | Various | Fix format strings |
| Other fs | Various | Grep for `i_ino` with `%lu` in each |

### 3.4 /proc interfaces (ABI-sensitive)

| File | Interface | Current | Notes |
|------|-----------|---------|-------|
| `fs/locks.c:2899` | `/proc/locks` | `%lu` for `inode->i_ino` | Change to `%llu`, values stay the same on 64-bit; on 32-bit, wider numbers possible but no parsers should break since field is positional/decimal |
| `fs/proc/task_mmu.c:468` | `/proc/PID/maps` | `unsigned long ino = 0;` → assigned from `inode->i_ino` | Change local var to `u64`, `seq_put_decimal_ull()` already accepts `unsigned long long` |

## Phase 4: Filesystem-Specific Inode Number Handling

### 4.1 NFS

**Files:**
- `include/linux/nfs_fs.h:670-677` — `nfs_fileid_to_ino_t()`
- `fs/nfs/inode.c:69-73` — `nfs_fattr_to_ino_t()`
- `fs/nfs/inode.c:93-107` — `nfs_compat_user_ino64()`

If `i_ino` is `u64`, NFS can store the full 64-bit fileid in `i_ino` without
folding. The `nfs_fileid_to_ino_t()` function could be simplified or its callers
updated to assign directly. However, the hash value passed to `iget5_locked()`
may still need folding for hash distribution on 32-bit — this is separate from
`i_ino` storage.

**Approach**: Update `nfs_fattr_to_ino_t()` to return `u64` and store the full
fileid. Keep hash folding only for the hash parameter to `iget5_locked()`.

### 4.2 CIFS/SMB

**File:** `fs/smb/client/cifsfs.h:16-29` — `cifs_uniqueid_to_ino_t()`

Same pattern as NFS. With `u64 i_ino`, CIFS can store the full 64-bit uniqueid
without hashing. Update to return `u64`.

**File:** `fs/smb/client/inode.c:1614-1623` — update hash variable type.

### 4.3 Other filesystems

Filesystems that already use 64-bit inode numbers internally (XFS, Ceph, FUSE)
may benefit from simplified code paths. Each is a separate cleanup patch.

## Phase 5: Related Type Updates

### 5.1 `get_next_ino()` (`fs/inode.c:1145`)

Returns `unsigned int` (32-bit). Used by pseudo-filesystems (tmpfs, sysfs,
debugfs, etc.). This is deliberately limited to 32 bits to avoid EOVERFLOW from
`stat()` on 32-bit userspace. **No change needed** — widening to `u64` storage
is fine; the values still fit.

### 5.2 `iunique()` (`fs/inode.c:1556`)

Uses `static unsigned int counter`. Returns `ino_t`. Same consideration as
`get_next_ino()` — keeps values in 32-bit range for compat. **No change needed**
for the counter, but return type should follow `ino_t` (which may or may not
change per Phase 1.2).

### 5.3 `is_zero_ino()` (`include/linux/fs.h:2986`)

Already casts to `(u32)` explicitly. **No change needed.**

## Patch Series Structure (Recommended Ordering)

```
1/N  vfs: widen inode hash/lookup functions to take u64
2/N  vfs: change i_ino from unsigned long to u64
3/N  tracefs: update writeback trace events for u64 i_ino
4/N  tracefs: update filelock trace events for u64 i_ino
5/N  ext4: update trace events for u64 i_ino
6/N  ext4: update error/warning functions for u64 i_ino
7/N  f2fs: update trace events for u64 i_ino
8/N  proc: update /proc/locks format for u64 i_ino
9/N  proc: update /proc/PID/maps for u64 i_ino
10/N nfs: store full 64-bit fileid in i_ino
11/N cifs: store full 64-bit uniqueid in i_ino
12/N <per-filesystem format string fixes as needed>
...
N/N  vfs: update ino_t typedef to u64 (if community agrees)
```

Each patch should:
- Build cleanly (no new warnings with W=1)
- Pass `scripts/checkpatch.pl --strict`
- Be bisectable (every commit in the series must compile and run)

## Risks and Considerations

1. **32-bit struct inode size increase**: 4 bytes larger on 32-bit archs. May
   affect slab cache efficiency for inode allocation. Embedded systems with
   constrained memory could be affected.

2. **Trace event binary format change**: Tools parsing binary trace data
   (trace-cmd, perf) will see different field widths. Self-describing format
   metadata handles this, but older tools may need updating.

3. **BPF programs**: CO-RE programs handle field relocation automatically.
   Legacy BPF programs with hardcoded offsets into `struct inode` on 32-bit
   will break. This is expected and acceptable.

4. **Userspace ABI**: No change — `struct stat`, `struct stat64`, `struct statx`
   are frozen. The existing overflow checks in `cp_new_stat()`, `cp_old_stat()`,
   `cp_compat_stat()` already handle 64-bit ino → 32-bit `st_ino` narrowing
   with `-EOVERFLOW`.

5. **`/proc/locks` and `/proc/PID/maps` format**: Wider numbers in output on
   32-bit. Parsers using positional fields should be unaffected; parsers using
   fixed-width expectations could break.

## Verification

- Build with `make allmodconfig` on x86_64 (catches format string warnings)
- Build with `make allmodconfig ARCH=arm` (32-bit, validates u64 expansion)
- Build with `make allmodconfig ARCH=i386` (32-bit x86)
- Run `scripts/checkpatch.pl --strict` on each patch
- Run `make C=2` (sparse) to catch type mismatches
- Run filesystem-specific selftests (`make kselftest TARGETS=filesystems`)
- Test NFS and CIFS with 64-bit inode numbers on 32-bit kernel

             reply	other threads:[~2026-02-18 15:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18 15:36 Jeff Layton [this message]
2026-02-19 14:31 ` [LSF/MM/BPF TOPIC] Should we make inode->i_ino a u64? Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08f8444c7237566ffb4ba8c9eb0ab4b4a5f14440.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox