* [PATCH 00/11] ntfsplus: ntfs filesystem remake
@ 2025-10-20 2:07 Namjae Jeon
2025-10-20 2:07 ` [PATCH 01/11] ntfsplus: in-memory, on-disk structures and headers Namjae Jeon
` (7 more replies)
0 siblings, 8 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
Introduction
============
The NTFS filesystem[1] still remains the default filesystem for Windows
and The well-maintained NTFS driver in the Linux kernel enhances
interoperability with Windows devices, making it easier for Linux users
to work with NTFS-formatted drives. Currently, ntfs support in Linux was
the long-neglected NTFS Classic (read-only), which has been removed from
the Linux kernel, leaving the poorly maintained ntfs3. ntfs3 still has
many problems and is poorly maintained, so users and distributions are
still using the old legacy ntfs-3g.
What is ntfsplus?
=================
The remade ntfs called ntfsplus is an implementation that supports write
and the essential requirements(iomap, no buffer-head, utilities, xfstests
test result) based on read-only classic NTFS.
The old read-only ntfs code is much cleaner, with extensive comments,
offers readability that makes understanding NTFS easier. This is why
ntfsplus was developed on old read-only NTFS base.
The target is to provide current trends(iomap, no buffer head, folio),
enhanced performance, stable maintenance, utility support including fsck.
Key Features
============
- Write support:
Implement write support on classic read-only NTFS. Additionally,
integrate delayed allocation to enhance write performance through
multi-cluster allocation and minimized fragmentation of cluster bitmap.
- Switch to using iomap:
Use iomap for buffered IO writes, reads, direct IO, file extent mapping,
readpages, writepages operations.
- Stop using the buffer head:
The use of buffer head in old ntfs and switched to use folio instead.
As a result, CONFIG_BUFFER_HEAD option enable is removed in Kconfig also.
- Public utilities include fsck[2]:
While ntfs-3g includes ntfsprogs as a component, it notably lacks
the fsck implementation. So we have launched a new ntfs utilitiies
project called ntfsprogs-plus by forking from ntfs-3g after removing
unnecessary ntfs fuse implementation. fsck.ntfs can be used for ntfs
testing with xfstests as well as for recovering corrupted NTFS device.
- Performance Enhancements:
- ntfsplus vs. ntfs3:
* Performance was benchmarked using iozone with various chunk size.
- In single-thread(1T) write tests, ntfsplus show approximately
3~5% better performance.
- In multi-thread(4T) write tests, ntfsplus show approximately
35~110% better performance.
- Read throughput is identical for both ntfs implementations.
1GB file size:4096 size:16384 size:65536
MB/sec ntfsplus | ntfs3 ntfsplus | ntfs3 ntfsplus | ntfs3
─────────────────────────────────────────────────────────────────
read 399 | 399 426 | 424 429 | 430
─────────────────────────────────────────────────────────────────
write(1T) 291 | 276 325 | 305 333 | 317
write(4T) 105 | 50 113 | 78 114 | 99.6
* File list browsing performance. (about 12~14% faster)
files:100000 files:200000 files:400000
Sec ntfsplus | ntfs3 ntfsplus | ntfs3 ntfsplus | ntfs3
─────────────────────────────────────────────────────────────────
ls -lR 7.07 | 8.10 14.03 | 16.35 28.27 | 32.86
* mount time.
parti_size:1TB parti_size:2TB parti_size:4TB
Sec ntfsplus | ntfs3 ntfsplus | ntfs3 ntfsplus | ntfs3
─────────────────────────────────────────────────────────────────
mount 0.38 | 2.03 0.39 | 2.25 0.70 | 4.51
The following are the reasons why ntfsplus performance is higher
compared to ntfs3:
- Use iomap aops.
- Delayed allocation support.
- Optimize zero out for newly allocated clusters.
- Optimize runlist merge overhead with small chunck size.
- pre-load mft(inode) blocks and index(dentry) blocks to improve
readdir + stat performance.
- Load lcn bitmap on background.
- Stability improvement:
a. Pass more xfstests tests:
ntfsplus passed 287 tests, significantly higher than ntfs3's 218.
ntfsplus implement fallocate, idmapped mount and permission, etc,
resulting in a significantly high number of xfstests passing compared
to ntfs3.
b. Bonnie++ issue[3]:
The Bonnie++ benchmark fails on ntfs3 with a "Directory not empty"
error during file deletion. ntfs3 currently iterates directory
entries by reading index blocks one by one. When entries are deleted
concurrently, index block merging or entry relocation can cause
readdir() to skip some entries, leaving files undeleted in
workloads(bonnie++) that mix unlink and directory scans.
ntfsplus implement leaf chain traversal in readdir to avoid entry skip
on deletion.
- Journaling support:
ntfs3 does not provide full journaling support. It only implement journal
replay[4], which in our testing did not function correctly. My next task
after upstreaming will be to add full journal support to ntfsplus.
The feature comparison summary
==============================
Feature ntfsplus ntfs3
=================================== ======== ===========
Write support Yes Yes
iomap support Yes No
No buffer head Yes No
Public utilities(mkfs, fsck, etc.) Yes No
xfstests passed 287 218
Idmapped mount Yes No
Delayed allocation Yes No
Bonnie++ Pass Fail
Journaling Planned Inoperative
=================================== ======== ===========
References
==========
[1] https://en.wikipedia.org/wiki/NTFS
[2] https://github.com/ntfsprogs-plus/ntfsprogs-plus
[3] https://lore.kernel.org/ntfs3/CAOZgwEd7NDkGEpdF6UQTcbYuupDavaHBoj4WwTy3Qe4Bqm6V0g@mail.gmail.com/
[4] https://marc.info/?l=linux-fsdevel&m=161738417018673&q=mbox
Namjae Jeon (11):
ntfsplus: in-memory, on-disk structures and headers
ntfsplus: add super block operations
ntfsplus: add inode operations
ntfsplus: add directory operations
ntfsplus: add file operations
ntfsplus: add iomap and address space operations
ntfsplus: add attrib operatrions
ntfsplus: add runlist handling and cluster allocator
ntfsplus: add reparse and ea operations
ntfsplus: add misc operations
ntfsplus: add Kconfig and Makefile
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/ntfsplus/Kconfig | 45 +
fs/ntfsplus/Makefile | 18 +
fs/ntfsplus/aops.c | 631 +++++
fs/ntfsplus/aops.h | 92 +
fs/ntfsplus/attrib.c | 5373 +++++++++++++++++++++++++++++++++++++
fs/ntfsplus/attrib.h | 159 ++
fs/ntfsplus/attrlist.c | 276 ++
fs/ntfsplus/attrlist.h | 21 +
fs/ntfsplus/bitmap.c | 193 ++
fs/ntfsplus/bitmap.h | 90 +
fs/ntfsplus/collate.c | 173 ++
fs/ntfsplus/collate.h | 37 +
fs/ntfsplus/compress.c | 1565 +++++++++++
fs/ntfsplus/dir.c | 1226 +++++++++
fs/ntfsplus/dir.h | 33 +
fs/ntfsplus/ea.c | 712 +++++
fs/ntfsplus/ea.h | 25 +
fs/ntfsplus/file.c | 1056 ++++++++
fs/ntfsplus/index.c | 2114 +++++++++++++++
fs/ntfsplus/index.h | 127 +
fs/ntfsplus/inode.c | 3705 +++++++++++++++++++++++++
fs/ntfsplus/inode.h | 354 +++
fs/ntfsplus/layout.h | 2288 ++++++++++++++++
fs/ntfsplus/lcnalloc.c | 993 +++++++
fs/ntfsplus/lcnalloc.h | 127 +
fs/ntfsplus/logfile.c | 773 ++++++
fs/ntfsplus/logfile.h | 316 +++
fs/ntfsplus/mft.c | 2630 ++++++++++++++++++
fs/ntfsplus/mft.h | 93 +
fs/ntfsplus/misc.c | 221 ++
fs/ntfsplus/misc.h | 218 ++
fs/ntfsplus/mst.c | 195 ++
fs/ntfsplus/namei.c | 1606 +++++++++++
fs/ntfsplus/ntfs.h | 172 ++
fs/ntfsplus/ntfs_iomap.c | 704 +++++
fs/ntfsplus/ntfs_iomap.h | 22 +
fs/ntfsplus/reparse.c | 550 ++++
fs/ntfsplus/reparse.h | 15 +
fs/ntfsplus/runlist.c | 1995 ++++++++++++++
fs/ntfsplus/runlist.h | 91 +
fs/ntfsplus/super.c | 2716 +++++++++++++++++++
fs/ntfsplus/unistr.c | 471 ++++
fs/ntfsplus/upcase.c | 73 +
fs/ntfsplus/volume.h | 241 ++
include/uapi/linux/ntfs.h | 23 +
47 files changed, 34560 insertions(+)
create mode 100644 fs/ntfsplus/Kconfig
create mode 100644 fs/ntfsplus/Makefile
create mode 100644 fs/ntfsplus/aops.c
create mode 100644 fs/ntfsplus/aops.h
create mode 100644 fs/ntfsplus/attrib.c
create mode 100644 fs/ntfsplus/attrib.h
create mode 100644 fs/ntfsplus/attrlist.c
create mode 100644 fs/ntfsplus/attrlist.h
create mode 100644 fs/ntfsplus/bitmap.c
create mode 100644 fs/ntfsplus/bitmap.h
create mode 100644 fs/ntfsplus/collate.c
create mode 100644 fs/ntfsplus/collate.h
create mode 100644 fs/ntfsplus/compress.c
create mode 100644 fs/ntfsplus/dir.c
create mode 100644 fs/ntfsplus/dir.h
create mode 100644 fs/ntfsplus/ea.c
create mode 100644 fs/ntfsplus/ea.h
create mode 100644 fs/ntfsplus/file.c
create mode 100644 fs/ntfsplus/index.c
create mode 100644 fs/ntfsplus/index.h
create mode 100644 fs/ntfsplus/inode.c
create mode 100644 fs/ntfsplus/inode.h
create mode 100644 fs/ntfsplus/layout.h
create mode 100644 fs/ntfsplus/lcnalloc.c
create mode 100644 fs/ntfsplus/lcnalloc.h
create mode 100644 fs/ntfsplus/logfile.c
create mode 100644 fs/ntfsplus/logfile.h
create mode 100644 fs/ntfsplus/mft.c
create mode 100644 fs/ntfsplus/mft.h
create mode 100644 fs/ntfsplus/misc.c
create mode 100644 fs/ntfsplus/misc.h
create mode 100644 fs/ntfsplus/mst.c
create mode 100644 fs/ntfsplus/namei.c
create mode 100644 fs/ntfsplus/ntfs.h
create mode 100644 fs/ntfsplus/ntfs_iomap.c
create mode 100644 fs/ntfsplus/ntfs_iomap.h
create mode 100644 fs/ntfsplus/reparse.c
create mode 100644 fs/ntfsplus/reparse.h
create mode 100644 fs/ntfsplus/runlist.c
create mode 100644 fs/ntfsplus/runlist.h
create mode 100644 fs/ntfsplus/super.c
create mode 100644 fs/ntfsplus/unistr.c
create mode 100644 fs/ntfsplus/upcase.c
create mode 100644 fs/ntfsplus/volume.h
create mode 100644 include/uapi/linux/ntfs.h
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 01/11] ntfsplus: in-memory, on-disk structures and headers
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
@ 2025-10-20 2:07 ` Namjae Jeon
2025-10-20 2:07 ` [PATCH 02/11] ntfsplus: add super block operations Namjae Jeon
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
This adds in-memory and on-disk structures and headers.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/ntfsplus/aops.h | 92 ++
fs/ntfsplus/attrib.h | 159 +++
fs/ntfsplus/attrlist.h | 21 +
fs/ntfsplus/bitmap.h | 90 ++
fs/ntfsplus/collate.h | 37 +
fs/ntfsplus/dir.h | 33 +
fs/ntfsplus/ea.h | 25 +
fs/ntfsplus/index.h | 127 ++
fs/ntfsplus/inode.h | 354 ++++++
fs/ntfsplus/layout.h | 2288 +++++++++++++++++++++++++++++++++++++
fs/ntfsplus/lcnalloc.h | 127 ++
fs/ntfsplus/logfile.h | 316 +++++
fs/ntfsplus/mft.h | 93 ++
fs/ntfsplus/misc.h | 218 ++++
fs/ntfsplus/ntfs.h | 172 +++
fs/ntfsplus/ntfs_iomap.h | 22 +
fs/ntfsplus/reparse.h | 15 +
fs/ntfsplus/runlist.h | 91 ++
fs/ntfsplus/volume.h | 241 ++++
include/uapi/linux/ntfs.h | 23 +
20 files changed, 4544 insertions(+)
create mode 100644 fs/ntfsplus/aops.h
create mode 100644 fs/ntfsplus/attrib.h
create mode 100644 fs/ntfsplus/attrlist.h
create mode 100644 fs/ntfsplus/bitmap.h
create mode 100644 fs/ntfsplus/collate.h
create mode 100644 fs/ntfsplus/dir.h
create mode 100644 fs/ntfsplus/ea.h
create mode 100644 fs/ntfsplus/index.h
create mode 100644 fs/ntfsplus/inode.h
create mode 100644 fs/ntfsplus/layout.h
create mode 100644 fs/ntfsplus/lcnalloc.h
create mode 100644 fs/ntfsplus/logfile.h
create mode 100644 fs/ntfsplus/mft.h
create mode 100644 fs/ntfsplus/misc.h
create mode 100644 fs/ntfsplus/ntfs.h
create mode 100644 fs/ntfsplus/ntfs_iomap.h
create mode 100644 fs/ntfsplus/reparse.h
create mode 100644 fs/ntfsplus/runlist.h
create mode 100644 fs/ntfsplus/volume.h
create mode 100644 include/uapi/linux/ntfs.h
diff --git a/fs/ntfsplus/aops.h b/fs/ntfsplus/aops.h
new file mode 100644
index 000000000000..333bbae8c566
--- /dev/null
+++ b/fs/ntfsplus/aops.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Defines for NTFS kernel address space operations and page cache
+ * handling.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_AOPS_H
+#define _LINUX_NTFS_AOPS_H
+
+#include <linux/pagemap.h>
+#include <linux/iomap.h>
+
+#include "volume.h"
+#include "inode.h"
+
+/**
+ * ntfs_unmap_folio - release a folio that was mapped using ntfs_folio_page()
+ * @folio: the folio to release
+ *
+ * Unpin, unmap and release a folio that was obtained from ntfs_folio_page().
+ */
+static inline void ntfs_unmap_folio(struct folio *folio, void *addr)
+{
+ if (addr)
+ kunmap_local(addr);
+ folio_put(folio);
+}
+
+/**
+ * ntfs_read_mapping_folio - map a folio into accessible memory, reading it if necessary
+ * @mapping: address space for which to obtain the page
+ * @index: index into the page cache for @mapping of the page to map
+ *
+ * Read a page from the page cache of the address space @mapping at position
+ * @index, where @index is in units of PAGE_SIZE, and not in bytes.
+ *
+ * If the page is not in memory it is loaded from disk first using the
+ * read_folio method defined in the address space operations of @mapping
+ * and the page is added to the page cache of @mapping in the process.
+ *
+ * If the page belongs to an mst protected attribute and it is marked as such
+ * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
+ * error checking is performed. This means the caller has to verify whether
+ * the ntfs record(s) contained in the page are valid or not using one of the
+ * ntfs_is_XXXX_record{,p}() macros, where XXXX is the record type you are
+ * expecting to see. (For details of the macros, see fs/ntfs/layout.h.)
+ *
+ * If the page is in high memory it is mapped into memory directly addressible
+ * by the kernel.
+ *
+ * Finally the page count is incremented, thus pinning the page into place.
+ *
+ * The above means that page_address(page) can be used on all pages obtained
+ * with ntfs_map_page() to get the kernel virtual address of the page.
+ *
+ * When finished with the page, the caller has to call ntfs_unmap_page() to
+ * unpin, unmap and release the page.
+ *
+ * Note this does not grant exclusive access. If such is desired, the caller
+ * must provide it independently of the ntfs_{un}map_page() calls by using
+ * a {rw_}semaphore or other means of serialization. A spin lock cannot be
+ * used as ntfs_map_page() can block.
+ *
+ * The unlocked and uptodate page is returned on success or an encoded error
+ * on failure. Caller has to test for error using the IS_ERR() macro on the
+ * return value. If that evaluates to 'true', the negative error code can be
+ * obtained using PTR_ERR() on the return value of ntfs_map_page().
+ */
+static inline struct folio *ntfs_read_mapping_folio(struct address_space *mapping,
+ unsigned long index)
+{
+ struct folio *folio;
+
+retry:
+ folio = read_mapping_folio(mapping, index, NULL);
+ if (PTR_ERR(folio) == -EINTR)
+ goto retry;
+
+ return folio;
+}
+
+void mark_ntfs_record_dirty(struct folio *folio);
+struct bio *ntfs_setup_bio(struct ntfs_volume *vol, unsigned int opf, s64 lcn,
+ unsigned int pg_ofs);
+int ntfs_dev_read(struct super_block *sb, void *buf, loff_t start, loff_t end);
+int ntfs_dev_write(struct super_block *sb, void *buf, loff_t start,
+ loff_t size, bool wait);
+#endif /* _LINUX_NTFS_AOPS_H */
diff --git a/fs/ntfsplus/attrib.h b/fs/ntfsplus/attrib.h
new file mode 100644
index 000000000000..e7991851dc9a
--- /dev/null
+++ b/fs/ntfsplus/attrib.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for attribute handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_ATTRIB_H
+#define _LINUX_NTFS_ATTRIB_H
+
+#include "ntfs.h"
+#include "dir.h"
+
+extern __le16 AT_UNNAMED[];
+
+/**
+ * ntfs_attr_search_ctx - used in attribute search functions
+ * @mrec: buffer containing mft record to search
+ * @attr: attribute record in @mrec where to begin/continue search
+ * @is_first: if true ntfs_attr_lookup() begins search with @attr, else after
+ *
+ * Structure must be initialized to zero before the first call to one of the
+ * attribute search functions. Initialize @mrec to point to the mft record to
+ * search, and @attr to point to the first attribute within @mrec (not necessary
+ * if calling the _first() functions), and set @is_first to 'true' (not necessary
+ * if calling the _first() functions).
+ *
+ * If @is_first is 'true', the search begins with @attr. If @is_first is 'false',
+ * the search begins after @attr. This is so that, after the first call to one
+ * of the search attribute functions, we can call the function again, without
+ * any modification of the search context, to automagically get the next
+ * matching attribute.
+ */
+struct ntfs_attr_search_ctx {
+ struct mft_record *mrec;
+ bool mapped_mrec;
+ struct attr_record *attr;
+ bool is_first;
+ struct ntfs_inode *ntfs_ino;
+ struct attr_list_entry *al_entry;
+ struct ntfs_inode *base_ntfs_ino;
+ struct mft_record *base_mrec;
+ bool mapped_base_mrec;
+ struct attr_record *base_attr;
+};
+
+enum { /* ways of processing holes when expanding */
+ HOLES_NO,
+ HOLES_OK,
+};
+
+int ntfs_map_runlist_nolock(struct ntfs_inode *ni, s64 vcn,
+ struct ntfs_attr_search_ctx *ctx);
+int ntfs_map_runlist(struct ntfs_inode *ni, s64 vcn);
+s64 ntfs_attr_vcn_to_lcn_nolock(struct ntfs_inode *ni, const s64 vcn,
+ const bool write_locked);
+struct runlist_element *ntfs_attr_find_vcn_nolock(struct ntfs_inode *ni,
+ const s64 vcn, struct ntfs_attr_search_ctx *ctx);
+struct runlist_element *__ntfs_attr_find_vcn_nolock(struct runlist *runlist,
+ const s64 vcn);
+int ntfs_attr_map_whole_runlist(struct ntfs_inode *ni);
+int ntfs_attr_lookup(const __le32 type, const __le16 *name,
+ const u32 name_len, const u32 ic,
+ const s64 lowest_vcn, const u8 *val, const u32 val_len,
+ struct ntfs_attr_search_ctx *ctx);
+int load_attribute_list(struct ntfs_inode *base_ni,
+ u8 *al_start, const s64 size);
+
+static inline s64 ntfs_attr_size(const struct attr_record *a)
+{
+ if (!a->non_resident)
+ return (s64)le32_to_cpu(a->data.resident.value_length);
+ return le64_to_cpu(a->data.non_resident.data_size);
+}
+
+void ntfs_attr_reinit_search_ctx(struct ntfs_attr_search_ctx *ctx);
+struct ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(struct ntfs_inode *ni,
+ struct mft_record *mrec);
+void ntfs_attr_put_search_ctx(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attr_size_bounds_check(const struct ntfs_volume *vol,
+ const __le32 type, const s64 size);
+int ntfs_attr_can_be_resident(const struct ntfs_volume *vol,
+ const __le32 type);
+int ntfs_attr_map_cluster(struct ntfs_inode *ni, s64 vcn_start, s64 *lcn_start,
+ s64 *lcn_count, s64 max_clu_count, bool *balloc, bool update_mp, bool skip_holes);
+int ntfs_attr_record_resize(struct mft_record *m, struct attr_record *a, u32 new_size);
+int ntfs_resident_attr_value_resize(struct mft_record *m, struct attr_record *a,
+ const u32 new_size);
+int ntfs_attr_make_non_resident(struct ntfs_inode *ni, const u32 data_size);
+int ntfs_attr_set(struct ntfs_inode *ni, const s64 ofs, const s64 cnt,
+ const u8 val);
+int ntfs_attr_set_initialized_size(struct ntfs_inode *ni, loff_t new_size);
+int ntfs_attr_open(struct ntfs_inode *ni, const __le32 type,
+ __le16 *name, u32 name_len);
+void ntfs_attr_close(struct ntfs_inode *n);
+int ntfs_attr_fallocate(struct ntfs_inode *ni, loff_t start, loff_t byte_len, bool keep_size);
+int ntfs_non_resident_attr_insert_range(struct ntfs_inode *ni, s64 start_vcn, s64 len);
+int ntfs_non_resident_attr_collapse_range(struct ntfs_inode *ni, s64 start_vcn, s64 len);
+int ntfs_non_resident_attr_punch_hole(struct ntfs_inode *ni, s64 start_vcn, s64 len);
+int __ntfs_attr_truncate_vfs(struct ntfs_inode *ni, const s64 newsize,
+ const s64 i_size);
+int ntfs_attr_expand(struct ntfs_inode *ni, const s64 newsize, const s64 prealloc_size);
+int ntfs_attr_truncate_i(struct ntfs_inode *ni, const s64 newsize, unsigned int holes);
+int ntfs_attr_truncate(struct ntfs_inode *ni, const s64 newsize);
+int ntfs_attr_rm(struct ntfs_inode *ni);
+int ntfs_attr_exist(struct ntfs_inode *ni, const __le32 type, __le16 *name,
+ u32 name_len);
+int ntfs_attr_remove(struct ntfs_inode *ni, const __le32 type, __le16 *name,
+ u32 name_len);
+int ntfs_attr_record_rm(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attr_record_move_to(struct ntfs_attr_search_ctx *ctx, struct ntfs_inode *ni);
+int ntfs_attr_add(struct ntfs_inode *ni, __le32 type,
+ __le16 *name, u8 name_len, u8 *val, s64 size);
+int ntfs_attr_record_move_away(struct ntfs_attr_search_ctx *ctx, int extra);
+char *ntfs_attr_name_get(const struct ntfs_volume *vol, const __le16 *uname,
+ const int uname_len);
+void ntfs_attr_name_free(unsigned char **name);
+void *ntfs_attr_readall(struct ntfs_inode *ni, const __le32 type,
+ __le16 *name, u32 name_len, s64 *data_size);
+int ntfs_resident_attr_record_add(struct ntfs_inode *ni, __le32 type,
+ __le16 *name, u8 name_len, u8 *val, u32 size,
+ __le16 flags);
+int ntfs_attr_update_mapping_pairs(struct ntfs_inode *ni, s64 from_vcn);
+struct runlist_element *ntfs_attr_vcn_to_rl(struct ntfs_inode *ni, s64 vcn, s64 *lcn);
+
+/**
+ * ntfs_attrs_walk - syntactic sugar for walking all attributes in an inode
+ * @ctx: initialised attribute search context
+ *
+ * Syntactic sugar for walking attributes in an inode.
+ *
+ * Return 0 on success and -1 on error with errno set to the error code from
+ * ntfs_attr_lookup().
+ *
+ * Example: When you want to enumerate all attributes in an open ntfs inode
+ * @ni, you can simply do:
+ *
+ * int err;
+ * struct ntfs_attr_search_ctx *ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ * if (!ctx)
+ * // Error code is in errno. Handle this case.
+ * while (!(err = ntfs_attrs_walk(ctx))) {
+ * struct attr_record *attr = ctx->attr;
+ * // attr now contains the next attribute. Do whatever you want
+ * // with it and then just continue with the while loop.
+ * }
+ * if (err && errno != ENOENT)
+ * // Ooops. An error occurred! You should handle this case.
+ * // Now finished with all attributes in the inode.
+ */
+static inline int ntfs_attrs_walk(struct ntfs_attr_search_ctx *ctx)
+{
+ return ntfs_attr_lookup(AT_UNUSED, NULL, 0, CASE_SENSITIVE, 0,
+ NULL, 0, ctx);
+}
+#endif /* _LINUX_NTFS_ATTRIB_H */
diff --git a/fs/ntfsplus/attrlist.h b/fs/ntfsplus/attrlist.h
new file mode 100644
index 000000000000..d0eadc5db1b0
--- /dev/null
+++ b/fs/ntfsplus/attrlist.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Exports for attribute list attribute handling.
+ * Originated from Linux-NTFS project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ * Copyright (c) 2004 Yura Pakhuchiy
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _NTFS_ATTRLIST_H
+#define _NTFS_ATTRLIST_H
+
+#include "attrib.h"
+
+int ntfs_attrlist_need(struct ntfs_inode *ni);
+int ntfs_attrlist_entry_add(struct ntfs_inode *ni, struct attr_record *attr);
+int ntfs_attrlist_entry_rm(struct ntfs_attr_search_ctx *ctx);
+int ntfs_attrlist_update(struct ntfs_inode *base_ni);
+
+#endif /* defined _NTFS_ATTRLIST_H */
diff --git a/fs/ntfsplus/bitmap.h b/fs/ntfsplus/bitmap.h
new file mode 100644
index 000000000000..9d8c3c5b16ac
--- /dev/null
+++ b/fs/ntfsplus/bitmap.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel bitmap handling. Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_BITMAP_H
+#define _LINUX_NTFS_BITMAP_H
+
+#include <linux/fs.h>
+
+int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
+ const s64 count, const u8 value, const bool is_rollback);
+
+/**
+ * ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
+ * @vi: vfs inode describing the bitmap
+ * @start_bit: first bit to set
+ * @count: number of bits to set
+ * @value: value to set the bits to (i.e. 0 or 1)
+ *
+ * Set @count bits starting at bit @start_bit in the bitmap described by the
+ * vfs inode @vi to @value, where @value is either 0 or 1.
+ */
+static inline int ntfs_bitmap_set_bits_in_run(struct inode *vi,
+ const s64 start_bit, const s64 count, const u8 value)
+{
+ return __ntfs_bitmap_set_bits_in_run(vi, start_bit, count, value,
+ false);
+}
+
+/**
+ * ntfs_bitmap_set_run - set a run of bits in a bitmap
+ * @vi: vfs inode describing the bitmap
+ * @start_bit: first bit to set
+ * @count: number of bits to set
+ *
+ * Set @count bits starting at bit @start_bit in the bitmap described by the
+ * vfs inode @vi.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static inline int ntfs_bitmap_set_run(struct inode *vi, const s64 start_bit,
+ const s64 count)
+{
+ return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 1);
+}
+
+/**
+ * ntfs_bitmap_clear_run - clear a run of bits in a bitmap
+ * @vi: vfs inode describing the bitmap
+ * @start_bit: first bit to clear
+ * @count: number of bits to clear
+ *
+ * Clear @count bits starting at bit @start_bit in the bitmap described by the
+ * vfs inode @vi.
+ */
+static inline int ntfs_bitmap_clear_run(struct inode *vi, const s64 start_bit,
+ const s64 count)
+{
+ return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 0);
+}
+
+/**
+ * ntfs_bitmap_set_bit - set a bit in a bitmap
+ * @vi: vfs inode describing the bitmap
+ * @bit: bit to set
+ *
+ * Set bit @bit in the bitmap described by the vfs inode @vi.
+ */
+static inline int ntfs_bitmap_set_bit(struct inode *vi, const s64 bit)
+{
+ return ntfs_bitmap_set_run(vi, bit, 1);
+}
+
+/**
+ * ntfs_bitmap_clear_bit - clear a bit in a bitmap
+ * @vi: vfs inode describing the bitmap
+ * @bit: bit to clear
+ *
+ * Clear bit @bit in the bitmap described by the vfs inode @vi.
+ */
+static inline int ntfs_bitmap_clear_bit(struct inode *vi, const s64 bit)
+{
+ return ntfs_bitmap_clear_run(vi, bit, 1);
+}
+
+#endif /* defined _LINUX_NTFS_BITMAP_H */
diff --git a/fs/ntfsplus/collate.h b/fs/ntfsplus/collate.h
new file mode 100644
index 000000000000..cf04508340f0
--- /dev/null
+++ b/fs/ntfsplus/collate.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel collation handling.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004 Anton Altaparmakov
+ * Copyright (c) 2005 Yura Pakhuchiy
+ */
+
+#ifndef _LINUX_NTFS_COLLATE_H
+#define _LINUX_NTFS_COLLATE_H
+
+#include "volume.h"
+
+static inline bool ntfs_is_collation_rule_supported(__le32 cr)
+{
+ int i;
+
+ if (unlikely(cr != COLLATION_BINARY && cr != COLLATION_NTOFS_ULONG &&
+ cr != COLLATION_FILE_NAME) && cr != COLLATION_NTOFS_ULONGS)
+ return false;
+ i = le32_to_cpu(cr);
+ if (likely(((i >= 0) && (i <= 0x02)) ||
+ ((i >= 0x10) && (i <= 0x13))))
+ return true;
+ return false;
+}
+
+int ntfs_collate(struct ntfs_volume *vol, __le32 cr,
+ const void *data1, const int data1_len,
+ const void *data2, const int data2_len);
+
+#endif /* _LINUX_NTFS_COLLATE_H */
diff --git a/fs/ntfsplus/dir.h b/fs/ntfsplus/dir.h
new file mode 100644
index 000000000000..5abe21c3d938
--- /dev/null
+++ b/fs/ntfsplus/dir.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for directory handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2002-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_DIR_H
+#define _LINUX_NTFS_DIR_H
+
+#include "inode.h"
+
+/*
+ * ntfs_name is used to return the file name to the caller of
+ * ntfs_lookup_inode_by_name() in order for the caller (namei.c::ntfs_lookup())
+ * to be able to deal with dcache aliasing issues.
+ */
+struct ntfs_name {
+ u64 mref;
+ u8 type;
+ u8 len;
+ __le16 name[];
+} __packed;
+
+/* The little endian Unicode string $I30 as a global constant. */
+extern __le16 I30[5];
+
+u64 ntfs_lookup_inode_by_name(struct ntfs_inode *dir_ni,
+ const __le16 *uname, const int uname_len, struct ntfs_name **res);
+int ntfs_check_empty_dir(struct ntfs_inode *ni, struct mft_record *ni_mrec);
+
+#endif /* _LINUX_NTFS_FS_DIR_H */
diff --git a/fs/ntfsplus/ea.h b/fs/ntfsplus/ea.h
new file mode 100644
index 000000000000..b2e678566eb0
--- /dev/null
+++ b/fs/ntfsplus/ea.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#define NTFS_EA_UID BIT(1)
+#define NTFS_EA_GID BIT(2)
+#define NTFS_EA_MODE BIT(3)
+
+extern const struct xattr_handler *const ntfs_xattr_handlers[];
+
+int ntfs_ea_set_wsl_not_symlink(struct ntfs_inode *ni, mode_t mode, dev_t dev);
+int ntfs_ea_get_wsl_inode(struct inode *inode, dev_t *rdevp, unsigned int flags);
+int ntfs_ea_set_wsl_inode(struct inode *inode, dev_t rdev, __le16 *ea_size,
+ unsigned int flags);
+ssize_t ntfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+struct posix_acl *ntfs_get_acl(struct mnt_idmap *idmap, struct dentry *dentry,
+ int type);
+int ntfs_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct posix_acl *acl, int type);
+int ntfs_init_acl(struct mnt_idmap *idmap, struct inode *inode,
+ struct inode *dir);
+#else
+#define ntfs_get_acl NULL
+#define ntfs_set_acl NULL
+#endif
diff --git a/fs/ntfsplus/index.h b/fs/ntfsplus/index.h
new file mode 100644
index 000000000000..b5c719910ab6
--- /dev/null
+++ b/fs/ntfsplus/index.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel index handling. Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_INDEX_H
+#define _LINUX_NTFS_INDEX_H
+
+#include <linux/fs.h>
+
+#include "attrib.h"
+#include "mft.h"
+#include "aops.h"
+
+#define VCN_INDEX_ROOT_PARENT ((s64)-2)
+
+#define MAX_PARENT_VCN 32
+
+/**
+ * @idx_ni: index inode containing the @entry described by this context
+ * @entry: index entry (points into @ir or @ia)
+ * @data: index entry data (points into @entry)
+ * @data_len: length in bytes of @data
+ * @is_in_root: 'true' if @entry is in @ir and 'false' if it is in @ia
+ * @ir: index root if @is_in_root and NULL otherwise
+ * @actx: attribute search context if @is_in_root and NULL otherwise
+ * @base_ni: base inode if @is_in_root and NULL otherwise
+ * @ia: index block if @is_in_root is 'false' and NULL otherwise
+ * @page: page if @is_in_root is 'false' and NULL otherwise
+ *
+ * @idx_ni is the index inode this context belongs to.
+ *
+ * @entry is the index entry described by this context. @data and @data_len
+ * are the index entry data and its length in bytes, respectively. @data
+ * simply points into @entry. This is probably what the user is interested in.
+ *
+ * If @is_in_root is 'true', @entry is in the index root attribute @ir described
+ * by the attribute search context @actx and the base inode @base_ni. @ia and
+ * @page are NULL in this case.
+ *
+ * If @is_in_root is 'false', @entry is in the index allocation attribute and @ia
+ * and @page point to the index allocation block and the mapped, locked page it
+ * is in, respectively. @ir, @actx and @base_ni are NULL in this case.
+ *
+ * To obtain a context call ntfs_index_ctx_get().
+ *
+ * We use this context to allow ntfs_index_lookup() to return the found index
+ * @entry and its @data without having to allocate a buffer and copy the @entry
+ * and/or its @data into it.
+ *
+ * When finished with the @entry and its @data, call ntfs_index_ctx_put() to
+ * free the context and other associated resources.
+ *
+ * If the index entry was modified, call flush_dcache_index_entry_page()
+ * immediately after the modification and either ntfs_index_entry_mark_dirty()
+ * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
+ * ensure that the changes are written to disk.
+ */
+struct ntfs_index_context {
+ struct ntfs_inode *idx_ni;
+ __le16 *name;
+ u32 name_len;
+ struct index_entry *entry;
+ __le32 cr;
+ void *data;
+ u16 data_len;
+ bool is_in_root;
+ struct index_root *ir;
+ struct ntfs_attr_search_ctx *actx;
+ struct index_block *ib;
+ struct ntfs_inode *base_ni;
+ struct index_block *ia;
+ struct page *page;
+ struct ntfs_inode *ia_ni;
+ int parent_pos[MAX_PARENT_VCN]; /* parent entries' positions */
+ s64 parent_vcn[MAX_PARENT_VCN]; /* entry's parent nodes */
+ int pindex; /* maximum it's the number of the parent nodes */
+ bool ib_dirty;
+ u32 block_size;
+ u8 vcn_size_bits;
+ bool sync_write;
+};
+
+int ntfs_index_entry_inconsistent(struct ntfs_index_context *icx, struct ntfs_volume *vol,
+ const struct index_entry *ie, __le32 collation_rule, u64 inum);
+struct ntfs_index_context *ntfs_index_ctx_get(struct ntfs_inode *ni, __le16 *name,
+ u32 name_len);
+void ntfs_index_ctx_put(struct ntfs_index_context *ictx);
+int ntfs_index_lookup(const void *key, const int key_len,
+ struct ntfs_index_context *ictx);
+
+/**
+ * ntfs_index_entry_flush_dcache_page - flush_dcache_page() for index entries
+ * @ictx: ntfs index context describing the index entry
+ *
+ * Call flush_dcache_page() for the page in which an index entry resides.
+ *
+ * This must be called every time an index entry is modified, just after the
+ * modification.
+ *
+ * If the index entry is in the index root attribute, simply flush the page
+ * containing the mft record containing the index root attribute.
+ *
+ * If the index entry is in an index block belonging to the index allocation
+ * attribute, simply flush the page cache page containing the index block.
+ */
+static inline void ntfs_index_entry_flush_dcache_page(struct ntfs_index_context *ictx)
+{
+ if (!ictx->is_in_root)
+ flush_dcache_page(ictx->page);
+}
+
+void ntfs_index_entry_mark_dirty(struct ntfs_index_context *ictx);
+int ntfs_index_add_filename(struct ntfs_inode *ni, struct file_name_attr *fn, u64 mref);
+int ntfs_index_remove(struct ntfs_inode *ni, const void *key, const int keylen);
+struct ntfs_inode *ntfs_ia_open(struct ntfs_index_context *icx, struct ntfs_inode *ni);
+struct index_entry *ntfs_index_walk_down(struct index_entry *ie, struct ntfs_index_context *ictx);
+struct index_entry *ntfs_index_next(struct index_entry *ie, struct ntfs_index_context *ictx);
+int ntfs_index_rm(struct ntfs_index_context *icx);
+void ntfs_index_ctx_reinit(struct ntfs_index_context *icx);
+int ntfs_ie_add(struct ntfs_index_context *icx, struct index_entry *ie);
+int ntfs_icx_ib_sync_write(struct ntfs_index_context *icx);
+
+#endif /* _LINUX_NTFS_INDEX_H */
diff --git a/fs/ntfsplus/inode.h b/fs/ntfsplus/inode.h
new file mode 100644
index 000000000000..0966f59160df
--- /dev/null
+++ b/fs/ntfsplus/inode.h
@@ -0,0 +1,354 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for inode structures NTFS Linux kernel driver. Part of
+ * the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2007 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_INODE_H
+#define _LINUX_NTFS_INODE_H
+
+#include "misc.h"
+#include <linux/version.h> // jnj Remove it for upstream
+
+enum ntfs_inode_mutex_lock_class {
+ NTFS_INODE_MUTEX_PARENT,
+ NTFS_INODE_MUTEX_NORMAL,
+ NTFS_INODE_MUTEX_PARENT_2,
+ NTFS_INODE_MUTEX_NORMAL_2,
+ NTFS_REPARSE_MUTEX_PARENT,
+ NTFS_EA_MUTEX_NORMAL
+};
+
+/*
+ * The NTFS in-memory inode structure. It is just used as an extension to the
+ * fields already provided in the VFS inode.
+ */
+struct ntfs_inode {
+ rwlock_t size_lock; /* Lock serializing access to inode sizes. */
+ unsigned long state; /*
+ * NTFS specific flags describing this inode.
+ * See ntfs_inode_state_bits below.
+ */
+ __le32 flags; /* Flags describing the file. (Copy from STANDARD_INFORMATION) */
+ unsigned long mft_no; /* Number of the mft record / inode. */
+ u16 seq_no; /* Sequence number of the mft record. */
+ atomic_t count; /* Inode reference count for book keeping. */
+ struct ntfs_volume *vol; /* Pointer to the ntfs volume of this inode. */
+
+ /*
+ * If NInoAttr() is true, the below fields describe the attribute which
+ * this fake inode belongs to. The actual inode of this attribute is
+ * pointed to by base_ntfs_ino and nr_extents is always set to -1 (see
+ * below). For real inodes, we also set the type (AT_DATA for files and
+ * AT_INDEX_ALLOCATION for directories), with the name = NULL and
+ * name_len = 0 for files and name = I30 (global constant) and
+ * name_len = 4 for directories.
+ */
+ __le32 type; /* Attribute type of this fake inode. */
+ __le16 *name; /* Attribute name of this fake inode. */
+ u32 name_len; /* Attribute name length of this fake inode. */
+ struct runlist runlist; /*
+ * If state has the NI_NonResident bit set,
+ * the runlist of the unnamed data attribute
+ * (if a file) or of the index allocation
+ * attribute (directory) or of the attribute
+ * described by the fake inode (if NInoAttr()).
+ * If runlist.rl is NULL, the runlist has not
+ * been read in yet or has been unmapped. If
+ * NI_NonResident is clear, the attribute is
+ * resident (file and fake inode) or there is
+ * no $I30 index allocation attribute
+ * (small directory). In the latter case
+ * runlist.rl is always NULL.
+ */
+ s64 lcn_seek_trunc;
+
+ s64 data_size; /* Copy from the attribute record. */
+ s64 initialized_size; /* Copy from the attribute record. */
+ s64 allocated_size; /* Copy from the attribute record. */
+
+ struct timespec64 i_crtime;
+
+ /*
+ * The following fields are only valid for real inodes and extent
+ * inodes.
+ */
+ void *mrec;
+ struct mutex mrec_lock; /*
+ * Lock for serializing access to the
+ * mft record belonging to this inode.
+ */
+ struct folio *folio; /*
+ * The folio containing the mft record of the
+ * inode. This should only be touched by the
+ * (un)map_mft_record*() functions.
+ */
+ int folio_ofs; /*
+ * Offset into the folio at which the mft record
+ * begins. This should only be touched by the
+ * (un)map_mft_record*() functions.
+ */
+ s64 mft_lcn[2]; /* s64 number containing the mft record */
+ unsigned int mft_lcn_count;
+
+ /*
+ * Attribute list support (only for use by the attribute lookup
+ * functions). Setup during read_inode for all inodes with attribute
+ * lists. Only valid if NI_AttrList is set in state.
+ */
+ u32 attr_list_size; /* Length of attribute list value in bytes. */
+ u8 *attr_list; /* Attribute list value itself. */
+
+ union {
+ struct { /* It is a directory, $MFT, or an index inode. */
+ u32 block_size; /* Size of an index block. */
+ u32 vcn_size; /* Size of a vcn in this index. */
+ __le32 collation_rule; /* The collation rule for the index. */
+ u8 block_size_bits; /* Log2 of the above. */
+ u8 vcn_size_bits; /* Log2 of the above. */
+ } index;
+ struct { /* It is a compressed/sparse file/attribute inode. */
+ s64 size; /* Copy of compressed_size from $DATA. */
+ u32 block_size; /* Size of a compression block (cb). */
+ u8 block_size_bits; /* Log2 of the size of a cb. */
+ u8 block_clusters; /* Number of clusters per cb. */
+ } compressed;
+ } itype;
+ struct mutex extent_lock; /* Lock for accessing/modifying the below . */
+ s32 nr_extents; /*
+ * For a base mft record, the number of attached extent\
+ * inodes (0 if none), for extent records and for fake
+ * inodes describing an attribute this is -1.
+ */
+ union { /* This union is only used if nr_extents != 0. */
+ struct ntfs_inode **extent_ntfs_inos; /*
+ * For nr_extents > 0, array of
+ * the ntfs inodes of the extent
+ * mft records belonging to
+ * this base inode which have
+ * been loaded.
+ */
+ struct ntfs_inode *base_ntfs_ino; /*
+ * For nr_extents == -1, the
+ * ntfs inode of the base mft
+ * record. For fake inodes, the
+ * real (base) inode to which
+ * the attribute belongs.
+ */
+ } ext;
+
+ unsigned int i_dealloc_clusters;
+ char *target;
+};
+
+/*
+ * Defined bits for the state field in the ntfs_inode structure.
+ * (f) = files only, (d) = directories only, (a) = attributes/fake inodes only
+ */
+enum {
+ NI_Dirty, /* 1: Mft record needs to be written to disk. */
+ NI_AttrListDirty, /* 1: Mft record contains an attribute list. */
+ NI_AttrList, /* 1: Mft record contains an attribute list. */
+ NI_AttrListNonResident, /*
+ * 1: Attribute list is non-resident. Implies
+ * NI_AttrList is set.
+ */
+
+ NI_Attr, /*
+ * 1: Fake inode for attribute i/o.
+ * 0: Real inode or extent inode.
+ */
+
+ NI_MstProtected, /*
+ * 1: Attribute is protected by MST fixups.
+ * 0: Attribute is not protected by fixups.
+ */
+ NI_NonResident, /*
+ * 1: Unnamed data attr is non-resident (f).
+ * 1: Attribute is non-resident (a).
+ */
+ NI_IndexAllocPresent, /* 1: $I30 index alloc attr is present (d). */
+ NI_Compressed, /*
+ * 1: Unnamed data attr is compressed (f).
+ * 1: Create compressed files by default (d).
+ * 1: Attribute is compressed (a).
+ */
+ NI_Encrypted, /*
+ * 1: Unnamed data attr is encrypted (f).
+ * 1: Create encrypted files by default (d).
+ * 1: Attribute is encrypted (a).
+ */
+ NI_Sparse, /*
+ * 1: Unnamed data attr is sparse (f).
+ * 1: Create sparse files by default (d).
+ * 1: Attribute is sparse (a).
+ */
+ NI_SparseDisabled, /* 1: May not create sparse regions. */
+ NI_FullyMapped,
+ NI_FileNameDirty,
+ NI_BeingDeleted,
+ NI_BeingCreated,
+ NI_HasEA,
+ NI_RunlistDirty,
+};
+
+/*
+ * NOTE: We should be adding dirty mft records to a list somewhere and they
+ * should be independent of the (ntfs/vfs) inode structure so that an inode can
+ * be removed but the record can be left dirty for syncing later.
+ */
+
+/*
+ * Macro tricks to expand the NInoFoo(), NInoSetFoo(), and NInoClearFoo()
+ * functions.
+ */
+#define NINO_FNS(flag) \
+static inline int NIno##flag(struct ntfs_inode *ni) \
+{ \
+ return test_bit(NI_##flag, &(ni)->state); \
+} \
+static inline void NInoSet##flag(struct ntfs_inode *ni) \
+{ \
+ set_bit(NI_##flag, &(ni)->state); \
+} \
+static inline void NInoClear##flag(struct ntfs_inode *ni) \
+{ \
+ clear_bit(NI_##flag, &(ni)->state); \
+}
+
+/*
+ * As above for NInoTestSetFoo() and NInoTestClearFoo().
+ */
+#define TAS_NINO_FNS(flag) \
+static inline int NInoTestSet##flag(struct ntfs_inode *ni) \
+{ \
+ return test_and_set_bit(NI_##flag, &(ni)->state); \
+} \
+static inline int NInoTestClear##flag(struct ntfs_inode *ni) \
+{ \
+ return test_and_clear_bit(NI_##flag, &(ni)->state); \
+}
+
+/* Emit the ntfs inode bitops functions. */
+NINO_FNS(Dirty)
+TAS_NINO_FNS(Dirty)
+NINO_FNS(AttrList)
+NINO_FNS(AttrListDirty)
+NINO_FNS(AttrListNonResident)
+NINO_FNS(Attr)
+NINO_FNS(MstProtected)
+NINO_FNS(NonResident)
+NINO_FNS(IndexAllocPresent)
+NINO_FNS(Compressed)
+NINO_FNS(Encrypted)
+NINO_FNS(Sparse)
+NINO_FNS(SparseDisabled)
+NINO_FNS(FullyMapped)
+NINO_FNS(FileNameDirty)
+TAS_NINO_FNS(FileNameDirty)
+NINO_FNS(BeingDeleted)
+NINO_FNS(HasEA)
+NINO_FNS(RunlistDirty)
+
+/*
+ * The full structure containing a ntfs_inode and a vfs struct inode. Used for
+ * all real and fake inodes but not for extent inodes which lack the vfs struct
+ * inode.
+ */
+struct big_ntfs_inode {
+ struct ntfs_inode ntfs_inode;
+ struct inode vfs_inode; /* The vfs inode structure. */
+};
+
+/**
+ * NTFS_I - return the ntfs inode given a vfs inode
+ * @inode: VFS inode
+ *
+ * NTFS_I() returns the ntfs inode associated with the VFS @inode.
+ */
+static inline struct ntfs_inode *NTFS_I(struct inode *inode)
+{
+ return (struct ntfs_inode *)container_of(inode, struct big_ntfs_inode, vfs_inode);
+}
+
+static inline struct inode *VFS_I(struct ntfs_inode *ni)
+{
+ return &((struct big_ntfs_inode *)ni)->vfs_inode;
+}
+
+/**
+ * ntfs_attr - ntfs in memory attribute structure
+ *
+ * This structure exists only to provide a small structure for the
+ * ntfs_{attr_}iget()/ntfs_test_inode()/ntfs_init_locked_inode() mechanism.
+ *
+ * NOTE: Elements are ordered by size to make the structure as compact as
+ * possible on all architectures.
+ */
+struct ntfs_attr {
+ unsigned long mft_no;
+ __le16 *name;
+ u32 name_len;
+ __le32 type;
+ unsigned long state;
+};
+
+int ntfs_test_inode(struct inode *vi, void *data);
+struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no);
+struct inode *ntfs_attr_iget(struct inode *base_vi, __le32 type,
+ __le16 *name, u32 name_len);
+struct inode *ntfs_index_iget(struct inode *base_vi, __le16 *name,
+ u32 name_len);
+struct inode *ntfs_alloc_big_inode(struct super_block *sb);
+void ntfs_free_big_inode(struct inode *inode);
+int ntfs_drop_big_inode(struct inode *inode);
+void ntfs_evict_big_inode(struct inode *vi);
+void __ntfs_init_inode(struct super_block *sb, struct ntfs_inode *ni);
+
+static inline void ntfs_init_big_inode(struct inode *vi)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+
+ ntfs_debug("Entering.");
+ __ntfs_init_inode(vi->i_sb, ni);
+ ni->mft_no = vi->i_ino;
+}
+
+struct ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
+ unsigned long mft_no);
+void ntfs_clear_extent_inode(struct ntfs_inode *ni);
+int ntfs_read_inode_mount(struct inode *vi);
+int ntfs_show_options(struct seq_file *sf, struct dentry *root);
+int ntfs_truncate_vfs(struct inode *vi, loff_t new_size, loff_t i_size);
+
+int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct iattr *attr);
+int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
+ struct kstat *stat, unsigned int request_mask,
+ unsigned int query_flags);
+
+int __ntfs_write_inode(struct inode *vi, int sync);
+int ntfs_inode_attach_all_extents(struct ntfs_inode *ni);
+int ntfs_inode_add_attrlist(struct ntfs_inode *ni);
+void ntfs_destroy_ext_inode(struct ntfs_inode *ni);
+int ntfs_inode_free_space(struct ntfs_inode *ni, int size);
+s64 ntfs_inode_attr_pread(struct inode *vi, s64 pos, s64 count, u8 *buf);
+s64 ntfs_inode_attr_pwrite(struct inode *vi, s64 pos, s64 count, u8 *buf,
+ bool sync);
+int ntfs_inode_close(struct ntfs_inode *ni);
+
+static inline void ntfs_commit_inode(struct inode *vi)
+{
+ __ntfs_write_inode(vi, 1);
+}
+
+int ntfs_inode_sync_filename(struct ntfs_inode *ni);
+int ntfs_extend_initialized_size(struct inode *vi, const loff_t offset,
+ const loff_t new_size);
+void ntfs_set_vfs_operations(struct inode *inode, mode_t mode, dev_t dev);
+
+#endif /* _LINUX_NTFS_INODE_H */
diff --git a/fs/ntfsplus/layout.h b/fs/ntfsplus/layout.h
new file mode 100644
index 000000000000..d0067e4c975a
--- /dev/null
+++ b/fs/ntfsplus/layout.h
@@ -0,0 +1,2288 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * All NTFS associated on-disk structures. Part of the Linux-NTFS
+ * project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ */
+
+#ifndef _LINUX_NTFS_LAYOUT_H
+#define _LINUX_NTFS_LAYOUT_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/list.h>
+#include <asm/byteorder.h>
+
+/* The NTFS oem_id "NTFS " */
+#define magicNTFS cpu_to_le64(0x202020205346544eULL)
+
+/*
+ * Location of bootsector on partition:
+ * The standard NTFS_BOOT_SECTOR is on sector 0 of the partition.
+ * On NT4 and above there is one backup copy of the boot sector to
+ * be found on the last sector of the partition (not normally accessible
+ * from within Windows as the bootsector contained number of sectors
+ * value is one less than the actual value!).
+ * On versions of NT 3.51 and earlier, the backup copy was located at
+ * number of sectors/2 (integer divide), i.e. in the middle of the volume.
+ */
+
+/*
+ * BIOS parameter block (bpb) structure.
+ */
+struct bios_parameter_block {
+ __le16 bytes_per_sector; /* Size of a sector in bytes. */
+ u8 sectors_per_cluster; /* Size of a cluster in sectors. */
+ __le16 reserved_sectors; /* zero */
+ u8 fats; /* zero */
+ __le16 root_entries; /* zero */
+ __le16 sectors; /* zero */
+ u8 media_type; /* 0xf8 = hard disk */
+ __le16 sectors_per_fat; /* zero */
+ __le16 sectors_per_track; /* irrelevant */
+ __le16 heads; /* irrelevant */
+ __le32 hidden_sectors; /* zero */
+ __le32 large_sectors; /* zero */
+} __packed;
+
+/*
+ * NTFS boot sector structure.
+ */
+struct ntfs_boot_sector {
+ u8 jump[3]; /* Irrelevant (jump to boot up code).*/
+ __le64 oem_id; /* Magic "NTFS ". */
+ struct bios_parameter_block bpb; /* See BIOS_PARAMETER_BLOCK. */
+ u8 unused[4]; /*
+ * zero, NTFS diskedit.exe states that
+ * this is actually:
+ * __u8 physical_drive; // 0x80
+ * __u8 current_head; // zero
+ * __u8 extended_boot_signature;
+ * // 0x80
+ * __u8 unused; // zero
+ */
+ __le64 number_of_sectors; /*
+ * Number of sectors in volume. Gives
+ * maximum volume size of 2^63 sectors.
+ * Assuming standard sector size of 512
+ * bytes, the maximum byte size is
+ * approx. 4.7x10^21 bytes. (-;
+ */
+ __le64 mft_lcn; /* Cluster location of mft data. */
+ __le64 mftmirr_lcn; /* Cluster location of copy of mft. */
+ s8 clusters_per_mft_record; /* Mft record size in clusters. */
+ u8 reserved0[3]; /* zero */
+ s8 clusters_per_index_record; /* Index block size in clusters. */
+ u8 reserved1[3]; /* zero */
+ __le64 volume_serial_number; /* Irrelevant (serial number). */
+ __le32 checksum; /* Boot sector checksum. */
+ u8 bootstrap[426]; /* Irrelevant (boot up code). */
+ __le16 end_of_sector_marker; /*
+ * End of bootsector magic. Always is
+ * 0xaa55 in little endian.
+ */
+/* sizeof() = 512 (0x200) bytes */
+} __packed;
+
+/*
+ * Magic identifiers present at the beginning of all ntfs record containing
+ * records (like mft records for example).
+ */
+enum {
+ /* Found in $MFT/$DATA. */
+ magic_FILE = cpu_to_le32(0x454c4946), /* Mft entry. */
+ magic_INDX = cpu_to_le32(0x58444e49), /* Index buffer. */
+ magic_HOLE = cpu_to_le32(0x454c4f48), /* ? (NTFS 3.0+?) */
+
+ /* Found in LogFile/DATA. */
+ magic_RSTR = cpu_to_le32(0x52545352), /* Restart page. */
+ magic_RCRD = cpu_to_le32(0x44524352), /* Log record page. */
+
+ /* Found in LogFile/DATA. (May be found in $MFT/$DATA, also?) */
+ magic_CHKD = cpu_to_le32(0x444b4843), /* Modified by chkdsk. */
+
+ /* Found in all ntfs record containing records. */
+ magic_BAAD = cpu_to_le32(0x44414142), /*
+ * Failed multi sector
+ * transfer was detected.
+ */
+ /*
+ * Found in LogFile/DATA when a page is full of 0xff bytes and is
+ * thus not initialized. Page must be initialized before using it.
+ */
+ magic_empty = cpu_to_le32(0xffffffff) /* Record is empty. */
+};
+
+/*
+ * Generic magic comparison macros. Finally found a use for the ## preprocessor
+ * operator! (-8
+ */
+
+static inline bool __ntfs_is_magic(__le32 x, __le32 r)
+{
+ return (x == r);
+}
+#define ntfs_is_magic(x, m) __ntfs_is_magic(x, magic_##m)
+
+static inline bool __ntfs_is_magicp(__le32 *p, __le32 r)
+{
+ return (*p == r);
+}
+#define ntfs_is_magicp(p, m) __ntfs_is_magicp(p, magic_##m)
+
+/*
+ * Specialised magic comparison macros for the NTFS_RECORD_TYPEs defined above.
+ */
+#define ntfs_is_file_record(x) (ntfs_is_magic(x, FILE))
+#define ntfs_is_file_recordp(p) (ntfs_is_magicp(p, FILE))
+#define ntfs_is_mft_record(x) (ntfs_is_file_record(x))
+#define ntfs_is_mft_recordp(p) (ntfs_is_file_recordp(p))
+#define ntfs_is_indx_record(x) (ntfs_is_magic(x, INDX))
+#define ntfs_is_indx_recordp(p) (ntfs_is_magicp(p, INDX))
+#define ntfs_is_hole_record(x) (ntfs_is_magic(x, HOLE))
+#define ntfs_is_hole_recordp(p) (ntfs_is_magicp(p, HOLE))
+
+#define ntfs_is_rstr_record(x) (ntfs_is_magic(x, RSTR))
+#define ntfs_is_rstr_recordp(p) (ntfs_is_magicp(p, RSTR))
+#define ntfs_is_rcrd_record(x) (ntfs_is_magic(x, RCRD))
+#define ntfs_is_rcrd_recordp(p) (ntfs_is_magicp(p, RCRD))
+
+#define ntfs_is_chkd_record(x) (ntfs_is_magic(x, CHKD))
+#define ntfs_is_chkd_recordp(p) (ntfs_is_magicp(p, CHKD))
+
+#define ntfs_is_baad_record(x) (ntfs_is_magic(x, BAAD))
+#define ntfs_is_baad_recordp(p) (ntfs_is_magicp(p, BAAD))
+
+#define ntfs_is_empty_record(x) (ntfs_is_magic(x, empty))
+#define ntfs_is_empty_recordp(p) (ntfs_is_magicp(p, empty))
+
+/*
+ * The Update Sequence Array (usa) is an array of the __le16 values which belong
+ * to the end of each sector protected by the update sequence record in which
+ * this array is contained. Note that the first entry is the Update Sequence
+ * Number (usn), a cyclic counter of how many times the protected record has
+ * been written to disk. The values 0 and -1 (ie. 0xffff) are not used. All
+ * last le16's of each sector have to be equal to the usn (during reading) or
+ * are set to it (during writing). If they are not, an incomplete multi sector
+ * transfer has occurred when the data was written.
+ * The maximum size for the update sequence array is fixed to:
+ * maximum size = usa_ofs + (usa_count * 2) = 510 bytes
+ * The 510 bytes comes from the fact that the last __le16 in the array has to
+ * (obviously) finish before the last __le16 of the first 512-byte sector.
+ * This formula can be used as a consistency check in that usa_ofs +
+ * (usa_count * 2) has to be less than or equal to 510.
+ */
+struct ntfs_record {
+ __le32 magic; /*
+ * A four-byte magic identifying the record
+ * type and/or status.
+ */
+ __le16 usa_ofs; /*
+ * Offset to the Update Sequence Array (usa)
+ * from the start of the ntfs record.
+ */
+ __le16 usa_count; /*
+ * Number of __le16 sized entries in the usa
+ * including the Update Sequence Number (usn),
+ * thus the number of fixups is the usa_count
+ * minus 1.
+ */
+} __packed;
+
+/*
+ * System files mft record numbers. All these files are always marked as used
+ * in the bitmap attribute of the mft; presumably in order to avoid accidental
+ * allocation for random other mft records. Also, the sequence number for each
+ * of the system files is always equal to their mft record number and it is
+ * never modified.
+ */
+enum {
+ FILE_MFT = 0, /*
+ * Master file table (mft). Data attribute
+ * contains the entries and bitmap attribute
+ * records which ones are in use (bit==1).
+ */
+ FILE_MFTMirr = 1, /* Mft mirror: copy of first four mft records
+ * in data attribute. If cluster size > 4kiB,
+ * copy of first N mft records, with
+ * N = cluster_size / mft_record_size.
+ */
+ FILE_LogFile = 2, /* Journalling log in data attribute. */
+ FILE_Volume = 3, /*
+ * Volume name attribute and volume information
+ * attribute (flags and ntfs version). Windows
+ * refers to this file as volume DASD (Direct
+ * Access Storage Device).
+ */
+ FILE_AttrDef = 4, /*
+ * Array of attribute definitions in data
+ * attribute.
+ */
+ FILE_root = 5, /* Root directory. */
+ FILE_Bitmap = 6, /*
+ * Allocation bitmap of all clusters (lcns) in
+ * data attribute.
+ */
+ FILE_Boot = 7, /*
+ * Boot sector (always at cluster 0) in data
+ * attribute.
+ */
+ FILE_BadClus = 8, /*
+ * Contains all bad clusters in the non-resident
+ * data attribute.
+ */
+ FILE_Secure = 9, /*
+ * Shared security descriptors in data attribute
+ * and two indexes into the descriptors.
+ * Appeared in Windows 2000. Before that, this
+ * file was named $Quota but was unused.
+ */
+ FILE_UpCase = 10, /*
+ * Uppercase equivalents of all 65536 Unicode
+ * characters in data attribute.
+ */
+ FILE_Extend = 11, /*
+ * Directory containing other system files (eg.
+ * $ObjId, $Quota, $Reparse and $UsnJrnl). This
+ * is new to NTFS3.0.
+ */
+ FILE_reserved12 = 12, /* Reserved for future use (records 12-15). */
+ FILE_reserved13 = 13,
+ FILE_reserved14 = 14,
+ FILE_reserved15 = 15,
+ FILE_first_user = 16, /*
+ * First user file, used as test limit for
+ * whether to allow opening a file or not.
+ */
+};
+
+/*
+ * These are the so far known MFT_RECORD_* flags (16-bit) which contain
+ * information about the mft record in which they are present.
+ */
+enum {
+ MFT_RECORD_IN_USE = cpu_to_le16(0x0001),
+ MFT_RECORD_IS_DIRECTORY = cpu_to_le16(0x0002),
+ MFT_RECORD_IS_4 = cpu_to_le16(0x0004),
+ MFT_RECORD_IS_VIEW_INDEX = cpu_to_le16(0x0008),
+ MFT_REC_SPACE_FILLER = 0xffff, /*Just to make flags 16-bit.*/
+} __packed;
+
+/*
+ * mft references (aka file references or file record segment references) are
+ * used whenever a structure needs to refer to a record in the mft.
+ *
+ * A reference consists of a 48-bit index into the mft and a 16-bit sequence
+ * number used to detect stale references.
+ *
+ * For error reporting purposes we treat the 48-bit index as a signed quantity.
+ *
+ * The sequence number is a circular counter (skipping 0) describing how many
+ * times the referenced mft record has been (re)used. This has to match the
+ * sequence number of the mft record being referenced, otherwise the reference
+ * is considered stale and removed.
+ *
+ * If the sequence number is zero it is assumed that no sequence number
+ * consistency checking should be performed.
+ */
+
+/*
+ * Define two unpacking macros to get to the reference (MREF) and
+ * sequence number (MSEQNO) respectively.
+ * The _LE versions are to be applied on little endian MFT_REFs.
+ * Note: The _LE versions will return a CPU endian formatted value!
+ */
+#define MFT_REF_MASK_CPU 0x0000ffffffffffffULL
+#define MFT_REF_MASK_LE cpu_to_le64(MFT_REF_MASK_CPU)
+
+#define MK_MREF(m, s) ((u64)(((u64)(s) << 48) | \
+ ((u64)(m) & MFT_REF_MASK_CPU)))
+#define MK_LE_MREF(m, s) cpu_to_le64(MK_MREF(m, s))
+
+#define MREF(x) ((unsigned long)((x) & MFT_REF_MASK_CPU))
+#define MSEQNO(x) ((u16)(((x) >> 48) & 0xffff))
+#define MREF_LE(x) ((unsigned long)(le64_to_cpu(x) & MFT_REF_MASK_CPU))
+#define MREF_INO(x) ((unsigned long)MREF_LE(x))
+#define MSEQNO_LE(x) ((u16)((le64_to_cpu(x) >> 48) & 0xffff))
+
+#define IS_ERR_MREF(x) (((x) & 0x0000800000000000ULL) ? true : false)
+#define ERR_MREF(x) ((u64)((s64)(x)))
+#define MREF_ERR(x) ((int)((s64)(x)))
+
+/*
+ * The mft record header present at the beginning of every record in the mft.
+ * This is followed by a sequence of variable length attribute records which
+ * is terminated by an attribute of type AT_END which is a truncated attribute
+ * in that it only consists of the attribute type code AT_END and none of the
+ * other members of the attribute structure are present.
+ */
+struct mft_record {
+ __le32 magic; /* Usually the magic is "FILE". */
+ __le16 usa_ofs; /* See ntfs_record struct definition above. */
+ __le16 usa_count; /* See ntfs_record struct definition above. */
+
+ __le64 lsn; /*
+ * LogFile sequence number for this record.
+ * Changed every time the record is modified.
+ */
+ __le16 sequence_number; /*
+ * Number of times this mft record has been
+ * reused. (See description for MFT_REF
+ * above.) NOTE: The increment (skipping zero)
+ * is done when the file is deleted. NOTE: If
+ * this is zero it is left zero.
+ */
+ __le16 link_count; /*
+ * Number of hard links, i.e. the number of
+ * directory entries referencing this record.
+ * NOTE: Only used in mft base records.
+ * NOTE: When deleting a directory entry we
+ * check the link_count and if it is 1 we
+ * delete the file. Otherwise we delete the
+ * struct file_name_attr being referenced by the
+ * directory entry from the mft record and
+ * decrement the link_count.
+ */
+ __le16 attrs_offset; /*
+ * Byte offset to the first attribute in this
+ * mft record from the start of the mft record.
+ * NOTE: Must be aligned to 8-byte boundary.
+ */
+ __le16 flags; /*
+ * Bit array of MFT_RECORD_FLAGS. When a file
+ * is deleted, the MFT_RECORD_IN_USE flag is
+ * set to zero.
+ */
+ __le32 bytes_in_use; /*
+ * Number of bytes used in this mft record.
+ * NOTE: Must be aligned to 8-byte boundary.
+ */
+ __le32 bytes_allocated; /*
+ * Number of bytes allocated for this mft
+ * record. This should be equal to the mft
+ * record size.
+ */
+ __le64 base_mft_record; /*
+ * This is zero for base mft records.
+ * When it is not zero it is a mft reference
+ * pointing to the base mft record to which
+ * this record belongs (this is then used to
+ * locate the attribute list attribute present
+ * in the base record which describes this
+ * extension record and hence might need
+ * modification when the extension record
+ * itself is modified, also locating the
+ * attribute list also means finding the other
+ * potential extents, belonging to the non-base
+ * mft record).
+ */
+ __le16 next_attr_instance; /*
+ * The instance number that will be assigned to
+ * the next attribute added to this mft record.
+ * NOTE: Incremented each time after it is used.
+ * NOTE: Every time the mft record is reused
+ * this number is set to zero. NOTE: The first
+ * instance number is always 0.
+ */
+/* The below fields are specific to NTFS 3.1+ (Windows XP and above): */
+ __le16 reserved; /* Reserved/alignment. */
+ __le32 mft_record_number; /* Number of this mft record. */
+/* sizeof() = 48 bytes */
+/*
+ * When (re)using the mft record, we place the update sequence array at this
+ * offset, i.e. before we start with the attributes. This also makes sense,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean that
+ * multi sector transfer protection wouldn't work. As you can't protect data
+ * by overwriting it since you then can't get it back...
+ * When reading we obviously use the data from the ntfs record header.
+ */
+} __packed;
+
+/* This is the version without the NTFS 3.1+ specific fields. */
+struct mft_record_old {
+ __le32 magic; /* Usually the magic is "FILE". */
+ __le16 usa_ofs; /* See ntfs_record struct definition above. */
+ __le16 usa_count; /* See ntfs_record struct definition above. */
+
+ __le64 lsn; /*
+ * LogFile sequence number for this record.
+ * Changed every time the record is modified.
+ */
+ __le16 sequence_number; /*
+ * Number of times this mft record has been
+ * reused. (See description for MFT_REF
+ * above.) NOTE: The increment (skipping zero)
+ * is done when the file is deleted. NOTE: If
+ * this is zero it is left zero.
+ */
+ __le16 link_count; /*
+ * Number of hard links, i.e. the number of
+ * directory entries referencing this record.
+ * NOTE: Only used in mft base records.
+ * NOTE: When deleting a directory entry we
+ * check the link_count and if it is 1 we
+ * delete the file. Otherwise we delete the
+ * struct file_name_attr being referenced by the
+ * directory entry from the mft record and
+ * decrement the link_count.
+ */
+ __le16 attrs_offset; /*
+ * Byte offset to the first attribute in this
+ * mft record from the start of the mft record.
+ * NOTE: Must be aligned to 8-byte boundary.
+ */
+ __le16 flags; /*
+ * Bit array of MFT_RECORD_FLAGS. When a file
+ * is deleted, the MFT_RECORD_IN_USE flag is
+ * set to zero.
+ */
+ __le32 bytes_in_use; /*
+ * Number of bytes used in this mft record.
+ * NOTE: Must be aligned to 8-byte boundary.
+ */
+ __le32 bytes_allocated; /*
+ * Number of bytes allocated for this mft
+ * record. This should be equal to the mft
+ * record size.
+ */
+ __le64 base_mft_record; /*
+ * This is zero for base mft records.
+ * When it is not zero it is a mft reference
+ * pointing to the base mft record to which
+ * this record belongs (this is then used to
+ * locate the attribute list attribute present
+ * in the base record which describes this
+ * extension record and hence might need
+ * modification when the extension record
+ * itself is modified, also locating the
+ * attribute list also means finding the other
+ * potential extents, belonging to the non-base
+ * mft record).
+ */
+ __le16 next_attr_instance; /*
+ * The instance number that will be assigned to
+ * the next attribute added to this mft record.
+ * NOTE: Incremented each time after it is used.
+ * NOTE: Every time the mft record is reused
+ * this number is set to zero. NOTE: The first
+ * instance number is always 0.
+ */
+/* sizeof() = 42 bytes */
+/*
+ * When (re)using the mft record, we place the update sequence array at this
+ * offset, i.e. before we start with the attributes. This also makes sense,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean that
+ * multi sector transfer protection wouldn't work. As you can't protect data
+ * by overwriting it since you then can't get it back...
+ * When reading we obviously use the data from the ntfs record header.
+ */
+} __packed;
+
+/*
+ * System defined attributes (32-bit). Each attribute type has a corresponding
+ * attribute name (Unicode string of maximum 64 character length) as described
+ * by the attribute definitions present in the data attribute of the $AttrDef
+ * system file. On NTFS 3.0 volumes the names are just as the types are named
+ * in the below defines exchanging AT_ for the dollar sign ($). If that is not
+ * a revealing choice of symbol I do not know what is... (-;
+ */
+enum {
+ AT_UNUSED = cpu_to_le32(0),
+ AT_STANDARD_INFORMATION = cpu_to_le32(0x10),
+ AT_ATTRIBUTE_LIST = cpu_to_le32(0x20),
+ AT_FILE_NAME = cpu_to_le32(0x30),
+ AT_OBJECT_ID = cpu_to_le32(0x40),
+ AT_SECURITY_DESCRIPTOR = cpu_to_le32(0x50),
+ AT_VOLUME_NAME = cpu_to_le32(0x60),
+ AT_VOLUME_INFORMATION = cpu_to_le32(0x70),
+ AT_DATA = cpu_to_le32(0x80),
+ AT_INDEX_ROOT = cpu_to_le32(0x90),
+ AT_INDEX_ALLOCATION = cpu_to_le32(0xa0),
+ AT_BITMAP = cpu_to_le32(0xb0),
+ AT_REPARSE_POINT = cpu_to_le32(0xc0),
+ AT_EA_INFORMATION = cpu_to_le32(0xd0),
+ AT_EA = cpu_to_le32(0xe0),
+ AT_PROPERTY_SET = cpu_to_le32(0xf0),
+ AT_LOGGED_UTILITY_STREAM = cpu_to_le32(0x100),
+ AT_FIRST_USER_DEFINED_ATTRIBUTE = cpu_to_le32(0x1000),
+ AT_END = cpu_to_le32(0xffffffff)
+};
+
+/*
+ * The collation rules for sorting views/indexes/etc (32-bit).
+ *
+ * COLLATION_BINARY - Collate by binary compare where the first byte is most
+ * significant.
+ * COLLATION_UNICODE_STRING - Collate Unicode strings by comparing their binary
+ * Unicode values, except that when a character can be uppercased, the
+ * upper case value collates before the lower case one.
+ * COLLATION_FILE_NAME - Collate file names as Unicode strings. The collation
+ * is done very much like COLLATION_UNICODE_STRING. In fact I have no idea
+ * what the difference is. Perhaps the difference is that file names
+ * would treat some special characters in an odd way (see
+ * unistr.c::ntfs_collate_names() and unistr.c::legal_ansi_char_array[]
+ * for what I mean but COLLATION_UNICODE_STRING would not give any special
+ * treatment to any characters at all, but this is speculation.
+ * COLLATION_NTOFS_ULONG - Sorting is done according to ascending __le32 key
+ * values. E.g. used for $SII index in FILE_Secure, which sorts by
+ * security_id (le32).
+ * COLLATION_NTOFS_SID - Sorting is done according to ascending SID values.
+ * E.g. used for $O index in FILE_Extend/$Quota.
+ * COLLATION_NTOFS_SECURITY_HASH - Sorting is done first by ascending hash
+ * values and second by ascending security_id values. E.g. used for $SDH
+ * index in FILE_Secure.
+ * COLLATION_NTOFS_ULONGS - Sorting is done according to a sequence of ascending
+ * __le32 key values. E.g. used for $O index in FILE_Extend/$ObjId, which
+ * sorts by object_id (16-byte), by splitting up the object_id in four
+ * __le32 values and using them as individual keys. E.g. take the following
+ * two security_ids, stored as follows on disk:
+ * 1st: a1 61 65 b7 65 7b d4 11 9e 3d 00 e0 81 10 42 59
+ * 2nd: 38 14 37 d2 d2 f3 d4 11 a5 21 c8 6b 79 b1 97 45
+ * To compare them, they are split into four __le32 values each, like so:
+ * 1st: 0xb76561a1 0x11d47b65 0xe0003d9e 0x59421081
+ * 2nd: 0xd2371438 0x11d4f3d2 0x6bc821a5 0x4597b179
+ * Now, it is apparent why the 2nd object_id collates after the 1st: the
+ * first __le32 value of the 1st object_id is less than the first __le32 of
+ * the 2nd object_id. If the first __le32 values of both object_ids were
+ * equal then the second __le32 values would be compared, etc.
+ */
+enum {
+ COLLATION_BINARY = cpu_to_le32(0x00),
+ COLLATION_FILE_NAME = cpu_to_le32(0x01),
+ COLLATION_UNICODE_STRING = cpu_to_le32(0x02),
+ COLLATION_NTOFS_ULONG = cpu_to_le32(0x10),
+ COLLATION_NTOFS_SID = cpu_to_le32(0x11),
+ COLLATION_NTOFS_SECURITY_HASH = cpu_to_le32(0x12),
+ COLLATION_NTOFS_ULONGS = cpu_to_le32(0x13),
+};
+
+/*
+ * The flags (32-bit) describing attribute properties in the attribute
+ * definition structure.
+ * The INDEXABLE flag is fairly certainly correct as only the file
+ * name attribute has this flag set and this is the only attribute indexed in
+ * NT4.
+ */
+enum {
+ ATTR_DEF_INDEXABLE = cpu_to_le32(0x02), /* Attribute can be indexed. */
+ ATTR_DEF_MULTIPLE = cpu_to_le32(0x04), /*
+ * Attribute type can be present
+ * multiple times in the mft records
+ * of an inode.
+ */
+ ATTR_DEF_NOT_ZERO = cpu_to_le32(0x08), /*
+ * Attribute value must contain
+ * at least one non-zero byte.
+ */
+ ATTR_DEF_INDEXED_UNIQUE = cpu_to_le32(0x10), /*
+ * Attribute must be indexed and
+ * the attribute value must be unique
+ * for the attribute type in all of
+ * the mft records of an inode.
+ */
+ ATTR_DEF_NAMED_UNIQUE = cpu_to_le32(0x20), /*
+ * Attribute must be named and
+ * the name must be unique for
+ * the attribute type in all of the mft
+ * records of an inode.
+ */
+ ATTR_DEF_RESIDENT = cpu_to_le32(0x40), /* Attribute must be resident. */
+ ATTR_DEF_ALWAYS_LOG = cpu_to_le32(0x80), /*
+ * Always log modifications to this attribute,
+ * regardless of whether it is resident or
+ * non-resident. Without this, only log
+ * modifications if the attribute is resident.
+ */
+};
+
+/*
+ * The data attribute of FILE_AttrDef contains a sequence of attribute
+ * definitions for the NTFS volume. With this, it is supposed to be safe for an
+ * older NTFS driver to mount a volume containing a newer NTFS version without
+ * damaging it (that's the theory. In practice it's: not damaging it too much).
+ * Entries are sorted by attribute type. The flags describe whether the
+ * attribute can be resident/non-resident and possibly other things, but the
+ * actual bits are unknown.
+ */
+struct attr_def {
+ __le16 name[0x40]; /* Unicode name of the attribute. Zero terminated. */
+ __le32 type; /* Type of the attribute. */
+ __le32 display_rule; /* Default display rule. */
+ __le32 collation_rule; /* Default collation rule. */
+ __le32 flags; /* Flags describing the attribute. */
+ __le64 min_size; /* Optional minimum attribute size. */
+ __le64 max_size; /* Maximum size of attribute. */
+/* sizeof() = 0xa0 or 160 bytes */
+} __packed;
+
+/*
+ * Attribute flags (16-bit).
+ */
+enum {
+ ATTR_IS_COMPRESSED = cpu_to_le16(0x0001),
+ ATTR_COMPRESSION_MASK = cpu_to_le16(0x00ff), /*
+ * Compression method mask.
+ * Also, first illegal value.
+ */
+ ATTR_IS_ENCRYPTED = cpu_to_le16(0x4000),
+ ATTR_IS_SPARSE = cpu_to_le16(0x8000),
+} __packed;
+
+/*
+ * Attribute compression.
+ *
+ * Only the data attribute is ever compressed in the current ntfs driver in
+ * Windows. Further, compression is only applied when the data attribute is
+ * non-resident. Finally, to use compression, the maximum allowed cluster size
+ * on a volume is 4kib.
+ *
+ * The compression method is based on independently compressing blocks of X
+ * clusters, where X is determined from the compression_unit value found in the
+ * non-resident attribute record header (more precisely: X = 2^compression_unit
+ * clusters). On Windows NT/2k, X always is 16 clusters (compression_unit = 4).
+ *
+ * There are three different cases of how a compression block of X clusters
+ * can be stored:
+ *
+ * 1) The data in the block is all zero (a sparse block):
+ * This is stored as a sparse block in the runlist, i.e. the runlist
+ * entry has length = X and lcn = -1. The mapping pairs array actually
+ * uses a delta_lcn value length of 0, i.e. delta_lcn is not present at
+ * all, which is then interpreted by the driver as lcn = -1.
+ * NOTE: Even uncompressed files can be sparse on NTFS 3.0 volumes, then
+ * the same principles apply as above, except that the length is not
+ * restricted to being any particular value.
+ *
+ * 2) The data in the block is not compressed:
+ * This happens when compression doesn't reduce the size of the block
+ * in clusters. I.e. if compression has a small effect so that the
+ * compressed data still occupies X clusters, then the uncompressed data
+ * is stored in the block.
+ * This case is recognised by the fact that the runlist entry has
+ * length = X and lcn >= 0. The mapping pairs array stores this as
+ * normal with a run length of X and some specific delta_lcn, i.e.
+ * delta_lcn has to be present.
+ *
+ * 3) The data in the block is compressed:
+ * The common case. This case is recognised by the fact that the run
+ * list entry has length L < X and lcn >= 0. The mapping pairs array
+ * stores this as normal with a run length of X and some specific
+ * delta_lcn, i.e. delta_lcn has to be present. This runlist entry is
+ * immediately followed by a sparse entry with length = X - L and
+ * lcn = -1. The latter entry is to make up the vcn counting to the
+ * full compression block size X.
+ *
+ * In fact, life is more complicated because adjacent entries of the same type
+ * can be coalesced. This means that one has to keep track of the number of
+ * clusters handled and work on a basis of X clusters at a time being one
+ * block. An example: if length L > X this means that this particular runlist
+ * entry contains a block of length X and part of one or more blocks of length
+ * L - X. Another example: if length L < X, this does not necessarily mean that
+ * the block is compressed as it might be that the lcn changes inside the block
+ * and hence the following runlist entry describes the continuation of the
+ * potentially compressed block. The block would be compressed if the
+ * following runlist entry describes at least X - L sparse clusters, thus
+ * making up the compression block length as described in point 3 above. (Of
+ * course, there can be several runlist entries with small lengths so that the
+ * sparse entry does not follow the first data containing entry with
+ * length < X.)
+ *
+ * NOTE: At the end of the compressed attribute value, there most likely is not
+ * just the right amount of data to make up a compression block, thus this data
+ * is not even attempted to be compressed. It is just stored as is, unless
+ * the number of clusters it occupies is reduced when compressed in which case
+ * it is stored as a compressed compression block, complete with sparse
+ * clusters at the end.
+ */
+
+/*
+ * Flags of resident attributes (8-bit).
+ */
+enum {
+ RESIDENT_ATTR_IS_INDEXED = 0x01, /*
+ * Attribute is referenced in an index
+ * (has implications for deleting and
+ * modifying the attribute).
+ */
+} __packed;
+
+/*
+ * Attribute record header. Always aligned to 8-byte boundary.
+ */
+struct attr_record {
+ __le32 type; /* The (32-bit) type of the attribute. */
+ __le32 length; /*
+ * Byte size of the resident part of the
+ * attribute (aligned to 8-byte boundary).
+ * Used to get to the next attribute.
+ */
+ u8 non_resident; /*
+ * If 0, attribute is resident.
+ * If 1, attribute is non-resident.
+ */
+ u8 name_length; /* Unicode character size of name of attribute. 0 if unnamed. */
+ __le16 name_offset; /*
+ * If name_length != 0, the byte offset to the
+ * beginning of the name from the attribute
+ * record. Note that the name is stored as a
+ * Unicode string. When creating, place offset
+ * just at the end of the record header. Then,
+ * follow with attribute value or mapping pairs
+ * array, resident and non-resident attributes
+ * respectively, aligning to an 8-byte
+ * boundary.
+ */
+ __le16 flags; /* Flags describing the attribute. */
+ __le16 instance; /*
+ * The instance of this attribute record. This
+ * number is unique within this mft record (see
+ * MFT_RECORD/next_attribute_instance notes in
+ * mft.h for more details).
+ */
+ union {
+ /* Resident attributes. */
+ struct {
+ __le32 value_length; /* Byte size of attribute value. */
+ __le16 value_offset; /*
+ * Byte offset of the attribute
+ * value from the start of the
+ * attribute record. When creating,
+ * align to 8-byte boundary if we
+ * have a name present as this might
+ * not have a length of a multiple
+ * of 8-bytes.
+ */
+ u8 flags; /* See above. */
+ s8 reserved; /* Reserved/alignment to 8-byte boundary. */
+ } __packed resident;
+ /* Non-resident attributes. */
+ struct {
+ __le64 lowest_vcn; /*
+ * Lowest valid virtual cluster number
+ * for this portion of the attribute value or
+ * 0 if this is the only extent (usually the
+ * case). - Only when an attribute list is used
+ * does lowest_vcn != 0 ever occur.
+ */
+ __le64 highest_vcn; /*
+ * Highest valid vcn of this extent of
+ * the attribute value. - Usually there is only one
+ * portion, so this usually equals the attribute
+ * value size in clusters minus 1. Can be -1 for
+ * zero length files. Can be 0 for "single extent"
+ * attributes.
+ */
+ __le16 mapping_pairs_offset; /*
+ * Byte offset from the beginning of
+ * the structure to the mapping pairs
+ * array which contains the mappings
+ * between the vcns and the logical cluster
+ * numbers (lcns).
+ * When creating, place this at the end of
+ * this record header aligned to 8-byte
+ * boundary.
+ */
+ u8 compression_unit; /*
+ * The compression unit expressed as the log
+ * to the base 2 of the number of
+ * clusters in a compression unit. 0 means not
+ * compressed. (This effectively limits the
+ * compression unit size to be a power of two
+ * clusters.) WinNT4 only uses a value of 4.
+ * Sparse files have this set to 0 on XPSP2.
+ */
+ u8 reserved[5]; /* Align to 8-byte boundary. */
+/*
+ * The sizes below are only used when lowest_vcn is zero, as otherwise it would
+ * be difficult to keep them up-to-date.
+ */
+ __le64 allocated_size; /*
+ * Byte size of disk space allocated
+ * to hold the attribute value. Always
+ * is a multiple of the cluster size.
+ * When a file is compressed, this field
+ * is a multiple of the compression block
+ * size (2^compression_unit) and it represents
+ * the logically allocated space rather than
+ * the actual on disk usage. For this use
+ * the compressed_size (see below).
+ */
+ __le64 data_size; /*
+ * Byte size of the attribute value. Can be
+ * larger than allocated_size if attribute value
+ * is compressed or sparse.
+ */
+ __le64 initialized_size; /*
+ * Byte size of initialized portion of
+ * the attribute value. Usually equals data_size.
+ */
+/* sizeof(uncompressed attr) = 64*/
+ __le64 compressed_size; /*
+ * Byte size of the attribute value after
+ * compression. Only present when compressed
+ * or sparse. Always is a multiple of the cluster
+ * size. Represents the actual amount of disk
+ * space being used on the disk.
+ */
+/* sizeof(compressed attr) = 72*/
+ } __packed non_resident;
+ } __packed data;
+} __packed;
+
+/*
+ * File attribute flags (32-bit) appearing in the file_attributes fields of the
+ * STANDARD_INFORMATION attribute of MFT_RECORDs and the FILENAME_ATTR
+ * attributes of MFT_RECORDs and directory index entries.
+ *
+ * All of the below flags appear in the directory index entries but only some
+ * appear in the STANDARD_INFORMATION attribute whilst only some others appear
+ * in the FILENAME_ATTR attribute of MFT_RECORDs. Unless otherwise stated the
+ * flags appear in all of the above.
+ */
+enum {
+ FILE_ATTR_READONLY = cpu_to_le32(0x00000001),
+ FILE_ATTR_HIDDEN = cpu_to_le32(0x00000002),
+ FILE_ATTR_SYSTEM = cpu_to_le32(0x00000004),
+ /* Old DOS volid. Unused in NT. = cpu_to_le32(0x00000008), */
+
+ FILE_ATTR_DIRECTORY = cpu_to_le32(0x00000010),
+ /*
+ * Note, FILE_ATTR_DIRECTORY is not considered valid in NT. It is
+ * reserved for the DOS SUBDIRECTORY flag.
+ */
+ FILE_ATTR_ARCHIVE = cpu_to_le32(0x00000020),
+ FILE_ATTR_DEVICE = cpu_to_le32(0x00000040),
+ FILE_ATTR_NORMAL = cpu_to_le32(0x00000080),
+
+ FILE_ATTR_TEMPORARY = cpu_to_le32(0x00000100),
+ FILE_ATTR_SPARSE_FILE = cpu_to_le32(0x00000200),
+ FILE_ATTR_REPARSE_POINT = cpu_to_le32(0x00000400),
+ FILE_ATTR_COMPRESSED = cpu_to_le32(0x00000800),
+
+ FILE_ATTR_OFFLINE = cpu_to_le32(0x00001000),
+ FILE_ATTR_NOT_CONTENT_INDEXED = cpu_to_le32(0x00002000),
+ FILE_ATTR_ENCRYPTED = cpu_to_le32(0x00004000),
+
+ FILE_ATTR_VALID_FLAGS = cpu_to_le32(0x00007fb7),
+ /*
+ * Note, FILE_ATTR_VALID_FLAGS masks out the old DOS VolId and the
+ * FILE_ATTR_DEVICE and preserves everything else. This mask is used
+ * to obtain all flags that are valid for reading.
+ */
+ FILE_ATTR_VALID_SET_FLAGS = cpu_to_le32(0x000031a7),
+ /*
+ * Note, FILE_ATTR_VALID_SET_FLAGS masks out the old DOS VolId, the
+ * F_A_DEVICE, F_A_DIRECTORY, F_A_SPARSE_FILE, F_A_REPARSE_POINT,
+ * F_A_COMPRESSED, and F_A_ENCRYPTED and preserves the rest. This mask
+ * is used to obtain all flags that are valid for setting.
+ */
+ /* Supposed to mean no data locally, possibly repurposed */
+ FILE_ATTRIBUTE_RECALL_ON_OPEN = cpu_to_le32(0x00040000),
+ /*
+ * The flag FILE_ATTR_DUP_FILENAME_INDEX_PRESENT is present in all
+ * FILENAME_ATTR attributes but not in the STANDARD_INFORMATION
+ * attribute of an mft record.
+ */
+ FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT = cpu_to_le32(0x10000000),
+ /*
+ * Note, this is a copy of the corresponding bit from the mft record,
+ * telling us whether this is a directory or not, i.e. whether it has
+ * an index root attribute or not.
+ */
+ FILE_ATTR_DUP_VIEW_INDEX_PRESENT = cpu_to_le32(0x20000000),
+ /*
+ * Note, this is a copy of the corresponding bit from the mft record,
+ * telling us whether this file has a view index present (eg. object id
+ * index, quota index, one of the security indexes or the encrypting
+ * filesystem related indexes).
+ */
+};
+
+/*
+ * NOTE on times in NTFS: All times are in MS standard time format, i.e. they
+ * are the number of 100-nanosecond intervals since 1st January 1601, 00:00:00
+ * universal coordinated time (UTC). (In Linux time starts 1st January 1970,
+ * 00:00:00 UTC and is stored as the number of 1-second intervals since then.)
+ */
+
+/*
+ * Attribute: Standard information (0x10).
+ *
+ * NOTE: Always resident.
+ * NOTE: Present in all base file records on a volume.
+ * NOTE: There is conflicting information about the meaning of each of the time
+ * fields but the meaning as defined below has been verified to be
+ * correct by practical experimentation on Windows NT4 SP6a and is hence
+ * assumed to be the one and only correct interpretation.
+ */
+struct standard_information {
+ __le64 creation_time; /*
+ * Time file was created. Updated when
+ * a filename is changed(?).
+ */
+ __le64 last_data_change_time; /* Time the data attribute was last modified. */
+ __le64 last_mft_change_time; /* Time this mft record was last modified. */
+ __le64 last_access_time; /*
+ * Approximate time when the file was
+ * last accessed (obviously this is not
+ * updated on read-only volumes). In
+ * Windows this is only updated when
+ * accessed if some time delta has
+ * passed since the last update. Also,
+ * last access time updates can be
+ * disabled altogether for speed.
+ */
+ __le32 file_attributes; /* Flags describing the file. */
+ union {
+ /* NTFS 1.2 */
+ struct {
+ u8 reserved12[12]; /* Reserved/alignment to 8-byte boundary. */
+ } __packed v1;
+ /* sizeof() = 48 bytes */
+ /* NTFS 3.x */
+ struct {
+/*
+ * If a volume has been upgraded from a previous NTFS version, then these
+ * fields are present only if the file has been accessed since the upgrade.
+ * Recognize the difference by comparing the length of the resident attribute
+ * value. If it is 48, then the following fields are missing. If it is 72 then
+ * the fields are present. Maybe just check like this:
+ * if (resident.ValueLength < sizeof(struct standard_information)) {
+ * Assume NTFS 1.2- format.
+ * If (volume version is 3.x)
+ * Upgrade attribute to NTFS 3.x format.
+ * else
+ * Use NTFS 1.2- format for access.
+ * } else
+ * Use NTFS 3.x format for access.
+ * Only problem is that it might be legal to set the length of the value to
+ * arbitrarily large values thus spoiling this check. - But chkdsk probably
+ * views that as a corruption, assuming that it behaves like this for all
+ * attributes.
+ */
+ __le32 maximum_versions; /*
+ * Maximum allowed versions for
+ * file. Zero if version numbering
+ * is disabled.
+ */
+ __le32 version_number; /*
+ * This file's version (if any).
+ * Set to zero if maximum_versions
+ * is zero.
+ */
+ __le32 class_id; /*
+ * Class id from bidirectional
+ * class id index (?).
+ */
+ __le32 owner_id; /*
+ * Owner_id of the user owning
+ * the file. Translate via $Q index
+ * in FILE_Extend /$Quota to the quota
+ * control entry for the user owning
+ * the file. Zero if quotas are disabled.
+ */
+ __le32 security_id; /*
+ * Security_id for the file. Translate via
+ * $SII index and $SDS data stream in
+ * FILE_Secure to the security descriptor.
+ */
+ __le64 quota_charged; /*
+ * Byte size of the charge to the quota for
+ * all streams of the file. Note: Is zero
+ * if quotas are disabled.
+ */
+ __le64 usn; /*
+ * Last update sequence number of the file.
+ * This is a direct index into the transaction
+ * log file ($UsnJrnl). It is zero if the usn
+ * journal is disabled or this file has not been
+ * subject to logging yet. See usnjrnl.h
+ * for details.
+ */
+ } __packed v3;
+ /* sizeof() = 72 bytes (NTFS 3.x) */
+ } __packed ver;
+} __packed;
+
+/*
+ * Attribute: Attribute list (0x20).
+ *
+ * - Can be either resident or non-resident.
+ * - Value consists of a sequence of variable length, 8-byte aligned,
+ * ATTR_LIST_ENTRY records.
+ * - The list is not terminated by anything at all! The only way to know when
+ * the end is reached is to keep track of the current offset and compare it to
+ * the attribute value size.
+ * - The attribute list attribute contains one entry for each attribute of
+ * the file in which the list is located, except for the list attribute
+ * itself. The list is sorted: first by attribute type, second by attribute
+ * name (if present), third by instance number. The extents of one
+ * non-resident attribute (if present) immediately follow after the initial
+ * extent. They are ordered by lowest_vcn and have their instance set to zero.
+ * It is not allowed to have two attributes with all sorting keys equal.
+ * - Further restrictions:
+ * - If not resident, the vcn to lcn mapping array has to fit inside the
+ * base mft record.
+ * - The attribute list attribute value has a maximum size of 256kb. This
+ * is imposed by the Windows cache manager.
+ * - Attribute lists are only used when the attributes of mft record do not
+ * fit inside the mft record despite all attributes (that can be made
+ * non-resident) having been made non-resident. This can happen e.g. when:
+ * - File has a large number of hard links (lots of file name
+ * attributes present).
+ * - The mapping pairs array of some non-resident attribute becomes so
+ * large due to fragmentation that it overflows the mft record.
+ * - The security descriptor is very complex (not applicable to
+ * NTFS 3.0 volumes).
+ * - There are many named streams.
+ */
+struct attr_list_entry {
+ __le32 type; /* Type of referenced attribute. */
+ __le16 length; /* Byte size of this entry (8-byte aligned). */
+ u8 name_length; /*
+ * Size in Unicode chars of the name of the
+ * attribute or 0 if unnamed.
+ */
+ u8 name_offset; /*
+ * Byte offset to beginning of attribute name
+ * (always set this to where the name would
+ * start even if unnamed).
+ */
+ __le64 lowest_vcn; /*
+ * Lowest virtual cluster number of this portion
+ * of the attribute value. This is usually 0. It
+ * is non-zero for the case where one attribute
+ * does not fit into one mft record and thus
+ * several mft records are allocated to hold
+ * this attribute. In the latter case, each mft
+ * record holds one extent of the attribute and
+ * there is one attribute list entry for each
+ * extent. NOTE: This is DEFINITELY a signed
+ * value! The windows driver uses cmp, followed
+ * by jg when comparing this, thus it treats it
+ * as signed.
+ */
+ __le64 mft_reference; /*
+ * The reference of the mft record holding
+ * the attr record for this portion of the
+ * attribute value.
+ */
+ __le16 instance; /*
+ * If lowest_vcn = 0, the instance of the
+ * attribute being referenced; otherwise 0.
+ */
+ __le16 name[]; /*
+ * Use when creating only. When reading use
+ * name_offset to determine the location of the name.
+ */
+/* sizeof() = 26 + (attribute_name_length * 2) bytes */
+} __packed;
+
+/*
+ * The maximum allowed length for a file name.
+ */
+#define MAXIMUM_FILE_NAME_LENGTH 255
+
+/*
+ * Possible namespaces for filenames in ntfs (8-bit).
+ */
+enum {
+ FILE_NAME_POSIX = 0x00,
+ /*
+ * This is the largest namespace. It is case sensitive and allows all
+ * Unicode characters except for: '\0' and '/'. Beware that in
+ * WinNT/2k/2003 by default files which eg have the same name except
+ * for their case will not be distinguished by the standard utilities
+ * and thus a "del filename" will delete both "filename" and "fileName"
+ * without warning. However if for example Services For Unix (SFU) are
+ * installed and the case sensitive option was enabled at installation
+ * time, then you can create/access/delete such files.
+ * Note that even SFU places restrictions on the filenames beyond the
+ * '\0' and '/' and in particular the following set of characters is
+ * not allowed: '"', '/', '<', '>', '\'. All other characters,
+ * including the ones no allowed in WIN32 namespace are allowed.
+ * Tested with SFU 3.5 (this is now free) running on Windows XP.
+ */
+ FILE_NAME_WIN32 = 0x01,
+ /*
+ * The standard WinNT/2k NTFS long filenames. Case insensitive. All
+ * Unicode chars except: '\0', '"', '*', '/', ':', '<', '>', '?', '\',
+ * and '|'. Further, names cannot end with a '.' or a space.
+ */
+ FILE_NAME_DOS = 0x02,
+ /*
+ * The standard DOS filenames (8.3 format). Uppercase only. All 8-bit
+ * characters greater space, except: '"', '*', '+', ',', '/', ':', ';',
+ * '<', '=', '>', '?', and '\'.\
+ */
+ FILE_NAME_WIN32_AND_DOS = 0x03,
+ /*
+ * 3 means that both the Win32 and the DOS filenames are identical and
+ * hence have been saved in this single filename record.
+ */
+} __packed;
+
+/*
+ * Attribute: Filename (0x30).
+ *
+ * NOTE: Always resident.
+ * NOTE: All fields, except the parent_directory, are only updated when the
+ * filename is changed. Until then, they just become out of sync with
+ * reality and the more up to date values are present in the standard
+ * information attribute.
+ * NOTE: There is conflicting information about the meaning of each of the time
+ * fields but the meaning as defined below has been verified to be
+ * correct by practical experimentation on Windows NT4 SP6a and is hence
+ * assumed to be the one and only correct interpretation.
+ */
+struct file_name_attr {
+/*hex ofs*/
+ __le64 parent_directory; /* Directory this filename is referenced from. */
+ __le64 creation_time; /* Time file was created. */
+ __le64 last_data_change_time; /* Time the data attribute was last modified. */
+ __le64 last_mft_change_time; /* Time this mft record was last modified. */
+ __le64 last_access_time; /* Time this mft record was last accessed. */
+ __le64 allocated_size; /*
+ * Byte size of on-disk allocated space
+ * for the unnamed data attribute. So for normal
+ * $DATA, this is the allocated_size from
+ * the unnamed $DATA attribute and for compressed
+ * and/or sparse $DATA, this is the
+ * compressed_size from the unnamed
+ * $DATA attribute. For a directory or
+ * other inode without an unnamed $DATA attribute,
+ * this is always 0. NOTE: This is a multiple of
+ * the cluster size.
+ */
+ __le64 data_size; /*
+ * Byte size of actual data in unnamed
+ * data attribute. For a directory or
+ * other inode without an unnamed $DATA
+ * attribute, this is always 0.
+ */
+ __le32 file_attributes; /* Flags describing the file. */
+ union {
+ struct {
+ __le16 packed_ea_size; /*
+ * Size of the buffer needed to
+ * pack the extended attributes
+ * (EAs), if such are present.
+ */
+ __le16 reserved; /* Reserved for alignment. */
+ } __packed ea;
+ struct {
+ __le32 reparse_point_tag; /*
+ * Type of reparse point,
+ * present only in reparse
+ * points and only if there are
+ * no EAs.
+ */
+ } __packed rp;
+ } __packed type;
+ u8 file_name_length; /* Length of file name in (Unicode) characters. */
+ u8 file_name_type; /* Namespace of the file name.*/
+ __le16 file_name[]; /* File name in Unicode. */
+} __packed;
+
+/*
+ * GUID structures store globally unique identifiers (GUID). A GUID is a
+ * 128-bit value consisting of one group of eight hexadecimal digits, followed
+ * by three groups of four hexadecimal digits each, followed by one group of
+ * twelve hexadecimal digits. GUIDs are Microsoft's implementation of the
+ * distributed computing environment (DCE) universally unique identifier (UUID).
+ * Example of a GUID:
+ * 1F010768-5A73-BC91-0010A52216A7
+ */
+struct guid {
+ __le32 data1; /* The first eight hexadecimal digits of the GUID. */
+ __le16 data2; /* The first group of four hexadecimal digits. */
+ __le16 data3; /* The second group of four hexadecimal digits. */
+ u8 data4[8]; /*
+ * The first two bytes are the third group of four
+ * hexadecimal digits. The remaining six bytes are the
+ * final 12 hexadecimal digits.
+ */
+} __packed;
+
+/*
+ * These relative identifiers (RIDs) are used with the above identifier
+ * authorities to make up universal well-known SIDs.
+ *
+ * Note: The relative identifier (RID) refers to the portion of a SID, which
+ * identifies a user or group in relation to the authority that issued the SID.
+ * For example, the universal well-known SID Creator Owner ID (S-1-3-0) is
+ * made up of the identifier authority SECURITY_CREATOR_SID_AUTHORITY (3) and
+ * the relative identifier SECURITY_CREATOR_OWNER_RID (0).
+ */
+enum { /* Identifier authority. */
+ SECURITY_NULL_RID = 0, /* S-1-0 */
+ SECURITY_WORLD_RID = 0, /* S-1-1 */
+ SECURITY_LOCAL_RID = 0, /* S-1-2 */
+
+ SECURITY_CREATOR_OWNER_RID = 0, /* S-1-3 */
+ SECURITY_CREATOR_GROUP_RID = 1, /* S-1-3 */
+
+ SECURITY_CREATOR_OWNER_SERVER_RID = 2, /* S-1-3 */
+ SECURITY_CREATOR_GROUP_SERVER_RID = 3, /* S-1-3 */
+
+ SECURITY_DIALUP_RID = 1,
+ SECURITY_NETWORK_RID = 2,
+ SECURITY_BATCH_RID = 3,
+ SECURITY_INTERACTIVE_RID = 4,
+ SECURITY_SERVICE_RID = 6,
+ SECURITY_ANONYMOUS_LOGON_RID = 7,
+ SECURITY_PROXY_RID = 8,
+ SECURITY_ENTERPRISE_CONTROLLERS_RID = 9,
+ SECURITY_SERVER_LOGON_RID = 9,
+ SECURITY_PRINCIPAL_SELF_RID = 0xa,
+ SECURITY_AUTHENTICATED_USER_RID = 0xb,
+ SECURITY_RESTRICTED_CODE_RID = 0xc,
+ SECURITY_TERMINAL_SERVER_RID = 0xd,
+
+ SECURITY_LOGON_IDS_RID = 5,
+ SECURITY_LOGON_IDS_RID_COUNT = 3,
+
+ SECURITY_LOCAL_SYSTEM_RID = 0x12,
+
+ SECURITY_NT_NON_UNIQUE = 0x15,
+
+ SECURITY_BUILTIN_DOMAIN_RID = 0x20,
+
+ /*
+ * Well-known domain relative sub-authority values (RIDs).
+ */
+
+ /* Users. */
+ DOMAIN_USER_RID_ADMIN = 0x1f4,
+ DOMAIN_USER_RID_GUEST = 0x1f5,
+ DOMAIN_USER_RID_KRBTGT = 0x1f6,
+
+ /* Groups. */
+ DOMAIN_GROUP_RID_ADMINS = 0x200,
+ DOMAIN_GROUP_RID_USERS = 0x201,
+ DOMAIN_GROUP_RID_GUESTS = 0x202,
+ DOMAIN_GROUP_RID_COMPUTERS = 0x203,
+ DOMAIN_GROUP_RID_CONTROLLERS = 0x204,
+ DOMAIN_GROUP_RID_CERT_ADMINS = 0x205,
+ DOMAIN_GROUP_RID_SCHEMA_ADMINS = 0x206,
+ DOMAIN_GROUP_RID_ENTERPRISE_ADMINS = 0x207,
+ DOMAIN_GROUP_RID_POLICY_ADMINS = 0x208,
+
+ /* Aliases. */
+ DOMAIN_ALIAS_RID_ADMINS = 0x220,
+ DOMAIN_ALIAS_RID_USERS = 0x221,
+ DOMAIN_ALIAS_RID_GUESTS = 0x222,
+ DOMAIN_ALIAS_RID_POWER_USERS = 0x223,
+
+ DOMAIN_ALIAS_RID_ACCOUNT_OPS = 0x224,
+ DOMAIN_ALIAS_RID_SYSTEM_OPS = 0x225,
+ DOMAIN_ALIAS_RID_PRINT_OPS = 0x226,
+ DOMAIN_ALIAS_RID_BACKUP_OPS = 0x227,
+
+ DOMAIN_ALIAS_RID_REPLICATOR = 0x228,
+ DOMAIN_ALIAS_RID_RAS_SERVERS = 0x229,
+ DOMAIN_ALIAS_RID_PREW2KCOMPACCESS = 0x22a,
+};
+
+/*
+ * The universal well-known SIDs:
+ *
+ * NULL_SID S-1-0-0
+ * WORLD_SID S-1-1-0
+ * LOCAL_SID S-1-2-0
+ * CREATOR_OWNER_SID S-1-3-0
+ * CREATOR_GROUP_SID S-1-3-1
+ * CREATOR_OWNER_SERVER_SID S-1-3-2
+ * CREATOR_GROUP_SERVER_SID S-1-3-3
+ *
+ * (Non-unique IDs) S-1-4
+ *
+ * NT well-known SIDs:
+ *
+ * NT_AUTHORITY_SID S-1-5
+ * DIALUP_SID S-1-5-1
+ *
+ * NETWORD_SID S-1-5-2
+ * BATCH_SID S-1-5-3
+ * INTERACTIVE_SID S-1-5-4
+ * SERVICE_SID S-1-5-6
+ * ANONYMOUS_LOGON_SID S-1-5-7 (aka null logon session)
+ * PROXY_SID S-1-5-8
+ * SERVER_LOGON_SID S-1-5-9 (aka domain controller account)
+ * SELF_SID S-1-5-10 (self RID)
+ * AUTHENTICATED_USER_SID S-1-5-11
+ * RESTRICTED_CODE_SID S-1-5-12 (running restricted code)
+ * TERMINAL_SERVER_SID S-1-5-13 (running on terminal server)
+ *
+ * (Logon IDs) S-1-5-5-X-Y
+ *
+ * (NT non-unique IDs) S-1-5-0x15-...
+ *
+ * (Built-in domain) S-1-5-0x20
+ */
+
+/*
+ * The SID structure is a variable-length structure used to uniquely identify
+ * users or groups. SID stands for security identifier.
+ *
+ * The standard textual representation of the SID is of the form:
+ * S-R-I-S-S...
+ * Where:
+ * - The first "S" is the literal character 'S' identifying the following
+ * digits as a SID.
+ * - R is the revision level of the SID expressed as a sequence of digits
+ * either in decimal or hexadecimal (if the later, prefixed by "0x").
+ * - I is the 48-bit identifier_authority, expressed as digits as R above.
+ * - S... is one or more sub_authority values, expressed as digits as above.
+ *
+ * Example SID; the domain-relative SID of the local Administrators group on
+ * Windows NT/2k:
+ * S-1-5-32-544
+ * This translates to a SID with:
+ * revision = 1,
+ * sub_authority_count = 2,
+ * identifier_authority = {0,0,0,0,0,5}, // SECURITY_NT_AUTHORITY
+ * sub_authority[0] = 32, // SECURITY_BUILTIN_DOMAIN_RID
+ * sub_authority[1] = 544 // DOMAIN_ALIAS_RID_ADMINS
+ */
+struct ntfs_sid {
+ u8 revision;
+ u8 sub_authority_count;
+ union {
+ struct {
+ u16 high_part; /* High 16-bits. */
+ u32 low_part; /* Low 32-bits. */
+ } __packed parts;
+ u8 value[6]; /* Value as individual bytes. */
+ } identifier_authority;
+ __le32 sub_authority[]; /* At least one sub_authority. */
+} __packed;
+
+/*
+ * The predefined ACE types (8-bit, see below).
+ */
+enum {
+ ACCESS_MIN_MS_ACE_TYPE = 0,
+ ACCESS_ALLOWED_ACE_TYPE = 0,
+ ACCESS_DENIED_ACE_TYPE = 1,
+ SYSTEM_AUDIT_ACE_TYPE = 2,
+ SYSTEM_ALARM_ACE_TYPE = 3, /* Not implemented as of Win2k. */
+ ACCESS_MAX_MS_V2_ACE_TYPE = 3,
+
+ ACCESS_ALLOWED_COMPOUND_ACE_TYPE = 4,
+ ACCESS_MAX_MS_V3_ACE_TYPE = 4,
+
+ /* The following are Win2k only. */
+ ACCESS_MIN_MS_OBJECT_ACE_TYPE = 5,
+ ACCESS_ALLOWED_OBJECT_ACE_TYPE = 5,
+ ACCESS_DENIED_OBJECT_ACE_TYPE = 6,
+ SYSTEM_AUDIT_OBJECT_ACE_TYPE = 7,
+ SYSTEM_ALARM_OBJECT_ACE_TYPE = 8,
+ ACCESS_MAX_MS_OBJECT_ACE_TYPE = 8,
+
+ ACCESS_MAX_MS_V4_ACE_TYPE = 8,
+
+ /* This one is for WinNT/2k. */
+ ACCESS_MAX_MS_ACE_TYPE = 8,
+} __packed;
+
+/*
+ * The ACE flags (8-bit) for audit and inheritance (see below).
+ *
+ * SUCCESSFUL_ACCESS_ACE_FLAG is only used with system audit and alarm ACE
+ * types to indicate that a message is generated (in Windows!) for successful
+ * accesses.
+ *
+ * FAILED_ACCESS_ACE_FLAG is only used with system audit and alarm ACE types
+ * to indicate that a message is generated (in Windows!) for failed accesses.
+ */
+enum {
+ /* The inheritance flags. */
+ OBJECT_INHERIT_ACE = 0x01,
+ CONTAINER_INHERIT_ACE = 0x02,
+ NO_PROPAGATE_INHERIT_ACE = 0x04,
+ INHERIT_ONLY_ACE = 0x08,
+ INHERITED_ACE = 0x10, /* Win2k only. */
+ VALID_INHERIT_FLAGS = 0x1f,
+
+ /* The audit flags. */
+ SUCCESSFUL_ACCESS_ACE_FLAG = 0x40,
+ FAILED_ACCESS_ACE_FLAG = 0x80,
+} __packed;
+
+/*
+ * The access mask (32-bit). Defines the access rights.
+ *
+ * The specific rights (bits 0 to 15). These depend on the type of the object
+ * being secured by the ACE.
+ */
+enum {
+ /* Specific rights for files and directories are as follows: */
+
+ /* Right to read data from the file. (FILE) */
+ FILE_READ_DATA = cpu_to_le32(0x00000001),
+ /* Right to list contents of a directory. (DIRECTORY) */
+ FILE_LIST_DIRECTORY = cpu_to_le32(0x00000001),
+
+ /* Right to write data to the file. (FILE) */
+ FILE_WRITE_DATA = cpu_to_le32(0x00000002),
+ /* Right to create a file in the directory. (DIRECTORY) */
+ FILE_ADD_FILE = cpu_to_le32(0x00000002),
+
+ /* Right to append data to the file. (FILE) */
+ FILE_APPEND_DATA = cpu_to_le32(0x00000004),
+ /* Right to create a subdirectory. (DIRECTORY) */
+ FILE_ADD_SUBDIRECTORY = cpu_to_le32(0x00000004),
+
+ /* Right to read extended attributes. (FILE/DIRECTORY) */
+ FILE_READ_EA = cpu_to_le32(0x00000008),
+
+ /* Right to write extended attributes. (FILE/DIRECTORY) */
+ FILE_WRITE_EA = cpu_to_le32(0x00000010),
+
+ /* Right to execute a file. (FILE) */
+ FILE_EXECUTE = cpu_to_le32(0x00000020),
+ /* Right to traverse the directory. (DIRECTORY) */
+ FILE_TRAVERSE = cpu_to_le32(0x00000020),
+
+ /*
+ * Right to delete a directory and all the files it contains (its
+ * children), even if the files are read-only. (DIRECTORY)
+ */
+ FILE_DELETE_CHILD = cpu_to_le32(0x00000040),
+
+ /* Right to read file attributes. (FILE/DIRECTORY) */
+ FILE_READ_ATTRIBUTES = cpu_to_le32(0x00000080),
+
+ /* Right to change file attributes. (FILE/DIRECTORY) */
+ FILE_WRITE_ATTRIBUTES = cpu_to_le32(0x00000100),
+
+ /*
+ * The standard rights (bits 16 to 23). These are independent of the
+ * type of object being secured.
+ */
+
+ /* Right to delete the object. */
+ DELETE = cpu_to_le32(0x00010000),
+
+ /*
+ * Right to read the information in the object's security descriptor,
+ * not including the information in the SACL, i.e. right to read the
+ * security descriptor and owner.
+ */
+ READ_CONTROL = cpu_to_le32(0x00020000),
+
+ /* Right to modify the DACL in the object's security descriptor. */
+ WRITE_DAC = cpu_to_le32(0x00040000),
+
+ /* Right to change the owner in the object's security descriptor. */
+ WRITE_OWNER = cpu_to_le32(0x00080000),
+
+ /*
+ * Right to use the object for synchronization. Enables a process to
+ * wait until the object is in the signalled state. Some object types
+ * do not support this access right.
+ */
+ SYNCHRONIZE = cpu_to_le32(0x00100000),
+
+ /*
+ * The following STANDARD_RIGHTS_* are combinations of the above for
+ * convenience and are defined by the Win32 API.
+ */
+
+ /* These are currently defined to READ_CONTROL. */
+ STANDARD_RIGHTS_READ = cpu_to_le32(0x00020000),
+ STANDARD_RIGHTS_WRITE = cpu_to_le32(0x00020000),
+ STANDARD_RIGHTS_EXECUTE = cpu_to_le32(0x00020000),
+
+ /* Combines DELETE, READ_CONTROL, WRITE_DAC, and WRITE_OWNER access. */
+ STANDARD_RIGHTS_REQUIRED = cpu_to_le32(0x000f0000),
+
+ /*
+ * Combines DELETE, READ_CONTROL, WRITE_DAC, WRITE_OWNER, and
+ * SYNCHRONIZE access.
+ */
+ STANDARD_RIGHTS_ALL = cpu_to_le32(0x001f0000),
+
+ /*
+ * The access system ACL and maximum allowed access types (bits 24 to
+ * 25, bits 26 to 27 are reserved).
+ */
+ ACCESS_SYSTEM_SECURITY = cpu_to_le32(0x01000000),
+ MAXIMUM_ALLOWED = cpu_to_le32(0x02000000),
+
+ /*
+ * The generic rights (bits 28 to 31). These map onto the standard and
+ * specific rights.
+ */
+
+ /* Read, write, and execute access. */
+ GENERIC_ALL = cpu_to_le32(0x10000000),
+
+ /* Execute access. */
+ GENERIC_EXECUTE = cpu_to_le32(0x20000000),
+
+ /*
+ * Write access. For files, this maps onto:
+ * FILE_APPEND_DATA | FILE_WRITE_ATTRIBUTES | FILE_WRITE_DATA |
+ * FILE_WRITE_EA | STANDARD_RIGHTS_WRITE | SYNCHRONIZE
+ * For directories, the mapping has the same numerical value. See
+ * above for the descriptions of the rights granted.
+ */
+ GENERIC_WRITE = cpu_to_le32(0x40000000),
+
+ /*
+ * Read access. For files, this maps onto:
+ * FILE_READ_ATTRIBUTES | FILE_READ_DATA | FILE_READ_EA |
+ * STANDARD_RIGHTS_READ | SYNCHRONIZE
+ * For directories, the mapping has the same numberical value. See
+ * above for the descriptions of the rights granted.
+ */
+ GENERIC_READ = cpu_to_le32(0x80000000),
+};
+
+/*
+ * The predefined ACE type structures are as defined below.
+ */
+
+struct ntfs_ace {
+ u8 type; /* Type of the ACE. */
+ u8 flags; /* Flags describing the ACE. */
+ __le16 size; /* Size in bytes of the ACE. */
+ __le32 mask; /* Access mask associated with the ACE. */
+ struct ntfs_sid sid; /* The SID associated with the ACE. */
+} __packed;
+
+/*
+ * The object ACE flags (32-bit).
+ */
+enum {
+ ACE_OBJECT_TYPE_PRESENT = cpu_to_le32(1),
+ ACE_INHERITED_OBJECT_TYPE_PRESENT = cpu_to_le32(2),
+};
+
+/*
+ * An ACL is an access-control list (ACL).
+ * An ACL starts with an ACL header structure, which specifies the size of
+ * the ACL and the number of ACEs it contains. The ACL header is followed by
+ * zero or more access control entries (ACEs). The ACL as well as each ACE
+ * are aligned on 4-byte boundaries.
+ */
+struct ntfs_acl {
+ u8 revision; /* Revision of this ACL. */
+ u8 alignment1;
+ __le16 size; /*
+ * Allocated space in bytes for ACL. Includes this
+ * header, the ACEs and the remaining free space.
+ */
+ __le16 ace_count; /* Number of ACEs in the ACL. */
+ __le16 alignment2;
+/* sizeof() = 8 bytes */
+} __packed;
+
+/*
+ * The security descriptor control flags (16-bit).
+ *
+ * SE_OWNER_DEFAULTED - This boolean flag, when set, indicates that the SID
+ * pointed to by the Owner field was provided by a defaulting mechanism
+ * rather than explicitly provided by the original provider of the
+ * security descriptor. This may affect the treatment of the SID with
+ * respect to inheritance of an owner.
+ *
+ * SE_GROUP_DEFAULTED - This boolean flag, when set, indicates that the SID in
+ * the Group field was provided by a defaulting mechanism rather than
+ * explicitly provided by the original provider of the security
+ * descriptor. This may affect the treatment of the SID with respect to
+ * inheritance of a primary group.
+ *
+ * SE_DACL_PRESENT - This boolean flag, when set, indicates that the security
+ * descriptor contains a discretionary ACL. If this flag is set and the
+ * Dacl field of the SECURITY_DESCRIPTOR is null, then a null ACL is
+ * explicitly being specified.
+ *
+ * SE_DACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
+ * pointed to by the Dacl field was provided by a defaulting mechanism
+ * rather than explicitly provided by the original provider of the
+ * security descriptor. This may affect the treatment of the ACL with
+ * respect to inheritance of an ACL. This flag is ignored if the
+ * DaclPresent flag is not set.
+ *
+ * SE_SACL_PRESENT - This boolean flag, when set, indicates that the security
+ * descriptor contains a system ACL pointed to by the Sacl field. If this
+ * flag is set and the Sacl field of the SECURITY_DESCRIPTOR is null, then
+ * an empty (but present) ACL is being specified.
+ *
+ * SE_SACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
+ * pointed to by the Sacl field was provided by a defaulting mechanism
+ * rather than explicitly provided by the original provider of the
+ * security descriptor. This may affect the treatment of the ACL with
+ * respect to inheritance of an ACL. This flag is ignored if the
+ * SaclPresent flag is not set.
+ *
+ * SE_SELF_RELATIVE - This boolean flag, when set, indicates that the security
+ * descriptor is in self-relative form. In this form, all fields of the
+ * security descriptor are contiguous in memory and all pointer fields are
+ * expressed as offsets from the beginning of the security descriptor.
+ */
+enum {
+ SE_OWNER_DEFAULTED = cpu_to_le16(0x0001),
+ SE_GROUP_DEFAULTED = cpu_to_le16(0x0002),
+ SE_DACL_PRESENT = cpu_to_le16(0x0004),
+ SE_DACL_DEFAULTED = cpu_to_le16(0x0008),
+
+ SE_SACL_PRESENT = cpu_to_le16(0x0010),
+ SE_SACL_DEFAULTED = cpu_to_le16(0x0020),
+
+ SE_DACL_AUTO_INHERIT_REQ = cpu_to_le16(0x0100),
+ SE_SACL_AUTO_INHERIT_REQ = cpu_to_le16(0x0200),
+ SE_DACL_AUTO_INHERITED = cpu_to_le16(0x0400),
+ SE_SACL_AUTO_INHERITED = cpu_to_le16(0x0800),
+
+ SE_DACL_PROTECTED = cpu_to_le16(0x1000),
+ SE_SACL_PROTECTED = cpu_to_le16(0x2000),
+ SE_RM_CONTROL_VALID = cpu_to_le16(0x4000),
+ SE_SELF_RELATIVE = cpu_to_le16(0x8000)
+} __packed;
+
+/*
+ * Self-relative security descriptor. Contains the owner and group SIDs as well
+ * as the sacl and dacl ACLs inside the security descriptor itself.
+ */
+struct security_descriptor_relative {
+ u8 revision; /* Revision level of the security descriptor. */
+ u8 alignment;
+ __le16 control; /*
+ * Flags qualifying the type of * the descriptor as well as
+ * the following fields.
+ */
+ __le32 owner; /*
+ * Byte offset to a SID representing an object's
+ * owner. If this is NULL, no owner SID is present in
+ * the descriptor.
+ */
+ __le32 group; /*
+ * Byte offset to a SID representing an object's
+ * primary group. If this is NULL, no primary group
+ * SID is present in the descriptor.
+ */
+ __le32 sacl; /*
+ * Byte offset to a system ACL. Only valid, if
+ * SE_SACL_PRESENT is set in the control field. If
+ * SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
+ * is specified.
+ */
+ __le32 dacl; /*
+ * Byte offset to a discretionary ACL. Only valid, if
+ * SE_DACL_PRESENT is set in the control field. If
+ * SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
+ * (unconditionally granting access) is specified.
+ */
+/* sizeof() = 0x14 bytes */
+} __packed;
+
+/*
+ * On NTFS 3.0+, all security descriptors are stored in FILE_Secure. Only one
+ * referenced instance of each unique security descriptor is stored.
+ *
+ * FILE_Secure contains no unnamed data attribute, i.e. it has zero length. It
+ * does, however, contain two indexes ($SDH and $SII) as well as a named data
+ * stream ($SDS).
+ *
+ * Every unique security descriptor is assigned a unique security identifier
+ * (security_id, not to be confused with a SID). The security_id is unique for
+ * the NTFS volume and is used as an index into the $SII index, which maps
+ * security_ids to the security descriptor's storage location within the $SDS
+ * data attribute. The $SII index is sorted by ascending security_id.
+ *
+ * A simple hash is computed from each security descriptor. This hash is used
+ * as an index into the $SDH index, which maps security descriptor hashes to
+ * the security descriptor's storage location within the $SDS data attribute.
+ * The $SDH index is sorted by security descriptor hash and is stored in a B+
+ * tree. When searching $SDH (with the intent of determining whether or not a
+ * new security descriptor is already present in the $SDS data stream), if a
+ * matching hash is found, but the security descriptors do not match, the
+ * search in the $SDH index is continued, searching for a next matching hash.
+ *
+ * When a precise match is found, the security_id coresponding to the security
+ * descriptor in the $SDS attribute is read from the found $SDH index entry and
+ * is stored in the $STANDARD_INFORMATION attribute of the file/directory to
+ * which the security descriptor is being applied. The $STANDARD_INFORMATION
+ * attribute is present in all base mft records (i.e. in all files and
+ * directories).
+ *
+ * If a match is not found, the security descriptor is assigned a new unique
+ * security_id and is added to the $SDS data attribute. Then, entries
+ * referencing the this security descriptor in the $SDS data attribute are
+ * added to the $SDH and $SII indexes.
+ *
+ * Note: Entries are never deleted from FILE_Secure, even if nothing
+ * references an entry any more.
+ */
+
+/*
+ * The index entry key used in the $SII index. The collation type is
+ * COLLATION_NTOFS_ULONG.
+ */
+struct sii_index_key {
+ __le32 security_id; /* The security_id assigned to the descriptor. */
+} __packed;
+
+/*
+ * The index entry key used in the $SDH index. The keys are sorted first by
+ * hash and then by security_id. The collation rule is
+ * COLLATION_NTOFS_SECURITY_HASH.
+ */
+struct sdh_index_key {
+ __le32 hash; /* Hash of the security descriptor. */
+ __le32 security_id; /* The security_id assigned to the descriptor. */
+} __packed;
+
+/*
+ * Possible flags for the volume (16-bit).
+ */
+enum {
+ VOLUME_IS_DIRTY = cpu_to_le16(0x0001),
+ VOLUME_RESIZE_LOG_FILE = cpu_to_le16(0x0002),
+ VOLUME_UPGRADE_ON_MOUNT = cpu_to_le16(0x0004),
+ VOLUME_MOUNTED_ON_NT4 = cpu_to_le16(0x0008),
+
+ VOLUME_DELETE_USN_UNDERWAY = cpu_to_le16(0x0010),
+ VOLUME_REPAIR_OBJECT_ID = cpu_to_le16(0x0020),
+
+ VOLUME_CHKDSK_UNDERWAY = cpu_to_le16(0x4000),
+ VOLUME_MODIFIED_BY_CHKDSK = cpu_to_le16(0x8000),
+
+ VOLUME_FLAGS_MASK = cpu_to_le16(0xc03f),
+
+ /* To make our life easier when checking if we must mount read-only. */
+ VOLUME_MUST_MOUNT_RO_MASK = cpu_to_le16(0xc027),
+} __packed;
+
+/*
+ * Attribute: Volume information (0x70).
+ *
+ * NOTE: Always resident.
+ * NOTE: Present only in FILE_Volume.
+ * NOTE: Windows 2000 uses NTFS 3.0 while Windows NT4 service pack 6a uses
+ * NTFS 1.2. I haven't personally seen other values yet.
+ */
+struct volume_information {
+ __le64 reserved; /* Not used (yet?). */
+ u8 major_ver; /* Major version of the ntfs format. */
+ u8 minor_ver; /* Minor version of the ntfs format. */
+ __le16 flags; /* Bit array of VOLUME_* flags. */
+} __packed;
+
+/*
+ * Index header flags (8-bit).
+ */
+enum {
+ /*
+ * When index header is in an index root attribute:
+ */
+ SMALL_INDEX = 0, /*
+ * The index is small enough to fit inside the index
+ * root attribute and there is no index allocation
+ * attribute present.
+ */
+ LARGE_INDEX = 1, /*
+ * The index is too large to fit in the index root
+ * attribute and/or an index allocation attribute is
+ * present.
+ */
+ /*
+ * When index header is in an index block, i.e. is part of index
+ * allocation attribute:
+ */
+ LEAF_NODE = 0, /*
+ * This is a leaf node, i.e. there are no more nodes
+ * branching off it.
+ */
+ INDEX_NODE = 1, /*
+ * This node indexes other nodes, i.e. it is not a leaf
+ * node.
+ */
+ NODE_MASK = 1, /* Mask for accessing the *_NODE bits. */
+} __packed;
+
+/*
+ * This is the header for indexes, describing the INDEX_ENTRY records, which
+ * follow the index_header. Together the index header and the index entries
+ * make up a complete index.
+ *
+ * IMPORTANT NOTE: The offset, length and size structure members are counted
+ * relative to the start of the index header structure and not relative to the
+ * start of the index root or index allocation structures themselves.
+ */
+struct index_header {
+ __le32 entries_offset; /*
+ * Byte offset to first INDEX_ENTRY
+ * aligned to 8-byte boundary.
+ */
+ __le32 index_length; /*
+ * Data size of the index in bytes,
+ * i.e. bytes used from allocated
+ * size, aligned to 8-byte boundary.
+ */
+ __le32 allocated_size; /*
+ * Byte size of this index (block),
+ * multiple of 8 bytes.
+ */
+ /*
+ * NOTE: For the index root attribute, the above two numbers are always
+ * equal, as the attribute is resident and it is resized as needed. In
+ * the case of the index allocation attribute the attribute is not
+ * resident and hence the allocated_size is a fixed value and must
+ * equal the index_block_size specified by the INDEX_ROOT attribute
+ * corresponding to the INDEX_ALLOCATION attribute this INDEX_BLOCK
+ * belongs to.
+ */
+ u8 flags; /* Bit field of INDEX_HEADER_FLAGS. */
+ u8 reserved[3]; /* Reserved/align to 8-byte boundary. */
+} __packed;
+
+/*
+ * Attribute: Index root (0x90).
+ *
+ * NOTE: Always resident.
+ *
+ * This is followed by a sequence of index entries (INDEX_ENTRY structures)
+ * as described by the index header.
+ *
+ * When a directory is small enough to fit inside the index root then this
+ * is the only attribute describing the directory. When the directory is too
+ * large to fit in the index root, on the other hand, two additional attributes
+ * are present: an index allocation attribute, containing sub-nodes of the B+
+ * directory tree (see below), and a bitmap attribute, describing which virtual
+ * cluster numbers (vcns) in the index allocation attribute are in use by an
+ * index block.
+ *
+ * NOTE: The root directory (FILE_root) contains an entry for itself. Other
+ * directories do not contain entries for themselves, though.
+ */
+struct index_root {
+ __le32 type; /*
+ * Type of the indexed attribute. Is
+ * $FILE_NAME for directories, zero
+ * for view indexes. No other values
+ * allowed.
+ */
+ __le32 collation_rule; /*
+ * Collation rule used to sort the index
+ * entries. If type is $FILE_NAME, this
+ * must be COLLATION_FILE_NAME.
+ */
+ __le32 index_block_size; /*
+ * Size of each index block in bytes (in
+ * the index allocation attribute).
+ */
+ u8 clusters_per_index_block; /*
+ * Cluster size of each index block (in
+ * the index allocation attribute), when
+ * an index block is >= than a cluster,
+ * otherwise this will be the log of
+ * the size (like how the encoding of
+ * the mft record size and the index
+ * record size found in the boot sector
+ * work). Has to be a power of 2.
+ */
+ u8 reserved[3]; /* Reserved/align to 8-byte boundary. */
+ struct index_header index; /* Index header describing the following index entries. */
+} __packed;
+
+/*
+ * Attribute: Index allocation (0xa0).
+ *
+ * NOTE: Always non-resident (doesn't make sense to be resident anyway!).
+ *
+ * This is an array of index blocks. Each index block starts with an
+ * index_block structure containing an index header, followed by a sequence of
+ * index entries (INDEX_ENTRY structures), as described by the struct index_header.
+ */
+struct index_block {
+ __le32 magic; /* Magic is "INDX". */
+ __le16 usa_ofs; /* See ntfs_record struct definition. */
+ __le16 usa_count; /* See ntfs_record struct definition. */
+
+ __le64 lsn; /*
+ * LogFile sequence number of the last
+ * modification of this index block.
+ */
+ __le64 index_block_vcn; /*
+ * Virtual cluster number of the index block.
+ * If the cluster_size on the volume is <= the
+ * index_block_size of the directory,
+ * index_block_vcn counts in units of clusters,
+ * and in units of sectors otherwise.
+ */
+ struct index_header index; /* Describes the following index entries. */
+/* sizeof()= 40 (0x28) bytes */
+/*
+ * When creating the index block, we place the update sequence array at this
+ * offset, i.e. before we start with the index entries. This also makes sense,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean that
+ * multi sector transfer protection wouldn't work. As you can't protect data
+ * by overwriting it since you then can't get it back...
+ * When reading use the data from the ntfs record header.
+ */
+} __packed;
+
+/*
+ * The system file FILE_Extend/$Reparse contains an index named $R listing
+ * all reparse points on the volume. The index entry keys are as defined
+ * below. Note, that there is no index data associated with the index entries.
+ *
+ * The index entries are sorted by the index key file_id. The collation rule is
+ * COLLATION_NTOFS_ULONGS.
+ */
+struct reparse_index_key {
+ __le32 reparse_tag; /* Reparse point type (inc. flags). */
+ __le64 file_id; /*
+ * Mft record of the file containing
+ * the reparse point attribute.
+ */
+} __packed;
+
+/*
+ * Quota flags (32-bit).
+ *
+ * The user quota flags. Names explain meaning.
+ */
+enum {
+ QUOTA_FLAG_DEFAULT_LIMITS = cpu_to_le32(0x00000001),
+ QUOTA_FLAG_LIMIT_REACHED = cpu_to_le32(0x00000002),
+ QUOTA_FLAG_ID_DELETED = cpu_to_le32(0x00000004),
+
+ QUOTA_FLAG_USER_MASK = cpu_to_le32(0x00000007),
+ /* This is a bit mask for the user quota flags. */
+
+ /*
+ * These flags are only present in the quota defaults index entry, i.e.
+ * in the entry where owner_id = QUOTA_DEFAULTS_ID.
+ */
+ QUOTA_FLAG_TRACKING_ENABLED = cpu_to_le32(0x00000010),
+ QUOTA_FLAG_ENFORCEMENT_ENABLED = cpu_to_le32(0x00000020),
+ QUOTA_FLAG_TRACKING_REQUESTED = cpu_to_le32(0x00000040),
+ QUOTA_FLAG_LOG_THRESHOLD = cpu_to_le32(0x00000080),
+
+ QUOTA_FLAG_LOG_LIMIT = cpu_to_le32(0x00000100),
+ QUOTA_FLAG_OUT_OF_DATE = cpu_to_le32(0x00000200),
+ QUOTA_FLAG_CORRUPT = cpu_to_le32(0x00000400),
+ QUOTA_FLAG_PENDING_DELETES = cpu_to_le32(0x00000800),
+};
+
+/*
+ * The system file FILE_Extend/$Quota contains two indexes $O and $Q. Quotas
+ * are on a per volume and per user basis.
+ *
+ * The $Q index contains one entry for each existing user_id on the volume. The
+ * index key is the user_id of the user/group owning this quota control entry,
+ * i.e. the key is the owner_id. The user_id of the owner of a file, i.e. the
+ * owner_id, is found in the standard information attribute. The collation rule
+ * for $Q is COLLATION_NTOFS_ULONG.
+ *
+ * The $O index contains one entry for each user/group who has been assigned
+ * a quota on that volume. The index key holds the SID of the user_id the
+ * entry belongs to, i.e. the owner_id. The collation rule for $O is
+ * COLLATION_NTOFS_SID.
+ *
+ * The $O index entry data is the user_id of the user corresponding to the SID.
+ * This user_id is used as an index into $Q to find the quota control entry
+ * associated with the SID.
+ *
+ * The $Q index entry data is the quota control entry and is defined below.
+ */
+struct quota_control_entry {
+ __le32 version; /* Currently equals 2. */
+ __le32 flags; /* Flags describing this quota entry. */
+ __le64 bytes_used; /* How many bytes of the quota are in use. */
+ __le64 change_time; /* Last time this quota entry was changed. */
+ __le64 threshold; /* Soft quota (-1 if not limited). */
+ __le64 limit; /* Hard quota (-1 if not limited). */
+ __le64 exceeded_time; /* How long the soft quota has been exceeded. */
+ struct ntfs_sid sid; /*
+ * The SID of the user/object associated with
+ * this quota entry. Equals zero for the quota
+ * defaults entry (and in fact on a WinXP
+ * volume, it is not present at all).
+ */
+} __packed;
+
+/*
+ * Predefined owner_id values (32-bit).
+ */
+enum {
+ QUOTA_INVALID_ID = cpu_to_le32(0x00000000),
+ QUOTA_DEFAULTS_ID = cpu_to_le32(0x00000001),
+ QUOTA_FIRST_USER_ID = cpu_to_le32(0x00000100),
+};
+
+/*
+ * Current constants for quota control entries.
+ */
+enum {
+ /* Current version. */
+ QUOTA_VERSION = 2,
+};
+
+/*
+ * Index entry flags (16-bit).
+ */
+enum {
+ INDEX_ENTRY_NODE = cpu_to_le16(1), /*
+ * This entry contains a sub-node,
+ * i.e. a reference to an index block
+ * in form of a virtual cluster number
+ * (see below).
+ */
+ INDEX_ENTRY_END = cpu_to_le16(2), /*
+ * This signifies the last entry in an
+ * index block. The index entry does not
+ * represent a file but it can point
+ * to a sub-node.
+ */
+
+ INDEX_ENTRY_SPACE_FILLER = cpu_to_le16(0xffff), /* gcc: Force enum bit width to 16-bit. */
+} __packed;
+
+/*
+ * This the index entry header (see below).
+ */
+struct index_entry_header {
+/* 0*/ union {
+ struct { /* Only valid when INDEX_ENTRY_END is not set. */
+ __le64 indexed_file; /*
+ * The mft reference of the file
+ * described by this index entry.
+ * Used for directory indexes.
+ */
+ } __packed dir;
+ struct {
+ /* Used for views/indexes to find the entry's data. */
+ __le16 data_offset; /*
+ * Data byte offset from this
+ * INDEX_ENTRY. Follows the index key.
+ */
+ __le16 data_length; /* Data length in bytes. */
+ __le32 reservedV; /* Reserved (zero). */
+ } __packed vi;
+ } __packed data;
+ __le16 length; /* Byte size of this index entry, multiple of 8-bytes. */
+ __le16 key_length; /*
+ * Byte size of the key value, which is in the index entry.
+ * It follows field reserved. Not multiple of 8-bytes.
+ */
+ __le16 flags; /* Bit field of INDEX_ENTRY_* flags. */
+ __le16 reserved; /* Reserved/align to 8-byte boundary. */
+/* sizeof() = 16 bytes */
+} __packed;
+
+/*
+ * This is an index entry. A sequence of such entries follows each index_header
+ * structure. Together they make up a complete index. The index follows either
+ * an index root attribute or an index allocation attribute.
+ *
+ * NOTE: Before NTFS 3.0 only filename attributes were indexed.
+ */
+struct index_entry {
+ union {
+ struct { /* Only valid when INDEX_ENTRY_END is not set. */
+ __le64 indexed_file; /*
+ * The mft reference of the file
+ * described by this index entry.
+ * Used for directory indexes.
+ */
+ } __packed dir;
+ struct { /* Used for views/indexes to find the entry's data. */
+ __le16 data_offset; /*
+ * Data byte offset from this INDEX_ENTRY.
+ * Follows the index key.
+ */
+ __le16 data_length; /* Data length in bytes. */
+ __le32 reservedV; /* Reserved (zero). */
+ } __packed vi;
+ } __packed data;
+ __le16 length; /* Byte size of this index entry, multiple of 8-bytes. */
+ __le16 key_length; /*
+ * Byte size of the key value, which is in the index entry.
+ * It follows field reserved. Not multiple of 8-bytes.
+ */
+ __le16 flags; /* Bit field of INDEX_ENTRY_* flags. */
+ __le16 reserved; /* Reserved/align to 8-byte boundary. */
+
+ union {
+ /*
+ * The key of the indexed attribute. NOTE: Only present
+ * if INDEX_ENTRY_END bit in flags is not set. NOTE: On
+ * NTFS versions before 3.0 the only valid key is the
+ * struct file_name_attr. On NTFS 3.0+ the following
+ * additional index keys are defined:
+ */
+ struct file_name_attr file_name; /* $I30 index in directories. */
+ struct sii_index_key sii; /* $SII index in $Secure. */
+ struct sdh_index_key sdh; /* $SDH index in $Secure. */
+ struct guid object_id; /*
+ * $O index in FILE_Extend/$ObjId: The object_id
+ * of the mft record found in the data part of
+ * the index.
+ */
+ struct reparse_index_key reparse; /* $R index in FILE_Extend/$Reparse. */
+ struct ntfs_sid sid; /*
+ * $O index in FILE_Extend/$Quota:
+ * SID of the owner of the user_id.
+ */
+ __le32 owner_id; /*
+ * $Q index in FILE_Extend/$Quota:
+ * user_id of the owner of the quota
+ * control entry in the data part of
+ * the index.
+ */
+ } __packed key;
+ /*
+ * The (optional) index data is inserted here when creating.
+ * __le64 vcn; If INDEX_ENTRY_NODE bit in flags is set, the last
+ * eight bytes of this index entry contain the virtual
+ * cluster number of the index block that holds the
+ * entries immediately preceding the current entry (the
+ * vcn references the corresponding cluster in the data
+ * of the non-resident index allocation attribute). If
+ * the key_length is zero, then the vcn immediately
+ * follows the INDEX_ENTRY_HEADER. Regardless of
+ * key_length, the address of the 8-byte boundary
+ * aligned vcn of INDEX_ENTRY{_HEADER} *ie is given by
+ * (char*)ie + le16_to_cpu(ie*)->length) - sizeof(VCN),
+ * where sizeof(VCN) can be hardcoded as 8 if wanted.
+ */
+} __packed;
+
+/*
+ * The reparse point tag defines the type of the reparse point. It also
+ * includes several flags, which further describe the reparse point.
+ *
+ * The reparse point tag is an unsigned 32-bit value divided in three parts:
+ *
+ * 1. The least significant 16 bits (i.e. bits 0 to 15) specify the type of
+ * the reparse point.
+ * 2. The 12 bits after this (i.e. bits 16 to 27) are reserved for future use.
+ * 3. The most significant four bits are flags describing the reparse point.
+ * They are defined as follows:
+ * bit 28: Directory bit. If set, the directory is not a surrogate
+ * and can be used the usual way.
+ * bit 29: Name surrogate bit. If set, the filename is an alias for
+ * another object in the system.
+ * bit 30: High-latency bit. If set, accessing the first byte of data will
+ * be slow. (E.g. the data is stored on a tape drive.)
+ * bit 31: Microsoft bit. If set, the tag is owned by Microsoft. User
+ * defined tags have to use zero here.
+ * 4. Moreover, on Windows 10 :
+ * Some flags may be used in bits 12 to 15 to further describe the
+ * reparse point.
+ */
+enum {
+ IO_REPARSE_TAG_DIRECTORY = cpu_to_le32(0x10000000),
+ IO_REPARSE_TAG_IS_ALIAS = cpu_to_le32(0x20000000),
+ IO_REPARSE_TAG_IS_HIGH_LATENCY = cpu_to_le32(0x40000000),
+ IO_REPARSE_TAG_IS_MICROSOFT = cpu_to_le32(0x80000000),
+
+ IO_REPARSE_TAG_RESERVED_ZERO = cpu_to_le32(0x00000000),
+ IO_REPARSE_TAG_RESERVED_ONE = cpu_to_le32(0x00000001),
+ IO_REPARSE_TAG_RESERVED_RANGE = cpu_to_le32(0x00000001),
+
+ IO_REPARSE_TAG_CSV = cpu_to_le32(0x80000009),
+ IO_REPARSE_TAG_DEDUP = cpu_to_le32(0x80000013),
+ IO_REPARSE_TAG_DFS = cpu_to_le32(0x8000000A),
+ IO_REPARSE_TAG_DFSR = cpu_to_le32(0x80000012),
+ IO_REPARSE_TAG_HSM = cpu_to_le32(0xC0000004),
+ IO_REPARSE_TAG_HSM2 = cpu_to_le32(0x80000006),
+ IO_REPARSE_TAG_MOUNT_POINT = cpu_to_le32(0xA0000003),
+ IO_REPARSE_TAG_NFS = cpu_to_le32(0x80000014),
+ IO_REPARSE_TAG_SIS = cpu_to_le32(0x80000007),
+ IO_REPARSE_TAG_SYMLINK = cpu_to_le32(0xA000000C),
+ IO_REPARSE_TAG_WIM = cpu_to_le32(0x80000008),
+ IO_REPARSE_TAG_DFM = cpu_to_le32(0x80000016),
+ IO_REPARSE_TAG_WOF = cpu_to_le32(0x80000017),
+ IO_REPARSE_TAG_WCI = cpu_to_le32(0x80000018),
+ IO_REPARSE_TAG_CLOUD = cpu_to_le32(0x9000001A),
+ IO_REPARSE_TAG_APPEXECLINK = cpu_to_le32(0x8000001B),
+ IO_REPARSE_TAG_GVFS = cpu_to_le32(0x9000001C),
+ IO_REPARSE_TAG_LX_SYMLINK = cpu_to_le32(0xA000001D),
+ IO_REPARSE_TAG_AF_UNIX = cpu_to_le32(0x80000023),
+ IO_REPARSE_TAG_LX_FIFO = cpu_to_le32(0x80000024),
+ IO_REPARSE_TAG_LX_CHR = cpu_to_le32(0x80000025),
+ IO_REPARSE_TAG_LX_BLK = cpu_to_le32(0x80000026),
+
+ IO_REPARSE_TAG_VALID_VALUES = cpu_to_le32(0xf000ffff),
+ IO_REPARSE_PLUGIN_SELECT = cpu_to_le32(0xffff0fff),
+};
+
+/*
+ * Attribute: Reparse point (0xc0).
+ *
+ * NOTE: Can be resident or non-resident.
+ */
+struct reparse_point {
+ __le32 reparse_tag; /* Reparse point type (inc. flags). */
+ __le16 reparse_data_length; /* Byte size of reparse data. */
+ __le16 reserved; /* Align to 8-byte boundary. */
+ u8 reparse_data[0]; /* Meaning depends on reparse_tag. */
+} __packed;
+
+/*
+ * Attribute: Extended attribute (EA) information (0xd0).
+ *
+ * NOTE: Always resident. (Is this true???)
+ */
+struct ea_information {
+ __le16 ea_length; /* Byte size of the packed extended attributes. */
+ __le16 need_ea_count; /*
+ * The number of extended attributes which have
+ * the NEED_EA bit set.
+ */
+ __le32 ea_query_length; /*
+ * Byte size of the buffer required to query
+ * the extended attributes when calling
+ * ZwQueryEaFile() in Windows NT/2k. I.e.
+ * the byte size of the unpacked extended attributes.
+ */
+} __packed;
+
+/*
+ * Extended attribute flags (8-bit).
+ */
+enum {
+ NEED_EA = 0x80 /*
+ * If set the file to which the EA belongs
+ * cannot be interpreted without understanding
+ * the associates extended attributes.
+ */
+} __packed;
+
+/*
+ * Attribute: Extended attribute (EA) (0xe0).
+ *
+ * NOTE: Can be resident or non-resident.
+ *
+ * Like the attribute list and the index buffer list, the EA attribute value is
+ * a sequence of EA_ATTR variable length records.
+ */
+struct ea_attr {
+ __le32 next_entry_offset; /* Offset to the next EA_ATTR. */
+ u8 flags; /* Flags describing the EA. */
+ u8 ea_name_length; /*
+ * Length of the name of the EA in bytes
+ * excluding the '\0' byte terminator.
+ */
+ __le16 ea_value_length; /* Byte size of the EA's value. */
+ u8 ea_name[]; /*
+ * Name of the EA. Note this is ASCII, not
+ * Unicode and it is zero terminated.
+ */
+ /* u8 ea_value[]; */ /* The value of the EA. Immediately follows the name. */
+} __packed;
+
+#endif /* _LINUX_NTFS_LAYOUT_H */
diff --git a/fs/ntfsplus/lcnalloc.h b/fs/ntfsplus/lcnalloc.h
new file mode 100644
index 000000000000..a1c66b8b73ac
--- /dev/null
+++ b/fs/ntfsplus/lcnalloc.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Exports for NTFS kernel cluster (de)allocation.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_LCNALLOC_H
+#define _LINUX_NTFS_LCNALLOC_H
+
+#include <linux/sched/mm.h>
+
+#include "attrib.h"
+
+enum {
+ FIRST_ZONE = 0, /* For sanity checking. */
+ MFT_ZONE = 0, /* Allocate from $MFT zone. */
+ DATA_ZONE = 1, /* Allocate from $DATA zone. */
+ LAST_ZONE = 1, /* For sanity checking. */
+};
+
+struct runlist_element *ntfs_cluster_alloc(struct ntfs_volume *vol,
+ const s64 start_vcn, const s64 count, const s64 start_lcn,
+ const int zone,
+ const bool is_extension,
+ const bool is_contig,
+ const bool is_dealloc);
+s64 __ntfs_cluster_free(struct ntfs_inode *ni, const s64 start_vcn,
+ s64 count, struct ntfs_attr_search_ctx *ctx, const bool is_rollback);
+
+/**
+ * ntfs_cluster_free - free clusters on an ntfs volume
+ * @ni: ntfs inode whose runlist describes the clusters to free
+ * @start_vcn: vcn in the runlist of @ni at which to start freeing clusters
+ * @count: number of clusters to free or -1 for all clusters
+ * @ctx: active attribute search context if present or NULL if not
+ *
+ * Free @count clusters starting at the cluster @start_vcn in the runlist
+ * described by the ntfs inode @ni.
+ *
+ * If @count is -1, all clusters from @start_vcn to the end of the runlist are
+ * deallocated. Thus, to completely free all clusters in a runlist, use
+ * @start_vcn = 0 and @count = -1.
+ *
+ * If @ctx is specified, it is an active search context of @ni and its base mft
+ * record. This is needed when ntfs_cluster_free() encounters unmapped runlist
+ * fragments and allows their mapping. If you do not have the mft record
+ * mapped, you can specify @ctx as NULL and ntfs_cluster_free() will perform
+ * the necessary mapping and unmapping.
+ *
+ * Note, ntfs_cluster_free() saves the state of @ctx on entry and restores it
+ * before returning. Thus, @ctx will be left pointing to the same attribute on
+ * return as on entry. However, the actual pointers in @ctx may point to
+ * different memory locations on return, so you must remember to reset any
+ * cached pointers from the @ctx, i.e. after the call to ntfs_cluster_free(),
+ * you will probably want to do:
+ * m = ctx->mrec;
+ * a = ctx->attr;
+ * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
+ * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
+ *
+ * Note, ntfs_cluster_free() does not modify the runlist, so you have to remove
+ * from the runlist or mark sparse the freed runs later.
+ *
+ * Return the number of deallocated clusters (not counting sparse ones) on
+ * success and -errno on error.
+ *
+ * WARNING: If @ctx is supplied, regardless of whether success or failure is
+ * returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
+ * is no longer valid, i.e. you need to either call
+ * ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
+ * In that case PTR_ERR(@ctx->mrec) will give you the error code for
+ * why the mapping of the old inode failed.
+ *
+ * Locking: - The runlist described by @ni must be locked for writing on entry
+ * and is locked on return. Note the runlist may be modified when
+ * needed runlist fragments need to be mapped.
+ * - The volume lcn bitmap must be unlocked on entry and is unlocked
+ * on return.
+ * - This function takes the volume lcn bitmap lock for writing and
+ * modifies the bitmap contents.
+ * - If @ctx is NULL, the base mft record of @ni must not be mapped on
+ * entry and it will be left unmapped on return.
+ * - If @ctx is not NULL, the base mft record must be mapped on entry
+ * and it will be left mapped on return.
+ */
+static inline s64 ntfs_cluster_free(struct ntfs_inode *ni, const s64 start_vcn,
+ s64 count, struct ntfs_attr_search_ctx *ctx)
+{
+ return __ntfs_cluster_free(ni, start_vcn, count, ctx, false);
+}
+
+int ntfs_cluster_free_from_rl_nolock(struct ntfs_volume *vol,
+ const struct runlist_element *rl);
+
+/**
+ * ntfs_cluster_free_from_rl - free clusters from runlist
+ * @vol: mounted ntfs volume on which to free the clusters
+ * @rl: runlist describing the clusters to free
+ *
+ * Free all the clusters described by the runlist @rl on the volume @vol. In
+ * the case of an error being returned, at least some of the clusters were not
+ * freed.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - This function takes the volume lcn bitmap lock for writing and
+ * modifies the bitmap contents.
+ * - The caller must have locked the runlist @rl for reading or
+ * writing.
+ */
+static inline int ntfs_cluster_free_from_rl(struct ntfs_volume *vol,
+ const struct runlist_element *rl)
+{
+ int ret;
+ unsigned int memalloc_flags;
+
+ memalloc_flags = memalloc_nofs_save();
+ down_write(&vol->lcnbmp_lock);
+ ret = ntfs_cluster_free_from_rl_nolock(vol, rl);
+ up_write(&vol->lcnbmp_lock);
+ memalloc_nofs_restore(memalloc_flags);
+ return ret;
+}
+
+#endif /* defined _LINUX_NTFS_LCNALLOC_H */
diff --git a/fs/ntfsplus/logfile.h b/fs/ntfsplus/logfile.h
new file mode 100644
index 000000000000..3c7e42425503
--- /dev/null
+++ b/fs/ntfsplus/logfile.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS kernel journal (LogFile) handling.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2000-2005 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_LOGFILE_H
+#define _LINUX_NTFS_LOGFILE_H
+
+#include "layout.h"
+
+/*
+ * Journal (LogFile) organization:
+ *
+ * Two restart areas present in the first two pages (restart pages, one restart
+ * area in each page). When the volume is dismounted they should be identical,
+ * except for the update sequence array which usually has a different update
+ * sequence number.
+ *
+ * These are followed by log records organized in pages headed by a log record
+ * header going up to log file size. Not all pages contain log records when a
+ * volume is first formatted, but as the volume ages, all records will be used.
+ * When the log file fills up, the records at the beginning are purged (by
+ * modifying the oldest_lsn to a higher value presumably) and writing begins
+ * at the beginning of the file. Effectively, the log file is viewed as a
+ * circular entity.
+ *
+ * NOTE: Windows NT, 2000, and XP all use log file version 1.1 but they accept
+ * versions <= 1.x, including 0.-1. (Yes, that is a minus one in there!) We
+ * probably only want to support 1.1 as this seems to be the current version
+ * and we don't know how that differs from the older versions. The only
+ * exception is if the journal is clean as marked by the two restart pages
+ * then it doesn't matter whether we are on an earlier version. We can just
+ * reinitialize the logfile and start again with version 1.1.
+ */
+
+/* Some LogFile related constants. */
+#define MaxLogFileSize 0x100000000ULL
+#define DefaultLogPageSize 4096
+#define MinLogRecordPages 48
+
+/*
+ * Log file restart page header (begins the restart area).
+ */
+struct restart_page_header {
+ __le32 magic; /* The magic is "RSTR". */
+ __le16 usa_ofs; /*
+ * See ntfs_record struct definition in layout.h.
+ * When creating, set this to be immediately after
+ * this header structure (without any alignment).
+ */
+ __le16 usa_count; /* See ntfs_record struct definition in layout.h. */
+
+ __le64 chkdsk_lsn; /*
+ * The last log file sequence number found by chkdsk.
+ * Only used when the magic is changed to "CHKD".
+ * Otherwise this is zero.
+ */
+ __le32 system_page_size; /*
+ * Byte size of system pages when the log file was created,
+ * has to be >= 512 and a power of 2. Use this to calculate
+ * the required size of the usa (usa_count) and add it to
+ * usa_ofs. Then verify that the result is less than
+ * the value of the restart_area_offset.
+ */
+ __le32 log_page_size; /*
+ * Byte size of log file pages, has to be >= 512 and
+ * a power of 2. The default is 4096 and is used
+ * when the system page size is between 4096 and 8192.
+ * Otherwise this is set to the system page size instead.
+ */
+ __le16 restart_area_offset; /*
+ * Byte offset from the start of this header to
+ * the RESTART_AREA. Value has to be aligned to 8-byte
+ * boundary. When creating, set this to be after the usa.
+ */
+ __le16 minor_ver; /* Log file minor version. Only check if major version is 1. */
+ __le16 major_ver; /* Log file major version. We only support version 1.1. */
+/* sizeof() = 30 (0x1e) bytes */
+} __packed;
+
+/*
+ * Constant for the log client indices meaning that there are no client records
+ * in this particular client array. Also inside the client records themselves,
+ * this means that there are no client records preceding or following this one.
+ */
+#define LOGFILE_NO_CLIENT cpu_to_le16(0xffff)
+#define LOGFILE_NO_CLIENT_CPU 0xffff
+
+/*
+ * These are the so far known RESTART_AREA_* flags (16-bit) which contain
+ * information about the log file in which they are present.
+ */
+enum {
+ RESTART_VOLUME_IS_CLEAN = cpu_to_le16(0x0002),
+ RESTART_SPACE_FILLER = cpu_to_le16(0xffff), /* gcc: Force enum bit width to 16. */
+} __packed;
+
+/*
+ * Log file restart area record. The offset of this record is found by adding
+ * the offset of the RESTART_PAGE_HEADER to the restart_area_offset value found
+ * in it. See notes at restart_area_offset above.
+ */
+struct restart_area {
+ __le64 current_lsn; /*
+ * The current, i.e. last LSN inside the log
+ * when the restart area was last written.
+ * This happens often but what is the interval?
+ * Is it just fixed time or is it every time a
+ * check point is written or somethine else?
+ * On create set to 0.
+ */
+ __le16 log_clients; /*
+ * Number of log client records in the array of
+ * log client records which follows this
+ * restart area. Must be 1.
+ */
+ __le16 client_free_list; /*
+ * The index of the first free log client record
+ * in the array of log client records.
+ * LOGFILE_NO_CLIENT means that there are no
+ * free log client records in the array.
+ * If != LOGFILE_NO_CLIENT, check that
+ * log_clients > client_free_list. On Win2k
+ * and presumably earlier, on a clean volume
+ * this is != LOGFILE_NO_CLIENT, and it should
+ * be 0, i.e. the first (and only) client
+ * record is free and thus the logfile is
+ * closed and hence clean. A dirty volume
+ * would have left the logfile open and hence
+ * this would be LOGFILE_NO_CLIENT. On WinXP
+ * and presumably later, the logfile is always
+ * open, even on clean shutdown so this should
+ * always be LOGFILE_NO_CLIENT.
+ */
+ __le16 client_in_use_list; /*
+ * The index of the first in-use log client
+ * record in the array of log client records.
+ * LOGFILE_NO_CLIENT means that there are no
+ * in-use log client records in the array. If
+ * != LOGFILE_NO_CLIENT check that log_clients
+ * > client_in_use_list. On Win2k and
+ * presumably earlier, on a clean volume this
+ * is LOGFILE_NO_CLIENT, i.e. there are no
+ * client records in use and thus the logfile
+ * is closed and hence clean. A dirty volume
+ * would have left the logfile open and hence
+ * this would be != LOGFILE_NO_CLIENT, and it
+ * should be 0, i.e. the first (and only)
+ * client record is in use. On WinXP and
+ * presumably later, the logfile is always
+ * open, even on clean shutdown so this should
+ * always be 0.
+ */
+ __le16 flags; /*
+ * Flags modifying LFS behaviour. On Win2k
+ * and presumably earlier this is always 0. On
+ * WinXP and presumably later, if the logfile
+ * was shutdown cleanly, the second bit,
+ * RESTART_VOLUME_IS_CLEAN, is set. This bit
+ * is cleared when the volume is mounted by
+ * WinXP and set when the volume is dismounted,
+ * thus if the logfile is dirty, this bit is
+ * clear. Thus we don't need to check the
+ * Windows version to determine if the logfile
+ * is clean. Instead if the logfile is closed,
+ * we know it must be clean. If it is open and
+ * this bit is set, we also know it must be
+ * clean. If on the other hand the logfile is
+ * open and this bit is clear, we can be almost
+ * certain that the logfile is dirty.
+ */
+ __le32 seq_number_bits; /*
+ * How many bits to use for the sequence
+ * number. This is calculated as 67 - the
+ * number of bits required to store the logfile
+ * size in bytes and this can be used in with
+ * the specified file_size as a consistency
+ * check.
+ */
+ __le16 restart_area_length; /*
+ * Length of the restart area including the
+ * client array. Following checks required if
+ * version matches. Otherwise, skip them.
+ * restart_area_offset + restart_area_length
+ * has to be <= system_page_size. Also,
+ * restart_area_length has to be >=
+ * client_array_offset + (log_clients *
+ * sizeof(log client record)).
+ */
+ __le16 client_array_offset; /*
+ * Offset from the start of this record to
+ * the first log client record if versions are
+ * matched. When creating, set this to be
+ * after this restart area structure, aligned
+ * to 8-bytes boundary. If the versions do not
+ * match, this is ignored and the offset is
+ * assumed to be (sizeof(RESTART_AREA) + 7) &
+ * ~7, i.e. rounded up to first 8-byte
+ * boundary. Either way, client_array_offset
+ * has to be aligned to an 8-byte boundary.
+ * Also, restart_area_offset +
+ * client_array_offset has to be <= 510.
+ * Finally, client_array_offset + (log_clients
+ * sizeof(log client record)) has to be <=
+ * system_page_size. On Win2k and presumably
+ * earlier, this is 0x30, i.e. immediately
+ * following this record. On WinXP and
+ * presumably later, this is 0x40, i.e. there
+ * are 16 extra bytes between this record and
+ * the client array. This probably means that
+ * the RESTART_AREA record is actually bigger
+ * in WinXP and later.
+ */
+ __le64 file_size; /*
+ * Usable byte size of the log file. If the
+ * restart_area_offset + the offset of the
+ * file_size are > 510 then corruption has
+ * occurred. This is the very first check when
+ * starting with the restart_area as if it
+ * fails it means that some of the above values
+ * will be corrupted by the multi sector
+ * transfer protection. The file_size has to
+ * be rounded down to be a multiple of the
+ * log_page_size in the RESTART_PAGE_HEADER and
+ * then it has to be at least big enough to
+ * store the two restart pages and 48 (0x30)
+ * log record pages.
+ */
+ __le32 last_lsn_data_length; /*
+ * Length of data of last LSN, not including
+ * the log record header. On create set to 0.
+ */
+ __le16 log_record_header_length; /*
+ * Byte size of the log record header.
+ * If the version matches then check that the
+ * value of log_record_header_length is a
+ * multiple of 8,
+ * i.e. (log_record_header_length + 7) & ~7 ==
+ * log_record_header_length. When creating set
+ * it to sizeof(LOG_RECORD_HEADER), aligned to
+ * 8 bytes.
+ */
+ __le16 log_page_data_offset; /*
+ * Offset to the start of data in a log record
+ * page. Must be a multiple of 8. On create
+ * set it to immediately after the update sequence
+ * array of the log record page.
+ */
+ __le32 restart_log_open_count; /*
+ * A counter that gets incremented every time
+ * the logfile is restarted which happens at mount
+ * time when the logfile is opened. When creating
+ * set to a random value. Win2k sets it to the low
+ * 32 bits of the current system time in NTFS format
+ * (see time.h).
+ */
+ __le32 reserved; /* Reserved/alignment to 8-byte boundary. */
+/* sizeof() = 48 (0x30) bytes */
+} __packed;
+
+/*
+ * Log client record. The offset of this record is found by adding the offset
+ * of the RESTART_AREA to the client_array_offset value found in it.
+ */
+struct log_client_record {
+ __le64 oldest_lsn; /*
+ * Oldest LSN needed by this client. On create
+ * set to 0.
+ */
+ __le64 client_restart_lsn; /*
+ * LSN at which this client needs to restart
+ * the volume, i.e. the current position within
+ * the log file. At present, if clean this
+ * should = current_lsn in restart area but it
+ * probably also = current_lsn when dirty most
+ * of the time. At create set to 0.
+ */
+ __le16 prev_client; /*
+ * The offset to the previous log client record
+ * in the array of log client records.
+ * LOGFILE_NO_CLIENT means there is no previous
+ * client record, i.e. this is the first one.
+ * This is always LOGFILE_NO_CLIENT.
+ */
+ __le16 next_client; /*
+ * The offset to the next log client record in
+ * the array of log client records.
+ * LOGFILE_NO_CLIENT means there are no next
+ * client records, i.e. this is the last one.
+ * This is always LOGFILE_NO_CLIENT.
+ */
+ __le16 seq_number; /*
+ * On Win2k and presumably earlier, this is set
+ * to zero every time the logfile is restarted
+ * and it is incremented when the logfile is
+ * closed at dismount time. Thus it is 0 when
+ * dirty and 1 when clean. On WinXP and
+ * presumably later, this is always 0.
+ */
+ u8 reserved[6]; /* Reserved/alignment. */
+ __le32 client_name_length; /* Length of client name in bytes. Should always be 8. */
+ __le16 client_name[64]; /*
+ * Name of the client in Unicode.
+ * Should always be "NTFS" with the remaining bytes
+ * set to 0.
+ */
+/* sizeof() = 160 (0xa0) bytes */
+} __packed;
+
+bool ntfs_check_logfile(struct inode *log_vi,
+ struct restart_page_header **rp);
+bool ntfs_empty_logfile(struct inode *log_vi);
+#endif /* _LINUX_NTFS_LOGFILE_H */
diff --git a/fs/ntfsplus/mft.h b/fs/ntfsplus/mft.h
new file mode 100644
index 000000000000..19c05ec2f278
--- /dev/null
+++ b/fs/ntfsplus/mft.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for mft record handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_MFT_H
+#define _LINUX_NTFS_MFT_H
+
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+
+#include "inode.h"
+
+struct mft_record *map_mft_record(struct ntfs_inode *ni);
+void unmap_mft_record(struct ntfs_inode *ni);
+struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 mref,
+ struct ntfs_inode **ntfs_ino);
+
+static inline void unmap_extent_mft_record(struct ntfs_inode *ni)
+{
+ unmap_mft_record(ni);
+}
+
+void __mark_mft_record_dirty(struct ntfs_inode *ni);
+
+/**
+ * mark_mft_record_dirty - set the mft record and the page containing it dirty
+ * @ni: ntfs inode describing the mapped mft record
+ *
+ * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
+ * as well as the page containing the mft record, dirty. Also, mark the base
+ * vfs inode dirty. This ensures that any changes to the mft record are
+ * written out to disk.
+ *
+ * NOTE: Do not do anything if the mft record is already marked dirty.
+ */
+static inline void mark_mft_record_dirty(struct ntfs_inode *ni)
+{
+ if (!NInoTestSetDirty(ni))
+ __mark_mft_record_dirty(ni);
+}
+
+int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_no,
+ struct mft_record *m);
+int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int sync);
+
+/**
+ * write_mft_record - write out a mapped (extent) mft record
+ * @ni: ntfs inode describing the mapped (extent) mft record
+ * @m: mapped (extent) mft record to write
+ * @sync: if true, wait for i/o completion
+ *
+ * This is just a wrapper for write_mft_record_nolock() (see mft.c), which
+ * locks the page for the duration of the write. This ensures that there are
+ * no race conditions between writing the mft record via the dirty inode code
+ * paths and via the page cache write back code paths or between writing
+ * neighbouring mft records residing in the same page.
+ *
+ * Locking the page also serializes us against ->read_folio() if the page is not
+ * uptodate.
+ *
+ * On success, clean the mft record and return 0. On error, leave the mft
+ * record dirty and return -errno.
+ */
+static inline int write_mft_record(struct ntfs_inode *ni, struct mft_record *m, int sync)
+{
+ struct folio *folio = ni->folio;
+ int err;
+
+ BUG_ON(!folio);
+ folio_lock(folio);
+ err = write_mft_record_nolock(ni, m, sync);
+ folio_unlock(folio);
+
+ return err;
+}
+
+bool ntfs_may_write_mft_record(struct ntfs_volume *vol,
+ const unsigned long mft_no, const struct mft_record *m,
+ struct ntfs_inode **locked_ni);
+int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode,
+ struct ntfs_inode **ni, struct ntfs_inode *base_ni,
+ struct mft_record **ni_mrec);
+int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni);
+int ntfs_mft_records_write(const struct ntfs_volume *vol, const u64 mref,
+ const s64 count, struct mft_record *b);
+int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record *m,
+ unsigned long mft_no);
+
+#endif /* _LINUX_NTFS_MFT_H */
diff --git a/fs/ntfsplus/misc.h b/fs/ntfsplus/misc.h
new file mode 100644
index 000000000000..3952c6c18bd0
--- /dev/null
+++ b/fs/ntfsplus/misc.h
@@ -0,0 +1,218 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * NTFS kernel debug support. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#ifndef _LINUX_NTFS_MISC_H
+#define _LINUX_NTFS_MISC_H
+
+#include <linux/fs.h>
+#include <linux/vmalloc.h>
+#include <linux/highmem.h>
+
+#include "runlist.h"
+
+#ifdef DEBUG
+
+extern int debug_msgs;
+
+extern __printf(4, 5)
+void __ntfs_debug(const char *file, int line, const char *function,
+ const char *format, ...);
+/**
+ * ntfs_debug - write a debug level message to syslog
+ * @f: a printf format string containing the message
+ * @...: the variables to substitute into @f
+ *
+ * ntfs_debug() writes a DEBUG level message to the syslog but only if the
+ * driver was compiled with -DDEBUG. Otherwise, the call turns into a NOP.
+ */
+#define ntfs_debug(f, a...) \
+ __ntfs_debug(__FILE__, __LINE__, __func__, f, ##a)
+
+void ntfs_debug_dump_runlist(const struct runlist_element *rl);
+
+#else /* !DEBUG */
+
+#define ntfs_debug(fmt, ...) \
+do { \
+ if (0) \
+ no_printk(fmt, ##__VA_ARGS__); \
+} while (0)
+
+#define ntfs_debug_dump_runlist(rl) \
+do { \
+ if (0) \
+ (void)rl; \
+} while (0)
+
+#endif /* !DEBUG */
+
+extern __printf(3, 4)
+void __ntfs_warning(const char *function, const struct super_block *sb,
+ const char *fmt, ...);
+#define ntfs_warning(sb, f, a...) __ntfs_warning(__func__, sb, f, ##a)
+
+extern __printf(3, 4)
+void __ntfs_error(const char *function, struct super_block *sb,
+ const char *fmt, ...);
+#define ntfs_error(sb, f, a...) __ntfs_error(__func__, sb, f, ##a)
+
+void ntfs_handle_error(struct super_block *sb);
+
+#if defined(DEBUG) && defined(CONFIG_SYSCTL)
+int ntfs_sysctl(int add);
+#else
+/* Just return success. */
+static inline int ntfs_sysctl(int add)
+{
+ return 0;
+}
+#endif
+
+#define NTFS_TIME_OFFSET ((s64)(369 * 365 + 89) * 24 * 3600 * 10000000)
+
+/**
+ * utc2ntfs - convert Linux UTC time to NTFS time
+ * @ts: Linux UTC time to convert to NTFS time
+ *
+ * Convert the Linux UTC time @ts to its corresponding NTFS time and return
+ * that in little endian format.
+ *
+ * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
+ * and a long tv_nsec where tv_sec is the number of 1-second intervals since
+ * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
+ * intervals since the value of tv_sec.
+ *
+ * NTFS uses Microsoft's standard time format which is stored in a s64 and is
+ * measured as the number of 100-nano-second intervals since 1st January 1601,
+ * 00:00:00 UTC.
+ */
+static inline __le64 utc2ntfs(const struct timespec64 ts)
+{
+ /*
+ * Convert the seconds to 100ns intervals, add the nano-seconds
+ * converted to 100ns intervals, and then add the NTFS time offset.
+ */
+ return cpu_to_le64((s64)ts.tv_sec * 10000000 + ts.tv_nsec / 100 +
+ NTFS_TIME_OFFSET);
+}
+
+/**
+ * ntfs2utc - convert NTFS time to Linux time
+ * @time: NTFS time (little endian) to convert to Linux UTC
+ *
+ * Convert the little endian NTFS time @time to its corresponding Linux UTC
+ * time and return that in cpu format.
+ *
+ * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
+ * and a long tv_nsec where tv_sec is the number of 1-second intervals since
+ * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
+ * intervals since the value of tv_sec.
+ *
+ * NTFS uses Microsoft's standard time format which is stored in a s64 and is
+ * measured as the number of 100 nano-second intervals since 1st January 1601,
+ * 00:00:00 UTC.
+ */
+static inline struct timespec64 ntfs2utc(const __le64 time)
+{
+ struct timespec64 ts;
+
+ /* Subtract the NTFS time offset. */
+ u64 t = (u64)(le64_to_cpu(time) - NTFS_TIME_OFFSET);
+ /*
+ * Convert the time to 1-second intervals and the remainder to
+ * 1-nano-second intervals.
+ */
+ ts.tv_nsec = do_div(t, 10000000) * 100;
+ ts.tv_sec = t;
+ return ts;
+}
+
+/**
+ * __ntfs_malloc - allocate memory in multiples of pages
+ * @size: number of bytes to allocate
+ * @gfp_mask: extra flags for the allocator
+ *
+ * Internal function. You probably want ntfs_malloc_nofs()...
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
+ * returns a pointer to the allocated memory.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ * Depending on @gfp_mask the allocation may be guaranteed to succeed.
+ */
+static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
+{
+ if (likely(size <= PAGE_SIZE)) {
+ if (!size)
+ return NULL;
+ /* kmalloc() has per-CPU caches so is faster for now. */
+ return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
+ /* return (void *)__get_free_page(gfp_mask); */
+ }
+ if (likely((size >> PAGE_SHIFT) < totalram_pages()))
+ return __vmalloc(size, gfp_mask);
+ return NULL;
+}
+
+/**
+ * ntfs_malloc_nofs - allocate memory in multiples of pages
+ * @size: number of bytes to allocate
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
+ * returns a pointer to the allocated memory.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ */
+static inline void *ntfs_malloc_nofs(unsigned long size)
+{
+ return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_ZERO);
+}
+
+/**
+ * ntfs_malloc_nofs_nofail - allocate memory in multiples of pages
+ * @size: number of bytes to allocate
+ *
+ * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
+ * returns a pointer to the allocated memory.
+ *
+ * This function guarantees that the allocation will succeed. It will sleep
+ * for as long as it takes to complete the allocation.
+ *
+ * If there was insufficient memory to complete the request, return NULL.
+ */
+static inline void *ntfs_malloc_nofs_nofail(unsigned long size)
+{
+ return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_NOFAIL);
+}
+
+static inline void ntfs_free(void *addr)
+{
+ kvfree(addr);
+}
+
+static inline void *ntfs_realloc_nofs(void *addr, unsigned long new_size,
+ unsigned long cpy_size)
+{
+ void *pnew_addr;
+
+ if (new_size == 0) {
+ ntfs_free(addr);
+ return NULL;
+ }
+
+ pnew_addr = ntfs_malloc_nofs(new_size);
+ if (pnew_addr == NULL)
+ return NULL;
+ if (addr) {
+ cpy_size = min(cpy_size, new_size);
+ if (cpy_size)
+ memcpy(pnew_addr, addr, cpy_size);
+ ntfs_free(addr);
+ }
+ return pnew_addr;
+}
+#endif /* _LINUX_NTFS_MISC_H */
diff --git a/fs/ntfsplus/ntfs.h b/fs/ntfsplus/ntfs.h
new file mode 100644
index 000000000000..abcd65860de7
--- /dev/null
+++ b/fs/ntfsplus/ntfs.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for NTFS Linux kernel driver.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (C) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_H
+#define _LINUX_NTFS_H
+
+#include <linux/stddef.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/nls.h>
+#include <linux/smp.h>
+#include <linux/pagemap.h>
+#include <linux/uidgid.h>
+
+#include "volume.h"
+#include "layout.h"
+#include "inode.h"
+
+#define NTFS_DEF_PREALLOC_SIZE (64*1024*1024)
+
+#define STANDARD_COMPRESSION_UNIT 4
+#define MAX_COMPRESSION_CLUSTER_SIZE 4096
+
+#define UCHAR_T_SIZE_BITS 1
+
+enum {
+ NTFS_BLOCK_SIZE = 512,
+ NTFS_BLOCK_SIZE_BITS = 9,
+ NTFS_SB_MAGIC = 0x5346544e, /* 'NTFS' */
+ NTFS_MAX_NAME_LEN = 255,
+};
+
+enum {
+ CASE_SENSITIVE = 0,
+ IGNORE_CASE = 1,
+};
+
+/* Global variables. */
+
+/* Slab caches (from super.c). */
+extern struct kmem_cache *ntfs_name_cache;
+extern struct kmem_cache *ntfs_inode_cache;
+extern struct kmem_cache *ntfs_big_inode_cache;
+extern struct kmem_cache *ntfs_attr_ctx_cache;
+extern struct kmem_cache *ntfs_index_ctx_cache;
+
+/* The various operations structs defined throughout the driver files. */
+extern const struct address_space_operations ntfs_normal_aops;
+extern const struct address_space_operations ntfs_compressed_aops;
+extern const struct address_space_operations ntfs_mst_aops;
+
+extern const struct file_operations ntfs_file_ops;
+extern const struct inode_operations ntfs_file_inode_ops;
+extern const struct inode_operations ntfs_symlink_inode_operations;
+extern const struct inode_operations ntfs_special_inode_operations;
+
+extern const struct file_operations ntfs_dir_ops;
+extern const struct inode_operations ntfs_dir_inode_ops;
+
+extern const struct file_operations ntfs_empty_file_ops;
+extern const struct inode_operations ntfs_empty_inode_ops;
+
+extern const struct export_operations ntfs_export_ops;
+
+/**
+ * NTFS_SB - return the ntfs volume given a vfs super block
+ * @sb: VFS super block
+ *
+ * NTFS_SB() returns the ntfs volume associated with the VFS super block @sb.
+ */
+static inline struct ntfs_volume *NTFS_SB(struct super_block *sb)
+{
+ return sb->s_fs_info;
+}
+
+/* Declarations of functions and global variables. */
+
+/* From fs/ntfs/compress.c */
+int ntfs_read_compressed_block(struct folio *folio);
+int allocate_compression_buffers(void);
+void free_compression_buffers(void);
+int ntfs_compress_write(struct ntfs_inode *ni, loff_t pos, size_t count,
+ struct iov_iter *from);
+
+/* From fs/ntfs/super.c */
+#define default_upcase_len 0x10000
+extern struct mutex ntfs_lock;
+
+struct option_t {
+ int val;
+ char *str;
+};
+extern const struct option_t on_errors_arr[];
+int ntfs_set_volume_flags(struct ntfs_volume *vol, __le16 flags);
+int ntfs_clear_volume_flags(struct ntfs_volume *vol, __le16 flags);
+
+/* From fs/ntfs/mst.c */
+int post_read_mst_fixup(struct ntfs_record *b, const u32 size);
+int pre_write_mst_fixup(struct ntfs_record *b, const u32 size);
+void post_write_mst_fixup(struct ntfs_record *b);
+
+/* From fs/ntfs/unistr.c */
+bool ntfs_are_names_equal(const __le16 *s1, size_t s1_len,
+ const __le16 *s2, size_t s2_len,
+ const u32 ic,
+ const __le16 *upcase, const u32 upcase_size);
+int ntfs_collate_names(const __le16 *name1, const u32 name1_len,
+ const __le16 *name2, const u32 name2_len,
+ const int err_val, const u32 ic,
+ const __le16 *upcase, const u32 upcase_len);
+int ntfs_ucsncmp(const __le16 *s1, const __le16 *s2, size_t n);
+int ntfs_ucsncasecmp(const __le16 *s1, const __le16 *s2, size_t n,
+ const __le16 *upcase, const u32 upcase_size);
+int ntfs_file_compare_values(const struct file_name_attr *file_name_attr1,
+ const struct file_name_attr *file_name_attr2,
+ const int err_val, const u32 ic,
+ const __le16 *upcase, const u32 upcase_len);
+int ntfs_nlstoucs(const struct ntfs_volume *vol, const char *ins,
+ const int ins_len, __le16 **outs, int max_name_len);
+int ntfs_ucstonls(const struct ntfs_volume *vol, const __le16 *ins,
+ const int ins_len, unsigned char **outs, int outs_len);
+__le16 *ntfs_ucsndup(const __le16 *s, u32 maxlen);
+bool ntfs_names_are_equal(const __le16 *s1, size_t s1_len,
+ const __le16 *s2, size_t s2_len,
+ const u32 ic,
+ const __le16 *upcase, const u32 upcase_size);
+int ntfs_force_shutdown(struct super_block *sb, u32 flags);
+long ntfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+#ifdef CONFIG_COMPAT
+long ntfs_compat_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg);
+#endif
+
+/* From fs/ntfs/upcase.c */
+__le16 *generate_default_upcase(void);
+
+static inline int ntfs_ffs(int x)
+{
+ int r = 1;
+
+ if (!x)
+ return 0;
+ if (!(x & 0xffff)) {
+ x >>= 16;
+ r += 16;
+ }
+ if (!(x & 0xff)) {
+ x >>= 8;
+ r += 8;
+ }
+ if (!(x & 0xf)) {
+ x >>= 4;
+ r += 4;
+ }
+ if (!(x & 3)) {
+ x >>= 2;
+ r += 2;
+ }
+ if (!(x & 1))
+ r += 1;
+ return r;
+}
+
+#endif /* _LINUX_NTFS_H */
diff --git a/fs/ntfsplus/ntfs_iomap.h b/fs/ntfsplus/ntfs_iomap.h
new file mode 100644
index 000000000000..b1a5d55fa077
--- /dev/null
+++ b/fs/ntfsplus/ntfs_iomap.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_IOMAP_H
+#define _LINUX_NTFS_IOMAP_H
+
+#include <linux/pagemap.h>
+#include <linux/iomap.h>
+
+#include "volume.h"
+#include "inode.h"
+
+extern const struct iomap_ops ntfs_write_iomap_ops;
+extern const struct iomap_ops ntfs_read_iomap_ops;
+extern const struct iomap_ops ntfs_page_mkwrite_iomap_ops;
+extern const struct iomap_ops ntfs_dio_iomap_ops;
+extern const struct iomap_writeback_ops ntfs_writeback_ops;
+extern const struct iomap_write_ops ntfs_iomap_folio_ops;
+int ntfs_zeroed_clusters(struct inode *vi, s64 lcn, s64 num);
+#endif /* _LINUX_NTFS_IOMAP_H */
diff --git a/fs/ntfsplus/reparse.h b/fs/ntfsplus/reparse.h
new file mode 100644
index 000000000000..a1f3829a89da
--- /dev/null
+++ b/fs/ntfsplus/reparse.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/**
+ * Copyright (c) 2008-2021 Jean-Pierre Andre
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+extern __le16 reparse_index_name[];
+
+unsigned int ntfs_make_symlink(struct ntfs_inode *ni);
+unsigned int ntfs_reparse_tag_dt_types(struct ntfs_volume *vol, unsigned long mref);
+int ntfs_reparse_set_wsl_symlink(struct ntfs_inode *ni,
+ const __le16 *target, int target_len);
+int ntfs_reparse_set_wsl_not_symlink(struct ntfs_inode *ni, mode_t mode);
+int ntfs_delete_reparse_index(struct ntfs_inode *ni);
+int ntfs_remove_ntfs_reparse_data(struct ntfs_inode *ni);
diff --git a/fs/ntfsplus/runlist.h b/fs/ntfsplus/runlist.h
new file mode 100644
index 000000000000..c9d88116371d
--- /dev/null
+++ b/fs/ntfsplus/runlist.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for runlist handling in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2005 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_RUNLIST_H
+#define _LINUX_NTFS_RUNLIST_H
+
+#include "volume.h"
+
+/**
+ * runlist_element - in memory vcn to lcn mapping array element
+ * @vcn: starting vcn of the current array element
+ * @lcn: starting lcn of the current array element
+ * @length: length in clusters of the current array element
+ *
+ * The last vcn (in fact the last vcn + 1) is reached when length == 0.
+ *
+ * When lcn == -1 this means that the count vcns starting at vcn are not
+ * physically allocated (i.e. this is a hole / data is sparse).
+ */
+struct runlist_element { /* In memory vcn to lcn mapping structure element. */
+ s64 vcn; /* vcn = Starting virtual cluster number. */
+ s64 lcn; /* lcn = Starting logical cluster number. */
+ s64 length; /* Run length in clusters. */
+};
+
+/**
+ * runlist - in memory vcn to lcn mapping array including a read/write lock
+ * @rl: pointer to an array of runlist elements
+ * @lock: read/write spinlock for serializing access to @rl
+ *
+ */
+struct runlist {
+ struct runlist_element *rl;
+ struct rw_semaphore lock;
+ size_t count;
+};
+
+static inline void ntfs_init_runlist(struct runlist *rl)
+{
+ rl->rl = NULL;
+ init_rwsem(&rl->lock);
+ rl->count = 0;
+}
+
+enum {
+ LCN_DELALLOC = -1,
+ LCN_HOLE = -2,
+ LCN_RL_NOT_MAPPED = -3,
+ LCN_ENOENT = -4,
+ LCN_ENOMEM = -5,
+ LCN_EIO = -6,
+ LCN_EINVAL = -7,
+};
+
+struct runlist_element *ntfs_runlists_merge(struct runlist *d_runlist,
+ struct runlist_element *srl, size_t s_rl_count,
+ size_t *new_rl_count);
+struct runlist_element *ntfs_mapping_pairs_decompress(const struct ntfs_volume *vol,
+ const struct attr_record *attr, struct runlist *old_runlist,
+ size_t *new_rl_count);
+s64 ntfs_rl_vcn_to_lcn(const struct runlist_element *rl, const s64 vcn);
+struct runlist_element *ntfs_rl_find_vcn_nolock(struct runlist_element *rl, const s64 vcn);
+int ntfs_get_size_for_mapping_pairs(const struct ntfs_volume *vol,
+ const struct runlist_element *rl, const s64 first_vcn,
+ const s64 last_vcn, int max_mp_size);
+int ntfs_mapping_pairs_build(const struct ntfs_volume *vol, s8 *dst,
+ const int dst_len, const struct runlist_element *rl,
+ const s64 first_vcn, const s64 last_vcn, s64 *const stop_vcn,
+ struct runlist_element **stop_rl, unsigned int *de_cluster_count);
+int ntfs_rl_truncate_nolock(const struct ntfs_volume *vol,
+ struct runlist *const runlist, const s64 new_length);
+int ntfs_rl_sparse(struct runlist_element *rl);
+s64 ntfs_rl_get_compressed_size(struct ntfs_volume *vol, struct runlist_element *rl);
+struct runlist_element *ntfs_rl_insert_range(struct runlist_element *dst_rl, int dst_cnt,
+ struct runlist_element *src_rl, int src_cnt, size_t *new_cnt);
+struct runlist_element *ntfs_rl_punch_hole(struct runlist_element *dst_rl, int dst_cnt,
+ s64 start_vcn, s64 len, struct runlist_element **punch_rl,
+ size_t *new_rl_cnt);
+struct runlist_element *ntfs_rl_collapse_range(struct runlist_element *dst_rl, int dst_cnt,
+ s64 start_vcn, s64 len, struct runlist_element **punch_rl,
+ size_t *new_rl_cnt);
+struct runlist_element *ntfs_rl_realloc(struct runlist_element *rl, int old_size,
+ int new_size);
+#endif /* _LINUX_NTFS_RUNLIST_H */
diff --git a/fs/ntfsplus/volume.h b/fs/ntfsplus/volume.h
new file mode 100644
index 000000000000..0bc8df650225
--- /dev/null
+++ b/fs/ntfsplus/volume.h
@@ -0,0 +1,241 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Defines for volume structures in NTFS Linux kernel driver.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _LINUX_NTFS_VOLUME_H
+#define _LINUX_NTFS_VOLUME_H
+
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/uidgid.h>
+#include <linux/workqueue.h>
+#include <linux/errseq.h>
+
+#include "layout.h"
+
+#define NTFS_VOL_UID BIT(1)
+#define NTFS_VOL_GID BIT(2)
+
+/*
+ * The NTFS in memory super block structure.
+ */
+struct ntfs_volume {
+ /* Device specifics. */
+ struct super_block *sb; /* Pointer back to the super_block. */
+ s64 nr_blocks; /*
+ * Number of sb->s_blocksize bytes
+ * sized blocks on the device.
+ */
+ /* Configuration provided by user at mount time. */
+ unsigned long flags; /* Miscellaneous flags, see below. */
+ kuid_t uid; /* uid that files will be mounted as. */
+ kgid_t gid; /* gid that files will be mounted as. */
+ umode_t fmask; /* The mask for file permissions. */
+ umode_t dmask; /* The mask for directory permissions. */
+ u8 mft_zone_multiplier; /* Initial mft zone multiplier. */
+ u8 on_errors; /* What to do on filesystem errors. */
+ errseq_t wb_err;
+ /* NTFS bootsector provided information. */
+ u16 sector_size; /* in bytes */
+ u8 sector_size_bits; /* log2(sector_size) */
+ u32 cluster_size; /* in bytes */
+ u32 cluster_size_mask; /* cluster_size - 1 */
+ u8 cluster_size_bits; /* log2(cluster_size) */
+ u32 mft_record_size; /* in bytes */
+ u32 mft_record_size_mask; /* mft_record_size - 1 */
+ u8 mft_record_size_bits; /* log2(mft_record_size) */
+ u32 index_record_size; /* in bytes */
+ u32 index_record_size_mask; /* index_record_size - 1 */
+ u8 index_record_size_bits; /* log2(index_record_size) */
+ s64 nr_clusters; /*
+ * Volume size in clusters == number of
+ * bits in lcn bitmap.
+ */
+ s64 mft_lcn; /* Cluster location of mft data. */
+ s64 mftmirr_lcn; /* Cluster location of copy of mft. */
+ u64 serial_no; /* The volume serial number. */
+ /* Mount specific NTFS information. */
+ u32 upcase_len; /* Number of entries in upcase[]. */
+ __le16 *upcase; /* The upcase table. */
+
+ s32 attrdef_size; /* Size of the attribute definition table in bytes. */
+ struct attr_def *attrdef; /*
+ * Table of attribute definitions.
+ * Obtained from FILE_AttrDef.
+ */
+
+ /* Variables used by the cluster and mft allocators. */
+ s64 mft_data_pos; /*
+ * Mft record number at which to
+ * allocate the next mft record.
+ */
+ s64 mft_zone_start; /* First cluster of the mft zone. */
+ s64 mft_zone_end; /* First cluster beyond the mft zone. */
+ s64 mft_zone_pos; /* Current position in the mft zone. */
+ s64 data1_zone_pos; /* Current position in the first data zone. */
+ s64 data2_zone_pos; /* Current position in the second data zone. */
+
+ struct inode *mft_ino; /* The VFS inode of $MFT. */
+
+ struct inode *mftbmp_ino; /* Attribute inode for $MFT/$BITMAP. */
+ struct rw_semaphore mftbmp_lock; /*
+ * Lock for serializing accesses to the
+ * mft record bitmap ($MFT/$BITMAP).
+ */
+ struct inode *mftmirr_ino; /* The VFS inode of $MFTMirr. */
+ int mftmirr_size; /* Size of mft mirror in mft records. */
+
+ struct inode *logfile_ino; /* The VFS inode of LogFile. */
+
+ struct inode *lcnbmp_ino; /* The VFS inode of $Bitmap. */
+ struct rw_semaphore lcnbmp_lock; /*
+ * Lock for serializing accesses to the
+ * cluster bitmap ($Bitmap/$DATA).
+ */
+
+ struct inode *vol_ino; /* The VFS inode of $Volume. */
+ __le16 vol_flags; /* Volume flags. */
+ u8 major_ver; /* Ntfs major version of volume. */
+ u8 minor_ver; /* Ntfs minor version of volume. */
+
+ struct inode *root_ino; /* The VFS inode of the root directory. */
+ struct inode *secure_ino; /*
+ * The VFS inode of $Secure (NTFS3.0+
+ * only, otherwise NULL).
+ */
+ struct inode *extend_ino; /*
+ * The VFS inode of $Extend (NTFS3.0+
+ * only, otherwise NULL).
+ */
+ /* $Quota stuff is NTFS3.0+ specific. Unused/NULL otherwise. */
+ struct inode *quota_ino; /* The VFS inode of $Quota. */
+ struct inode *quota_q_ino; /* Attribute inode for $Quota/$Q. */
+ struct nls_table *nls_map;
+ bool nls_utf8;
+ wait_queue_head_t free_waitq;
+
+ atomic64_t free_clusters; /* Track the number of free clusters */
+ atomic64_t free_mft_records; /* Track the free mft records */
+ atomic64_t dirty_clusters;
+ u8 sparse_compression_unit;
+ unsigned int *lcn_empty_bits_per_page;
+ struct work_struct precalc_work;
+ loff_t preallocated_size;
+};
+
+/*
+ * Defined bits for the flags field in the ntfs_volume structure.
+ */
+enum {
+ NV_Errors, /* 1: Volume has errors, prevent remount rw. */
+ NV_ShowSystemFiles, /* 1: Return system files in ntfs_readdir(). */
+ NV_CaseSensitive, /*
+ * 1: Treat file names as case sensitive and
+ * create filenames in the POSIX namespace.
+ * Otherwise be case insensitive but still
+ * create file names in POSIX namespace.
+ */
+ NV_LogFileEmpty, /* 1: LogFile journal is empty. */
+ NV_QuotaOutOfDate, /* 1: Quota is out of date. */
+ NV_UsnJrnlStamped, /* 1: UsnJrnl has been stamped. */
+ NV_ReadOnly,
+ NV_Compression,
+ NV_FreeClusterKnown,
+ NV_Shutdown,
+};
+
+/*
+ * Macro tricks to expand the NVolFoo(), NVolSetFoo(), and NVolClearFoo()
+ * functions.
+ */
+#define DEFINE_NVOL_BIT_OPS(flag) \
+static inline int NVol##flag(struct ntfs_volume *vol) \
+{ \
+ return test_bit(NV_##flag, &(vol)->flags); \
+} \
+static inline void NVolSet##flag(struct ntfs_volume *vol) \
+{ \
+ set_bit(NV_##flag, &(vol)->flags); \
+} \
+static inline void NVolClear##flag(struct ntfs_volume *vol) \
+{ \
+ clear_bit(NV_##flag, &(vol)->flags); \
+}
+
+/* Emit the ntfs volume bitops functions. */
+DEFINE_NVOL_BIT_OPS(Errors)
+DEFINE_NVOL_BIT_OPS(ShowSystemFiles)
+DEFINE_NVOL_BIT_OPS(CaseSensitive)
+DEFINE_NVOL_BIT_OPS(LogFileEmpty)
+DEFINE_NVOL_BIT_OPS(QuotaOutOfDate)
+DEFINE_NVOL_BIT_OPS(UsnJrnlStamped)
+DEFINE_NVOL_BIT_OPS(ReadOnly)
+DEFINE_NVOL_BIT_OPS(Compression)
+DEFINE_NVOL_BIT_OPS(FreeClusterKnown)
+DEFINE_NVOL_BIT_OPS(Shutdown)
+
+static inline void ntfs_inc_free_clusters(struct ntfs_volume *vol, s64 nr)
+{
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+ atomic64_add(nr, &vol->free_clusters);
+}
+
+static inline void ntfs_dec_free_clusters(struct ntfs_volume *vol, s64 nr)
+{
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+ atomic64_sub(nr, &vol->free_clusters);
+}
+
+static inline void ntfs_inc_free_mft_records(struct ntfs_volume *vol, s64 nr)
+{
+ if (!NVolFreeClusterKnown(vol))
+ return;
+
+ atomic64_add(nr, &vol->free_mft_records);
+}
+
+static inline void ntfs_dec_free_mft_records(struct ntfs_volume *vol, s64 nr)
+{
+ if (!NVolFreeClusterKnown(vol))
+ return;
+
+ atomic64_sub(nr, &vol->free_mft_records);
+}
+
+static inline void ntfs_set_lcn_empty_bits(struct ntfs_volume *vol, unsigned long index,
+ u8 val, unsigned int count)
+{
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+ if (val)
+ vol->lcn_empty_bits_per_page[index] -= count;
+ else
+ vol->lcn_empty_bits_per_page[index] += count;
+}
+
+static __always_inline void ntfs_hold_dirty_clusters(struct ntfs_volume *vol, s64 nr_clusters)
+{
+ atomic64_add(nr_clusters, &vol->dirty_clusters);
+}
+
+static __always_inline void ntfs_release_dirty_clusters(struct ntfs_volume *vol, s64 nr_clusters)
+{
+ if (atomic64_read(&vol->dirty_clusters) < nr_clusters)
+ atomic64_set(&vol->dirty_clusters, 0);
+ else
+ atomic64_sub(nr_clusters, &vol->dirty_clusters);
+}
+
+s64 ntfs_available_clusters_count(struct ntfs_volume *vol, s64 nr_clusters);
+s64 get_nr_free_clusters(struct ntfs_volume *vol);
+#endif /* _LINUX_NTFS_VOLUME_H */
diff --git a/include/uapi/linux/ntfs.h b/include/uapi/linux/ntfs.h
new file mode 100644
index 000000000000..e76957285280
--- /dev/null
+++ b/include/uapi/linux/ntfs.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#ifndef _UAPI_LINUX_NTFS_H
+#define _UAPI_LINUX_NTFS_H
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/*
+ * ntfs-specific ioctl commands
+ */
+#define NTFS_IOC_SHUTDOWN _IOR('X', 125, __u32)
+
+/*
+ * Flags used by NTFS_IOC_SHUTDOWN
+ */
+#define NTFS_GOING_DOWN_DEFAULT 0x0 /* default with full sync */
+#define NTFS_GOING_DOWN_FULLSYNC 0x1 /* going down with full sync*/
+#define NTFS_GOING_DOWN_NOSYNC 0x2 /* going down */
+
+#endif /* _UAPI_LINUX_NTFS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 02/11] ntfsplus: add super block operations
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
2025-10-20 2:07 ` [PATCH 01/11] ntfsplus: in-memory, on-disk structures and headers Namjae Jeon
@ 2025-10-20 2:07 ` Namjae Jeon
2025-10-20 2:07 ` [PATCH 03/11] ntfsplus: add inode operations Namjae Jeon
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
This adds the implementation of superblock operations for ntfsplus.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/ntfsplus/super.c | 2716 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 2716 insertions(+)
create mode 100644 fs/ntfsplus/super.c
diff --git a/fs/ntfsplus/super.c b/fs/ntfsplus/super.c
new file mode 100644
index 000000000000..1803eeec5618
--- /dev/null
+++ b/fs/ntfsplus/super.c
@@ -0,0 +1,2716 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel super block handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2001,2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/blkdev.h> /* For bdev_logical_block_size(). */
+#include <linux/backing-dev.h>
+#include <linux/vfs.h>
+#include <linux/fs_struct.h>
+#include <linux/sched/mm.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
+#include <uapi/linux/ntfs.h>
+
+#include "misc.h"
+#include "logfile.h"
+#include "index.h"
+#include "ntfs.h"
+#include "ea.h"
+#include "volume.h"
+
+/* A global default upcase table and a corresponding reference count. */
+static __le16 *default_upcase;
+static unsigned long ntfs_nr_upcase_users;
+
+static struct workqueue_struct *ntfs_wq;
+
+/* Error constants/strings used in inode.c::ntfs_show_options(). */
+enum {
+ /* One of these must be present, default is ON_ERRORS_CONTINUE. */
+ ON_ERRORS_PANIC = 0x01,
+ ON_ERRORS_REMOUNT_RO = 0x02,
+ ON_ERRORS_CONTINUE = 0x04,
+};
+
+static const struct constant_table ntfs_param_enums[] = {
+ { "panic", ON_ERRORS_PANIC },
+ { "remount-ro", ON_ERRORS_REMOUNT_RO },
+ { "continue", ON_ERRORS_CONTINUE },
+ {}
+};
+
+enum {
+ Opt_uid,
+ Opt_gid,
+ Opt_umask,
+ Opt_dmask,
+ Opt_fmask,
+ Opt_errors,
+ Opt_nls,
+ Opt_show_sys_files,
+ Opt_case_sensitive,
+ Opt_disable_sparse,
+ Opt_mft_zone_multiplier,
+ Opt_preallocated_size,
+};
+
+static const struct fs_parameter_spec ntfs_parameters[] = {
+ fsparam_u32("uid", Opt_uid),
+ fsparam_u32("gid", Opt_gid),
+ fsparam_u32oct("umask", Opt_umask),
+ fsparam_u32oct("dmask", Opt_dmask),
+ fsparam_u32oct("fmask", Opt_fmask),
+ fsparam_string("nls", Opt_nls),
+ fsparam_enum("errors", Opt_errors, ntfs_param_enums),
+ fsparam_flag("show_sys_files", Opt_show_sys_files),
+ fsparam_flag("case_sensitive", Opt_case_sensitive),
+ fsparam_flag("disable_sparse", Opt_disable_sparse),
+ fsparam_s32("mft_zone_multiplier", Opt_mft_zone_multiplier),
+ fsparam_u64("preallocated_size", Opt_preallocated_size),
+ {}
+};
+
+static int ntfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+ struct ntfs_volume *vol = fc->s_fs_info;
+ struct fs_parse_result result;
+ int opt;
+ char *nls_name = NULL;
+
+ opt = fs_parse(fc, ntfs_parameters, param, &result);
+ if (opt < 0)
+ return opt;
+
+ switch (opt) {
+ case Opt_uid:
+ vol->uid = make_kuid(current_user_ns(), result.uint_32);
+ break;
+ case Opt_gid:
+ vol->gid = make_kgid(current_user_ns(), result.uint_32);
+ break;
+ case Opt_umask:
+ vol->fmask = vol->dmask = result.uint_32;
+ break;
+ case Opt_dmask:
+ vol->dmask = result.uint_32;
+ break;
+ case Opt_fmask:
+ vol->fmask = result.uint_32;
+ break;
+ case Opt_errors:
+ vol->on_errors = result.uint_32;
+ break;
+ case Opt_nls:
+ if (nls_name && nls_name != param->string)
+ kfree(nls_name);
+ nls_name = param->string;
+ vol->nls_map = load_nls(nls_name);
+ param->string = NULL;
+ break;
+ case Opt_mft_zone_multiplier:
+ if (vol->mft_zone_multiplier && vol->mft_zone_multiplier !=
+ result.int_32) {
+ ntfs_error(vol->sb, "Cannot change mft_zone_multiplier on remount.");
+ return -EINVAL;
+ }
+ if (result.int_32 < 1 || result.int_32 > 4) {
+ ntfs_error(vol->sb,
+ "Invalid mft_zone_multiplier. Using default value, i.e. 1.");
+ vol->mft_zone_multiplier = 1;
+ } else
+ vol->mft_zone_multiplier = result.int_32;
+ break;
+ case Opt_show_sys_files:
+ if (result.boolean)
+ NVolSetShowSystemFiles(vol);
+ else
+ NVolClearShowSystemFiles(vol);
+ break;
+ case Opt_case_sensitive:
+ if (result.boolean)
+ NVolSetCaseSensitive(vol);
+ else
+ NVolClearCaseSensitive(vol);
+ break;
+ case Opt_preallocated_size:
+ vol->preallocated_size = (loff_t)result.uint_64;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * ntfs_mark_quotas_out_of_date - mark the quotas out of date on an ntfs volume
+ * @vol: ntfs volume on which to mark the quotas out of date
+ *
+ * Mark the quotas out of date on the ntfs volume @vol and return 'true' on
+ * success and 'false' on error.
+ */
+static bool ntfs_mark_quotas_out_of_date(struct ntfs_volume *vol)
+{
+ struct ntfs_index_context *ictx;
+ struct quota_control_entry *qce;
+ const __le32 qid = QUOTA_DEFAULTS_ID;
+ int err;
+
+ ntfs_debug("Entering.");
+ if (NVolQuotaOutOfDate(vol))
+ goto done;
+ if (!vol->quota_ino || !vol->quota_q_ino) {
+ ntfs_error(vol->sb, "Quota inodes are not open.");
+ return false;
+ }
+ inode_lock(vol->quota_q_ino);
+ ictx = ntfs_index_ctx_get(NTFS_I(vol->quota_q_ino), I30, 4);
+ if (!ictx) {
+ ntfs_error(vol->sb, "Failed to get index context.");
+ goto err_out;
+ }
+ err = ntfs_index_lookup(&qid, sizeof(qid), ictx);
+ if (err) {
+ if (err == -ENOENT)
+ ntfs_error(vol->sb, "Quota defaults entry is not present.");
+ else
+ ntfs_error(vol->sb, "Lookup of quota defaults entry failed.");
+ goto err_out;
+ }
+ if (ictx->data_len < offsetof(struct quota_control_entry, sid)) {
+ ntfs_error(vol->sb, "Quota defaults entry size is invalid. Run chkdsk.");
+ goto err_out;
+ }
+ qce = (struct quota_control_entry *)ictx->data;
+ if (le32_to_cpu(qce->version) != QUOTA_VERSION) {
+ ntfs_error(vol->sb,
+ "Quota defaults entry version 0x%x is not supported.",
+ le32_to_cpu(qce->version));
+ goto err_out;
+ }
+ ntfs_debug("Quota defaults flags = 0x%x.", le32_to_cpu(qce->flags));
+ /* If quotas are already marked out of date, no need to do anything. */
+ if (qce->flags & QUOTA_FLAG_OUT_OF_DATE)
+ goto set_done;
+ /*
+ * If quota tracking is neither requested, nor enabled and there are no
+ * pending deletes, no need to mark the quotas out of date.
+ */
+ if (!(qce->flags & (QUOTA_FLAG_TRACKING_ENABLED |
+ QUOTA_FLAG_TRACKING_REQUESTED |
+ QUOTA_FLAG_PENDING_DELETES)))
+ goto set_done;
+ /*
+ * Set the QUOTA_FLAG_OUT_OF_DATE bit thus marking quotas out of date.
+ * This is verified on WinXP to be sufficient to cause windows to
+ * rescan the volume on boot and update all quota entries.
+ */
+ qce->flags |= QUOTA_FLAG_OUT_OF_DATE;
+ /* Ensure the modified flags are written to disk. */
+ ntfs_index_entry_flush_dcache_page(ictx);
+ ntfs_index_entry_mark_dirty(ictx);
+set_done:
+ ntfs_index_ctx_put(ictx);
+ inode_unlock(vol->quota_q_ino);
+ /*
+ * We set the flag so we do not try to mark the quotas out of date
+ * again on remount.
+ */
+ NVolSetQuotaOutOfDate(vol);
+done:
+ ntfs_debug("Done.");
+ return true;
+err_out:
+ if (ictx)
+ ntfs_index_ctx_put(ictx);
+ inode_unlock(vol->quota_q_ino);
+ return false;
+}
+
+static int ntfs_reconfigure(struct fs_context *fc)
+{
+ struct super_block *sb = fc->root->d_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+
+ ntfs_debug("Entering with remount");
+
+ sync_filesystem(sb);
+
+ /*
+ * For the read-write compiled driver, if we are remounting read-write,
+ * make sure there are no volume errors and that no unsupported volume
+ * flags are set. Also, empty the logfile journal as it would become
+ * stale as soon as something is written to the volume and mark the
+ * volume dirty so that chkdsk is run if the volume is not umounted
+ * cleanly. Finally, mark the quotas out of date so Windows rescans
+ * the volume on boot and updates them.
+ *
+ * When remounting read-only, mark the volume clean if no volume errors
+ * have occurred.
+ */
+ if (sb_rdonly(sb) && !(fc->sb_flags & SB_RDONLY)) {
+ static const char *es = ". Cannot remount read-write.";
+
+ /* Remounting read-write. */
+ if (NVolErrors(vol)) {
+ ntfs_error(sb, "Volume has errors and is read-only%s",
+ es);
+ return -EROFS;
+ }
+ if (vol->vol_flags & VOLUME_IS_DIRTY) {
+ ntfs_error(sb, "Volume is dirty and read-only%s", es);
+ return -EROFS;
+ }
+ if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
+ ntfs_error(sb, "Volume has been modified by chkdsk and is read-only%s", es);
+ return -EROFS;
+ }
+ if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
+ ntfs_error(sb, "Volume has unsupported flags set (0x%x) and is read-only%s",
+ le16_to_cpu(vol->vol_flags), es);
+ return -EROFS;
+ }
+ if (vol->logfile_ino && !ntfs_empty_logfile(vol->logfile_ino)) {
+ ntfs_error(sb, "Failed to empty journal LogFile%s",
+ es);
+ NVolSetErrors(vol);
+ return -EROFS;
+ }
+ if (!ntfs_mark_quotas_out_of_date(vol)) {
+ ntfs_error(sb, "Failed to mark quotas out of date%s",
+ es);
+ NVolSetErrors(vol);
+ return -EROFS;
+ }
+ } else if (!sb_rdonly(sb) && (fc->sb_flags & SB_RDONLY)) {
+ /* Remounting read-only. */
+ if (!NVolErrors(vol)) {
+ if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
+ ntfs_warning(sb,
+ "Failed to clear dirty bit in volume information flags. Run chkdsk.");
+ }
+ }
+
+ ntfs_debug("Done.");
+ return 0;
+}
+
+const struct option_t on_errors_arr[] = {
+ { ON_ERRORS_PANIC, "panic" },
+ { ON_ERRORS_REMOUNT_RO, "remount-ro", },
+ { ON_ERRORS_CONTINUE, "continue", },
+ { 0, NULL }
+};
+
+void ntfs_handle_error(struct super_block *sb)
+{
+ struct ntfs_volume *vol = NTFS_SB(sb);
+
+ if (sb_rdonly(sb))
+ return;
+
+ if (vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ sb->s_flags |= SB_RDONLY;
+ pr_crit("(device %s): Filesystem has been set read-only\n",
+ sb->s_id);
+ } else if (vol->on_errors == ON_ERRORS_PANIC) {
+ panic("ntfs: (device %s): panic from previous error\n",
+ sb->s_id);
+ } else if (vol->on_errors == ON_ERRORS_CONTINUE) {
+ if (errseq_check(&sb->s_wb_err, vol->wb_err) == -ENODEV) {
+ NVolSetShutdown(vol);
+ vol->wb_err = sb->s_wb_err;
+ }
+ }
+}
+
+/**
+ * ntfs_write_volume_flags - write new flags to the volume information flags
+ * @vol: ntfs volume on which to modify the flags
+ * @flags: new flags value for the volume information flags
+ *
+ * Internal function. You probably want to use ntfs_{set,clear}_volume_flags()
+ * instead (see below).
+ *
+ * Replace the volume information flags on the volume @vol with the value
+ * supplied in @flags. Note, this overwrites the volume information flags, so
+ * make sure to combine the flags you want to modify with the old flags and use
+ * the result when calling ntfs_write_volume_flags().
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_write_volume_flags(struct ntfs_volume *vol, const __le16 flags)
+{
+ struct ntfs_inode *ni = NTFS_I(vol->vol_ino);
+ struct volume_information *vi;
+ struct ntfs_attr_search_ctx *ctx;
+ int err;
+
+ ntfs_debug("Entering, old flags = 0x%x, new flags = 0x%x.",
+ le16_to_cpu(vol->vol_flags), le16_to_cpu(flags));
+ mutex_lock(&ni->mrec_lock);
+ if (vol->vol_flags == flags)
+ goto done;
+ BUG_ON(!ni);
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto put_unm_err_out;
+ }
+ err = ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+ ctx);
+ if (err)
+ goto put_unm_err_out;
+ vi = (struct volume_information *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ vol->vol_flags = vi->flags = flags;
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+done:
+ mutex_unlock(&ni->mrec_lock);
+ ntfs_debug("Done.");
+ return 0;
+put_unm_err_out:
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ mutex_unlock(&ni->mrec_lock);
+ ntfs_error(vol->sb, "Failed with error code %i.", -err);
+ return err;
+}
+
+/**
+ * ntfs_set_volume_flags - set bits in the volume information flags
+ * @vol: ntfs volume on which to modify the flags
+ * @flags: flags to set on the volume
+ *
+ * Set the bits in @flags in the volume information flags on the volume @vol.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int ntfs_set_volume_flags(struct ntfs_volume *vol, __le16 flags)
+{
+ flags &= VOLUME_FLAGS_MASK;
+ return ntfs_write_volume_flags(vol, vol->vol_flags | flags);
+}
+
+/**
+ * ntfs_clear_volume_flags - clear bits in the volume information flags
+ * @vol: ntfs volume on which to modify the flags
+ * @flags: flags to clear on the volume
+ *
+ * Clear the bits in @flags in the volume information flags on the volume @vol.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int ntfs_clear_volume_flags(struct ntfs_volume *vol, __le16 flags)
+{
+ flags &= VOLUME_FLAGS_MASK;
+ flags = vol->vol_flags & cpu_to_le16(~le16_to_cpu(flags));
+ return ntfs_write_volume_flags(vol, flags);
+}
+
+/**
+ * is_boot_sector_ntfs - check whether a boot sector is a valid NTFS boot sector
+ * @sb: Super block of the device to which @b belongs.
+ * @b: Boot sector of device @sb to check.
+ * @silent: If 'true', all output will be silenced.
+ *
+ * is_boot_sector_ntfs() checks whether the boot sector @b is a valid NTFS boot
+ * sector. Returns 'true' if it is valid and 'false' if not.
+ *
+ * @sb is only needed for warning/error output, i.e. it can be NULL when silent
+ * is 'true'.
+ */
+static bool is_boot_sector_ntfs(const struct super_block *sb,
+ const struct ntfs_boot_sector *b, const bool silent)
+{
+ /*
+ * Check that checksum == sum of u32 values from b to the checksum
+ * field. If checksum is zero, no checking is done. We will work when
+ * the checksum test fails, since some utilities update the boot sector
+ * ignoring the checksum which leaves the checksum out-of-date. We
+ * report a warning if this is the case.
+ */
+ if ((void *)b < (void *)&b->checksum && b->checksum && !silent) {
+ __le32 *u;
+ u32 i;
+
+ for (i = 0, u = (__le32 *)b; u < (__le32 *)(&b->checksum); ++u)
+ i += le32_to_cpup(u);
+ if (le32_to_cpu(b->checksum) != i)
+ ntfs_warning(sb, "Invalid boot sector checksum.");
+ }
+ /* Check OEMidentifier is "NTFS " */
+ if (b->oem_id != magicNTFS)
+ goto not_ntfs;
+ /* Check bytes per sector value is between 256 and 4096. */
+ if (le16_to_cpu(b->bpb.bytes_per_sector) < 0x100 ||
+ le16_to_cpu(b->bpb.bytes_per_sector) > 0x1000)
+ goto not_ntfs;
+ /*
+ * Check sectors per cluster value is valid and the cluster size
+ * is not above the maximum (2MB).
+ */
+ if (b->bpb.sectors_per_cluster > 0x80 &&
+ b->bpb.sectors_per_cluster < 0xf4)
+ goto not_ntfs;
+
+ /* Check reserved/unused fields are really zero. */
+ if (le16_to_cpu(b->bpb.reserved_sectors) ||
+ le16_to_cpu(b->bpb.root_entries) ||
+ le16_to_cpu(b->bpb.sectors) ||
+ le16_to_cpu(b->bpb.sectors_per_fat) ||
+ le32_to_cpu(b->bpb.large_sectors) || b->bpb.fats)
+ goto not_ntfs;
+ /* Check clusters per file mft record value is valid. */
+ if ((u8)b->clusters_per_mft_record < 0xe1 ||
+ (u8)b->clusters_per_mft_record > 0xf7)
+ switch (b->clusters_per_mft_record) {
+ case 1: case 2: case 4: case 8: case 16: case 32: case 64:
+ break;
+ default:
+ goto not_ntfs;
+ }
+ /* Check clusters per index block value is valid. */
+ if ((u8)b->clusters_per_index_record < 0xe1 ||
+ (u8)b->clusters_per_index_record > 0xf7)
+ switch (b->clusters_per_index_record) {
+ case 1: case 2: case 4: case 8: case 16: case 32: case 64:
+ break;
+ default:
+ goto not_ntfs;
+ }
+ /*
+ * Check for valid end of sector marker. We will work without it, but
+ * many BIOSes will refuse to boot from a bootsector if the magic is
+ * incorrect, so we emit a warning.
+ */
+ if (!silent && b->end_of_sector_marker != cpu_to_le16(0xaa55))
+ ntfs_warning(sb, "Invalid end of sector marker.");
+ return true;
+not_ntfs:
+ return false;
+}
+
+/**
+ * read_ntfs_boot_sector - read the NTFS boot sector of a device
+ * @sb: super block of device to read the boot sector from
+ * @silent: if true, suppress all output
+ *
+ * Reads the boot sector from the device and validates it.
+ */
+static char *read_ntfs_boot_sector(struct super_block *sb,
+ const int silent)
+{
+ char *boot_sector;
+
+ boot_sector = ntfs_malloc_nofs(PAGE_SIZE);
+ if (!boot_sector)
+ return NULL;
+
+ if (ntfs_dev_read(sb, boot_sector, 0, PAGE_SIZE)) {
+ if (!silent)
+ ntfs_error(sb, "Unable to read primary boot sector.");
+ kfree(boot_sector);
+ return NULL;
+ }
+
+ if (!is_boot_sector_ntfs(sb, (struct ntfs_boot_sector *)boot_sector,
+ silent)) {
+ if (!silent)
+ ntfs_error(sb, "Primary boot sector is invalid.");
+ kfree(boot_sector);
+ return NULL;
+ }
+
+ return boot_sector;
+}
+
+/**
+ * parse_ntfs_boot_sector - parse the boot sector and store the data in @vol
+ * @vol: volume structure to initialise with data from boot sector
+ * @b: boot sector to parse
+ *
+ * Parse the ntfs boot sector @b and store all imporant information therein in
+ * the ntfs super block @vol. Return 'true' on success and 'false' on error.
+ */
+static bool parse_ntfs_boot_sector(struct ntfs_volume *vol,
+ const struct ntfs_boot_sector *b)
+{
+ unsigned int sectors_per_cluster, sectors_per_cluster_bits, nr_hidden_sects;
+ int clusters_per_mft_record, clusters_per_index_record;
+ s64 ll;
+
+ vol->sector_size = le16_to_cpu(b->bpb.bytes_per_sector);
+ vol->sector_size_bits = ffs(vol->sector_size) - 1;
+ ntfs_debug("vol->sector_size = %i (0x%x)", vol->sector_size,
+ vol->sector_size);
+ ntfs_debug("vol->sector_size_bits = %i (0x%x)", vol->sector_size_bits,
+ vol->sector_size_bits);
+ if (vol->sector_size < vol->sb->s_blocksize) {
+ ntfs_error(vol->sb,
+ "Sector size (%i) is smaller than the device block size (%lu). This is not supported.",
+ vol->sector_size, vol->sb->s_blocksize);
+ return false;
+ }
+
+ if (b->bpb.sectors_per_cluster >= 0xf4)
+ sectors_per_cluster = 1U << -(s8)b->bpb.sectors_per_cluster;
+ else
+ sectors_per_cluster = b->bpb.sectors_per_cluster;
+ ntfs_debug("sectors_per_cluster = 0x%x", b->bpb.sectors_per_cluster);
+ sectors_per_cluster_bits = ffs(sectors_per_cluster) - 1;
+ ntfs_debug("sectors_per_cluster_bits = 0x%x",
+ sectors_per_cluster_bits);
+ nr_hidden_sects = le32_to_cpu(b->bpb.hidden_sectors);
+ ntfs_debug("number of hidden sectors = 0x%x", nr_hidden_sects);
+ vol->cluster_size = vol->sector_size << sectors_per_cluster_bits;
+ vol->cluster_size_mask = vol->cluster_size - 1;
+ vol->cluster_size_bits = ffs(vol->cluster_size) - 1;
+ ntfs_debug("vol->cluster_size = %i (0x%x)", vol->cluster_size,
+ vol->cluster_size);
+ ntfs_debug("vol->cluster_size_mask = 0x%x", vol->cluster_size_mask);
+ ntfs_debug("vol->cluster_size_bits = %i", vol->cluster_size_bits);
+ if (vol->cluster_size < vol->sector_size) {
+ ntfs_error(vol->sb,
+ "Cluster size (%i) is smaller than the sector size (%i). This is not supported.",
+ vol->cluster_size, vol->sector_size);
+ return false;
+ }
+ clusters_per_mft_record = b->clusters_per_mft_record;
+ ntfs_debug("clusters_per_mft_record = %i (0x%x)",
+ clusters_per_mft_record, clusters_per_mft_record);
+ if (clusters_per_mft_record > 0)
+ vol->mft_record_size = vol->cluster_size <<
+ (ffs(clusters_per_mft_record) - 1);
+ else
+ /*
+ * When mft_record_size < cluster_size, clusters_per_mft_record
+ * = -log2(mft_record_size) bytes. mft_record_size normaly is
+ * 1024 bytes, which is encoded as 0xF6 (-10 in decimal).
+ */
+ vol->mft_record_size = 1 << -clusters_per_mft_record;
+ vol->mft_record_size_mask = vol->mft_record_size - 1;
+ vol->mft_record_size_bits = ffs(vol->mft_record_size) - 1;
+ ntfs_debug("vol->mft_record_size = %i (0x%x)", vol->mft_record_size,
+ vol->mft_record_size);
+ ntfs_debug("vol->mft_record_size_mask = 0x%x",
+ vol->mft_record_size_mask);
+ ntfs_debug("vol->mft_record_size_bits = %i (0x%x)",
+ vol->mft_record_size_bits, vol->mft_record_size_bits);
+ /*
+ * We cannot support mft record sizes above the PAGE_SIZE since
+ * we store $MFT/$DATA, the table of mft records in the page cache.
+ */
+ if (vol->mft_record_size > PAGE_SIZE) {
+ ntfs_error(vol->sb,
+ "Mft record size (%i) exceeds the PAGE_SIZE on your system (%lu). This is not supported.",
+ vol->mft_record_size, PAGE_SIZE);
+ return false;
+ }
+ /* We cannot support mft record sizes below the sector size. */
+ if (vol->mft_record_size < vol->sector_size) {
+ ntfs_warning(vol->sb, "Mft record size (%i) is smaller than the sector size (%i).",
+ vol->mft_record_size, vol->sector_size);
+ }
+ clusters_per_index_record = b->clusters_per_index_record;
+ ntfs_debug("clusters_per_index_record = %i (0x%x)",
+ clusters_per_index_record, clusters_per_index_record);
+ if (clusters_per_index_record > 0)
+ vol->index_record_size = vol->cluster_size <<
+ (ffs(clusters_per_index_record) - 1);
+ else
+ /*
+ * When index_record_size < cluster_size,
+ * clusters_per_index_record = -log2(index_record_size) bytes.
+ * index_record_size normaly equals 4096 bytes, which is
+ * encoded as 0xF4 (-12 in decimal).
+ */
+ vol->index_record_size = 1 << -clusters_per_index_record;
+ vol->index_record_size_mask = vol->index_record_size - 1;
+ vol->index_record_size_bits = ffs(vol->index_record_size) - 1;
+ ntfs_debug("vol->index_record_size = %i (0x%x)",
+ vol->index_record_size, vol->index_record_size);
+ ntfs_debug("vol->index_record_size_mask = 0x%x",
+ vol->index_record_size_mask);
+ ntfs_debug("vol->index_record_size_bits = %i (0x%x)",
+ vol->index_record_size_bits,
+ vol->index_record_size_bits);
+ /* We cannot support index record sizes below the sector size. */
+ if (vol->index_record_size < vol->sector_size) {
+ ntfs_error(vol->sb,
+ "Index record size (%i) is smaller than the sector size (%i). This is not supported.",
+ vol->index_record_size, vol->sector_size);
+ return false;
+ }
+ /*
+ * Get the size of the volume in clusters and check for 64-bit-ness.
+ * Windows currently only uses 32 bits to save the clusters so we do
+ * the same as it is much faster on 32-bit CPUs.
+ */
+ ll = le64_to_cpu(b->number_of_sectors) >> sectors_per_cluster_bits;
+ if ((u64)ll >= 1ULL << 32) {
+ ntfs_error(vol->sb, "Cannot handle 64-bit clusters.");
+ return false;
+ }
+ vol->nr_clusters = ll;
+ ntfs_debug("vol->nr_clusters = 0x%llx", vol->nr_clusters);
+ /*
+ * On an architecture where unsigned long is 32-bits, we restrict the
+ * volume size to 2TiB (2^41). On a 64-bit architecture, the compiler
+ * will hopefully optimize the whole check away.
+ */
+ if (sizeof(unsigned long) < 8) {
+ if ((ll << vol->cluster_size_bits) >= (1ULL << 41)) {
+ ntfs_error(vol->sb,
+ "Volume size (%lluTiB) is too large for this architecture. Maximum supported is 2TiB.",
+ ll >> (40 - vol->cluster_size_bits));
+ return false;
+ }
+ }
+ ll = le64_to_cpu(b->mft_lcn);
+ if (ll >= vol->nr_clusters) {
+ ntfs_error(vol->sb, "MFT LCN (%lli, 0x%llx) is beyond end of volume. Weird.",
+ ll, ll);
+ return false;
+ }
+ vol->mft_lcn = ll;
+ ntfs_debug("vol->mft_lcn = 0x%llx", vol->mft_lcn);
+ ll = le64_to_cpu(b->mftmirr_lcn);
+ if (ll >= vol->nr_clusters) {
+ ntfs_error(vol->sb, "MFTMirr LCN (%lli, 0x%llx) is beyond end of volume. Weird.",
+ ll, ll);
+ return false;
+ }
+ vol->mftmirr_lcn = ll;
+ ntfs_debug("vol->mftmirr_lcn = 0x%llx", vol->mftmirr_lcn);
+ /*
+ * Work out the size of the mft mirror in number of mft records. If the
+ * cluster size is less than or equal to the size taken by four mft
+ * records, the mft mirror stores the first four mft records. If the
+ * cluster size is bigger than the size taken by four mft records, the
+ * mft mirror contains as many mft records as will fit into one
+ * cluster.
+ */
+ if (vol->cluster_size <= (4 << vol->mft_record_size_bits))
+ vol->mftmirr_size = 4;
+ else
+ vol->mftmirr_size = vol->cluster_size >>
+ vol->mft_record_size_bits;
+ ntfs_debug("vol->mftmirr_size = %i", vol->mftmirr_size);
+ vol->serial_no = le64_to_cpu(b->volume_serial_number);
+ ntfs_debug("vol->serial_no = 0x%llx", vol->serial_no);
+
+ vol->sparse_compression_unit = 4;
+ if (vol->cluster_size > 4096) {
+ switch (vol->cluster_size) {
+ case 65536:
+ vol->sparse_compression_unit = 0;
+ break;
+ case 32768:
+ vol->sparse_compression_unit = 1;
+ break;
+ case 16384:
+ vol->sparse_compression_unit = 2;
+ break;
+ case 8192:
+ vol->sparse_compression_unit = 3;
+ break;
+ }
+ }
+
+ return true;
+}
+
+/**
+ * ntfs_setup_allocators - initialize the cluster and mft allocators
+ * @vol: volume structure for which to setup the allocators
+ *
+ * Setup the cluster (lcn) and mft allocators to the starting values.
+ */
+static void ntfs_setup_allocators(struct ntfs_volume *vol)
+{
+ s64 mft_zone_size, mft_lcn;
+
+ ntfs_debug("vol->mft_zone_multiplier = 0x%x",
+ vol->mft_zone_multiplier);
+ /* Determine the size of the MFT zone. */
+ mft_zone_size = vol->nr_clusters;
+ switch (vol->mft_zone_multiplier) { /* % of volume size in clusters */
+ case 4:
+ mft_zone_size >>= 1; /* 50% */
+ break;
+ case 3:
+ mft_zone_size = (mft_zone_size +
+ (mft_zone_size >> 1)) >> 2; /* 37.5% */
+ break;
+ case 2:
+ mft_zone_size >>= 2; /* 25% */
+ break;
+ /* case 1: */
+ default:
+ mft_zone_size >>= 3; /* 12.5% */
+ break;
+ }
+ /* Setup the mft zone. */
+ vol->mft_zone_start = vol->mft_zone_pos = vol->mft_lcn;
+ ntfs_debug("vol->mft_zone_pos = 0x%llx", vol->mft_zone_pos);
+ /*
+ * Calculate the mft_lcn for an unmodified NTFS volume (see mkntfs
+ * source) and if the actual mft_lcn is in the expected place or even
+ * further to the front of the volume, extend the mft_zone to cover the
+ * beginning of the volume as well. This is in order to protect the
+ * area reserved for the mft bitmap as well within the mft_zone itself.
+ * On non-standard volumes we do not protect it as the overhead would
+ * be higher than the speed increase we would get by doing it.
+ */
+ mft_lcn = (8192 + 2 * vol->cluster_size - 1) / vol->cluster_size;
+ if (mft_lcn * vol->cluster_size < 16 * 1024)
+ mft_lcn = (16 * 1024 + vol->cluster_size - 1) /
+ vol->cluster_size;
+ if (vol->mft_zone_start <= mft_lcn)
+ vol->mft_zone_start = 0;
+ ntfs_debug("vol->mft_zone_start = 0x%llx", vol->mft_zone_start);
+ /*
+ * Need to cap the mft zone on non-standard volumes so that it does
+ * not point outside the boundaries of the volume. We do this by
+ * halving the zone size until we are inside the volume.
+ */
+ vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
+ while (vol->mft_zone_end >= vol->nr_clusters) {
+ mft_zone_size >>= 1;
+ vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
+ }
+ ntfs_debug("vol->mft_zone_end = 0x%llx", vol->mft_zone_end);
+ /*
+ * Set the current position within each data zone to the start of the
+ * respective zone.
+ */
+ vol->data1_zone_pos = vol->mft_zone_end;
+ ntfs_debug("vol->data1_zone_pos = 0x%llx", vol->data1_zone_pos);
+ vol->data2_zone_pos = 0;
+ ntfs_debug("vol->data2_zone_pos = 0x%llx", vol->data2_zone_pos);
+
+ /* Set the mft data allocation position to mft record 24. */
+ vol->mft_data_pos = 24;
+ ntfs_debug("vol->mft_data_pos = 0x%llx", vol->mft_data_pos);
+}
+
+static struct lock_class_key mftmirr_runlist_lock_key,
+ mftmirr_mrec_lock_key;
+/**
+ * load_and_init_mft_mirror - load and setup the mft mirror inode for a volume
+ * @vol: ntfs super block describing device whose mft mirror to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_mft_mirror(struct ntfs_volume *vol)
+{
+ struct inode *tmp_ino;
+ struct ntfs_inode *tmp_ni;
+
+ ntfs_debug("Entering.");
+ /* Get mft mirror inode. */
+ tmp_ino = ntfs_iget(vol->sb, FILE_MFTMirr);
+ if (IS_ERR(tmp_ino)) {
+ if (!IS_ERR(tmp_ino))
+ iput(tmp_ino);
+ /* Caller will display error message. */
+ return false;
+ }
+ lockdep_set_class(&NTFS_I(tmp_ino)->runlist.lock,
+ &mftmirr_runlist_lock_key);
+ lockdep_set_class(&NTFS_I(tmp_ino)->mrec_lock,
+ &mftmirr_mrec_lock_key);
+ /*
+ * Re-initialize some specifics about $MFTMirr's inode as
+ * ntfs_read_inode() will have set up the default ones.
+ */
+ /* Set uid and gid to root. */
+ tmp_ino->i_uid = GLOBAL_ROOT_UID;
+ tmp_ino->i_gid = GLOBAL_ROOT_GID;
+ /* Regular file. No access for anyone. */
+ tmp_ino->i_mode = S_IFREG;
+ /* No VFS initiated operations allowed for $MFTMirr. */
+ tmp_ino->i_op = &ntfs_empty_inode_ops;
+ tmp_ino->i_fop = &ntfs_empty_file_ops;
+ /* Put in our special address space operations. */
+ tmp_ino->i_mapping->a_ops = &ntfs_mst_aops;
+ tmp_ni = NTFS_I(tmp_ino);
+ /* The $MFTMirr, like the $MFT is multi sector transfer protected. */
+ NInoSetMstProtected(tmp_ni);
+ NInoSetSparseDisabled(tmp_ni);
+ /*
+ * Set up our little cheat allowing us to reuse the async read io
+ * completion handler for directories.
+ */
+ tmp_ni->itype.index.block_size = vol->mft_record_size;
+ tmp_ni->itype.index.block_size_bits = vol->mft_record_size_bits;
+ vol->mftmirr_ino = tmp_ino;
+ ntfs_debug("Done.");
+ return true;
+}
+
+/**
+ * check_mft_mirror - compare contents of the mft mirror with the mft
+ * @vol: ntfs super block describing device whose mft mirror to check
+ *
+ * Return 'true' on success or 'false' on error.
+ *
+ * Note, this function also results in the mft mirror runlist being completely
+ * mapped into memory. The mft mirror write code requires this and will BUG()
+ * should it find an unmapped runlist element.
+ */
+static bool check_mft_mirror(struct ntfs_volume *vol)
+{
+ struct super_block *sb = vol->sb;
+ struct ntfs_inode *mirr_ni;
+ struct folio *mft_folio = NULL, *mirr_folio = NULL;
+ u8 *kmft = NULL, *kmirr = NULL;
+ struct runlist_element *rl, rl2[2];
+ pgoff_t index;
+ int mrecs_per_page, i;
+
+ ntfs_debug("Entering.");
+ /* Compare contents of $MFT and $MFTMirr. */
+ mrecs_per_page = PAGE_SIZE / vol->mft_record_size;
+ BUG_ON(!mrecs_per_page);
+ BUG_ON(!vol->mftmirr_size);
+ index = i = 0;
+ do {
+ u32 bytes;
+
+ /* Switch pages if necessary. */
+ if (!(i % mrecs_per_page)) {
+ if (index) {
+ ntfs_unmap_folio(mirr_folio, kmirr);
+ ntfs_unmap_folio(mft_folio, kmft);
+ }
+ /* Get the $MFT page. */
+ mft_folio = ntfs_read_mapping_folio(vol->mft_ino->i_mapping,
+ index);
+ if (IS_ERR(mft_folio)) {
+ ntfs_error(sb, "Failed to read $MFT.");
+ return false;
+ }
+ kmft = kmap_local_folio(mft_folio, 0);
+ /* Get the $MFTMirr page. */
+ mirr_folio = ntfs_read_mapping_folio(vol->mftmirr_ino->i_mapping,
+ index);
+ if (IS_ERR(mirr_folio)) {
+ ntfs_error(sb, "Failed to read $MFTMirr.");
+ goto mft_unmap_out;
+ }
+ kmirr = kmap_local_folio(mirr_folio, 0);
+ ++index;
+ }
+
+ /* Do not check the record if it is not in use. */
+ if (((struct mft_record *)kmft)->flags & MFT_RECORD_IN_USE) {
+ /* Make sure the record is ok. */
+ if (ntfs_is_baad_recordp((__le32 *)kmft)) {
+ ntfs_error(sb,
+ "Incomplete multi sector transfer detected in mft record %i.",
+ i);
+mm_unmap_out:
+ ntfs_unmap_folio(mirr_folio, kmirr);
+mft_unmap_out:
+ ntfs_unmap_folio(mft_folio, kmft);
+ return false;
+ }
+ }
+ /* Do not check the mirror record if it is not in use. */
+ if (((struct mft_record *)kmirr)->flags & MFT_RECORD_IN_USE) {
+ if (ntfs_is_baad_recordp((__le32 *)kmirr)) {
+ ntfs_error(sb,
+ "Incomplete multi sector transfer detected in mft mirror record %i.",
+ i);
+ goto mm_unmap_out;
+ }
+ }
+ /* Get the amount of data in the current record. */
+ bytes = le32_to_cpu(((struct mft_record *)kmft)->bytes_in_use);
+ if (bytes < sizeof(struct mft_record_old) ||
+ bytes > vol->mft_record_size ||
+ ntfs_is_baad_recordp((__le32 *)kmft)) {
+ bytes = le32_to_cpu(((struct mft_record *)kmirr)->bytes_in_use);
+ if (bytes < sizeof(struct mft_record_old) ||
+ bytes > vol->mft_record_size ||
+ ntfs_is_baad_recordp((__le32 *)kmirr))
+ bytes = vol->mft_record_size;
+ }
+ kmft += vol->mft_record_size;
+ kmirr += vol->mft_record_size;
+ } while (++i < vol->mftmirr_size);
+ /* Release the last folios. */
+ ntfs_unmap_folio(mirr_folio, kmirr);
+ ntfs_unmap_folio(mft_folio, kmft);
+
+ /* Construct the mft mirror runlist by hand. */
+ rl2[0].vcn = 0;
+ rl2[0].lcn = vol->mftmirr_lcn;
+ rl2[0].length = (vol->mftmirr_size * vol->mft_record_size +
+ vol->cluster_size - 1) / vol->cluster_size;
+ rl2[1].vcn = rl2[0].length;
+ rl2[1].lcn = LCN_ENOENT;
+ rl2[1].length = 0;
+ /*
+ * Because we have just read all of the mft mirror, we know we have
+ * mapped the full runlist for it.
+ */
+ mirr_ni = NTFS_I(vol->mftmirr_ino);
+ down_read(&mirr_ni->runlist.lock);
+ rl = mirr_ni->runlist.rl;
+ /* Compare the two runlists. They must be identical. */
+ i = 0;
+ do {
+ if (rl2[i].vcn != rl[i].vcn || rl2[i].lcn != rl[i].lcn ||
+ rl2[i].length != rl[i].length) {
+ ntfs_error(sb, "$MFTMirr location mismatch. Run chkdsk.");
+ up_read(&mirr_ni->runlist.lock);
+ return false;
+ }
+ } while (rl2[i++].length);
+ up_read(&mirr_ni->runlist.lock);
+ ntfs_debug("Done.");
+ return true;
+}
+
+/**
+ * load_and_check_logfile - load and check the logfile inode for a volume
+ *
+ * Return 0 on success or errno on error.
+ */
+static int load_and_check_logfile(struct ntfs_volume *vol,
+ struct restart_page_header **rp)
+{
+ struct inode *tmp_ino;
+ int err = 0;
+
+ ntfs_debug("Entering.");
+ tmp_ino = ntfs_iget(vol->sb, FILE_LogFile);
+ if (IS_ERR(tmp_ino)) {
+ if (!IS_ERR(tmp_ino))
+ iput(tmp_ino);
+ /* Caller will display error message. */
+ return -ENOENT;
+ }
+ if (!ntfs_check_logfile(tmp_ino, rp))
+ err = -EINVAL;
+ NInoSetSparseDisabled(NTFS_I(tmp_ino));
+ vol->logfile_ino = tmp_ino;
+ ntfs_debug("Done.");
+ return err;
+}
+
+#define NTFS_HIBERFIL_HEADER_SIZE 4096
+
+/**
+ * check_windows_hibernation_status - check if Windows is suspended on a volume
+ * @vol: ntfs super block of device to check
+ *
+ * Check if Windows is hibernated on the ntfs volume @vol. This is done by
+ * looking for the file hiberfil.sys in the root directory of the volume. If
+ * the file is not present Windows is definitely not suspended.
+ *
+ * If hiberfil.sys exists and is less than 4kiB in size it means Windows is
+ * definitely suspended (this volume is not the system volume). Caveat: on a
+ * system with many volumes it is possible that the < 4kiB check is bogus but
+ * for now this should do fine.
+ *
+ * If hiberfil.sys exists and is larger than 4kiB in size, we need to read the
+ * hiberfil header (which is the first 4kiB). If this begins with "hibr",
+ * Windows is definitely suspended. If it is completely full of zeroes,
+ * Windows is definitely not hibernated. Any other case is treated as if
+ * Windows is suspended. This caters for the above mentioned caveat of a
+ * system with many volumes where no "hibr" magic would be present and there is
+ * no zero header.
+ *
+ * Return 0 if Windows is not hibernated on the volume, >0 if Windows is
+ * hibernated on the volume, and -errno on error.
+ */
+static int check_windows_hibernation_status(struct ntfs_volume *vol)
+{
+ static const __le16 hiberfil[13] = { cpu_to_le16('h'),
+ cpu_to_le16('i'), cpu_to_le16('b'),
+ cpu_to_le16('e'), cpu_to_le16('r'),
+ cpu_to_le16('f'), cpu_to_le16('i'),
+ cpu_to_le16('l'), cpu_to_le16('.'),
+ cpu_to_le16('s'), cpu_to_le16('y'),
+ cpu_to_le16('s'), 0 };
+ u64 mref;
+ struct inode *vi;
+ struct folio *folio;
+ u32 *kaddr, *kend, *start_addr = NULL;
+ struct ntfs_name *name = NULL;
+ int ret = 1;
+
+ ntfs_debug("Entering.");
+ /*
+ * Find the inode number for the hibernation file by looking up the
+ * filename hiberfil.sys in the root directory.
+ */
+ inode_lock(vol->root_ino);
+ mref = ntfs_lookup_inode_by_name(NTFS_I(vol->root_ino), hiberfil, 12,
+ &name);
+ inode_unlock(vol->root_ino);
+ kfree(name);
+ if (IS_ERR_MREF(mref)) {
+ ret = MREF_ERR(mref);
+ /* If the file does not exist, Windows is not hibernated. */
+ if (ret == -ENOENT) {
+ ntfs_debug("hiberfil.sys not present. Windows is not hibernated on the volume.");
+ return 0;
+ }
+ /* A real error occurred. */
+ ntfs_error(vol->sb, "Failed to find inode number for hiberfil.sys.");
+ return ret;
+ }
+ /* Get the inode. */
+ vi = ntfs_iget(vol->sb, MREF(mref));
+ if (IS_ERR(vi)) {
+ if (!IS_ERR(vi))
+ iput(vi);
+ ntfs_error(vol->sb, "Failed to load hiberfil.sys.");
+ return IS_ERR(vi) ? PTR_ERR(vi) : -EIO;
+ }
+ if (unlikely(i_size_read(vi) < NTFS_HIBERFIL_HEADER_SIZE)) {
+ ntfs_debug("hiberfil.sys is smaller than 4kiB (0x%llx). Windows is hibernated on the volume. This is not the system volume.",
+ i_size_read(vi));
+ goto iput_out;
+ }
+
+ folio = ntfs_read_mapping_folio(vi->i_mapping, 0);
+ if (IS_ERR(folio)) {
+ ntfs_error(vol->sb, "Failed to read from hiberfil.sys.");
+ ret = PTR_ERR(folio);
+ goto iput_out;
+ }
+ start_addr = (u32 *)kmap_local_folio(folio, 0);
+ kaddr = start_addr;
+ if (*(__le32 *)kaddr == cpu_to_le32(0x72626968)/*'hibr'*/) {
+ ntfs_debug("Magic \"hibr\" found in hiberfil.sys. Windows is hibernated on the volume. This is the system volume.");
+ goto unm_iput_out;
+ }
+ kend = kaddr + NTFS_HIBERFIL_HEADER_SIZE/sizeof(*kaddr);
+ do {
+ if (unlikely(*kaddr)) {
+ ntfs_debug("hiberfil.sys is larger than 4kiB (0x%llx), does not contain the \"hibr\" magic, and does not have a zero header. Windows is hibernated on the volume. This is not the system volume.",
+ i_size_read(vi));
+ goto unm_iput_out;
+ }
+ } while (++kaddr < kend);
+ ntfs_debug("hiberfil.sys contains a zero header. Windows is not hibernated on the volume. This is the system volume.");
+ ret = 0;
+unm_iput_out:
+ ntfs_unmap_folio(folio, start_addr);
+iput_out:
+ iput(vi);
+ return ret;
+}
+
+/**
+ * load_and_init_quota - load and setup the quota file for a volume if present
+ * @vol: ntfs super block describing device whose quota file to load
+ *
+ * Return 'true' on success or 'false' on error. If $Quota is not present, we
+ * leave vol->quota_ino as NULL and return success.
+ */
+static bool load_and_init_quota(struct ntfs_volume *vol)
+{
+ static const __le16 Quota[7] = { cpu_to_le16('$'),
+ cpu_to_le16('Q'), cpu_to_le16('u'),
+ cpu_to_le16('o'), cpu_to_le16('t'),
+ cpu_to_le16('a'), 0 };
+ static __le16 Q[3] = { cpu_to_le16('$'),
+ cpu_to_le16('Q'), 0 };
+ struct ntfs_name *name = NULL;
+ u64 mref;
+ struct inode *tmp_ino;
+
+ ntfs_debug("Entering.");
+ /*
+ * Find the inode number for the quota file by looking up the filename
+ * $Quota in the extended system files directory $Extend.
+ */
+ inode_lock(vol->extend_ino);
+ mref = ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), Quota, 6,
+ &name);
+ inode_unlock(vol->extend_ino);
+ kfree(name);
+ if (IS_ERR_MREF(mref)) {
+ /*
+ * If the file does not exist, quotas are disabled and have
+ * never been enabled on this volume, just return success.
+ */
+ if (MREF_ERR(mref) == -ENOENT) {
+ ntfs_debug("$Quota not present. Volume does not have quotas enabled.");
+ /*
+ * No need to try to set quotas out of date if they are
+ * not enabled.
+ */
+ NVolSetQuotaOutOfDate(vol);
+ return true;
+ }
+ /* A real error occurred. */
+ ntfs_error(vol->sb, "Failed to find inode number for $Quota.");
+ return false;
+ }
+ /* Get the inode. */
+ tmp_ino = ntfs_iget(vol->sb, MREF(mref));
+ if (IS_ERR(tmp_ino)) {
+ if (!IS_ERR(tmp_ino))
+ iput(tmp_ino);
+ ntfs_error(vol->sb, "Failed to load $Quota.");
+ return false;
+ }
+ vol->quota_ino = tmp_ino;
+ /* Get the $Q index allocation attribute. */
+ tmp_ino = ntfs_index_iget(vol->quota_ino, Q, 2);
+ if (IS_ERR(tmp_ino)) {
+ ntfs_error(vol->sb, "Failed to load $Quota/$Q index.");
+ return false;
+ }
+ vol->quota_q_ino = tmp_ino;
+ ntfs_debug("Done.");
+ return true;
+}
+
+/**
+ * load_and_init_attrdef - load the attribute definitions table for a volume
+ * @vol: ntfs super block describing device whose attrdef to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_attrdef(struct ntfs_volume *vol)
+{
+ loff_t i_size;
+ struct super_block *sb = vol->sb;
+ struct inode *ino;
+ struct folio *folio;
+ u8 *addr;
+ pgoff_t index, max_index;
+ unsigned int size;
+
+ ntfs_debug("Entering.");
+ /* Read attrdef table and setup vol->attrdef and vol->attrdef_size. */
+ ino = ntfs_iget(sb, FILE_AttrDef);
+ if (IS_ERR(ino)) {
+ if (!IS_ERR(ino))
+ iput(ino);
+ goto failed;
+ }
+ NInoSetSparseDisabled(NTFS_I(ino));
+ /* The size of FILE_AttrDef must be above 0 and fit inside 31 bits. */
+ i_size = i_size_read(ino);
+ if (i_size <= 0 || i_size > 0x7fffffff)
+ goto iput_failed;
+ vol->attrdef = (struct attr_def *)ntfs_malloc_nofs(i_size);
+ if (!vol->attrdef)
+ goto iput_failed;
+ index = 0;
+ max_index = i_size >> PAGE_SHIFT;
+ size = PAGE_SIZE;
+ while (index < max_index) {
+ /* Read the attrdef table and copy it into the linear buffer. */
+read_partial_attrdef_page:
+ folio = ntfs_read_mapping_folio(ino->i_mapping, index);
+ if (IS_ERR(folio))
+ goto free_iput_failed;
+ addr = kmap_local_folio(folio, 0);
+ memcpy((u8 *)vol->attrdef + (index++ << PAGE_SHIFT),
+ addr, size);
+ ntfs_unmap_folio(folio, addr);
+ }
+ if (size == PAGE_SIZE) {
+ size = i_size & ~PAGE_MASK;
+ if (size)
+ goto read_partial_attrdef_page;
+ }
+ vol->attrdef_size = i_size;
+ ntfs_debug("Read %llu bytes from $AttrDef.", i_size);
+ iput(ino);
+ return true;
+free_iput_failed:
+ ntfs_free(vol->attrdef);
+ vol->attrdef = NULL;
+iput_failed:
+ iput(ino);
+failed:
+ ntfs_error(sb, "Failed to initialize attribute definition table.");
+ return false;
+}
+
+/**
+ * load_and_init_upcase - load the upcase table for an ntfs volume
+ * @vol: ntfs super block describing device whose upcase to load
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_and_init_upcase(struct ntfs_volume *vol)
+{
+ loff_t i_size;
+ struct super_block *sb = vol->sb;
+ struct inode *ino;
+ struct folio *folio;
+ u8 *addr;
+ pgoff_t index, max_index;
+ unsigned int size;
+ int i, max;
+
+ ntfs_debug("Entering.");
+ /* Read upcase table and setup vol->upcase and vol->upcase_len. */
+ ino = ntfs_iget(sb, FILE_UpCase);
+ if (IS_ERR(ino)) {
+ if (!IS_ERR(ino))
+ iput(ino);
+ goto upcase_failed;
+ }
+ /*
+ * The upcase size must not be above 64k Unicode characters, must not
+ * be zero and must be a multiple of sizeof(__le16).
+ */
+ i_size = i_size_read(ino);
+ if (!i_size || i_size & (sizeof(__le16) - 1) ||
+ i_size > 64ULL * 1024 * sizeof(__le16))
+ goto iput_upcase_failed;
+ vol->upcase = (__le16 *)ntfs_malloc_nofs(i_size);
+ if (!vol->upcase)
+ goto iput_upcase_failed;
+ index = 0;
+ max_index = i_size >> PAGE_SHIFT;
+ size = PAGE_SIZE;
+ while (index < max_index) {
+ /* Read the upcase table and copy it into the linear buffer. */
+read_partial_upcase_page:
+ folio = ntfs_read_mapping_folio(ino->i_mapping, index);
+ if (IS_ERR(folio))
+ goto iput_upcase_failed;
+ addr = kmap_local_folio(folio, 0);
+ memcpy((char *)vol->upcase + (index++ << PAGE_SHIFT),
+ addr, size);
+ ntfs_unmap_folio(folio, addr);
+ };
+ if (size == PAGE_SIZE) {
+ size = i_size & ~PAGE_MASK;
+ if (size)
+ goto read_partial_upcase_page;
+ }
+ vol->upcase_len = i_size >> UCHAR_T_SIZE_BITS;
+ ntfs_debug("Read %llu bytes from $UpCase (expected %zu bytes).",
+ i_size, 64 * 1024 * sizeof(__le16));
+ iput(ino);
+ mutex_lock(&ntfs_lock);
+ if (!default_upcase) {
+ ntfs_debug("Using volume specified $UpCase since default is not present.");
+ mutex_unlock(&ntfs_lock);
+ return true;
+ }
+ max = default_upcase_len;
+ if (max > vol->upcase_len)
+ max = vol->upcase_len;
+ for (i = 0; i < max; i++)
+ if (vol->upcase[i] != default_upcase[i])
+ break;
+ if (i == max) {
+ ntfs_free(vol->upcase);
+ vol->upcase = default_upcase;
+ vol->upcase_len = max;
+ ntfs_nr_upcase_users++;
+ mutex_unlock(&ntfs_lock);
+ ntfs_debug("Volume specified $UpCase matches default. Using default.");
+ return true;
+ }
+ mutex_unlock(&ntfs_lock);
+ ntfs_debug("Using volume specified $UpCase since it does not match the default.");
+ return true;
+iput_upcase_failed:
+ iput(ino);
+ ntfs_free(vol->upcase);
+ vol->upcase = NULL;
+upcase_failed:
+ mutex_lock(&ntfs_lock);
+ if (default_upcase) {
+ vol->upcase = default_upcase;
+ vol->upcase_len = default_upcase_len;
+ ntfs_nr_upcase_users++;
+ mutex_unlock(&ntfs_lock);
+ ntfs_error(sb, "Failed to load $UpCase from the volume. Using default.");
+ return true;
+ }
+ mutex_unlock(&ntfs_lock);
+ ntfs_error(sb, "Failed to initialize upcase table.");
+ return false;
+}
+
+/*
+ * The lcn and mft bitmap inodes are NTFS-internal inodes with
+ * their own special locking rules:
+ */
+static struct lock_class_key
+ lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
+ mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
+
+/**
+ * load_system_files - open the system files using normal functions
+ * @vol: ntfs super block describing device whose system files to load
+ *
+ * Open the system files with normal access functions and complete setting up
+ * the ntfs super block @vol.
+ *
+ * Return 'true' on success or 'false' on error.
+ */
+static bool load_system_files(struct ntfs_volume *vol)
+{
+ struct super_block *sb = vol->sb;
+ struct mft_record *m;
+ struct volume_information *vi;
+ struct ntfs_attr_search_ctx *ctx;
+ struct restart_page_header *rp;
+ int err;
+
+ ntfs_debug("Entering.");
+ /* Get mft mirror inode compare the contents of $MFT and $MFTMirr. */
+ if (!load_and_init_mft_mirror(vol) || !check_mft_mirror(vol)) {
+ /* If a read-write mount, convert it to a read-only mount. */
+ if (!sb_rdonly(sb) && vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ static const char *es1 = "Failed to load $MFTMirr";
+ static const char *es2 = "$MFTMirr does not match $MFT";
+ static const char *es3 = ". Run ntfsck and/or chkdsk.";
+
+ sb->s_flags |= SB_RDONLY;
+ ntfs_error(sb, "%s. Mounting read-only%s",
+ !vol->mftmirr_ino ? es1 : es2, es3);
+ }
+ NVolSetErrors(vol);
+ }
+ /* Get mft bitmap attribute inode. */
+ vol->mftbmp_ino = ntfs_attr_iget(vol->mft_ino, AT_BITMAP, NULL, 0);
+ if (IS_ERR(vol->mftbmp_ino)) {
+ ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
+ goto iput_mirr_err_out;
+ }
+ lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
+ &mftbmp_runlist_lock_key);
+ lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
+ &mftbmp_mrec_lock_key);
+ /* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
+ if (!load_and_init_upcase(vol))
+ goto iput_mftbmp_err_out;
+ /*
+ * Read attribute definitions table and setup @vol->attrdef and
+ * @vol->attrdef_size.
+ */
+ if (!load_and_init_attrdef(vol))
+ goto iput_upcase_err_out;
+ /*
+ * Get the cluster allocation bitmap inode and verify the size, no
+ * need for any locking at this stage as we are already running
+ * exclusively as we are mount in progress task.
+ */
+ vol->lcnbmp_ino = ntfs_iget(sb, FILE_Bitmap);
+ if (IS_ERR(vol->lcnbmp_ino)) {
+ if (!IS_ERR(vol->lcnbmp_ino))
+ iput(vol->lcnbmp_ino);
+ goto bitmap_failed;
+ }
+ lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
+ &lcnbmp_runlist_lock_key);
+ lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
+ &lcnbmp_mrec_lock_key);
+
+ NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
+ if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
+ iput(vol->lcnbmp_ino);
+bitmap_failed:
+ ntfs_error(sb, "Failed to load $Bitmap.");
+ goto iput_attrdef_err_out;
+ }
+ /*
+ * Get the volume inode and setup our cache of the volume flags and
+ * version.
+ */
+ vol->vol_ino = ntfs_iget(sb, FILE_Volume);
+ if (IS_ERR(vol->vol_ino)) {
+ if (!IS_ERR(vol->vol_ino))
+ iput(vol->vol_ino);
+volume_failed:
+ ntfs_error(sb, "Failed to load $Volume.");
+ goto iput_lcnbmp_err_out;
+ }
+ m = map_mft_record(NTFS_I(vol->vol_ino));
+ if (IS_ERR(m)) {
+iput_volume_failed:
+ iput(vol->vol_ino);
+ goto volume_failed;
+ }
+
+ ctx = ntfs_attr_get_search_ctx(NTFS_I(vol->vol_ino), m);
+ if (!ctx) {
+ ntfs_error(sb, "Failed to get attribute search context.");
+ goto get_ctx_vol_failed;
+ }
+ if (ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+ ctx) || ctx->attr->non_resident || ctx->attr->flags) {
+err_put_vol:
+ ntfs_attr_put_search_ctx(ctx);
+get_ctx_vol_failed:
+ unmap_mft_record(NTFS_I(vol->vol_ino));
+ goto iput_volume_failed;
+ }
+ vi = (struct volume_information *)((char *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ /* Some bounds checks. */
+ if ((u8 *)vi < (u8 *)ctx->attr || (u8 *)vi +
+ le32_to_cpu(ctx->attr->data.resident.value_length) >
+ (u8 *)ctx->attr + le32_to_cpu(ctx->attr->length))
+ goto err_put_vol;
+ /* Copy the volume flags and version to the struct ntfs_volume structure. */
+ vol->vol_flags = vi->flags;
+ vol->major_ver = vi->major_ver;
+ vol->minor_ver = vi->minor_ver;
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(NTFS_I(vol->vol_ino));
+ pr_info("volume version %i.%i, dev %s, cluster size %d\n",
+ vol->major_ver, vol->minor_ver, sb->s_id, vol->cluster_size);
+
+ /* Make sure that no unsupported volume flags are set. */
+ if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
+ static const char *es1a = "Volume is dirty";
+ static const char *es1b = "Volume has been modified by chkdsk";
+ static const char *es1c = "Volume has unsupported flags set";
+ static const char *es2a = ". Run chkdsk and mount in Windows.";
+ static const char *es2b = ". Mount in Windows.";
+ const char *es1, *es2;
+
+ es2 = es2a;
+ if (vol->vol_flags & VOLUME_IS_DIRTY)
+ es1 = es1a;
+ else if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
+ es1 = es1b;
+ es2 = es2b;
+ } else {
+ es1 = es1c;
+ ntfs_warning(sb, "Unsupported volume flags 0x%x encountered.",
+ (unsigned int)le16_to_cpu(vol->vol_flags));
+ }
+ /* If a read-write mount, convert it to a read-only mount. */
+ if (!sb_rdonly(sb) && vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ sb->s_flags |= SB_RDONLY;
+ ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
+ }
+ /*
+ * Do not set NVolErrors() because ntfs_remount() re-checks the
+ * flags which we need to do in case any flags have changed.
+ */
+ }
+ /*
+ * Get the inode for the logfile, check it and determine if the volume
+ * was shutdown cleanly.
+ */
+ rp = NULL;
+ err = load_and_check_logfile(vol, &rp);
+ if (err) {
+ /* If a read-write mount, convert it to a read-only mount. */
+ if (!sb_rdonly(sb) && vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ sb->s_flags |= SB_RDONLY;
+ ntfs_error(sb, "Failed to load LogFile. Mounting read-only.");
+ }
+ NVolSetErrors(vol);
+ }
+
+ ntfs_free(rp);
+ /* Get the root directory inode so we can do path lookups. */
+ vol->root_ino = ntfs_iget(sb, FILE_root);
+ if (IS_ERR(vol->root_ino)) {
+ if (!IS_ERR(vol->root_ino))
+ iput(vol->root_ino);
+ ntfs_error(sb, "Failed to load root directory.");
+ goto iput_logfile_err_out;
+ }
+ /*
+ * Check if Windows is suspended to disk on the target volume. If it
+ * is hibernated, we must not write *anything* to the disk so set
+ * NVolErrors() without setting the dirty volume flag and mount
+ * read-only. This will prevent read-write remounting and it will also
+ * prevent all writes.
+ */
+ err = check_windows_hibernation_status(vol);
+ if (unlikely(err)) {
+ static const char *es1a = "Failed to determine if Windows is hibernated";
+ static const char *es1b = "Windows is hibernated";
+ static const char *es2 = ". Run chkdsk.";
+ const char *es1;
+
+ es1 = err < 0 ? es1a : es1b;
+ /* If a read-write mount, convert it to a read-only mount. */
+ if (!sb_rdonly(sb) && vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ sb->s_flags |= SB_RDONLY;
+ ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
+ }
+ NVolSetErrors(vol);
+ }
+
+ /* If (still) a read-write mount, empty the logfile. */
+ if (!sb_rdonly(sb) &&
+ vol->logfile_ino && !ntfs_empty_logfile(vol->logfile_ino) &&
+ vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ static const char *es1 = "Failed to empty LogFile";
+ static const char *es2 = ". Mount in Windows.";
+
+ /* Convert to a read-only mount. */
+ ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
+ sb->s_flags |= SB_RDONLY;
+ NVolSetErrors(vol);
+ }
+ /* If on NTFS versions before 3.0, we are done. */
+ if (unlikely(vol->major_ver < 3))
+ return true;
+ /* NTFS 3.0+ specific initialization. */
+ /* Get the security descriptors inode. */
+ vol->secure_ino = ntfs_iget(sb, FILE_Secure);
+ if (IS_ERR(vol->secure_ino)) {
+ if (!IS_ERR(vol->secure_ino))
+ iput(vol->secure_ino);
+ ntfs_error(sb, "Failed to load $Secure.");
+ goto iput_root_err_out;
+ }
+ /* Get the extended system files' directory inode. */
+ vol->extend_ino = ntfs_iget(sb, FILE_Extend);
+ if (IS_ERR(vol->extend_ino) ||
+ !S_ISDIR(vol->extend_ino->i_mode)) {
+ if (!IS_ERR(vol->extend_ino))
+ iput(vol->extend_ino);
+ ntfs_error(sb, "Failed to load $Extend.");
+ goto iput_sec_err_out;
+ }
+ /* Find the quota file, load it if present, and set it up. */
+ if (!load_and_init_quota(vol) &&
+ vol->on_errors == ON_ERRORS_REMOUNT_RO) {
+ static const char *es1 = "Failed to load $Quota";
+ static const char *es2 = ". Run chkdsk.";
+
+ sb->s_flags |= SB_RDONLY;
+ ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
+ /* This will prevent a read-write remount. */
+ NVolSetErrors(vol);
+ }
+
+ return true;
+
+iput_sec_err_out:
+ iput(vol->secure_ino);
+iput_root_err_out:
+ iput(vol->root_ino);
+iput_logfile_err_out:
+ if (vol->logfile_ino)
+ iput(vol->logfile_ino);
+ iput(vol->vol_ino);
+iput_lcnbmp_err_out:
+ iput(vol->lcnbmp_ino);
+iput_attrdef_err_out:
+ vol->attrdef_size = 0;
+ if (vol->attrdef) {
+ ntfs_free(vol->attrdef);
+ vol->attrdef = NULL;
+ }
+iput_upcase_err_out:
+ vol->upcase_len = 0;
+ mutex_lock(&ntfs_lock);
+ if (vol->upcase == default_upcase) {
+ ntfs_nr_upcase_users--;
+ vol->upcase = NULL;
+ }
+ mutex_unlock(&ntfs_lock);
+ if (vol->upcase) {
+ ntfs_free(vol->upcase);
+ vol->upcase = NULL;
+ }
+iput_mftbmp_err_out:
+ iput(vol->mftbmp_ino);
+iput_mirr_err_out:
+ iput(vol->mftmirr_ino);
+ return false;
+}
+
+static void ntfs_volume_free(struct ntfs_volume *vol)
+{
+ /* Throw away the table of attribute definitions. */
+ vol->attrdef_size = 0;
+ if (vol->attrdef) {
+ ntfs_free(vol->attrdef);
+ vol->attrdef = NULL;
+ }
+ vol->upcase_len = 0;
+ /*
+ * Destroy the global default upcase table if necessary. Also decrease
+ * the number of upcase users if we are a user.
+ */
+ mutex_lock(&ntfs_lock);
+ if (vol->upcase == default_upcase) {
+ ntfs_nr_upcase_users--;
+ vol->upcase = NULL;
+ }
+
+ if (!ntfs_nr_upcase_users && default_upcase) {
+ ntfs_free(default_upcase);
+ default_upcase = NULL;
+ }
+
+ free_compression_buffers();
+
+ mutex_unlock(&ntfs_lock);
+ if (vol->upcase) {
+ ntfs_free(vol->upcase);
+ vol->upcase = NULL;
+ }
+
+ unload_nls(vol->nls_map);
+
+ if (vol->lcn_empty_bits_per_page)
+ kvfree(vol->lcn_empty_bits_per_page);
+ kfree(vol);
+}
+
+/**
+ * ntfs_put_super - called by the vfs to unmount a volume
+ * @sb: vfs superblock of volume to unmount
+ */
+static void ntfs_put_super(struct super_block *sb)
+{
+ struct ntfs_volume *vol = NTFS_SB(sb);
+
+ pr_info("Entering %s, dev %s\n", __func__, sb->s_id);
+
+ cancel_work_sync(&vol->precalc_work);
+
+ /*
+ * Commit all inodes while they are still open in case some of them
+ * cause others to be dirtied.
+ */
+ ntfs_commit_inode(vol->vol_ino);
+
+ /* NTFS 3.0+ specific. */
+ if (vol->major_ver >= 3) {
+ if (vol->quota_q_ino)
+ ntfs_commit_inode(vol->quota_q_ino);
+ if (vol->quota_ino)
+ ntfs_commit_inode(vol->quota_ino);
+ if (vol->extend_ino)
+ ntfs_commit_inode(vol->extend_ino);
+ if (vol->secure_ino)
+ ntfs_commit_inode(vol->secure_ino);
+ }
+
+ ntfs_commit_inode(vol->root_ino);
+
+ ntfs_commit_inode(vol->lcnbmp_ino);
+
+ /*
+ * the GFP_NOFS scope is not needed because ntfs_commit_inode
+ * does nothing
+ */
+ ntfs_commit_inode(vol->mftbmp_ino);
+
+ if (vol->logfile_ino)
+ ntfs_commit_inode(vol->logfile_ino);
+
+ if (vol->mftmirr_ino)
+ ntfs_commit_inode(vol->mftmirr_ino);
+ ntfs_commit_inode(vol->mft_ino);
+
+ /*
+ * If a read-write mount and no volume errors have occurred, mark the
+ * volume clean. Also, re-commit all affected inodes.
+ */
+ if (!sb_rdonly(sb)) {
+ if (!NVolErrors(vol)) {
+ if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
+ ntfs_warning(sb,
+ "Failed to clear dirty bit in volume information flags. Run chkdsk.");
+ ntfs_commit_inode(vol->vol_ino);
+ ntfs_commit_inode(vol->root_ino);
+ if (vol->mftmirr_ino)
+ ntfs_commit_inode(vol->mftmirr_ino);
+ ntfs_commit_inode(vol->mft_ino);
+ } else {
+ ntfs_warning(sb,
+ "Volume has errors. Leaving volume marked dirty. Run chkdsk.");
+ }
+ }
+
+ iput(vol->vol_ino);
+ vol->vol_ino = NULL;
+
+ /* NTFS 3.0+ specific clean up. */
+ if (vol->major_ver >= 3) {
+ if (vol->quota_q_ino) {
+ iput(vol->quota_q_ino);
+ vol->quota_q_ino = NULL;
+ }
+ if (vol->quota_ino) {
+ iput(vol->quota_ino);
+ vol->quota_ino = NULL;
+ }
+ if (vol->extend_ino) {
+ iput(vol->extend_ino);
+ vol->extend_ino = NULL;
+ }
+ if (vol->secure_ino) {
+ iput(vol->secure_ino);
+ vol->secure_ino = NULL;
+ }
+ }
+
+ iput(vol->root_ino);
+ vol->root_ino = NULL;
+
+ iput(vol->lcnbmp_ino);
+ vol->lcnbmp_ino = NULL;
+
+ iput(vol->mftbmp_ino);
+ vol->mftbmp_ino = NULL;
+
+ if (vol->logfile_ino) {
+ iput(vol->logfile_ino);
+ vol->logfile_ino = NULL;
+ }
+ if (vol->mftmirr_ino) {
+ /* Re-commit the mft mirror and mft just in case. */
+ ntfs_commit_inode(vol->mftmirr_ino);
+ ntfs_commit_inode(vol->mft_ino);
+ iput(vol->mftmirr_ino);
+ vol->mftmirr_ino = NULL;
+ }
+ /*
+ * We should have no dirty inodes left, due to
+ * mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
+ * the underlying mft records are written out and cleaned.
+ */
+ ntfs_commit_inode(vol->mft_ino);
+ write_inode_now(vol->mft_ino, 1);
+
+ iput(vol->mft_ino);
+ vol->mft_ino = NULL;
+
+ ntfs_volume_free(vol);
+}
+
+int ntfs_force_shutdown(struct super_block *sb, u32 flags)
+{
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int ret;
+
+ if (NVolShutdown(vol))
+ return 0;
+
+ switch (flags) {
+ case NTFS_GOING_DOWN_DEFAULT:
+ case NTFS_GOING_DOWN_FULLSYNC:
+ ret = bdev_freeze(sb->s_bdev);
+ if (ret)
+ return ret;
+ bdev_thaw(sb->s_bdev);
+ NVolSetShutdown(vol);
+ break;
+ case NTFS_GOING_DOWN_NOSYNC:
+ NVolSetShutdown(vol);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void ntfs_shutdown(struct super_block *sb)
+{
+ ntfs_force_shutdown(sb, NTFS_GOING_DOWN_NOSYNC);
+
+}
+
+static int ntfs_sync_fs(struct super_block *sb, int wait)
+{
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int err = 0;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ if (!wait)
+ return 0;
+
+ /* If there are some dirty buffers in the bdev inode */
+ if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY)) {
+ ntfs_warning(sb, "Failed to clear dirty bit in volume information flags. Run chkdsk.");
+ err = -EIO;
+ }
+ sync_inodes_sb(sb);
+ sync_blockdev(sb->s_bdev);
+ blkdev_issue_flush(sb->s_bdev);
+ return err;
+}
+
+/**
+ * get_nr_free_clusters - return the number of free clusters on a volume
+ * @vol: ntfs volume for which to obtain free cluster count
+ *
+ * Calculate the number of free clusters on the mounted NTFS volume @vol. We
+ * actually calculate the number of clusters in use instead because this
+ * allows us to not care about partial pages as these will be just zero filled
+ * and hence not be counted as allocated clusters.
+ *
+ * The only particularity is that clusters beyond the end of the logical ntfs
+ * volume will be marked as allocated to prevent errors which means we have to
+ * discount those at the end. This is important as the cluster bitmap always
+ * has a size in multiples of 8 bytes, i.e. up to 63 clusters could be outside
+ * the logical volume and marked in use when they are not as they do not exist.
+ *
+ * If any pages cannot be read we assume all clusters in the erroring pages are
+ * in use. This means we return an underestimate on errors which is better than
+ * an overestimate.
+ */
+s64 get_nr_free_clusters(struct ntfs_volume *vol)
+{
+ s64 nr_free = vol->nr_clusters;
+ u32 nr_used;
+ struct address_space *mapping = vol->lcnbmp_ino->i_mapping;
+ struct folio *folio;
+ pgoff_t index, max_index;
+ struct file_ra_state *ra;
+
+ ntfs_debug("Entering.");
+ /* Serialize accesses to the cluster bitmap. */
+
+ if (NVolFreeClusterKnown(vol))
+ return atomic64_read(&vol->free_clusters);
+
+ ra = kzalloc(sizeof(*ra), GFP_NOFS);
+ if (!ra)
+ return 0;
+
+ file_ra_state_init(ra, mapping);
+
+ /*
+ * Convert the number of bits into bytes rounded up, then convert into
+ * multiples of PAGE_SIZE, rounding up so that if we have one
+ * full and one partial page max_index = 2.
+ */
+ max_index = (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >>
+ PAGE_SHIFT;
+ /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
+ ntfs_debug("Reading $Bitmap, max_index = 0x%lx, max_size = 0x%lx.",
+ max_index, PAGE_SIZE / 4);
+ for (index = 0; index < max_index; index++) {
+ unsigned long *kaddr;
+
+ /*
+ * Get folio from page cache, getting it from backing store
+ * if necessary, and increment the use count.
+ */
+ folio = filemap_lock_folio(mapping, index);
+ if (IS_ERR(folio)) {
+ page_cache_sync_readahead(mapping, ra, NULL,
+ index, max_index - index);
+ folio = ntfs_read_mapping_folio(mapping, index);
+ if (!IS_ERR(folio))
+ folio_lock(folio);
+ }
+
+ /* Ignore pages which errored synchronously. */
+ if (IS_ERR(folio)) {
+ ntfs_debug("Skipping page (index 0x%lx).", index);
+ nr_free -= PAGE_SIZE * 8;
+ vol->lcn_empty_bits_per_page[index] = 0;
+ continue;
+ }
+
+ kaddr = kmap_local_folio(folio, 0);
+ /*
+ * Subtract the number of set bits. If this
+ * is the last page and it is partial we don't really care as
+ * it just means we do a little extra work but it won't affect
+ * the result as all out of range bytes are set to zero by
+ * ntfs_readpage().
+ */
+ nr_used = bitmap_weight(kaddr, PAGE_SIZE * BITS_PER_BYTE);
+ nr_free -= nr_used;
+ vol->lcn_empty_bits_per_page[index] = PAGE_SIZE * BITS_PER_BYTE - nr_used;
+ kunmap_local(kaddr);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ ntfs_debug("Finished reading $Bitmap, last index = 0x%lx.", index - 1);
+ /*
+ * Fixup for eventual bits outside logical ntfs volume (see function
+ * description above).
+ */
+ if (vol->nr_clusters & 63)
+ nr_free += 64 - (vol->nr_clusters & 63);
+
+ /* If errors occurred we may well have gone below zero, fix this. */
+ if (nr_free < 0)
+ nr_free = 0;
+ else
+ atomic64_set(&vol->free_clusters, nr_free);
+
+ kfree(ra);
+ NVolSetFreeClusterKnown(vol);
+ wake_up_all(&vol->free_waitq);
+ ntfs_debug("Exiting.");
+ return nr_free;
+}
+
+/*
+ * @nr_clusters is the number of clusters requested for allocation.
+ *
+ * Return the number of clusters available for allocation within
+ * the range of @nr_clusters, which is counts that considered
+ * for delayed allocation.
+ */
+s64 ntfs_available_clusters_count(struct ntfs_volume *vol, s64 nr_clusters)
+{
+ s64 free_clusters;
+
+ /* wait event */
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+ free_clusters = atomic64_read(&vol->free_clusters) -
+ atomic64_read(&vol->dirty_clusters);
+ if (free_clusters <= 0)
+ return -ENOSPC;
+ else if (free_clusters < nr_clusters)
+ nr_clusters = free_clusters;
+
+ return nr_clusters;
+}
+
+/**
+ * __get_nr_free_mft_records - return the number of free inodes on a volume
+ * @vol: ntfs volume for which to obtain free inode count
+ * @nr_free: number of mft records in filesystem
+ * @max_index: maximum number of pages containing set bits
+ *
+ * Calculate the number of free mft records (inodes) on the mounted NTFS
+ * volume @vol. We actually calculate the number of mft records in use instead
+ * because this allows us to not care about partial pages as these will be just
+ * zero filled and hence not be counted as allocated mft record.
+ *
+ * If any pages cannot be read we assume all mft records in the erroring pages
+ * are in use. This means we return an underestimate on errors which is better
+ * than an overestimate.
+ *
+ * NOTE: Caller must hold mftbmp_lock rw_semaphore for reading or writing.
+ */
+static unsigned long __get_nr_free_mft_records(struct ntfs_volume *vol,
+ s64 nr_free, const pgoff_t max_index)
+{
+ struct address_space *mapping = vol->mftbmp_ino->i_mapping;
+ struct folio *folio;
+ pgoff_t index;
+ struct file_ra_state *ra;
+
+ ntfs_debug("Entering.");
+
+ ra = kzalloc(sizeof(*ra), GFP_NOFS);
+ if (!ra)
+ return 0;
+
+ file_ra_state_init(ra, mapping);
+
+ /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
+ ntfs_debug("Reading $MFT/$BITMAP, max_index = 0x%lx, max_size = 0x%lx.",
+ max_index, PAGE_SIZE / 4);
+ for (index = 0; index < max_index; index++) {
+ unsigned long *kaddr;
+
+ /*
+ * Get folio from page cache, getting it from backing store
+ * if necessary, and increment the use count.
+ */
+ folio = filemap_lock_folio(mapping, index);
+ if (IS_ERR(folio)) {
+ page_cache_sync_readahead(mapping, ra, NULL,
+ index, max_index - index);
+ folio = ntfs_read_mapping_folio(mapping, index);
+ if (!IS_ERR(folio))
+ folio_lock(folio);
+ }
+
+ /* Ignore pages which errored synchronously. */
+ if (IS_ERR(folio)) {
+ ntfs_debug("read_mapping_page() error. Skipping page (index 0x%lx).",
+ index);
+ nr_free -= PAGE_SIZE * 8;
+ continue;
+ }
+
+ kaddr = kmap_local_folio(folio, 0);
+ /*
+ * Subtract the number of set bits. If this
+ * is the last page and it is partial we don't really care as
+ * it just means we do a little extra work but it won't affect
+ * the result as all out of range bytes are set to zero by
+ * ntfs_readpage().
+ */
+ nr_free -= bitmap_weight(kaddr,
+ PAGE_SIZE * BITS_PER_BYTE);
+ kunmap_local(kaddr);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ ntfs_debug("Finished reading $MFT/$BITMAP, last index = 0x%lx.",
+ index - 1);
+ /* If errors occurred we may well have gone below zero, fix this. */
+ if (nr_free < 0)
+ nr_free = 0;
+ else
+ atomic64_set(&vol->free_mft_records, nr_free);
+
+ kfree(ra);
+ ntfs_debug("Exiting.");
+ return nr_free;
+}
+
+/**
+ * ntfs_statfs - return information about mounted NTFS volume
+ * @dentry: dentry from mounted volume
+ * @sfs: statfs structure in which to return the information
+ *
+ * Return information about the mounted NTFS volume @dentry in the statfs structure
+ * pointed to by @sfs (this is initialized with zeros before ntfs_statfs is
+ * called). We interpret the values to be correct of the moment in time at
+ * which we are called. Most values are variable otherwise and this isn't just
+ * the free values but the totals as well. For example we can increase the
+ * total number of file nodes if we run out and we can keep doing this until
+ * there is no more space on the volume left at all.
+ *
+ * Called from vfs_statfs which is used to handle the statfs, fstatfs, and
+ * ustat system calls.
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int ntfs_statfs(struct dentry *dentry, struct kstatfs *sfs)
+{
+ struct super_block *sb = dentry->d_sb;
+ s64 size;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ struct ntfs_inode *mft_ni = NTFS_I(vol->mft_ino);
+ unsigned long flags;
+
+ ntfs_debug("Entering.");
+ /* Type of filesystem. */
+ sfs->f_type = NTFS_SB_MAGIC;
+ /* Optimal transfer block size. */
+ sfs->f_bsize = vol->cluster_size;
+ /* Fundamental file system block size, used as the unit. */
+ sfs->f_frsize = vol->cluster_size;
+
+ /*
+ * Total data blocks in filesystem in units of f_bsize and since
+ * inodes are also stored in data blocs ($MFT is a file) this is just
+ * the total clusters.
+ */
+ sfs->f_blocks = vol->nr_clusters;
+
+ /* wait event */
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+ /* Free data blocks in filesystem in units of f_bsize. */
+ size = atomic64_read(&vol->free_clusters) -
+ atomic64_read(&vol->dirty_clusters);
+ if (size < 0LL)
+ size = 0LL;
+
+ /* Free blocks avail to non-superuser, same as above on NTFS. */
+ sfs->f_bavail = sfs->f_bfree = size;
+
+ /* Number of inodes in filesystem (at this point in time). */
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ sfs->f_files = i_size_read(vol->mft_ino) >> vol->mft_record_size_bits;
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+
+ /* Free inodes in fs (based on current total count). */
+ sfs->f_ffree = atomic64_read(&vol->free_mft_records);
+
+ /*
+ * File system id. This is extremely *nix flavour dependent and even
+ * within Linux itself all fs do their own thing. I interpret this to
+ * mean a unique id associated with the mounted fs and not the id
+ * associated with the filesystem driver, the latter is already given
+ * by the filesystem type in sfs->f_type. Thus we use the 64-bit
+ * volume serial number splitting it into two 32-bit parts. We enter
+ * the least significant 32-bits in f_fsid[0] and the most significant
+ * 32-bits in f_fsid[1].
+ */
+ sfs->f_fsid = u64_to_fsid(vol->serial_no);
+ /* Maximum length of filenames. */
+ sfs->f_namelen = NTFS_MAX_NAME_LEN;
+
+ return 0;
+}
+
+static int ntfs_write_inode(struct inode *vi, struct writeback_control *wbc)
+{
+ return __ntfs_write_inode(vi, wbc->sync_mode == WB_SYNC_ALL);
+}
+
+/**
+ * The complete super operations.
+ */
+static const struct super_operations ntfs_sops = {
+ .alloc_inode = ntfs_alloc_big_inode, /* VFS: Allocate new inode. */
+ .free_inode = ntfs_free_big_inode, /* VFS: Deallocate inode. */
+ .drop_inode = ntfs_drop_big_inode,
+ .write_inode = ntfs_write_inode, /* VFS: Write dirty inode to disk. */
+ .put_super = ntfs_put_super, /* Syscall: umount. */
+ .shutdown = ntfs_shutdown,
+ .sync_fs = ntfs_sync_fs, /* Syscall: sync. */
+ .statfs = ntfs_statfs, /* Syscall: statfs */
+ .evict_inode = ntfs_evict_big_inode,
+ .show_options = ntfs_show_options, /* Show mount options in proc. */
+};
+
+static void precalc_free_clusters(struct work_struct *work)
+{
+ struct ntfs_volume *vol = container_of(work, struct ntfs_volume, precalc_work);
+ s64 nr_free;
+
+ nr_free = get_nr_free_clusters(vol);
+
+ ntfs_debug("pre-calculate free clusters(%lld) using workqueue",
+ nr_free);
+}
+
+/**
+ * ntfs_fill_super - mount an ntfs filesystem
+ *
+ * ntfs_fill_super() is called by the VFS to mount the device described by @sb
+ * with the mount otions in @data with the NTFS filesystem.
+ *
+ * If @silent is true, remain silent even if errors are detected. This is used
+ * during bootup, when the kernel tries to mount the root filesystem with all
+ * registered filesystems one after the other until one succeeds. This implies
+ * that all filesystems except the correct one will quite correctly and
+ * expectedly return an error, but nobody wants to see error messages when in
+ * fact this is what is supposed to happen.
+ */
+static struct lock_class_key ntfs_mft_inval_lock_key;
+
+static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+ char *boot;
+ struct inode *tmp_ino;
+ int blocksize, result;
+ pgoff_t lcn_bit_pages;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int silent = fc->sb_flags & SB_SILENT;
+
+ vol->sb = sb;
+
+ /*
+ * We do a pretty difficult piece of bootstrap by reading the
+ * MFT (and other metadata) from disk into memory. We'll only
+ * release this metadata during umount, so the locking patterns
+ * observed during bootstrap do not count. So turn off the
+ * observation of locking patterns (strictly for this context
+ * only) while mounting NTFS. [The validator is still active
+ * otherwise, even for this context: it will for example record
+ * lock class registrations.]
+ */
+ lockdep_off();
+ ntfs_debug("Entering.");
+
+ if (vol->nls_map && !strcmp(vol->nls_map->charset, "utf8"))
+ vol->nls_utf8 = true;
+
+ /* We support sector sizes up to the PAGE_SIZE. */
+ if (bdev_logical_block_size(sb->s_bdev) > PAGE_SIZE) {
+ if (!silent)
+ ntfs_error(sb,
+ "Device has unsupported sector size (%i). The maximum supported sector size on this architecture is %lu bytes.",
+ bdev_logical_block_size(sb->s_bdev),
+ PAGE_SIZE);
+ goto err_out_now;
+ }
+
+ /*
+ * Setup the device access block size to NTFS_BLOCK_SIZE or the hard
+ * sector size, whichever is bigger.
+ */
+ blocksize = sb_min_blocksize(sb, NTFS_BLOCK_SIZE);
+ if (blocksize < NTFS_BLOCK_SIZE) {
+ if (!silent)
+ ntfs_error(sb, "Unable to set device block size.");
+ goto err_out_now;
+ }
+
+ BUG_ON(blocksize != sb->s_blocksize);
+ ntfs_debug("Set device block size to %i bytes (block size bits %i).",
+ blocksize, sb->s_blocksize_bits);
+ /* Determine the size of the device in units of block_size bytes. */
+ if (!bdev_nr_bytes(sb->s_bdev)) {
+ if (!silent)
+ ntfs_error(sb, "Unable to determine device size.");
+ goto err_out_now;
+ }
+ vol->nr_blocks = bdev_nr_bytes(sb->s_bdev) >>
+ sb->s_blocksize_bits;
+ /* Read the boot sector and return unlocked buffer head to it. */
+ boot = read_ntfs_boot_sector(sb, silent);
+ if (!boot) {
+ if (!silent)
+ ntfs_error(sb, "Not an NTFS volume.");
+ goto err_out_now;
+ }
+ /*
+ * Extract the data from the boot sector and setup the ntfs volume
+ * using it.
+ */
+ result = parse_ntfs_boot_sector(vol, (struct ntfs_boot_sector *)boot);
+ kfree(boot);
+ if (!result) {
+ if (!silent)
+ ntfs_error(sb, "Unsupported NTFS filesystem.");
+ goto err_out_now;
+ }
+
+ if (vol->sector_size > blocksize) {
+ blocksize = sb_set_blocksize(sb, vol->sector_size);
+ if (blocksize != vol->sector_size) {
+ if (!silent)
+ ntfs_error(sb,
+ "Unable to set device block size to sector size (%i).",
+ vol->sector_size);
+ goto err_out_now;
+ }
+ BUG_ON(blocksize != sb->s_blocksize);
+ vol->nr_blocks = bdev_nr_bytes(sb->s_bdev) >>
+ sb->s_blocksize_bits;
+ ntfs_debug("Changed device block size to %i bytes (block size bits %i) to match volume sector size.",
+ blocksize, sb->s_blocksize_bits);
+ }
+ /* Initialize the cluster and mft allocators. */
+ ntfs_setup_allocators(vol);
+ /* Setup remaining fields in the super block. */
+ sb->s_magic = NTFS_SB_MAGIC;
+ /*
+ * Ntfs allows 63 bits for the file size, i.e. correct would be:
+ * sb->s_maxbytes = ~0ULL >> 1;
+ * But the kernel uses a long as the page cache page index which on
+ * 32-bit architectures is only 32-bits. MAX_LFS_FILESIZE is kernel
+ * defined to the maximum the page cache page index can cope with
+ * without overflowing the index or to 2^63 - 1, whichever is smaller.
+ */
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ /* Ntfs measures time in 100ns intervals. */
+ sb->s_time_gran = 100;
+
+ sb->s_xattr = ntfs_xattr_handlers;
+ /*
+ * Now load the metadata required for the page cache and our address
+ * space operations to function. We do this by setting up a specialised
+ * read_inode method and then just calling the normal iget() to obtain
+ * the inode for $MFT which is sufficient to allow our normal inode
+ * operations and associated address space operations to function.
+ */
+ sb->s_op = &ntfs_sops;
+ tmp_ino = new_inode(sb);
+ if (!tmp_ino) {
+ if (!silent)
+ ntfs_error(sb, "Failed to load essential metadata.");
+ goto err_out_now;
+ }
+
+ tmp_ino->i_ino = FILE_MFT;
+ insert_inode_hash(tmp_ino);
+ if (ntfs_read_inode_mount(tmp_ino) < 0) {
+ if (!silent)
+ ntfs_error(sb, "Failed to load essential metadata.");
+ goto iput_tmp_ino_err_out_now;
+ }
+ lockdep_set_class(&tmp_ino->i_mapping->invalidate_lock,
+ &ntfs_mft_inval_lock_key);
+
+ mutex_lock(&ntfs_lock);
+
+ /*
+ * Generate the global default upcase table if necessary. Also
+ * temporarily increment the number of upcase users to avoid race
+ * conditions with concurrent (u)mounts.
+ */
+ if (!default_upcase)
+ default_upcase = generate_default_upcase();
+ ntfs_nr_upcase_users++;
+ mutex_unlock(&ntfs_lock);
+
+ lcn_bit_pages = (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ vol->lcn_empty_bits_per_page = kvmalloc_array(lcn_bit_pages, sizeof(unsigned int),
+ GFP_KERNEL);
+ if (!vol->lcn_empty_bits_per_page) {
+ ntfs_error(sb,
+ "Unable to allocate pages for storing LCN empty bit counts\n");
+ goto unl_upcase_iput_tmp_ino_err_out_now;
+ }
+
+ /*
+ * From now on, ignore @silent parameter. If we fail below this line,
+ * it will be due to a corrupt fs or a system error, so we report it.
+ */
+ /*
+ * Open the system files with normal access functions and complete
+ * setting up the ntfs super block.
+ */
+ if (!load_system_files(vol)) {
+ ntfs_error(sb, "Failed to load system files.");
+ goto unl_upcase_iput_tmp_ino_err_out_now;
+ }
+
+ /* We grab a reference, simulating an ntfs_iget(). */
+ ihold(vol->root_ino);
+ sb->s_root = d_make_root(vol->root_ino);
+ if (sb->s_root) {
+ s64 nr_records;
+
+ ntfs_debug("Exiting, status successful.");
+
+ /* Release the default upcase if it has no users. */
+ mutex_lock(&ntfs_lock);
+ if (!--ntfs_nr_upcase_users && default_upcase) {
+ ntfs_free(default_upcase);
+ default_upcase = NULL;
+ }
+ mutex_unlock(&ntfs_lock);
+ sb->s_export_op = &ntfs_export_ops;
+ lockdep_on();
+
+ nr_records = __get_nr_free_mft_records(vol,
+ i_size_read(vol->mft_ino) >> vol->mft_record_size_bits,
+ ((((NTFS_I(vol->mft_ino)->initialized_size >>
+ vol->mft_record_size_bits) +
+ 7) >> 3) + PAGE_SIZE - 1) >> PAGE_SHIFT);
+ ntfs_debug("Free mft records(%lld)", nr_records);
+
+ init_waitqueue_head(&vol->free_waitq);
+ INIT_WORK(&vol->precalc_work, precalc_free_clusters);
+ queue_work(ntfs_wq, &vol->precalc_work);
+ return 0;
+ }
+ ntfs_error(sb, "Failed to allocate root directory.");
+ /* Clean up after the successful load_system_files() call from above. */
+ iput(vol->vol_ino);
+ vol->vol_ino = NULL;
+ /* NTFS 3.0+ specific clean up. */
+ if (vol->major_ver >= 3) {
+ if (vol->quota_q_ino) {
+ iput(vol->quota_q_ino);
+ vol->quota_q_ino = NULL;
+ }
+ if (vol->quota_ino) {
+ iput(vol->quota_ino);
+ vol->quota_ino = NULL;
+ }
+ if (vol->extend_ino) {
+ iput(vol->extend_ino);
+ vol->extend_ino = NULL;
+ }
+ if (vol->secure_ino) {
+ iput(vol->secure_ino);
+ vol->secure_ino = NULL;
+ }
+ }
+ iput(vol->root_ino);
+ vol->root_ino = NULL;
+ iput(vol->lcnbmp_ino);
+ vol->lcnbmp_ino = NULL;
+ iput(vol->mftbmp_ino);
+ vol->mftbmp_ino = NULL;
+ if (vol->logfile_ino) {
+ iput(vol->logfile_ino);
+ vol->logfile_ino = NULL;
+ }
+ if (vol->mftmirr_ino) {
+ iput(vol->mftmirr_ino);
+ vol->mftmirr_ino = NULL;
+ }
+ /* Throw away the table of attribute definitions. */
+ vol->attrdef_size = 0;
+ if (vol->attrdef) {
+ ntfs_free(vol->attrdef);
+ vol->attrdef = NULL;
+ }
+ vol->upcase_len = 0;
+ mutex_lock(&ntfs_lock);
+ if (vol->upcase == default_upcase) {
+ ntfs_nr_upcase_users--;
+ vol->upcase = NULL;
+ }
+ mutex_unlock(&ntfs_lock);
+ if (vol->upcase) {
+ ntfs_free(vol->upcase);
+ vol->upcase = NULL;
+ }
+ if (vol->nls_map) {
+ unload_nls(vol->nls_map);
+ vol->nls_map = NULL;
+ }
+ /* Error exit code path. */
+unl_upcase_iput_tmp_ino_err_out_now:
+ if (vol->lcn_empty_bits_per_page)
+ kvfree(vol->lcn_empty_bits_per_page);
+ /*
+ * Decrease the number of upcase users and destroy the global default
+ * upcase table if necessary.
+ */
+ mutex_lock(&ntfs_lock);
+ if (!--ntfs_nr_upcase_users && default_upcase) {
+ ntfs_free(default_upcase);
+ default_upcase = NULL;
+ }
+
+ mutex_unlock(&ntfs_lock);
+iput_tmp_ino_err_out_now:
+ iput(tmp_ino);
+ if (vol->mft_ino && vol->mft_ino != tmp_ino)
+ iput(vol->mft_ino);
+ vol->mft_ino = NULL;
+ /* Errors at this stage are irrelevant. */
+err_out_now:
+ sb->s_fs_info = NULL;
+ kfree(vol);
+ ntfs_debug("Failed, returning -EINVAL.");
+ lockdep_on();
+ return -EINVAL;
+}
+
+/*
+ * This is a slab cache to optimize allocations and deallocations of Unicode
+ * strings of the maximum length allowed by NTFS, which is NTFS_MAX_NAME_LEN
+ * (255) Unicode characters + a terminating NULL Unicode character.
+ */
+struct kmem_cache *ntfs_name_cache;
+
+/* Slab caches for efficient allocation/deallocation of inodes. */
+struct kmem_cache *ntfs_inode_cache;
+struct kmem_cache *ntfs_big_inode_cache;
+
+/* Init once constructor for the inode slab cache. */
+static void ntfs_big_inode_init_once(void *foo)
+{
+ struct ntfs_inode *ni = (struct ntfs_inode *)foo;
+
+ inode_init_once(VFS_I(ni));
+}
+
+/*
+ * Slab caches to optimize allocations and deallocations of attribute search
+ * contexts and index contexts, respectively.
+ */
+struct kmem_cache *ntfs_attr_ctx_cache;
+struct kmem_cache *ntfs_index_ctx_cache;
+
+/* Driver wide mutex. */
+DEFINE_MUTEX(ntfs_lock);
+
+static int ntfs_get_tree(struct fs_context *fc)
+{
+ return get_tree_bdev(fc, ntfs_fill_super);
+}
+
+static void ntfs_free_fs_context(struct fs_context *fc)
+{
+ struct ntfs_volume *vol = fc->s_fs_info;
+
+ if (vol)
+ ntfs_volume_free(vol);
+}
+
+static const struct fs_context_operations ntfs_context_ops = {
+ .parse_param = ntfs_parse_param,
+ .get_tree = ntfs_get_tree,
+ .free = ntfs_free_fs_context,
+ .reconfigure = ntfs_reconfigure,
+};
+
+static int ntfs_init_fs_context(struct fs_context *fc)
+{
+ struct ntfs_volume *vol;
+
+ /* Allocate a new struct ntfs_volume and place it in sb->s_fs_info. */
+ vol = kmalloc(sizeof(struct ntfs_volume), GFP_NOFS);
+ if (!vol)
+ return -ENOMEM;
+
+ /* Initialize struct ntfs_volume structure. */
+ *vol = (struct ntfs_volume) {
+ .uid = INVALID_UID,
+ .gid = INVALID_GID,
+ .fmask = 0,
+ .dmask = 0,
+ .mft_zone_multiplier = 1,
+ .on_errors = ON_ERRORS_CONTINUE,
+ .nls_map = load_nls_default(),
+ .preallocated_size = NTFS_DEF_PREALLOC_SIZE,
+ };
+ init_rwsem(&vol->mftbmp_lock);
+ init_rwsem(&vol->lcnbmp_lock);
+
+ fc->s_fs_info = vol;
+ fc->ops = &ntfs_context_ops;
+ return 0;
+}
+
+static struct file_system_type ntfs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "ntfsplus",
+ .init_fs_context = ntfs_init_fs_context,
+ .parameters = ntfs_parameters,
+ .kill_sb = kill_block_super,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+};
+MODULE_ALIAS_FS("ntfsplus");
+
+static int ntfs_workqueue_init(void)
+{
+ ntfs_wq = alloc_workqueue("ntfsplus-bg-io", 0, 0);
+ if (!ntfs_wq)
+ return -ENOMEM;
+ return 0;
+}
+
+static void ntfs_workqueue_destroy(void)
+{
+ destroy_workqueue(ntfs_wq);
+ ntfs_wq = NULL;
+}
+
+/* Stable names for the slab caches. */
+static const char ntfs_index_ctx_cache_name[] = "ntfs_index_ctx_cache";
+static const char ntfs_attr_ctx_cache_name[] = "ntfs_attr_ctx_cache";
+static const char ntfs_name_cache_name[] = "ntfs_name_cache";
+static const char ntfs_inode_cache_name[] = "ntfs_inode_cache";
+static const char ntfs_big_inode_cache_name[] = "ntfs_big_inode_cache";
+
+static int __init init_ntfs_fs(void)
+{
+ int err = 0;
+
+ err = ntfs_workqueue_init();
+ if (err) {
+ pr_crit("Failed to register workqueue!\n");
+ return err;
+ }
+
+ ntfs_index_ctx_cache = kmem_cache_create(ntfs_index_ctx_cache_name,
+ sizeof(struct ntfs_index_context), 0 /* offset */,
+ SLAB_HWCACHE_ALIGN, NULL /* ctor */);
+ if (!ntfs_index_ctx_cache) {
+ pr_crit("Failed to create %s!\n", ntfs_index_ctx_cache_name);
+ goto ictx_err_out;
+ }
+ ntfs_attr_ctx_cache = kmem_cache_create(ntfs_attr_ctx_cache_name,
+ sizeof(struct ntfs_attr_search_ctx), 0 /* offset */,
+ SLAB_HWCACHE_ALIGN, NULL /* ctor */);
+ if (!ntfs_attr_ctx_cache) {
+ pr_crit("ntfs+: Failed to create %s!\n",
+ ntfs_attr_ctx_cache_name);
+ goto actx_err_out;
+ }
+
+ ntfs_name_cache = kmem_cache_create(ntfs_name_cache_name,
+ (NTFS_MAX_NAME_LEN+2) * sizeof(__le16), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
+ if (!ntfs_name_cache) {
+ pr_crit("Failed to create %s!\n", ntfs_name_cache_name);
+ goto name_err_out;
+ }
+
+ ntfs_inode_cache = kmem_cache_create(ntfs_inode_cache_name,
+ sizeof(struct ntfs_inode), 0, SLAB_RECLAIM_ACCOUNT, NULL);
+ if (!ntfs_inode_cache) {
+ pr_crit("Failed to create %s!\n", ntfs_inode_cache_name);
+ goto inode_err_out;
+ }
+
+ ntfs_big_inode_cache = kmem_cache_create(ntfs_big_inode_cache_name,
+ sizeof(struct big_ntfs_inode), 0, SLAB_HWCACHE_ALIGN |
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+ ntfs_big_inode_init_once);
+ if (!ntfs_big_inode_cache) {
+ pr_crit("Failed to create %s!\n", ntfs_big_inode_cache_name);
+ goto big_inode_err_out;
+ }
+
+ /* Register the ntfs sysctls. */
+ err = ntfs_sysctl(1);
+ if (err) {
+ pr_crit("Failed to register NTFS sysctls!\n");
+ goto sysctl_err_out;
+ }
+
+ err = register_filesystem(&ntfs_fs_type);
+ if (!err) {
+ ntfs_debug("ntfs+ driver registered successfully.");
+ return 0; /* Success! */
+ }
+ pr_crit("Failed to register ntfs+ filesystem driver!\n");
+
+ /* Unregister the ntfs sysctls. */
+ ntfs_sysctl(0);
+sysctl_err_out:
+ kmem_cache_destroy(ntfs_big_inode_cache);
+big_inode_err_out:
+ kmem_cache_destroy(ntfs_inode_cache);
+inode_err_out:
+ kmem_cache_destroy(ntfs_name_cache);
+name_err_out:
+ kmem_cache_destroy(ntfs_attr_ctx_cache);
+actx_err_out:
+ kmem_cache_destroy(ntfs_index_ctx_cache);
+ictx_err_out:
+ if (!err) {
+ pr_crit("Aborting ntfs+ filesystem driver registration...\n");
+ err = -ENOMEM;
+ }
+ return err;
+}
+
+static void __exit exit_ntfs_fs(void)
+{
+ ntfs_debug("Unregistering ntfs+ driver.");
+
+ unregister_filesystem(&ntfs_fs_type);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
+ kmem_cache_destroy(ntfs_big_inode_cache);
+ kmem_cache_destroy(ntfs_inode_cache);
+ kmem_cache_destroy(ntfs_name_cache);
+ kmem_cache_destroy(ntfs_attr_ctx_cache);
+ kmem_cache_destroy(ntfs_index_ctx_cache);
+ ntfs_workqueue_destroy();
+ /* Unregister the ntfs sysctls. */
+ ntfs_sysctl(0);
+}
+
+module_init(init_ntfs_fs);
+module_exit(exit_ntfs_fs);
+
+MODULE_AUTHOR("Anton Altaparmakov <anton@tuxera.com>"); /* Original read-only NTFS driver */
+MODULE_AUTHOR("Namjae Jeon <linkinjeon@kernel.org>"); /* Add write, iomap and various features */
+MODULE_DESCRIPTION("NTFS+ read-write filesystem driver");
+MODULE_LICENSE("GPL");
+#ifdef DEBUG
+module_param(debug_msgs, bint, 0);
+MODULE_PARM_DESC(debug_msgs, "Enable debug messages.");
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 03/11] ntfsplus: add inode operations
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
2025-10-20 2:07 ` [PATCH 01/11] ntfsplus: in-memory, on-disk structures and headers Namjae Jeon
2025-10-20 2:07 ` [PATCH 02/11] ntfsplus: add super block operations Namjae Jeon
@ 2025-10-20 2:07 ` Namjae Jeon
2025-10-20 2:07 ` [PATCH 04/11] ntfsplus: add directory operations Namjae Jeon
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
This adds the implementation of inode operations for ntfsplus.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/ntfsplus/inode.c | 3705 +++++++++++++++++++++++++++++++++++++++++++
fs/ntfsplus/mft.c | 2630 ++++++++++++++++++++++++++++++
fs/ntfsplus/mst.c | 195 +++
fs/ntfsplus/namei.c | 1606 +++++++++++++++++++
4 files changed, 8136 insertions(+)
create mode 100644 fs/ntfsplus/inode.c
create mode 100644 fs/ntfsplus/mft.c
create mode 100644 fs/ntfsplus/mst.c
create mode 100644 fs/ntfsplus/namei.c
diff --git a/fs/ntfsplus/inode.c b/fs/ntfsplus/inode.c
new file mode 100644
index 000000000000..2b8e0d67cf62
--- /dev/null
+++ b/fs/ntfsplus/inode.c
@@ -0,0 +1,3705 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel inode handling.
+ *
+ * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/seq_file.h>
+
+#include "lcnalloc.h"
+#include "misc.h"
+#include "ntfs.h"
+#include "index.h"
+#include "attrlist.h"
+#include "reparse.h"
+#include "ea.h"
+#include "attrib.h"
+#include "ntfs_iomap.h"
+
+/**
+ * ntfs_test_inode - compare two (possibly fake) inodes for equality
+ * @vi: vfs inode which to test
+ * @data: data which is being tested with
+ *
+ * Compare the ntfs attribute embedded in the ntfs specific part of the vfs
+ * inode @vi for equality with the ntfs attribute @data.
+ *
+ * If searching for the normal file/directory inode, set @na->type to AT_UNUSED.
+ * @na->name and @na->name_len are then ignored.
+ *
+ * Return 1 if the attributes match and 0 if not.
+ *
+ * NOTE: This function runs with the inode_hash_lock spin lock held so it is not
+ * allowed to sleep.
+ */
+int ntfs_test_inode(struct inode *vi, void *data)
+{
+ struct ntfs_attr *na = (struct ntfs_attr *)data;
+ struct ntfs_inode *ni;
+
+ if (vi->i_ino != na->mft_no)
+ return 0;
+
+ ni = NTFS_I(vi);
+
+ /* If !NInoAttr(ni), @vi is a normal file or directory inode. */
+ if (likely(!NInoAttr(ni))) {
+ /* If not looking for a normal inode this is a mismatch. */
+ if (unlikely(na->type != AT_UNUSED))
+ return 0;
+ } else {
+ /* A fake inode describing an attribute. */
+ if (ni->type != na->type)
+ return 0;
+ if (ni->name_len != na->name_len)
+ return 0;
+ if (na->name_len && memcmp(ni->name, na->name,
+ na->name_len * sizeof(__le16)))
+ return 0;
+ if (!ni->ext.base_ntfs_ino)
+ return 0;
+ }
+
+ /* Match! */
+ return 1;
+}
+
+/**
+ * ntfs_init_locked_inode - initialize an inode
+ * @vi: vfs inode to initialize
+ * @data: data which to initialize @vi to
+ *
+ * Initialize the vfs inode @vi with the values from the ntfs attribute @data in
+ * order to enable ntfs_test_inode() to do its work.
+ *
+ * If initializing the normal file/directory inode, set @na->type to AT_UNUSED.
+ * In that case, @na->name and @na->name_len should be set to NULL and 0,
+ * respectively. Although that is not strictly necessary as
+ * ntfs_read_locked_inode() will fill them in later.
+ *
+ * Return 0 on success and error.
+ *
+ * NOTE: This function runs with the inode->i_lock spin lock held so it is not
+ * allowed to sleep. (Hence the GFP_ATOMIC allocation.)
+ */
+static int ntfs_init_locked_inode(struct inode *vi, void *data)
+{
+ struct ntfs_attr *na = (struct ntfs_attr *)data;
+ struct ntfs_inode *ni = NTFS_I(vi);
+
+ vi->i_ino = na->mft_no;
+
+ if (na->type == AT_INDEX_ALLOCATION)
+ NInoSetMstProtected(ni);
+ else
+ ni->type = na->type;
+
+ ni->name = na->name;
+ ni->name_len = na->name_len;
+ ni->folio = NULL;
+ atomic_set(&ni->count, 1);
+
+ /* If initializing a normal inode, we are done. */
+ if (likely(na->type == AT_UNUSED)) {
+ BUG_ON(na->name);
+ BUG_ON(na->name_len);
+ return 0;
+ }
+
+ /* It is a fake inode. */
+ NInoSetAttr(ni);
+
+ /*
+ * We have I30 global constant as an optimization as it is the name
+ * in >99.9% of named attributes! The other <0.1% incur a GFP_ATOMIC
+ * allocation but that is ok. And most attributes are unnamed anyway,
+ * thus the fraction of named attributes with name != I30 is actually
+ * absolutely tiny.
+ */
+ if (na->name_len && na->name != I30) {
+ unsigned int i;
+
+ BUG_ON(!na->name);
+ i = na->name_len * sizeof(__le16);
+ ni->name = kmalloc(i + sizeof(__le16), GFP_ATOMIC);
+ if (!ni->name)
+ return -ENOMEM;
+ memcpy(ni->name, na->name, i);
+ ni->name[na->name_len] = 0;
+ }
+ return 0;
+}
+
+static int ntfs_read_locked_inode(struct inode *vi);
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi);
+static int ntfs_read_locked_index_inode(struct inode *base_vi,
+ struct inode *vi);
+
+/**
+ * ntfs_iget - obtain a struct inode corresponding to a specific normal inode
+ * @sb: super block of mounted volume
+ * @mft_no: mft record number / inode number to obtain
+ *
+ * Obtain the struct inode corresponding to a specific normal inode (i.e. a
+ * file or directory).
+ *
+ * If the inode is in the cache, it is just returned with an increased
+ * reference count. Otherwise, a new struct inode is allocated and initialized,
+ * and finally ntfs_read_locked_inode() is called to read in the inode and
+ * fill in the remainder of the inode structure.
+ *
+ * Return the struct inode on success. Check the return value with IS_ERR() and
+ * if true, the function failed and the error code is obtained from PTR_ERR().
+ */
+struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no)
+{
+ struct inode *vi;
+ int err;
+ struct ntfs_attr na;
+
+ na.mft_no = mft_no;
+ na.type = AT_UNUSED;
+ na.name = NULL;
+ na.name_len = 0;
+
+ vi = iget5_locked(sb, mft_no, ntfs_test_inode,
+ ntfs_init_locked_inode, &na);
+ if (unlikely(!vi))
+ return ERR_PTR(-ENOMEM);
+
+ err = 0;
+
+ /* If this is a freshly allocated inode, need to read it now. */
+ if (vi->i_state & I_NEW) {
+ err = ntfs_read_locked_inode(vi);
+ unlock_new_inode(vi);
+ }
+ /*
+ * There is no point in keeping bad inodes around if the failure was
+ * due to ENOMEM. We want to be able to retry again later.
+ */
+ if (unlikely(err == -ENOMEM)) {
+ iput(vi);
+ vi = ERR_PTR(err);
+ }
+ return vi;
+}
+
+/**
+ * ntfs_attr_iget - obtain a struct inode corresponding to an attribute
+ * @base_vi: vfs base inode containing the attribute
+ * @type: attribute type
+ * @name: Unicode name of the attribute (NULL if unnamed)
+ * @name_len: length of @name in Unicode characters (0 if unnamed)
+ *
+ * Obtain the (fake) struct inode corresponding to the attribute specified by
+ * @type, @name, and @name_len, which is present in the base mft record
+ * specified by the vfs inode @base_vi.
+ *
+ * If the attribute inode is in the cache, it is just returned with an
+ * increased reference count. Otherwise, a new struct inode is allocated and
+ * initialized, and finally ntfs_read_locked_attr_inode() is called to read the
+ * attribute and fill in the inode structure.
+ *
+ * Note, for index allocation attributes, you need to use ntfs_index_iget()
+ * instead of ntfs_attr_iget() as working with indices is a lot more complex.
+ *
+ * Return the struct inode of the attribute inode on success. Check the return
+ * value with IS_ERR() and if true, the function failed and the error code is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_attr_iget(struct inode *base_vi, __le32 type,
+ __le16 *name, u32 name_len)
+{
+ struct inode *vi;
+ int err;
+ struct ntfs_attr na;
+
+ /* Make sure no one calls ntfs_attr_iget() for indices. */
+ BUG_ON(type == AT_INDEX_ALLOCATION);
+
+ na.mft_no = base_vi->i_ino;
+ na.type = type;
+ na.name = name;
+ na.name_len = name_len;
+
+ vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+ ntfs_init_locked_inode, &na);
+ if (unlikely(!vi))
+ return ERR_PTR(-ENOMEM);
+ err = 0;
+
+ /* If this is a freshly allocated inode, need to read it now. */
+ if (vi->i_state & I_NEW) {
+ err = ntfs_read_locked_attr_inode(base_vi, vi);
+ unlock_new_inode(vi);
+ }
+ /*
+ * There is no point in keeping bad attribute inodes around. This also
+ * simplifies things in that we never need to check for bad attribute
+ * inodes elsewhere.
+ */
+ if (unlikely(err)) {
+ iput(vi);
+ vi = ERR_PTR(err);
+ }
+ return vi;
+}
+
+/**
+ * ntfs_index_iget - obtain a struct inode corresponding to an index
+ * @base_vi: vfs base inode containing the index related attributes
+ * @name: Unicode name of the index
+ * @name_len: length of @name in Unicode characters
+ *
+ * Obtain the (fake) struct inode corresponding to the index specified by @name
+ * and @name_len, which is present in the base mft record specified by the vfs
+ * inode @base_vi.
+ *
+ * If the index inode is in the cache, it is just returned with an increased
+ * reference count. Otherwise, a new struct inode is allocated and
+ * initialized, and finally ntfs_read_locked_index_inode() is called to read
+ * the index related attributes and fill in the inode structure.
+ *
+ * Return the struct inode of the index inode on success. Check the return
+ * value with IS_ERR() and if true, the function failed and the error code is
+ * obtained from PTR_ERR().
+ */
+struct inode *ntfs_index_iget(struct inode *base_vi, __le16 *name,
+ u32 name_len)
+{
+ struct inode *vi;
+ int err;
+ struct ntfs_attr na;
+
+ na.mft_no = base_vi->i_ino;
+ na.type = AT_INDEX_ALLOCATION;
+ na.name = name;
+ na.name_len = name_len;
+
+ vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
+ ntfs_init_locked_inode, &na);
+ if (unlikely(!vi))
+ return ERR_PTR(-ENOMEM);
+
+ err = 0;
+
+ /* If this is a freshly allocated inode, need to read it now. */
+ if (vi->i_state & I_NEW) {
+ err = ntfs_read_locked_index_inode(base_vi, vi);
+ unlock_new_inode(vi);
+ }
+ /*
+ * There is no point in keeping bad index inodes around. This also
+ * simplifies things in that we never need to check for bad index
+ * inodes elsewhere.
+ */
+ if (unlikely(err)) {
+ iput(vi);
+ vi = ERR_PTR(err);
+ }
+ return vi;
+}
+
+struct inode *ntfs_alloc_big_inode(struct super_block *sb)
+{
+ struct ntfs_inode *ni;
+
+ ntfs_debug("Entering.");
+ ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
+ if (likely(ni != NULL)) {
+ ni->state = 0;
+ ni->type = 0;
+ ni->mft_no = 0;
+ return VFS_I(ni);
+ }
+ ntfs_error(sb, "Allocation of NTFS big inode structure failed.");
+ return NULL;
+}
+
+void ntfs_free_big_inode(struct inode *inode)
+{
+ kmem_cache_free(ntfs_big_inode_cache, NTFS_I(inode));
+}
+
+static int ntfs_non_resident_dealloc_clusters(struct ntfs_inode *ni)
+{
+ struct super_block *sb = ni->vol->sb;
+ struct ntfs_attr_search_ctx *actx;
+ int err = 0;
+
+ actx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!actx)
+ return -ENOMEM;
+ BUG_ON(actx->mrec->link_count != 0);
+
+ /**
+ * ntfs_truncate_vfs cannot be called in evict() context due
+ * to some limitations, which are the @ni vfs inode is marked
+ * with I_FREEING, and etc.
+ */
+ if (NInoRunlistDirty(ni)) {
+ err = ntfs_cluster_free_from_rl(ni->vol, ni->runlist.rl);
+ if (err)
+ ntfs_error(sb,
+ "Failed to free clusters. Leaving inconsistent metadata.\n");
+ }
+
+ while ((err = ntfs_attrs_walk(actx)) == 0) {
+ if (actx->attr->non_resident &&
+ (!NInoRunlistDirty(ni) || actx->attr->type != AT_DATA)) {
+ struct runlist_element *rl;
+ size_t new_rl_count;
+
+ rl = ntfs_mapping_pairs_decompress(ni->vol, actx->attr, NULL,
+ &new_rl_count);
+ if (IS_ERR(rl)) {
+ err = PTR_ERR(rl);
+ ntfs_error(sb,
+ "Failed to decompress runlist. Leaving inconsistent metadata.\n");
+ continue;
+ }
+
+ err = ntfs_cluster_free_from_rl(ni->vol, rl);
+ if (err)
+ ntfs_error(sb,
+ "Failed to free attribute clusters. Leaving inconsistent metadata.\n");
+ ntfs_free(rl);
+ }
+ }
+
+ ntfs_release_dirty_clusters(ni->vol, ni->i_dealloc_clusters);
+ ntfs_attr_put_search_ctx(actx);
+ return err;
+}
+
+int ntfs_drop_big_inode(struct inode *inode)
+{
+ struct ntfs_inode *ni = NTFS_I(inode);
+
+ if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
+ if (ni->type == AT_DATA || ni->type == AT_INDEX_ALLOCATION) {
+ if (!inode->i_nlink) {
+ struct ntfs_inode *ni = NTFS_I(inode);
+
+ if (ni->data_size == 0)
+ return 0;
+
+ /* To avoid evict_inode call simultaneously */
+ atomic_inc(&inode->i_count);
+ spin_unlock(&inode->i_lock);
+
+ truncate_setsize(VFS_I(ni), 0);
+ ntfs_truncate_vfs(VFS_I(ni), 0, 1);
+
+ sb_start_intwrite(inode->i_sb);
+ i_size_write(inode, 0);
+ ni->allocated_size = ni->initialized_size = ni->data_size = 0;
+
+ truncate_inode_pages_final(inode->i_mapping);
+ sb_end_intwrite(inode->i_sb);
+
+ spin_lock(&inode->i_lock);
+ atomic_dec(&inode->i_count);
+ }
+ return 0;
+ } else if (ni->type == AT_INDEX_ROOT)
+ return 0;
+ }
+
+ return inode_generic_drop(inode);
+}
+
+static inline struct ntfs_inode *ntfs_alloc_extent_inode(void)
+{
+ struct ntfs_inode *ni;
+
+ ntfs_debug("Entering.");
+ ni = kmem_cache_alloc(ntfs_inode_cache, GFP_NOFS);
+ if (likely(ni != NULL)) {
+ ni->state = 0;
+ return ni;
+ }
+ ntfs_error(NULL, "Allocation of NTFS inode structure failed.");
+ return NULL;
+}
+
+static void ntfs_destroy_extent_inode(struct ntfs_inode *ni)
+{
+ ntfs_debug("Entering.");
+
+ if (!atomic_dec_and_test(&ni->count))
+ BUG();
+ if (ni->folio)
+ ntfs_unmap_folio(ni->folio, NULL);
+ kfree(ni->mrec);
+ kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct lock_class_key attr_inode_mrec_lock_class;
+static struct lock_class_key attr_list_inode_mrec_lock_class;
+
+/*
+ * The attribute runlist lock has separate locking rules from the
+ * normal runlist lock, so split the two lock-classes:
+ */
+static struct lock_class_key attr_list_rl_lock_class;
+
+/**
+ * __ntfs_init_inode - initialize ntfs specific part of an inode
+ * @sb: super block of mounted volume
+ * @ni: freshly allocated ntfs inode which to initialize
+ *
+ * Initialize an ntfs inode to defaults.
+ *
+ * NOTE: ni->mft_no, ni->state, ni->type, ni->name, and ni->name_len are left
+ * untouched. Make sure to initialize them elsewhere.
+ */
+void __ntfs_init_inode(struct super_block *sb, struct ntfs_inode *ni)
+{
+ ntfs_debug("Entering.");
+ rwlock_init(&ni->size_lock);
+ ni->initialized_size = ni->allocated_size = 0;
+ ni->seq_no = 0;
+ atomic_set(&ni->count, 1);
+ ni->vol = NTFS_SB(sb);
+ ntfs_init_runlist(&ni->runlist);
+ ni->lcn_seek_trunc = LCN_RL_NOT_MAPPED;
+ mutex_init(&ni->mrec_lock);
+ if (ni->type == AT_ATTRIBUTE_LIST) {
+ lockdep_set_class(&ni->mrec_lock,
+ &attr_list_inode_mrec_lock_class);
+ lockdep_set_class(&ni->runlist.lock,
+ &attr_list_rl_lock_class);
+ } else if (NInoAttr(ni)) {
+ lockdep_set_class(&ni->mrec_lock,
+ &attr_inode_mrec_lock_class);
+ }
+
+ ni->folio = NULL;
+ ni->folio_ofs = 0;
+ ni->mrec = NULL;
+ ni->attr_list_size = 0;
+ ni->attr_list = NULL;
+ ni->itype.index.block_size = 0;
+ ni->itype.index.vcn_size = 0;
+ ni->itype.index.collation_rule = 0;
+ ni->itype.index.block_size_bits = 0;
+ ni->itype.index.vcn_size_bits = 0;
+ mutex_init(&ni->extent_lock);
+ ni->nr_extents = 0;
+ ni->ext.base_ntfs_ino = NULL;
+ ni->flags = 0;
+ ni->mft_lcn[0] = LCN_RL_NOT_MAPPED;
+ ni->mft_lcn_count = 0;
+ ni->target = NULL;
+ ni->i_dealloc_clusters = 0;
+}
+
+/*
+ * Extent inodes get MFT-mapped in a nested way, while the base inode
+ * is still mapped. Teach this nesting to the lock validator by creating
+ * a separate class for nested inode's mrec_lock's:
+ */
+static struct lock_class_key extent_inode_mrec_lock_key;
+
+inline struct ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
+ unsigned long mft_no)
+{
+ struct ntfs_inode *ni = ntfs_alloc_extent_inode();
+
+ ntfs_debug("Entering.");
+ if (likely(ni != NULL)) {
+ __ntfs_init_inode(sb, ni);
+ lockdep_set_class(&ni->mrec_lock, &extent_inode_mrec_lock_key);
+ ni->mft_no = mft_no;
+ ni->type = AT_UNUSED;
+ ni->name = NULL;
+ ni->name_len = 0;
+ }
+ return ni;
+}
+
+/**
+ * ntfs_is_extended_system_file - check if a file is in the $Extend directory
+ * @ctx: initialized attribute search context
+ *
+ * Search all file name attributes in the inode described by the attribute
+ * search context @ctx and check if any of the names are in the $Extend system
+ * directory.
+ *
+ * Return values:
+ * 1: file is in $Extend directory
+ * 0: file is not in $Extend directory
+ * -errno: failed to determine if the file is in the $Extend directory
+ */
+static int ntfs_is_extended_system_file(struct ntfs_attr_search_ctx *ctx)
+{
+ int nr_links, err;
+
+ /* Restart search. */
+ ntfs_attr_reinit_search_ctx(ctx);
+
+ /* Get number of hard links. */
+ nr_links = le16_to_cpu(ctx->mrec->link_count);
+
+ /* Loop through all hard links. */
+ while (!(err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0,
+ ctx))) {
+ struct file_name_attr *file_name_attr;
+ struct attr_record *attr = ctx->attr;
+ u8 *p, *p2;
+
+ nr_links--;
+ /*
+ * Maximum sanity checking as we are called on an inode that
+ * we suspect might be corrupt.
+ */
+ p = (u8 *)attr + le32_to_cpu(attr->length);
+ if (p < (u8 *)ctx->mrec || (u8 *)p > (u8 *)ctx->mrec +
+ le32_to_cpu(ctx->mrec->bytes_in_use)) {
+err_corrupt_attr:
+ ntfs_error(ctx->ntfs_ino->vol->sb,
+ "Corrupt file name attribute. You should run chkdsk.");
+ return -EIO;
+ }
+ if (attr->non_resident) {
+ ntfs_error(ctx->ntfs_ino->vol->sb,
+ "Non-resident file name. You should run chkdsk.");
+ return -EIO;
+ }
+ if (attr->flags) {
+ ntfs_error(ctx->ntfs_ino->vol->sb,
+ "File name with invalid flags. You should run chkdsk.");
+ return -EIO;
+ }
+ if (!(attr->data.resident.flags & RESIDENT_ATTR_IS_INDEXED)) {
+ ntfs_error(ctx->ntfs_ino->vol->sb,
+ "Unindexed file name. You should run chkdsk.");
+ return -EIO;
+ }
+ file_name_attr = (struct file_name_attr *)((u8 *)attr +
+ le16_to_cpu(attr->data.resident.value_offset));
+ p2 = (u8 *)file_name_attr + le32_to_cpu(attr->data.resident.value_length);
+ if (p2 < (u8 *)attr || p2 > p)
+ goto err_corrupt_attr;
+ /* This attribute is ok, but is it in the $Extend directory? */
+ if (MREF_LE(file_name_attr->parent_directory) == FILE_Extend) {
+ unsigned char *s;
+
+ s = ntfs_attr_name_get(ctx->ntfs_ino->vol,
+ file_name_attr->file_name,
+ file_name_attr->file_name_length);
+ if (!s)
+ return 1;
+ if (!strcmp("$Reparse", s)) {
+ ntfs_attr_name_free(&s);
+ return 2; /* it's reparse point file */
+ }
+ ntfs_attr_name_free(&s);
+ return 1; /* YES, it's an extended system file. */
+ }
+ }
+ if (unlikely(err != -ENOENT))
+ return err;
+ if (unlikely(nr_links)) {
+ ntfs_error(ctx->ntfs_ino->vol->sb,
+ "Inode hard link count doesn't match number of name attributes. You should run chkdsk.");
+ return -EIO;
+ }
+ return 0; /* NO, it is not an extended system file. */
+}
+
+static struct lock_class_key ntfs_dir_inval_lock_key;
+
+void ntfs_set_vfs_operations(struct inode *inode, mode_t mode, dev_t dev)
+{
+ if (S_ISDIR(mode)) {
+ if (!NInoAttr(NTFS_I(inode))) {
+ inode->i_op = &ntfs_dir_inode_ops;
+ inode->i_fop = &ntfs_dir_ops;
+ }
+ if (NInoMstProtected(NTFS_I(inode)))
+ inode->i_mapping->a_ops = &ntfs_mst_aops;
+ else
+ inode->i_mapping->a_ops = &ntfs_normal_aops;
+ lockdep_set_class(&inode->i_mapping->invalidate_lock,
+ &ntfs_dir_inval_lock_key);
+ } else if (S_ISLNK(mode)) {
+ inode->i_op = &ntfs_symlink_inode_operations;
+ inode->i_mapping->a_ops = &ntfs_normal_aops;
+ } else if (S_ISCHR(mode) || S_ISBLK(mode) || S_ISFIFO(mode) || S_ISSOCK(mode)) {
+ inode->i_op = &ntfs_special_inode_operations;
+ init_special_inode(inode, inode->i_mode, dev);
+ } else {
+ if (!NInoAttr(NTFS_I(inode))) {
+ inode->i_op = &ntfs_file_inode_ops;
+ inode->i_fop = &ntfs_file_ops;
+ }
+ if (NInoMstProtected(NTFS_I(inode)))
+ inode->i_mapping->a_ops = &ntfs_mst_aops;
+ else if (NInoCompressed(NTFS_I(inode)))
+ inode->i_mapping->a_ops = &ntfs_compressed_aops;
+ else
+ inode->i_mapping->a_ops = &ntfs_normal_aops;
+ }
+}
+
+__le16 R[3] = { cpu_to_le16('$'), cpu_to_le16('R'), 0 };
+
+/**
+ * ntfs_read_locked_inode - read an inode from its device
+ * @vi: inode to read
+ *
+ * ntfs_read_locked_inode() is called from ntfs_iget() to read the inode
+ * described by @vi into memory from the device.
+ *
+ * The only fields in @vi that we need to/can look at when the function is
+ * called are i_sb, pointing to the mounted device's super block, and i_ino,
+ * the number of the inode to load.
+ *
+ * ntfs_read_locked_inode() maps, pins and locks the mft record number i_ino
+ * for reading and sets up the necessary @vi fields as well as initializing
+ * the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ * i_count is set to 1, so it is not going to go away
+ * i_flags is set to 0 and we have no business touching it. Only an ioctl()
+ * is allowed to write to them. We should of course be honouring them but
+ * we need to do that using the IS_* macros defined in include/linux/fs.h.
+ * In any case ntfs_read_locked_inode() has nothing to do with i_flags.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_inode(struct inode *vi)
+{
+ struct ntfs_volume *vol = NTFS_SB(vi->i_sb);
+ struct ntfs_inode *ni;
+ struct mft_record *m;
+ struct attr_record *a;
+ struct standard_information *si;
+ struct ntfs_attr_search_ctx *ctx;
+ int err = 0;
+ __le16 *name = I30;
+ unsigned int name_len = 4, flags = 0;
+ int extend_sys = 0;
+ dev_t dev = 0;
+ bool vol_err = true;
+
+ ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+ if (uid_valid(vol->uid)) {
+ vi->i_uid = vol->uid;
+ flags |= NTFS_VOL_UID;
+ } else
+ vi->i_uid = GLOBAL_ROOT_UID;
+
+ if (gid_valid(vol->gid)) {
+ vi->i_gid = vol->gid;
+ flags |= NTFS_VOL_GID;
+ } else
+ vi->i_gid = GLOBAL_ROOT_GID;
+
+ vi->i_mode = 0777;
+
+ /*
+ * Initialize the ntfs specific part of @vi special casing
+ * FILE_MFT which we need to do at mount time.
+ */
+ if (vi->i_ino != FILE_MFT)
+ ntfs_init_big_inode(vi);
+ ni = NTFS_I(vi);
+
+ m = map_mft_record(ni);
+ if (IS_ERR(m)) {
+ err = PTR_ERR(m);
+ goto err_out;
+ }
+
+ ctx = ntfs_attr_get_search_ctx(ni, m);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+
+ if (!(m->flags & MFT_RECORD_IN_USE)) {
+ err = -ENOENT;
+ vol_err = false;
+ goto unm_err_out;
+ }
+
+ if (m->base_mft_record) {
+ ntfs_error(vi->i_sb, "Inode is an extent inode!");
+ goto unm_err_out;
+ }
+
+ /* Transfer information from mft record into vfs and ntfs inodes. */
+ vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
+
+ if (le16_to_cpu(m->link_count) < 1) {
+ ntfs_error(vi->i_sb, "Inode link count is 0!");
+ goto unm_err_out;
+ }
+ set_nlink(vi, le16_to_cpu(m->link_count));
+
+ /* If read-only, no one gets write permissions. */
+ if (IS_RDONLY(vi))
+ vi->i_mode &= ~0222;
+
+ /*
+ * Find the standard information attribute in the mft record. At this
+ * stage we haven't setup the attribute list stuff yet, so this could
+ * in fact fail if the standard information is in an extent record, but
+ * I don't think this actually ever happens.
+ */
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0, 0, 0, NULL, 0,
+ ctx);
+ if (unlikely(err)) {
+ if (err == -ENOENT)
+ ntfs_error(vi->i_sb, "$STANDARD_INFORMATION attribute is missing.");
+ goto unm_err_out;
+ }
+ a = ctx->attr;
+ /* Get the standard information attribute value. */
+ if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+ + le32_to_cpu(a->data.resident.value_length) >
+ (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
+ goto unm_err_out;
+ }
+ si = (struct standard_information *)((u8 *)a +
+ le16_to_cpu(a->data.resident.value_offset));
+
+ /* Transfer information from the standard information into vi. */
+ /*
+ * Note: The i_?times do not quite map perfectly onto the NTFS times,
+ * but they are close enough, and in the end it doesn't really matter
+ * that much...
+ */
+ /*
+ * mtime is the last change of the data within the file. Not changed
+ * when only metadata is changed, e.g. a rename doesn't affect mtime.
+ */
+ ni->i_crtime = ntfs2utc(si->creation_time);
+
+ inode_set_mtime_to_ts(vi, ntfs2utc(si->last_data_change_time));
+ /*
+ * ctime is the last change of the metadata of the file. This obviously
+ * always changes, when mtime is changed. ctime can be changed on its
+ * own, mtime is then not changed, e.g. when a file is renamed.
+ */
+ inode_set_ctime_to_ts(vi, ntfs2utc(si->last_mft_change_time));
+ /*
+ * Last access to the data within the file. Not changed during a rename
+ * for example but changed whenever the file is written to.
+ */
+ inode_set_atime_to_ts(vi, ntfs2utc(si->last_access_time));
+ ni->flags = si->file_attributes;
+
+ /* Find the attribute list attribute if present. */
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+ if (err) {
+ if (unlikely(err != -ENOENT)) {
+ ntfs_error(vi->i_sb, "Failed to lookup attribute list attribute.");
+ goto unm_err_out;
+ }
+ } else {
+ if (vi->i_ino == FILE_MFT)
+ goto skip_attr_list_load;
+ ntfs_debug("Attribute list found in inode 0x%lx.", vi->i_ino);
+ NInoSetAttrList(ni);
+ a = ctx->attr;
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ ntfs_error(vi->i_sb,
+ "Attribute list attribute is compressed.");
+ goto unm_err_out;
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED ||
+ a->flags & ATTR_IS_SPARSE) {
+ if (a->non_resident) {
+ ntfs_error(vi->i_sb,
+ "Non-resident attribute list attribute is encrypted/sparse.");
+ goto unm_err_out;
+ }
+ ntfs_warning(vi->i_sb,
+ "Resident attribute list attribute in inode 0x%lx is marked encrypted/sparse which is not true. However, Windows allows this and chkdsk does not detect or correct it so we will just ignore the invalid flags and pretend they are not set.",
+ vi->i_ino);
+ }
+ /* Now allocate memory for the attribute list. */
+ ni->attr_list_size = (u32)ntfs_attr_size(a);
+ if (!ni->attr_list_size) {
+ ntfs_error(vi->i_sb, "Attr_list_size is zero");
+ goto unm_err_out;
+ }
+ ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
+ if (!ni->attr_list) {
+ ntfs_error(vi->i_sb,
+ "Not enough memory to allocate buffer for attribute list.");
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+ if (a->non_resident) {
+ NInoSetAttrListNonResident(ni);
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(vi->i_sb, "Attribute list has non zero lowest_vcn.");
+ goto unm_err_out;
+ }
+
+ /* Now load the attribute list. */
+ err = load_attribute_list(ni, ni->attr_list, ni->attr_list_size);
+ if (err) {
+ ntfs_error(vi->i_sb, "Failed to load attribute list attribute.");
+ goto unm_err_out;
+ }
+ } else /* if (!a->non_resident) */ {
+ if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+ + le32_to_cpu(
+ a->data.resident.value_length) >
+ (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "Corrupt attribute list in inode.");
+ goto unm_err_out;
+ }
+ /* Now copy the attribute list. */
+ memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+ a->data.resident.value_offset),
+ le32_to_cpu(
+ a->data.resident.value_length));
+ }
+ }
+skip_attr_list_load:
+ err = ntfs_attr_lookup(AT_EA_INFORMATION, NULL, 0, 0, 0, NULL, 0, ctx);
+ if (!err)
+ NInoSetHasEA(ni);
+
+ ntfs_ea_get_wsl_inode(vi, &dev, flags);
+
+ if (m->flags & MFT_RECORD_IS_DIRECTORY) {
+ vi->i_mode |= S_IFDIR;
+ /*
+ * Apply the directory permissions mask set in the mount
+ * options.
+ */
+ vi->i_mode &= ~vol->dmask;
+ /* Things break without this kludge! */
+ if (vi->i_nlink > 1)
+ set_nlink(vi, 1);
+ } else {
+ if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+ unsigned int mode;
+
+ mode = ntfs_make_symlink(ni);
+ if (mode)
+ vi->i_mode |= mode;
+ else {
+ vi->i_mode &= ~S_IFLNK;
+ vi->i_mode |= S_IFREG;
+ }
+ } else
+ vi->i_mode |= S_IFREG;
+ /* Apply the file permissions mask set in the mount options. */
+ vi->i_mode &= ~vol->fmask;
+ }
+
+ /*
+ * If an attribute list is present we now have the attribute list value
+ * in ntfs_ino->attr_list and it is ntfs_ino->attr_list_size bytes.
+ */
+ if (S_ISDIR(vi->i_mode)) {
+ struct index_root *ir;
+ u8 *ir_end, *index_end;
+
+view_index_meta:
+ /* It is a directory, find index root attribute. */
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_lookup(AT_INDEX_ROOT, name, name_len, CASE_SENSITIVE,
+ 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ if (err == -ENOENT)
+ ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+ goto unm_err_out;
+ }
+ a = ctx->attr;
+ /* Set up the state. */
+ if (unlikely(a->non_resident)) {
+ ntfs_error(vol->sb,
+ "$INDEX_ROOT attribute is not resident.");
+ goto unm_err_out;
+ }
+ /* Ensure the attribute name is placed before the value. */
+ if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
+ le16_to_cpu(a->data.resident.value_offset)))) {
+ ntfs_error(vol->sb,
+ "$INDEX_ROOT attribute name is placed after the attribute value.");
+ goto unm_err_out;
+ }
+ /*
+ * Compressed/encrypted index root just means that the newly
+ * created files in that directory should be created compressed/
+ * encrypted. However index root cannot be both compressed and
+ * encrypted.
+ */
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ NInoSetCompressed(ni);
+ ni->flags |= FILE_ATTR_COMPRESSED;
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED) {
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ ntfs_error(vi->i_sb, "Found encrypted and compressed attribute.");
+ goto unm_err_out;
+ }
+ NInoSetEncrypted(ni);
+ ni->flags |= FILE_ATTR_ENCRYPTED;
+ }
+ if (a->flags & ATTR_IS_SPARSE) {
+ NInoSetSparse(ni);
+ ni->flags |= FILE_ATTR_SPARSE_FILE;
+ }
+ ir = (struct index_root *)((u8 *)a +
+ le16_to_cpu(a->data.resident.value_offset));
+ ir_end = (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+ if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+ goto unm_err_out;
+ }
+ index_end = (u8 *)&ir->index +
+ le32_to_cpu(ir->index.index_length);
+ if (index_end > ir_end) {
+ ntfs_error(vi->i_sb, "Directory index is corrupt.");
+ goto unm_err_out;
+ }
+
+ if (extend_sys) {
+ if (ir->type) {
+ ntfs_error(vi->i_sb, "Indexed attribute is not zero.");
+ goto unm_err_out;
+ }
+ } else {
+ if (ir->type != AT_FILE_NAME) {
+ ntfs_error(vi->i_sb, "Indexed attribute is not $FILE_NAME.");
+ goto unm_err_out;
+ }
+
+ if (ir->collation_rule != COLLATION_FILE_NAME) {
+ ntfs_error(vi->i_sb,
+ "Index collation rule is not COLLATION_FILE_NAME.");
+ goto unm_err_out;
+ }
+ }
+
+ ni->itype.index.collation_rule = ir->collation_rule;
+ ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
+ if (ni->itype.index.block_size &
+ (ni->itype.index.block_size - 1)) {
+ ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+ ni->itype.index.block_size);
+ goto unm_err_out;
+ }
+ if (ni->itype.index.block_size > PAGE_SIZE) {
+ ntfs_error(vi->i_sb,
+ "Index block size (%u) > PAGE_SIZE (%ld) is not supported.",
+ ni->itype.index.block_size,
+ PAGE_SIZE);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+ if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+ ntfs_error(vi->i_sb,
+ "Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+ ni->itype.index.block_size,
+ NTFS_BLOCK_SIZE);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+ ni->itype.index.block_size_bits =
+ ffs(ni->itype.index.block_size) - 1;
+ /* Determine the size of a vcn in the directory index. */
+ if (vol->cluster_size <= ni->itype.index.block_size) {
+ ni->itype.index.vcn_size = vol->cluster_size;
+ ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
+ } else {
+ ni->itype.index.vcn_size = vol->sector_size;
+ ni->itype.index.vcn_size_bits = vol->sector_size_bits;
+ }
+
+ /* Setup the index allocation attribute, even if not present. */
+ ni->type = AT_INDEX_ROOT;
+ ni->name = name;
+ ni->name_len = name_len;
+ vi->i_size = ni->initialized_size = ni->data_size =
+ le32_to_cpu(a->data.resident.value_length);
+ ni->allocated_size = (ni->data_size + 7) & ~7;
+ /* We are done with the mft record, so we release it. */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+ m = NULL;
+ ctx = NULL;
+ /* Setup the operations for this inode. */
+ ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+ if (ir->index.flags & LARGE_INDEX)
+ NInoSetIndexAllocPresent(ni);
+ } else {
+ /* It is a file. */
+ ntfs_attr_reinit_search_ctx(ctx);
+
+ /* Setup the data attribute, even if not present. */
+ ni->type = AT_DATA;
+ ni->name = AT_UNNAMED;
+ ni->name_len = 0;
+
+ /* Find first extent of the unnamed data attribute. */
+ err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ vi->i_size = ni->initialized_size =
+ ni->allocated_size = 0;
+ if (err != -ENOENT) {
+ ntfs_error(vi->i_sb, "Failed to lookup $DATA attribute.");
+ goto unm_err_out;
+ }
+ /*
+ * FILE_Secure does not have an unnamed $DATA
+ * attribute, so we special case it here.
+ */
+ if (vi->i_ino == FILE_Secure)
+ goto no_data_attr_special_case;
+ /*
+ * Most if not all the system files in the $Extend
+ * system directory do not have unnamed data
+ * attributes so we need to check if the parent
+ * directory of the file is FILE_Extend and if it is
+ * ignore this error. To do this we need to get the
+ * name of this inode from the mft record as the name
+ * contains the back reference to the parent directory.
+ */
+ extend_sys = ntfs_is_extended_system_file(ctx);
+ if (extend_sys > 0) {
+ if (m->flags & MFT_RECORD_IS_VIEW_INDEX &&
+ extend_sys == 2) {
+ name = R;
+ name_len = 2;
+ goto view_index_meta;
+ }
+ goto no_data_attr_special_case;
+ }
+
+ err = extend_sys;
+ ntfs_error(vi->i_sb, "$DATA attribute is missing, err : %d", err);
+ goto unm_err_out;
+ }
+ a = ctx->attr;
+ /* Setup the state. */
+ if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ NInoSetCompressed(ni);
+ ni->flags |= FILE_ATTR_COMPRESSED;
+ if (vol->cluster_size > 4096) {
+ ntfs_error(vi->i_sb,
+ "Found compressed data but compression is disabled due to cluster size (%i) > 4kiB.",
+ vol->cluster_size);
+ goto unm_err_out;
+ }
+ if ((a->flags & ATTR_COMPRESSION_MASK)
+ != ATTR_IS_COMPRESSED) {
+ ntfs_error(vi->i_sb,
+ "Found unknown compression method or corrupt file.");
+ goto unm_err_out;
+ }
+ }
+ if (a->flags & ATTR_IS_SPARSE) {
+ NInoSetSparse(ni);
+ ni->flags |= FILE_ATTR_SPARSE_FILE;
+ }
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED) {
+ if (NInoCompressed(ni)) {
+ ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+ goto unm_err_out;
+ }
+ NInoSetEncrypted(ni);
+ ni->flags |= FILE_ATTR_ENCRYPTED;
+ }
+ if (a->non_resident) {
+ NInoSetNonResident(ni);
+ if (NInoCompressed(ni) || NInoSparse(ni)) {
+ if (NInoCompressed(ni) &&
+ a->data.non_resident.compression_unit != 4) {
+ ntfs_error(vi->i_sb,
+ "Found non-standard compression unit (%u instead of 4). Cannot handle this.",
+ a->data.non_resident.compression_unit);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+
+ if (NInoSparse(ni) &&
+ a->data.non_resident.compression_unit &&
+ a->data.non_resident.compression_unit !=
+ vol->sparse_compression_unit) {
+ ntfs_error(vi->i_sb,
+ "Found non-standard compression unit (%u instead of 0 or %d). Cannot handle this.",
+ a->data.non_resident.compression_unit,
+ vol->sparse_compression_unit);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+
+
+ if (a->data.non_resident.compression_unit) {
+ ni->itype.compressed.block_size = 1U <<
+ (a->data.non_resident.compression_unit +
+ vol->cluster_size_bits);
+ ni->itype.compressed.block_size_bits =
+ ffs(ni->itype.compressed.block_size) - 1;
+ ni->itype.compressed.block_clusters =
+ 1U << a->data.non_resident.compression_unit;
+ } else {
+ ni->itype.compressed.block_size = 0;
+ ni->itype.compressed.block_size_bits =
+ 0;
+ ni->itype.compressed.block_clusters =
+ 0;
+ }
+ ni->itype.compressed.size = le64_to_cpu(
+ a->data.non_resident.compressed_size);
+ }
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(vi->i_sb,
+ "First extent of $DATA attribute has non zero lowest_vcn.");
+ goto unm_err_out;
+ }
+ vi->i_size = ni->data_size = le64_to_cpu(a->data.non_resident.data_size);
+ ni->initialized_size = le64_to_cpu(a->data.non_resident.initialized_size);
+ ni->allocated_size = le64_to_cpu(a->data.non_resident.allocated_size);
+ } else { /* Resident attribute. */
+ vi->i_size = ni->data_size = ni->initialized_size = le32_to_cpu(
+ a->data.resident.value_length);
+ ni->allocated_size = le32_to_cpu(a->length) -
+ le16_to_cpu(
+ a->data.resident.value_offset);
+ if (vi->i_size > ni->allocated_size) {
+ ntfs_error(vi->i_sb,
+ "Resident data attribute is corrupt (size exceeds allocation).");
+ goto unm_err_out;
+ }
+ }
+no_data_attr_special_case:
+ /* We are done with the mft record, so we release it. */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+ m = NULL;
+ ctx = NULL;
+ /* Setup the operations for this inode. */
+ ntfs_set_vfs_operations(vi, vi->i_mode, dev);
+ }
+ /*
+ * The number of 512-byte blocks used on disk (for stat). This is in so
+ * far inaccurate as it doesn't account for any named streams or other
+ * special non-resident attributes, but that is how Windows works, too,
+ * so we are at least consistent with Windows, if not entirely
+ * consistent with the Linux Way. Doing it the Linux Way would cause a
+ * significant slowdown as it would involve iterating over all
+ * attributes in the mft record and adding the allocated/compressed
+ * sizes of all non-resident attributes present to give us the Linux
+ * correct size that should go into i_blocks (after division by 512).
+ */
+ if (S_ISREG(vi->i_mode) && (NInoCompressed(ni) || NInoSparse(ni)))
+ vi->i_blocks = ni->itype.compressed.size >> 9;
+ else
+ vi->i_blocks = ni->allocated_size >> 9;
+
+ ntfs_debug("Done.");
+ return 0;
+unm_err_out:
+ if (!err)
+ err = -EIO;
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ if (m)
+ unmap_mft_record(ni);
+err_out:
+ if (err != -EOPNOTSUPP && err != -ENOMEM && vol_err == true) {
+ ntfs_error(vol->sb,
+ "Failed with error code %i. Marking corrupt inode 0x%lx as bad. Run chkdsk.",
+ err, vi->i_ino);
+ NVolSetErrors(vol);
+ }
+ return err;
+}
+
+/**
+ * ntfs_read_locked_attr_inode - read an attribute inode from its base inode
+ * @base_vi: base inode
+ * @vi: attribute inode to read
+ *
+ * ntfs_read_locked_attr_inode() is called from ntfs_attr_iget() to read the
+ * attribute inode described by @vi into memory from the base mft record
+ * described by @base_ni.
+ *
+ * ntfs_read_locked_attr_inode() maps, pins and locks the base inode for
+ * reading and looks up the attribute described by @vi before setting up the
+ * necessary fields in @vi as well as initializing the ntfs inode.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ * i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Note this cannot be called for AT_INDEX_ALLOCATION.
+ */
+static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi)
+{
+ struct ntfs_volume *vol = NTFS_SB(vi->i_sb);
+ struct ntfs_inode *ni = NTFS_I(vi), *base_ni = NTFS_I(base_vi);
+ struct mft_record *m;
+ struct attr_record *a;
+ struct ntfs_attr_search_ctx *ctx;
+ int err = 0;
+
+ ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+
+ ntfs_init_big_inode(vi);
+
+ /* Just mirror the values from the base inode. */
+ vi->i_uid = base_vi->i_uid;
+ vi->i_gid = base_vi->i_gid;
+ set_nlink(vi, base_vi->i_nlink);
+ inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+ inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+ inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+ vi->i_generation = ni->seq_no = base_ni->seq_no;
+
+ /* Set inode type to zero but preserve permissions. */
+ vi->i_mode = base_vi->i_mode & ~S_IFMT;
+
+ m = map_mft_record(base_ni);
+ if (IS_ERR(m)) {
+ err = PTR_ERR(m);
+ goto err_out;
+ }
+ ctx = ntfs_attr_get_search_ctx(base_ni, m);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+ /* Find the attribute. */
+ err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
+ CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(err))
+ goto unm_err_out;
+ a = ctx->attr;
+ if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ NInoSetCompressed(ni);
+ ni->flags |= FILE_ATTR_COMPRESSED;
+ if ((ni->type != AT_DATA) || (ni->type == AT_DATA &&
+ ni->name_len)) {
+ ntfs_error(vi->i_sb,
+ "Found compressed non-data or named data attribute.");
+ goto unm_err_out;
+ }
+ if (vol->cluster_size > 4096) {
+ ntfs_error(vi->i_sb,
+ "Found compressed attribute but compression is disabled due to cluster size (%i) > 4kiB.",
+ vol->cluster_size);
+ goto unm_err_out;
+ }
+ if ((a->flags & ATTR_COMPRESSION_MASK) !=
+ ATTR_IS_COMPRESSED) {
+ ntfs_error(vi->i_sb, "Found unknown compression method.");
+ goto unm_err_out;
+ }
+ }
+ /*
+ * The compressed/sparse flag set in an index root just means
+ * to compress all files.
+ */
+ if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
+ ntfs_error(vi->i_sb,
+ "Found mst protected attribute but the attribute is %s.",
+ NInoCompressed(ni) ? "compressed" : "sparse");
+ goto unm_err_out;
+ }
+ if (a->flags & ATTR_IS_SPARSE) {
+ NInoSetSparse(ni);
+ ni->flags |= FILE_ATTR_SPARSE_FILE;
+ }
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED) {
+ if (NInoCompressed(ni)) {
+ ntfs_error(vi->i_sb, "Found encrypted and compressed data.");
+ goto unm_err_out;
+ }
+ /*
+ * The encryption flag set in an index root just means to
+ * encrypt all files.
+ */
+ if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
+ ntfs_error(vi->i_sb,
+ "Found mst protected attribute but the attribute is encrypted.");
+ goto unm_err_out;
+ }
+ if (ni->type != AT_DATA) {
+ ntfs_error(vi->i_sb,
+ "Found encrypted non-data attribute.");
+ goto unm_err_out;
+ }
+ NInoSetEncrypted(ni);
+ ni->flags |= FILE_ATTR_ENCRYPTED;
+ }
+ if (!a->non_resident) {
+ /* Ensure the attribute name is placed before the value. */
+ if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
+ le16_to_cpu(a->data.resident.value_offset)))) {
+ ntfs_error(vol->sb,
+ "Attribute name is placed after the attribute value.");
+ goto unm_err_out;
+ }
+ if (NInoMstProtected(ni)) {
+ ntfs_error(vi->i_sb,
+ "Found mst protected attribute but the attribute is resident.");
+ goto unm_err_out;
+ }
+ vi->i_size = ni->initialized_size = ni->data_size = le32_to_cpu(
+ a->data.resident.value_length);
+ ni->allocated_size = le32_to_cpu(a->length) -
+ le16_to_cpu(a->data.resident.value_offset);
+ if (vi->i_size > ni->allocated_size) {
+ ntfs_error(vi->i_sb,
+ "Resident attribute is corrupt (size exceeds allocation).");
+ goto unm_err_out;
+ }
+ } else {
+ NInoSetNonResident(ni);
+ /*
+ * Ensure the attribute name is placed before the mapping pairs
+ * array.
+ */
+ if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
+ le16_to_cpu(
+ a->data.non_resident.mapping_pairs_offset)))) {
+ ntfs_error(vol->sb,
+ "Attribute name is placed after the mapping pairs array.");
+ goto unm_err_out;
+ }
+ if (NInoCompressed(ni) || NInoSparse(ni)) {
+ if (NInoCompressed(ni) && a->data.non_resident.compression_unit != 4) {
+ ntfs_error(vi->i_sb,
+ "Found non-standard compression unit (%u instead of 4). Cannot handle this.",
+ a->data.non_resident.compression_unit);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+ if (a->data.non_resident.compression_unit) {
+ ni->itype.compressed.block_size = 1U <<
+ (a->data.non_resident.compression_unit +
+ vol->cluster_size_bits);
+ ni->itype.compressed.block_size_bits =
+ ffs(ni->itype.compressed.block_size) - 1;
+ ni->itype.compressed.block_clusters = 1U <<
+ a->data.non_resident.compression_unit;
+ } else {
+ ni->itype.compressed.block_size = 0;
+ ni->itype.compressed.block_size_bits = 0;
+ ni->itype.compressed.block_clusters = 0;
+ }
+ ni->itype.compressed.size = le64_to_cpu(
+ a->data.non_resident.compressed_size);
+ }
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(vi->i_sb, "First extent of attribute has non-zero lowest_vcn.");
+ goto unm_err_out;
+ }
+ vi->i_size = ni->data_size = le64_to_cpu(a->data.non_resident.data_size);
+ ni->initialized_size = le64_to_cpu(a->data.non_resident.initialized_size);
+ ni->allocated_size = le64_to_cpu(a->data.non_resident.allocated_size);
+ }
+ vi->i_mapping->a_ops = &ntfs_normal_aops;
+ if (NInoMstProtected(ni))
+ vi->i_mapping->a_ops = &ntfs_mst_aops;
+ else if (NInoCompressed(ni))
+ vi->i_mapping->a_ops = &ntfs_compressed_aops;
+ if ((NInoCompressed(ni) || NInoSparse(ni)) && ni->type != AT_INDEX_ROOT)
+ vi->i_blocks = ni->itype.compressed.size >> 9;
+ else
+ vi->i_blocks = ni->allocated_size >> 9;
+ /*
+ * Make sure the base inode does not go away and attach it to the
+ * attribute inode.
+ */
+ if (!igrab(base_vi)) {
+ err = -ENOENT;
+ goto unm_err_out;
+ }
+ ni->ext.base_ntfs_ino = base_ni;
+ ni->nr_extents = -1;
+
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(base_ni);
+
+ ntfs_debug("Done.");
+ return 0;
+
+unm_err_out:
+ if (!err)
+ err = -EIO;
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(base_ni);
+err_out:
+ if (err != -ENOENT)
+ ntfs_error(vol->sb,
+ "Failed with error code %i while reading attribute inode (mft_no 0x%lx, type 0x%x, name_len %i). Marking corrupt inode and base inode 0x%lx as bad. Run chkdsk.",
+ err, vi->i_ino, ni->type, ni->name_len,
+ base_vi->i_ino);
+ if (err != -ENOENT && err != -ENOMEM)
+ NVolSetErrors(vol);
+ return err;
+}
+
+/**
+ * ntfs_read_locked_index_inode - read an index inode from its base inode
+ * @base_vi: base inode
+ * @vi: index inode to read
+ *
+ * ntfs_read_locked_index_inode() is called from ntfs_index_iget() to read the
+ * index inode described by @vi into memory from the base mft record described
+ * by @base_ni.
+ *
+ * ntfs_read_locked_index_inode() maps, pins and locks the base inode for
+ * reading and looks up the attributes relating to the index described by @vi
+ * before setting up the necessary fields in @vi as well as initializing the
+ * ntfs inode.
+ *
+ * Note, index inodes are essentially attribute inodes (NInoAttr() is true)
+ * with the attribute type set to AT_INDEX_ALLOCATION. Apart from that, they
+ * are setup like directory inodes since directories are a special case of
+ * indices ao they need to be treated in much the same way. Most importantly,
+ * for small indices the index allocation attribute might not actually exist.
+ * However, the index root attribute always exists but this does not need to
+ * have an inode associated with it and this is why we define a new inode type
+ * index. Also, like for directories, we need to have an attribute inode for
+ * the bitmap attribute corresponding to the index allocation attribute and we
+ * can store this in the appropriate field of the inode, just like we do for
+ * normal directory inodes.
+ *
+ * Q: What locks are held when the function is called?
+ * A: i_state has I_NEW set, hence the inode is locked, also
+ * i_count is set to 1, so it is not going to go away
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_read_locked_index_inode(struct inode *base_vi, struct inode *vi)
+{
+ loff_t bvi_size;
+ struct ntfs_volume *vol = NTFS_SB(vi->i_sb);
+ struct ntfs_inode *ni = NTFS_I(vi), *base_ni = NTFS_I(base_vi), *bni;
+ struct inode *bvi;
+ struct mft_record *m;
+ struct attr_record *a;
+ struct ntfs_attr_search_ctx *ctx;
+ struct index_root *ir;
+ u8 *ir_end, *index_end;
+ int err = 0;
+
+ ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
+ lockdep_assert_held(&base_ni->mrec_lock);
+
+ ntfs_init_big_inode(vi);
+ /* Just mirror the values from the base inode. */
+ vi->i_uid = base_vi->i_uid;
+ vi->i_gid = base_vi->i_gid;
+ set_nlink(vi, base_vi->i_nlink);
+ inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
+ inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
+ inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
+ vi->i_generation = ni->seq_no = base_ni->seq_no;
+ /* Set inode type to zero but preserve permissions. */
+ vi->i_mode = base_vi->i_mode & ~S_IFMT;
+ /* Map the mft record for the base inode. */
+ m = map_mft_record(base_ni);
+ if (IS_ERR(m)) {
+ err = PTR_ERR(m);
+ goto err_out;
+ }
+ ctx = ntfs_attr_get_search_ctx(base_ni, m);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+ /* Find the index root attribute. */
+ err = ntfs_attr_lookup(AT_INDEX_ROOT, ni->name, ni->name_len,
+ CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ if (err == -ENOENT)
+ ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is missing.");
+ goto unm_err_out;
+ }
+ a = ctx->attr;
+ /* Set up the state. */
+ if (unlikely(a->non_resident)) {
+ ntfs_error(vol->sb, "$INDEX_ROOT attribute is not resident.");
+ goto unm_err_out;
+ }
+ /* Ensure the attribute name is placed before the value. */
+ if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
+ le16_to_cpu(a->data.resident.value_offset)))) {
+ ntfs_error(vol->sb,
+ "$INDEX_ROOT attribute name is placed after the attribute value.");
+ goto unm_err_out;
+ }
+
+ ir = (struct index_root *)((u8 *)a + le16_to_cpu(a->data.resident.value_offset));
+ ir_end = (u8 *)ir + le32_to_cpu(a->data.resident.value_length);
+ if (ir_end > (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
+ goto unm_err_out;
+ }
+ index_end = (u8 *)&ir->index + le32_to_cpu(ir->index.index_length);
+ if (index_end > ir_end) {
+ ntfs_error(vi->i_sb, "Index is corrupt.");
+ goto unm_err_out;
+ }
+
+ ni->itype.index.collation_rule = ir->collation_rule;
+ ntfs_debug("Index collation rule is 0x%x.",
+ le32_to_cpu(ir->collation_rule));
+ ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
+ if (!is_power_of_2(ni->itype.index.block_size)) {
+ ntfs_error(vi->i_sb, "Index block size (%u) is not a power of two.",
+ ni->itype.index.block_size);
+ goto unm_err_out;
+ }
+ if (ni->itype.index.block_size > PAGE_SIZE) {
+ ntfs_error(vi->i_sb, "Index block size (%u) > PAGE_SIZE (%ld) is not supported.",
+ ni->itype.index.block_size, PAGE_SIZE);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+ if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
+ ntfs_error(vi->i_sb,
+ "Index block size (%u) < NTFS_BLOCK_SIZE (%i) is not supported.",
+ ni->itype.index.block_size, NTFS_BLOCK_SIZE);
+ err = -EOPNOTSUPP;
+ goto unm_err_out;
+ }
+ ni->itype.index.block_size_bits = ffs(ni->itype.index.block_size) - 1;
+ /* Determine the size of a vcn in the index. */
+ if (vol->cluster_size <= ni->itype.index.block_size) {
+ ni->itype.index.vcn_size = vol->cluster_size;
+ ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
+ } else {
+ ni->itype.index.vcn_size = vol->sector_size;
+ ni->itype.index.vcn_size_bits = vol->sector_size_bits;
+ }
+
+ /* Find index allocation attribute. */
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_lookup(AT_INDEX_ALLOCATION, ni->name, ni->name_len,
+ CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ if (err == -ENOENT) {
+ /* No index allocation. */
+ vi->i_size = ni->initialized_size = ni->allocated_size = 0;
+ /* We are done with the mft record, so we release it. */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(base_ni);
+ m = NULL;
+ ctx = NULL;
+ goto skip_large_index_stuff;
+ } else
+ ntfs_error(vi->i_sb, "Failed to lookup $INDEX_ALLOCATION attribute.");
+ goto unm_err_out;
+ }
+ NInoSetIndexAllocPresent(ni);
+ NInoSetNonResident(ni);
+ ni->type = AT_INDEX_ALLOCATION;
+
+ a = ctx->attr;
+ if (!a->non_resident) {
+ ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is resident.");
+ goto unm_err_out;
+ }
+ /*
+ * Ensure the attribute name is placed before the mapping pairs array.
+ */
+ if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
+ le16_to_cpu(a->data.non_resident.mapping_pairs_offset)))) {
+ ntfs_error(vol->sb,
+ "$INDEX_ALLOCATION attribute name is placed after the mapping pairs array.");
+ goto unm_err_out;
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED) {
+ ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is encrypted.");
+ goto unm_err_out;
+ }
+ if (a->flags & ATTR_IS_SPARSE) {
+ ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is sparse.");
+ goto unm_err_out;
+ }
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ ntfs_error(vi->i_sb,
+ "$INDEX_ALLOCATION attribute is compressed.");
+ goto unm_err_out;
+ }
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(vi->i_sb,
+ "First extent of $INDEX_ALLOCATION attribute has non zero lowest_vcn.");
+ goto unm_err_out;
+ }
+ vi->i_size = ni->data_size = le64_to_cpu(a->data.non_resident.data_size);
+ ni->initialized_size = le64_to_cpu(a->data.non_resident.initialized_size);
+ ni->allocated_size = le64_to_cpu(a->data.non_resident.allocated_size);
+ /*
+ * We are done with the mft record, so we release it. Otherwise
+ * we would deadlock in ntfs_attr_iget().
+ */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(base_ni);
+ m = NULL;
+ ctx = NULL;
+ /* Get the index bitmap attribute inode. */
+ bvi = ntfs_attr_iget(base_vi, AT_BITMAP, ni->name, ni->name_len);
+ if (IS_ERR(bvi)) {
+ ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
+ err = PTR_ERR(bvi);
+ goto unm_err_out;
+ }
+ bni = NTFS_I(bvi);
+ if (NInoCompressed(bni) || NInoEncrypted(bni) ||
+ NInoSparse(bni)) {
+ ntfs_error(vi->i_sb,
+ "$BITMAP attribute is compressed and/or encrypted and/or sparse.");
+ goto iput_unm_err_out;
+ }
+ /* Consistency check bitmap size vs. index allocation size. */
+ bvi_size = i_size_read(bvi);
+ if ((bvi_size << 3) < (vi->i_size >> ni->itype.index.block_size_bits)) {
+ ntfs_error(vi->i_sb,
+ "Index bitmap too small (0x%llx) for index allocation (0x%llx).",
+ bvi_size << 3, vi->i_size);
+ goto iput_unm_err_out;
+ }
+ iput(bvi);
+skip_large_index_stuff:
+ /* Setup the operations for this index inode. */
+ ntfs_set_vfs_operations(vi, S_IFDIR, 0);
+ vi->i_blocks = ni->allocated_size >> 9;
+ /*
+ * Make sure the base inode doesn't go away and attach it to the
+ * index inode.
+ */
+ if (!igrab(base_vi))
+ goto unm_err_out;
+ ni->ext.base_ntfs_ino = base_ni;
+ ni->nr_extents = -1;
+
+ ntfs_debug("Done.");
+ return 0;
+iput_unm_err_out:
+ iput(bvi);
+unm_err_out:
+ if (!err)
+ err = -EIO;
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ if (m)
+ unmap_mft_record(base_ni);
+err_out:
+ ntfs_error(vi->i_sb,
+ "Failed with error code %i while reading index inode (mft_no 0x%lx, name_len %i.",
+ err, vi->i_ino, ni->name_len);
+ if (err != -EOPNOTSUPP && err != -ENOMEM)
+ NVolSetErrors(vol);
+ return err;
+}
+
+/**
+ * load_attribute_list_mount - load an attribute list into memory
+ * @vol: ntfs volume from which to read
+ * @runlist: runlist of the attribute list
+ * @al_start: destination buffer
+ * @size: size of the destination buffer in bytes
+ * @initialized_size: initialized size of the attribute list
+ *
+ * Walk the runlist @runlist and load all clusters from it copying them into
+ * the linear buffer @al. The maximum number of bytes copied to @al is @size
+ * bytes. Note, @size does not need to be a multiple of the cluster size. If
+ * @initialized_size is less than @size, the region in @al between
+ * @initialized_size and @size will be zeroed and not read from disk.
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int load_attribute_list_mount(struct ntfs_volume *vol,
+ struct runlist_element *rl, u8 *al_start, const s64 size,
+ const s64 initialized_size)
+{
+ s64 lcn;
+ u8 *al = al_start;
+ u8 *al_end = al + initialized_size;
+ struct super_block *sb;
+ int err = 0;
+ loff_t rl_byte_off, rl_byte_len;
+
+ ntfs_debug("Entering.");
+ if (!vol || !rl || !al || size <= 0 || initialized_size < 0 ||
+ initialized_size > size)
+ return -EINVAL;
+ if (!initialized_size) {
+ memset(al, 0, size);
+ return 0;
+ }
+ sb = vol->sb;
+
+ /* Read all clusters specified by the runlist one run at a time. */
+ while (rl->length) {
+ lcn = ntfs_rl_vcn_to_lcn(rl, rl->vcn);
+ ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
+ (unsigned long long)rl->vcn,
+ (unsigned long long)lcn);
+ /* The attribute list cannot be sparse. */
+ if (lcn < 0) {
+ ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read attribute list.");
+ goto err_out;
+ }
+
+ rl_byte_off = lcn << vol->cluster_size_bits;
+ rl_byte_len = rl->length << vol->cluster_size_bits;
+
+ if (al + rl_byte_len > al_end)
+ rl_byte_len = al_end - al;
+
+ err = ntfs_dev_read(sb, al, rl_byte_off, rl_byte_len);
+ if (err) {
+ ntfs_error(sb, "Cannot read attribute list.");
+ goto err_out;
+ }
+
+ if (al + rl_byte_len >= al_end) {
+ if (initialized_size < size)
+ goto initialize;
+ goto done;
+ }
+
+ al += rl_byte_len;
+ rl++;
+ }
+ if (initialized_size < size) {
+initialize:
+ memset(al_start + initialized_size, 0, size - initialized_size);
+ }
+done:
+ return err;
+ /* Real overflow! */
+ ntfs_error(sb, "Attribute list buffer overflow. Read attribute list is truncated.");
+err_out:
+ err = -EIO;
+ goto done;
+}
+/*
+ * The MFT inode has special locking, so teach the lock validator
+ * about this by splitting off the locking rules of the MFT from
+ * the locking rules of other inodes. The MFT inode can never be
+ * accessed from the VFS side (or even internally), only by the
+ * map_mft functions.
+ */
+static struct lock_class_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
+
+/**
+ * ntfs_read_inode_mount - special read_inode for mount time use only
+ * @vi: inode to read
+ *
+ * Read inode FILE_MFT at mount time, only called with super_block lock
+ * held from within the read_super() code path.
+ *
+ * This function exists because when it is called the page cache for $MFT/$DATA
+ * is not initialized and hence we cannot get at the contents of mft records
+ * by calling map_mft_record*().
+ *
+ * Further it needs to cope with the circular references problem, i.e. cannot
+ * load any attributes other than $ATTRIBUTE_LIST until $DATA is loaded, because
+ * we do not know where the other extent mft records are yet and again, because
+ * we cannot call map_mft_record*() yet. Obviously this applies only when an
+ * attribute list is actually present in $MFT inode.
+ *
+ * We solve these problems by starting with the $DATA attribute before anything
+ * else and iterating using ntfs_attr_lookup($DATA) over all extents. As each
+ * extent is found, we ntfs_mapping_pairs_decompress() including the implied
+ * ntfs_runlists_merge(). Each step of the iteration necessarily provides
+ * sufficient information for the next step to complete.
+ *
+ * This should work but there are two possible pit falls (see inline comments
+ * below), but only time will tell if they are real pits or just smoke...
+ */
+int ntfs_read_inode_mount(struct inode *vi)
+{
+ s64 next_vcn, last_vcn, highest_vcn;
+ struct super_block *sb = vi->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ struct ntfs_inode *ni;
+ struct mft_record *m = NULL;
+ struct attr_record *a;
+ struct ntfs_attr_search_ctx *ctx;
+ unsigned int i, nr_blocks;
+ int err;
+ size_t new_rl_count;
+
+ ntfs_debug("Entering.");
+
+ /* Initialize the ntfs specific part of @vi. */
+ ntfs_init_big_inode(vi);
+
+ ni = NTFS_I(vi);
+
+ /* Setup the data attribute. It is special as it is mst protected. */
+ NInoSetNonResident(ni);
+ NInoSetMstProtected(ni);
+ NInoSetSparseDisabled(ni);
+ ni->type = AT_DATA;
+ ni->name = AT_UNNAMED;
+ ni->name_len = 0;
+ /*
+ * This sets up our little cheat allowing us to reuse the async read io
+ * completion handler for directories.
+ */
+ ni->itype.index.block_size = vol->mft_record_size;
+ ni->itype.index.block_size_bits = vol->mft_record_size_bits;
+
+ /* Very important! Needed to be able to call map_mft_record*(). */
+ vol->mft_ino = vi;
+
+ /* Allocate enough memory to read the first mft record. */
+ if (vol->mft_record_size > 64 * 1024) {
+ ntfs_error(sb, "Unsupported mft record size %i (max 64kiB).",
+ vol->mft_record_size);
+ goto err_out;
+ }
+
+ i = vol->mft_record_size;
+ if (i < sb->s_blocksize)
+ i = sb->s_blocksize;
+
+ m = (struct mft_record *)ntfs_malloc_nofs(i);
+ if (!m) {
+ ntfs_error(sb, "Failed to allocate buffer for $MFT record 0.");
+ goto err_out;
+ }
+
+ /* Determine the first block of the $MFT/$DATA attribute. */
+ nr_blocks = vol->mft_record_size >> sb->s_blocksize_bits;
+ if (!nr_blocks)
+ nr_blocks = 1;
+
+ /* Load $MFT/$DATA's first mft record. */
+ err = ntfs_dev_read(sb, m, vol->mft_lcn << vol->cluster_size_bits, i);
+ if (err) {
+ ntfs_error(sb, "Device read failed.");
+ goto err_out;
+ }
+
+ if (le32_to_cpu(m->bytes_allocated) != vol->mft_record_size) {
+ ntfs_error(sb, "Incorrect mft record size %u in superblock, should be %u.",
+ le32_to_cpu(m->bytes_allocated), vol->mft_record_size);
+ goto err_out;
+ }
+
+ /* Apply the mst fixups. */
+ if (post_read_mst_fixup((struct ntfs_record *)m, vol->mft_record_size)) {
+ ntfs_error(sb, "MST fixup failed. $MFT is corrupt.");
+ goto err_out;
+ }
+
+ if (ntfs_mft_record_check(vol, m, FILE_MFT)) {
+ ntfs_error(sb, "ntfs_mft_record_check failed. $MFT is corrupt.");
+ goto err_out;
+ }
+
+ /* Need this to sanity check attribute list references to $MFT. */
+ vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
+
+ /* Provides read_folio() for map_mft_record(). */
+ vi->i_mapping->a_ops = &ntfs_mst_aops;
+
+ ctx = ntfs_attr_get_search_ctx(ni, m);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ /* Find the attribute list attribute if present. */
+ err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
+ if (err) {
+ if (unlikely(err != -ENOENT)) {
+ ntfs_error(sb,
+ "Failed to lookup attribute list attribute. You should run chkdsk.");
+ goto put_err_out;
+ }
+ } else /* if (!err) */ {
+ struct attr_list_entry *al_entry, *next_al_entry;
+ u8 *al_end;
+ static const char *es = " Not allowed. $MFT is corrupt. You should run chkdsk.";
+
+ ntfs_debug("Attribute list attribute found in $MFT.");
+ NInoSetAttrList(ni);
+ a = ctx->attr;
+ if (a->flags & ATTR_COMPRESSION_MASK) {
+ ntfs_error(sb,
+ "Attribute list attribute is compressed.%s",
+ es);
+ goto put_err_out;
+ }
+ if (a->flags & ATTR_IS_ENCRYPTED ||
+ a->flags & ATTR_IS_SPARSE) {
+ if (a->non_resident) {
+ ntfs_error(sb,
+ "Non-resident attribute list attribute is encrypted/sparse.%s",
+ es);
+ goto put_err_out;
+ }
+ ntfs_warning(sb,
+ "Resident attribute list attribute in $MFT system file is marked encrypted/sparse which is not true. However, Windows allows this and chkdsk does not detect or correct it so we will just ignore the invalid flags and pretend they are not set.");
+ }
+ /* Now allocate memory for the attribute list. */
+ ni->attr_list_size = (u32)ntfs_attr_size(a);
+ if (!ni->attr_list_size) {
+ ntfs_error(sb, "Attr_list_size is zero");
+ goto put_err_out;
+ }
+ ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
+ if (!ni->attr_list) {
+ ntfs_error(sb, "Not enough memory to allocate buffer for attribute list.");
+ goto put_err_out;
+ }
+ if (a->non_resident) {
+ struct runlist_element *rl;
+ size_t new_rl_count;
+
+ NInoSetAttrListNonResident(ni);
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(sb,
+ "Attribute list has non zero lowest_vcn. $MFT is corrupt. You should run chkdsk.");
+ goto put_err_out;
+ }
+
+ rl = ntfs_mapping_pairs_decompress(vol, a, NULL, &new_rl_count);
+ if (IS_ERR(rl)) {
+ err = PTR_ERR(rl);
+ ntfs_error(sb,
+ "Mapping pairs decompression failed with error code %i.",
+ -err);
+ goto put_err_out;
+ }
+
+ err = load_attribute_list_mount(vol, rl, ni->attr_list, ni->attr_list_size,
+ le64_to_cpu(a->data.non_resident.initialized_size));
+ ntfs_free(rl);
+ if (err) {
+ ntfs_error(sb,
+ "Failed to load attribute list with error code %i.",
+ -err);
+ goto put_err_out;
+ }
+ } else /* if (!ctx.attr->non_resident) */ {
+ if ((u8 *)a + le16_to_cpu(
+ a->data.resident.value_offset) +
+ le32_to_cpu(a->data.resident.value_length) >
+ (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(sb, "Corrupt attribute list attribute.");
+ goto put_err_out;
+ }
+ /* Now copy the attribute list. */
+ memcpy(ni->attr_list, (u8 *)a + le16_to_cpu(
+ a->data.resident.value_offset),
+ le32_to_cpu(a->data.resident.value_length));
+ }
+ /* The attribute list is now setup in memory. */
+ al_entry = (struct attr_list_entry *)ni->attr_list;
+ al_end = (u8 *)al_entry + ni->attr_list_size;
+ for (;; al_entry = next_al_entry) {
+ /* Out of bounds check. */
+ if ((u8 *)al_entry < ni->attr_list ||
+ (u8 *)al_entry > al_end)
+ goto em_put_err_out;
+ /* Catch the end of the attribute list. */
+ if ((u8 *)al_entry == al_end)
+ goto em_put_err_out;
+ if (!al_entry->length)
+ goto em_put_err_out;
+ if ((u8 *)al_entry + 6 > al_end ||
+ (u8 *)al_entry + le16_to_cpu(al_entry->length) > al_end)
+ goto em_put_err_out;
+ next_al_entry = (struct attr_list_entry *)((u8 *)al_entry +
+ le16_to_cpu(al_entry->length));
+ if (le32_to_cpu(al_entry->type) > le32_to_cpu(AT_DATA))
+ goto em_put_err_out;
+ if (al_entry->type != AT_DATA)
+ continue;
+ /* We want an unnamed attribute. */
+ if (al_entry->name_length)
+ goto em_put_err_out;
+ /* Want the first entry, i.e. lowest_vcn == 0. */
+ if (al_entry->lowest_vcn)
+ goto em_put_err_out;
+ /* First entry has to be in the base mft record. */
+ if (MREF_LE(al_entry->mft_reference) != vi->i_ino) {
+ /* MFT references do not match, logic fails. */
+ ntfs_error(sb,
+ "BUG: The first $DATA extent of $MFT is not in the base mft record.");
+ goto put_err_out;
+ } else {
+ /* Sequence numbers must match. */
+ if (MSEQNO_LE(al_entry->mft_reference) !=
+ ni->seq_no)
+ goto em_put_err_out;
+ /* Got it. All is ok. We can stop now. */
+ break;
+ }
+ }
+ }
+
+ ntfs_attr_reinit_search_ctx(ctx);
+
+ /* Now load all attribute extents. */
+ a = NULL;
+ next_vcn = last_vcn = highest_vcn = 0;
+ while (!(err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, next_vcn, NULL, 0,
+ ctx))) {
+ struct runlist_element *nrl;
+
+ /* Cache the current attribute. */
+ a = ctx->attr;
+ /* $MFT must be non-resident. */
+ if (!a->non_resident) {
+ ntfs_error(sb,
+ "$MFT must be non-resident but a resident extent was found. $MFT is corrupt. Run chkdsk.");
+ goto put_err_out;
+ }
+ /* $MFT must be uncompressed and unencrypted. */
+ if (a->flags & ATTR_COMPRESSION_MASK ||
+ a->flags & ATTR_IS_ENCRYPTED ||
+ a->flags & ATTR_IS_SPARSE) {
+ ntfs_error(sb,
+ "$MFT must be uncompressed, non-sparse, and unencrypted but a compressed/sparse/encrypted extent was found. $MFT is corrupt. Run chkdsk.");
+ goto put_err_out;
+ }
+ /*
+ * Decompress the mapping pairs array of this extent and merge
+ * the result into the existing runlist. No need for locking
+ * as we have exclusive access to the inode at this time and we
+ * are a mount in progress task, too.
+ */
+ nrl = ntfs_mapping_pairs_decompress(vol, a, &ni->runlist,
+ &new_rl_count);
+ if (IS_ERR(nrl)) {
+ ntfs_error(sb,
+ "ntfs_mapping_pairs_decompress() failed with error code %ld.",
+ PTR_ERR(nrl));
+ goto put_err_out;
+ }
+ ni->runlist.rl = nrl;
+ ni->runlist.count = new_rl_count;
+
+ /* Are we in the first extent? */
+ if (!next_vcn) {
+ if (a->data.non_resident.lowest_vcn) {
+ ntfs_error(sb,
+ "First extent of $DATA attribute has non zero lowest_vcn. $MFT is corrupt. You should run chkdsk.");
+ goto put_err_out;
+ }
+ /* Get the last vcn in the $DATA attribute. */
+ last_vcn = le64_to_cpu(a->data.non_resident.allocated_size) >>
+ vol->cluster_size_bits;
+ /* Fill in the inode size. */
+ vi->i_size = le64_to_cpu(a->data.non_resident.data_size);
+ ni->initialized_size = le64_to_cpu(a->data.non_resident.initialized_size);
+ ni->allocated_size = le64_to_cpu(a->data.non_resident.allocated_size);
+ /*
+ * Verify the number of mft records does not exceed
+ * 2^32 - 1.
+ */
+ if ((vi->i_size >> vol->mft_record_size_bits) >=
+ (1ULL << 32)) {
+ ntfs_error(sb, "$MFT is too big! Aborting.");
+ goto put_err_out;
+ }
+ /*
+ * We have got the first extent of the runlist for
+ * $MFT which means it is now relatively safe to call
+ * the normal ntfs_read_inode() function.
+ * Complete reading the inode, this will actually
+ * re-read the mft record for $MFT, this time entering
+ * it into the page cache with which we complete the
+ * kick start of the volume. It should be safe to do
+ * this now as the first extent of $MFT/$DATA is
+ * already known and we would hope that we don't need
+ * further extents in order to find the other
+ * attributes belonging to $MFT. Only time will tell if
+ * this is really the case. If not we will have to play
+ * magic at this point, possibly duplicating a lot of
+ * ntfs_read_inode() at this point. We will need to
+ * ensure we do enough of its work to be able to call
+ * ntfs_read_inode() on extents of $MFT/$DATA. But lets
+ * hope this never happens...
+ */
+ err = ntfs_read_locked_inode(vi);
+ if (err) {
+ ntfs_error(sb, "ntfs_read_inode() of $MFT failed.\n");
+ ntfs_attr_put_search_ctx(ctx);
+ /* Revert to the safe super operations. */
+ ntfs_free(m);
+ return -1;
+ }
+ /*
+ * Re-initialize some specifics about $MFT's inode as
+ * ntfs_read_inode() will have set up the default ones.
+ */
+ /* Set uid and gid to root. */
+ vi->i_uid = GLOBAL_ROOT_UID;
+ vi->i_gid = GLOBAL_ROOT_GID;
+ /* Regular file. No access for anyone. */
+ vi->i_mode = S_IFREG;
+ /* No VFS initiated operations allowed for $MFT. */
+ vi->i_op = &ntfs_empty_inode_ops;
+ vi->i_fop = &ntfs_empty_file_ops;
+ }
+
+ /* Get the lowest vcn for the next extent. */
+ highest_vcn = le64_to_cpu(a->data.non_resident.highest_vcn);
+ next_vcn = highest_vcn + 1;
+
+ /* Only one extent or error, which we catch below. */
+ if (next_vcn <= 0)
+ break;
+
+ /* Avoid endless loops due to corruption. */
+ if (next_vcn < le64_to_cpu(a->data.non_resident.lowest_vcn)) {
+ ntfs_error(sb, "$MFT has corrupt attribute list attribute. Run chkdsk.");
+ goto put_err_out;
+ }
+ }
+ if (err != -ENOENT) {
+ ntfs_error(sb, "Failed to lookup $MFT/$DATA attribute extent. Run chkdsk.\n");
+ goto put_err_out;
+ }
+ if (!a) {
+ ntfs_error(sb, "$MFT/$DATA attribute not found. $MFT is corrupt. Run chkdsk.");
+ goto put_err_out;
+ }
+ if (highest_vcn && highest_vcn != last_vcn - 1) {
+ ntfs_error(sb, "Failed to load the complete runlist for $MFT/$DATA. Run chkdsk.");
+ ntfs_debug("highest_vcn = 0x%llx, last_vcn - 1 = 0x%llx",
+ (unsigned long long)highest_vcn,
+ (unsigned long long)last_vcn - 1);
+ goto put_err_out;
+ }
+ ntfs_attr_put_search_ctx(ctx);
+ ntfs_debug("Done.");
+ ntfs_free(m);
+
+ /*
+ * Split the locking rules of the MFT inode from the
+ * locking rules of other inodes:
+ */
+ lockdep_set_class(&ni->runlist.lock, &mft_ni_runlist_lock_key);
+ lockdep_set_class(&ni->mrec_lock, &mft_ni_mrec_lock_key);
+
+ return 0;
+
+em_put_err_out:
+ ntfs_error(sb,
+ "Couldn't find first extent of $DATA attribute in attribute list. $MFT is corrupt. Run chkdsk.");
+put_err_out:
+ ntfs_attr_put_search_ctx(ctx);
+err_out:
+ ntfs_error(sb, "Failed. Marking inode as bad.");
+ ntfs_free(m);
+ return -1;
+}
+
+static void __ntfs_clear_inode(struct ntfs_inode *ni)
+{
+ /* Free all alocated memory. */
+ if (NInoNonResident(ni) && ni->runlist.rl) {
+ ntfs_free(ni->runlist.rl);
+ ni->runlist.rl = NULL;
+ }
+
+ if (ni->attr_list) {
+ ntfs_free(ni->attr_list);
+ ni->attr_list = NULL;
+ }
+
+ if (ni->name_len && ni->name != I30 &&
+ ni->name != reparse_index_name &&
+ ni->name != R) {
+ /* Catch bugs... */
+ BUG_ON(!ni->name);
+ kfree(ni->name);
+ }
+}
+
+void ntfs_clear_extent_inode(struct ntfs_inode *ni)
+{
+ ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+ BUG_ON(NInoAttr(ni));
+ BUG_ON(ni->nr_extents != -1);
+
+ __ntfs_clear_inode(ni);
+ ntfs_destroy_extent_inode(ni);
+}
+
+static int ntfs_delete_base_inode(struct ntfs_inode *ni)
+{
+ struct super_block *sb = ni->vol->sb;
+ int err;
+
+ if (NInoAttr(ni) || ni->nr_extents == -1)
+ return 0;
+
+ err = ntfs_non_resident_dealloc_clusters(ni);
+
+ /*
+ * Deallocate extent mft records and free extent inodes.
+ * No need to lock as no one else has a reference.
+ */
+ while (ni->nr_extents) {
+ err = ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+ if (err)
+ ntfs_error(sb,
+ "Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+ ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+ }
+
+ /* Deallocate base mft record */
+ err = ntfs_mft_record_free(ni->vol, ni);
+ if (err)
+ ntfs_error(sb, "Failed to free base MFT record. Leaving inconsistent metadata.\n");
+ return err;
+}
+
+/**
+ * ntfs_evict_big_inode - clean up the ntfs specific part of an inode
+ * @vi: vfs inode pending annihilation
+ *
+ * When the VFS is going to remove an inode from memory, ntfs_clear_big_inode()
+ * is called, which deallocates all memory belonging to the NTFS specific part
+ * of the inode and returns.
+ *
+ * If the MFT record is dirty, we commit it before doing anything else.
+ */
+void ntfs_evict_big_inode(struct inode *vi)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+
+ truncate_inode_pages_final(&vi->i_data);
+
+ if (!vi->i_nlink) {
+ if (!NInoAttr(ni)) {
+ /* Never called with extent inodes */
+ BUG_ON(ni->nr_extents == -1);
+ ntfs_delete_base_inode(ni);
+ }
+ goto release;
+ }
+
+ if (NInoDirty(ni)) {
+ /* Committing the inode also commits all extent inodes. */
+ ntfs_commit_inode(vi);
+
+ if (NInoDirty(ni)) {
+ ntfs_debug("Failed to commit dirty inode 0x%lx. Losing data!",
+ vi->i_ino);
+ NInoClearAttrListDirty(ni);
+ NInoClearDirty(ni);
+ }
+ }
+
+ /* No need to lock at this stage as no one else has a reference. */
+ if (ni->nr_extents > 0) {
+ int i;
+
+ for (i = 0; i < ni->nr_extents; i++) {
+ if (ni->ext.extent_ntfs_inos[i])
+ ntfs_clear_extent_inode(ni->ext.extent_ntfs_inos[i]);
+ }
+ ni->nr_extents = 0;
+ ntfs_free(ni->ext.extent_ntfs_inos);
+ }
+
+release:
+ clear_inode(vi);
+ __ntfs_clear_inode(ni);
+
+ if (NInoAttr(ni)) {
+ /* Release the base inode if we are holding it. */
+ if (ni->nr_extents == -1) {
+ iput(VFS_I(ni->ext.base_ntfs_ino));
+ ni->nr_extents = 0;
+ ni->ext.base_ntfs_ino = NULL;
+ }
+ }
+
+ if (!atomic_dec_and_test(&ni->count))
+ BUG();
+ if (ni->folio)
+ ntfs_unmap_folio(ni->folio, NULL);
+ kfree(ni->mrec);
+ ntfs_free(ni->target);
+}
+
+/**
+ * ntfs_show_options - show mount options in /proc/mounts
+ * @sf: seq_file in which to write our mount options
+ * @root: root of the mounted tree whose mount options to display
+ *
+ * Called by the VFS once for each mounted ntfs volume when someone reads
+ * /proc/mounts in order to display the NTFS specific mount options of each
+ * mount. The mount options of fs specified by @root are written to the seq file
+ * @sf and success is returned.
+ */
+int ntfs_show_options(struct seq_file *sf, struct dentry *root)
+{
+ struct ntfs_volume *vol = NTFS_SB(root->d_sb);
+ int i;
+
+ seq_printf(sf, ",uid=%i", from_kuid_munged(&init_user_ns, vol->uid));
+ seq_printf(sf, ",gid=%i", from_kgid_munged(&init_user_ns, vol->gid));
+ if (vol->fmask == vol->dmask)
+ seq_printf(sf, ",umask=0%o", vol->fmask);
+ else {
+ seq_printf(sf, ",fmask=0%o", vol->fmask);
+ seq_printf(sf, ",dmask=0%o", vol->dmask);
+ }
+ seq_printf(sf, ",nls=%s", vol->nls_map->charset);
+ if (NVolCaseSensitive(vol))
+ seq_puts(sf, ",case_sensitive");
+ if (NVolShowSystemFiles(vol))
+ seq_puts(sf, ",show_sys_files");
+ for (i = 0; on_errors_arr[i].val; i++) {
+ if (on_errors_arr[i].val == vol->on_errors)
+ seq_printf(sf, ",errors=%s", on_errors_arr[i].str);
+ }
+ seq_printf(sf, ",mft_zone_multiplier=%i", vol->mft_zone_multiplier);
+ return 0;
+}
+
+int ntfs_extend_initialized_size(struct inode *vi, const loff_t offset,
+ const loff_t new_size)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ loff_t old_init_size;
+ unsigned long flags;
+ int err;
+
+ read_lock_irqsave(&ni->size_lock, flags);
+ old_init_size = ni->initialized_size;
+ read_unlock_irqrestore(&ni->size_lock, flags);
+
+ if (!NInoNonResident(ni))
+ return -EINVAL;
+ if (old_init_size >= new_size)
+ return 0;
+
+ err = ntfs_attr_map_whole_runlist(ni);
+ if (err)
+ return err;
+
+ if (!NInoCompressed(ni) && old_init_size < offset) {
+ err = iomap_zero_range(vi, old_init_size,
+ offset - old_init_size,
+ NULL, &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ if (err)
+ return err;
+ }
+
+
+ mutex_lock(&ni->mrec_lock);
+ err = ntfs_attr_set_initialized_size(ni, new_size);
+ mutex_unlock(&ni->mrec_lock);
+ if (err)
+ truncate_setsize(vi, old_init_size);
+ return err;
+}
+
+int ntfs_truncate_vfs(struct inode *vi, loff_t new_size, loff_t i_size)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ int err;
+
+ mutex_lock(&ni->mrec_lock);
+ err = __ntfs_attr_truncate_vfs(ni, new_size, i_size);
+ mutex_unlock(&ni->mrec_lock);
+ if (err < 0)
+ return err;
+
+ inode_set_mtime_to_ts(vi, inode_set_ctime_current(vi));
+ return 0;
+}
+
+/**
+ * ntfs_inode_sync_standard_information - update standard information attribute
+ * @vi: inode to update standard information
+ * @m: mft record
+ *
+ * Return 0 on success or -errno on error.
+ */
+static int ntfs_inode_sync_standard_information(struct inode *vi, struct mft_record *m)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_attr_search_ctx *ctx;
+ struct standard_information *si;
+ __le64 nt;
+ int err = 0;
+ bool modified = false;
+
+ /* Update the access times in the standard information attribute. */
+ ctx = ntfs_attr_get_search_ctx(ni, m);
+ if (unlikely(!ctx))
+ return -ENOMEM;
+ err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0,
+ CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ ntfs_attr_put_search_ctx(ctx);
+ return err;
+ }
+ si = (struct standard_information *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ si->file_attributes = ni->flags;
+
+ /* Update the creation times if they have changed. */
+ nt = utc2ntfs(ni->i_crtime);
+ if (si->creation_time != nt) {
+ ntfs_debug("Updating creation time for inode 0x%lx: old = 0x%llx, new = 0x%llx",
+ vi->i_ino, le64_to_cpu(si->creation_time),
+ le64_to_cpu(nt));
+ si->creation_time = nt;
+ modified = true;
+ }
+
+ /* Update the access times if they have changed. */
+ nt = utc2ntfs(inode_get_mtime(vi));
+ if (si->last_data_change_time != nt) {
+ ntfs_debug("Updating mtime for inode 0x%lx: old = 0x%llx, new = 0x%llx",
+ vi->i_ino, le64_to_cpu(si->last_data_change_time),
+ le64_to_cpu(nt));
+ si->last_data_change_time = nt;
+ modified = true;
+ }
+
+ nt = utc2ntfs(inode_get_ctime(vi));
+ if (si->last_mft_change_time != nt) {
+ ntfs_debug("Updating ctime for inode 0x%lx: old = 0x%llx, new = 0x%llx",
+ vi->i_ino, le64_to_cpu(si->last_mft_change_time),
+ le64_to_cpu(nt));
+ si->last_mft_change_time = nt;
+ modified = true;
+ }
+ nt = utc2ntfs(inode_get_atime(vi));
+ if (si->last_access_time != nt) {
+ ntfs_debug("Updating atime for inode 0x%lx: old = 0x%llx, new = 0x%llx",
+ vi->i_ino,
+ le64_to_cpu(si->last_access_time),
+ le64_to_cpu(nt));
+ si->last_access_time = nt;
+ modified = true;
+ }
+
+ /*
+ * If we just modified the standard information attribute we need to
+ * mark the mft record it is in dirty. We do this manually so that
+ * mark_inode_dirty() is not called which would redirty the inode and
+ * hence result in an infinite loop of trying to write the inode.
+ * There is no need to mark the base inode nor the base mft record
+ * dirty, since we are going to write this mft record below in any case
+ * and the base mft record may actually not have been modified so it
+ * might not need to be written out.
+ * NOTE: It is not a problem when the inode for $MFT itself is being
+ * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
+ * on the $MFT inode and hence ntfs_write_inode() will not be
+ * re-invoked because of it which in turn is ok since the dirtied mft
+ * record will be cleaned and written out to disk below, i.e. before
+ * this function returns.
+ */
+ if (modified)
+ NInoSetDirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+
+ return err;
+}
+
+/**
+ * ntfs_inode_sync_filename - update FILE_NAME attributes
+ * @ni: ntfs inode to update FILE_NAME attributes
+ *
+ * Update all FILE_NAME attributes for inode @ni in the index.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_sync_filename(struct ntfs_inode *ni)
+{
+ struct inode *index_vi;
+ struct super_block *sb = VFS_I(ni)->i_sb;
+ struct ntfs_attr_search_ctx *ctx = NULL;
+ struct ntfs_index_context *ictx;
+ struct ntfs_inode *index_ni;
+ struct file_name_attr *fn;
+ struct file_name_attr *fnx;
+ struct reparse_point *rpp;
+ __le32 reparse_tag;
+ int err = 0;
+ unsigned long flags;
+
+ ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx)
+ return -ENOMEM;
+
+ /* Collect the reparse tag, if any */
+ reparse_tag = cpu_to_le32(0);
+ if (ni->flags & FILE_ATTR_REPARSE_POINT) {
+ if (!ntfs_attr_lookup(AT_REPARSE_POINT, NULL,
+ 0, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+ rpp = (struct reparse_point *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ reparse_tag = rpp->reparse_tag;
+ }
+ ntfs_attr_reinit_search_ctx(ctx);
+ }
+
+ /* Walk through all FILE_NAME attributes and update them. */
+ while (!(err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0, ctx))) {
+ fn = (struct file_name_attr *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ if (MREF_LE(fn->parent_directory) == ni->mft_no)
+ continue;
+
+ index_vi = ntfs_iget(sb, MREF_LE(fn->parent_directory));
+ if (IS_ERR(index_vi)) {
+ ntfs_error(sb, "Failed to open inode %lld with index",
+ (long long)MREF_LE(fn->parent_directory));
+ continue;
+ }
+
+ index_ni = NTFS_I(index_vi);
+
+ mutex_lock_nested(&index_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+ if (NInoBeingDeleted(ni)) {
+ iput(index_vi);
+ mutex_unlock(&index_ni->mrec_lock);
+ continue;
+ }
+
+ ictx = ntfs_index_ctx_get(index_ni, I30, 4);
+ if (!ictx) {
+ ntfs_error(sb, "Failed to get index ctx, inode %lld",
+ (long long)index_ni->mft_no);
+ iput(index_vi);
+ mutex_unlock(&index_ni->mrec_lock);
+ continue;
+ }
+
+ err = ntfs_index_lookup(fn, sizeof(struct file_name_attr), ictx);
+ if (err) {
+ ntfs_debug("Index lookup failed, inode %lld",
+ (long long)index_ni->mft_no);
+ ntfs_index_ctx_put(ictx);
+ iput(index_vi);
+ mutex_unlock(&index_ni->mrec_lock);
+ continue;
+ }
+ /* Update flags and file size. */
+ fnx = (struct file_name_attr *)ictx->data;
+ fnx->file_attributes =
+ (fnx->file_attributes & ~FILE_ATTR_VALID_FLAGS) |
+ (ni->flags & FILE_ATTR_VALID_FLAGS);
+ if (ctx->mrec->flags & MFT_RECORD_IS_DIRECTORY)
+ fnx->data_size = fnx->allocated_size = 0;
+ else {
+ read_lock_irqsave(&ni->size_lock, flags);
+ if (NInoSparse(ni) || NInoCompressed(ni))
+ fnx->allocated_size = cpu_to_le64(ni->itype.compressed.size);
+ else
+ fnx->allocated_size = cpu_to_le64(ni->allocated_size);
+ fnx->data_size = cpu_to_le64(ni->data_size);
+
+ /*
+ * The file name record has also to be fixed if some
+ * attribute update implied the unnamed data to be
+ * made non-resident
+ */
+ fn->allocated_size = fnx->allocated_size;
+ fn->data_size = fnx->data_size;
+ read_unlock_irqrestore(&ni->size_lock, flags);
+ }
+
+ /* update or clear the reparse tag in the index */
+ fnx->type.rp.reparse_point_tag = reparse_tag;
+ fnx->creation_time = fn->creation_time;
+ fnx->last_data_change_time = fn->last_data_change_time;
+ fnx->last_mft_change_time = fn->last_mft_change_time;
+ fnx->last_access_time = fn->last_access_time;
+ ntfs_index_entry_mark_dirty(ictx);
+ ntfs_icx_ib_sync_write(ictx);
+ NInoSetDirty(ctx->ntfs_ino);
+ ntfs_index_ctx_put(ictx);
+ mutex_unlock(&index_ni->mrec_lock);
+ iput(index_vi);
+ }
+ /* Check for real error occurred. */
+ if (err != -ENOENT) {
+ ntfs_error(sb, "Attribute lookup failed, err : %d, inode %lld", err,
+ (long long)ni->mft_no);
+ } else
+ err = 0;
+
+ ntfs_attr_put_search_ctx(ctx);
+ return err;
+}
+
+/**
+ * __ntfs_write_inode - write out a dirty inode
+ * @vi: inode to write out
+ * @sync: if true, write out synchronously
+ *
+ * Write out a dirty inode to disk including any extent inodes if present.
+ *
+ * If @sync is true, commit the inode to disk and wait for io completion. This
+ * is done using write_mft_record().
+ *
+ * If @sync is false, just schedule the write to happen but do not wait for i/o
+ * completion.
+ *
+ * Return 0 on success and -errno on error.
+ */
+int __ntfs_write_inode(struct inode *vi, int sync)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct mft_record *m;
+ int err = 0;
+ bool need_iput = false;
+
+ ntfs_debug("Entering for %sinode 0x%lx.", NInoAttr(ni) ? "attr " : "",
+ vi->i_ino);
+
+ if (NVolShutdown(ni->vol))
+ return -EIO;
+
+ /*
+ * Dirty attribute inodes are written via their real inodes so just
+ * clean them here. Access time updates are taken care off when the
+ * real inode is written.
+ */
+ if (NInoAttr(ni) || ni->nr_extents == -1) {
+ NInoClearDirty(ni);
+ ntfs_debug("Done.");
+ return 0;
+ }
+
+ /* igrab prevents vi from being evicted while mrec_lock is hold. */
+ if (igrab(vi) != NULL)
+ need_iput = true;
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ /* Map, pin, and lock the mft record belonging to the inode. */
+ m = map_mft_record(ni);
+ if (IS_ERR(m)) {
+ mutex_unlock(&ni->mrec_lock);
+ err = PTR_ERR(m);
+ goto err_out;
+ }
+
+ if (NInoNonResident(ni) && NInoRunlistDirty(ni)) {
+ BUG_ON(!NInoFullyMapped(ni));
+ down_write(&ni->runlist.lock);
+ err = ntfs_attr_update_mapping_pairs(ni, 0);
+ if (!err)
+ NInoClearRunlistDirty(ni);
+ up_write(&ni->runlist.lock);
+ }
+
+ err = ntfs_inode_sync_standard_information(vi, m);
+ if (err)
+ goto unm_err_out;
+
+ /*
+ * when being umounted and inodes are evicted, write_inode()
+ * is called with all inodes being marked with I_FREEING.
+ * then ntfs_inode_sync_filename() waits infinitly because
+ * of ntfs_iget. This situation happens only where sync_filesysem()
+ * from umount fails because of a disk unplug and etc.
+ * the absent of SB_ACTIVE means umounting.
+ */
+ if ((vi->i_sb->s_flags & SB_ACTIVE) && NInoTestClearFileNameDirty(ni))
+ ntfs_inode_sync_filename(ni);
+
+ /* Now the access times are updated, write the base mft record. */
+ if (NInoDirty(ni)) {
+ err = write_mft_record(ni, m, sync);
+ if (err)
+ ntfs_error(vi->i_sb, "write_mft_record failed, err : %d\n", err);
+ }
+ unmap_mft_record(ni);
+
+ /* Write all attached extent mft records. */
+ mutex_lock(&ni->extent_lock);
+ if (ni->nr_extents > 0) {
+ struct ntfs_inode **extent_nis = ni->ext.extent_ntfs_inos;
+ int i;
+
+ ntfs_debug("Writing %i extent inodes.", ni->nr_extents);
+ for (i = 0; i < ni->nr_extents; i++) {
+ struct ntfs_inode *tni = extent_nis[i];
+
+ if (NInoDirty(tni)) {
+ struct mft_record *tm;
+ int ret;
+
+ mutex_lock(&tni->mrec_lock);
+ tm = map_mft_record(tni);
+ if (IS_ERR(tm)) {
+ mutex_unlock(&tni->mrec_lock);
+ if (!err || err == -ENOMEM)
+ err = PTR_ERR(tm);
+ continue;
+ }
+
+ ret = write_mft_record(tni, tm, sync);
+ unmap_mft_record(tni);
+ mutex_unlock(&tni->mrec_lock);
+
+ if (unlikely(ret)) {
+ if (!err || err == -ENOMEM)
+ err = ret;
+ }
+ }
+ }
+ }
+ mutex_unlock(&ni->extent_lock);
+ mutex_unlock(&ni->mrec_lock);
+
+ if (unlikely(err))
+ goto err_out;
+ if (need_iput)
+ iput(vi);
+ ntfs_debug("Done.");
+ return 0;
+unm_err_out:
+ unmap_mft_record(ni);
+ mutex_unlock(&ni->mrec_lock);
+err_out:
+ if (err == -ENOMEM)
+ mark_inode_dirty(vi);
+ else {
+ ntfs_error(vi->i_sb, "Failed (error %i): Run chkdsk.", -err);
+ NVolSetErrors(ni->vol);
+ }
+ if (need_iput)
+ iput(vi);
+ return err;
+}
+
+/**
+ * ntfs_extent_inode_open - load an extent inode and attach it to its base
+ * @base_ni: base ntfs inode
+ * @mref: mft reference of the extent inode to load (in little endian)
+ *
+ * First check if the extent inode @mref is already attached to the base ntfs
+ * inode @base_ni, and if so, return a pointer to the attached extent inode.
+ *
+ * If the extent inode is not already attached to the base inode, allocate an
+ * ntfs_inode structure and initialize it for the given inode @mref. @mref
+ * specifies the inode number / mft record to read, including the sequence
+ * number, which can be 0 if no sequence number checking is to be performed.
+ *
+ * Then, allocate a buffer for the mft record, read the mft record from the
+ * volume @base_ni->vol, and attach it to the ntfs_inode structure (->mrec).
+ * The mft record is mst deprotected and sanity checked for validity and we
+ * abort if deprotection or checks fail.
+ *
+ * Finally attach the ntfs inode to its base inode @base_ni and return a
+ * pointer to the ntfs_inode structure on success or NULL on error, with errno
+ * set to the error code.
+ *
+ * Note, extent inodes are never closed directly. They are automatically
+ * disposed off by the closing of the base inode.
+ */
+static struct ntfs_inode *ntfs_extent_inode_open(struct ntfs_inode *base_ni,
+ const __le64 mref)
+{
+ u64 mft_no = MREF_LE(mref);
+ struct ntfs_inode *ni = NULL;
+ struct ntfs_inode **extent_nis;
+ int i;
+ struct mft_record *ni_mrec;
+ struct super_block *sb;
+
+ if (!base_ni)
+ return NULL;
+
+ sb = base_ni->vol->sb;
+ ntfs_debug("Opening extent inode %lld (base mft record %lld).\n",
+ (unsigned long long)mft_no,
+ (unsigned long long)base_ni->mft_no);
+
+ /* Is the extent inode already open and attached to the base inode? */
+ if (base_ni->nr_extents > 0) {
+ extent_nis = base_ni->ext.extent_ntfs_inos;
+ for (i = 0; i < base_ni->nr_extents; i++) {
+ u16 seq_no;
+
+ ni = extent_nis[i];
+ if (mft_no != ni->mft_no)
+ continue;
+ ni_mrec = map_mft_record(ni);
+ if (IS_ERR(ni_mrec)) {
+ ntfs_error(sb, "failed to map mft record for %lu",
+ ni->mft_no);
+ goto out;
+ }
+ /* Verify the sequence number if given. */
+ seq_no = MSEQNO_LE(mref);
+ if (seq_no &&
+ seq_no != le16_to_cpu(ni_mrec->sequence_number)) {
+ ntfs_error(sb, "Found stale extent mft reference mft=%lld",
+ (long long)ni->mft_no);
+ unmap_mft_record(ni);
+ goto out;
+ }
+ unmap_mft_record(ni);
+ goto out;
+ }
+ }
+ /* Wasn't there, we need to load the extent inode. */
+ ni = ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+ if (!ni)
+ goto out;
+
+ ni->seq_no = (u16)MSEQNO_LE(mref);
+ ni->nr_extents = -1;
+ ni->ext.base_ntfs_ino = base_ni;
+ /* Attach extent inode to base inode, reallocating memory if needed. */
+ if (!(base_ni->nr_extents & 3)) {
+ i = (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+ extent_nis = ntfs_malloc_nofs(i);
+ if (!extent_nis)
+ goto err_out;
+ if (base_ni->nr_extents) {
+ memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
+ i - 4 * sizeof(struct ntfs_inode *));
+ ntfs_free(base_ni->ext.extent_ntfs_inos);
+ }
+ base_ni->ext.extent_ntfs_inos = extent_nis;
+ }
+ base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
+
+out:
+ ntfs_debug("\n");
+ return ni;
+err_out:
+ ntfs_destroy_ext_inode(ni);
+ ni = NULL;
+ goto out;
+}
+
+/**
+ * ntfs_inode_attach_all_extents - attach all extents for target inode
+ * @ni: opened ntfs inode for which perform attach
+ *
+ * Return 0 on success and error.
+ */
+int ntfs_inode_attach_all_extents(struct ntfs_inode *ni)
+{
+ struct attr_list_entry *ale;
+ u64 prev_attached = 0;
+
+ if (!ni) {
+ ntfs_debug("Invalid arguments.\n");
+ return -EINVAL;
+ }
+
+ if (NInoAttr(ni))
+ ni = ni->ext.base_ntfs_ino;
+
+ ntfs_debug("Entering for inode 0x%llx.\n", (long long) ni->mft_no);
+
+ /* Inode haven't got attribute list, thus nothing to attach. */
+ if (!NInoAttrList(ni))
+ return 0;
+
+ if (!ni->attr_list) {
+ ntfs_debug("Corrupt in-memory struct.\n");
+ return -EINVAL;
+ }
+
+ /* Walk through attribute list and attach all extents. */
+ ale = (struct attr_list_entry *)ni->attr_list;
+ while ((u8 *)ale < ni->attr_list + ni->attr_list_size) {
+ if (ni->mft_no != MREF_LE(ale->mft_reference) &&
+ prev_attached != MREF_LE(ale->mft_reference)) {
+ if (!ntfs_extent_inode_open(ni, ale->mft_reference)) {
+ ntfs_debug("Couldn't attach extent inode.\n");
+ return -1;
+ }
+ prev_attached = MREF_LE(ale->mft_reference);
+ }
+ ale = (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+ }
+ return 0;
+}
+
+/**
+ * ntfs_inode_add_attrlist - add attribute list to inode and fill it
+ * @ni: opened ntfs inode to which add attribute list
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_add_attrlist(struct ntfs_inode *ni)
+{
+ int err;
+ struct ntfs_attr_search_ctx *ctx;
+ u8 *al = NULL, *aln;
+ int al_len = 0;
+ struct attr_list_entry *ale = NULL;
+ struct mft_record *ni_mrec;
+ u32 attr_al_len;
+
+ if (!ni)
+ return -EINVAL;
+
+ ntfs_debug("inode %llu\n", (unsigned long long) ni->mft_no);
+
+ if (NInoAttrList(ni) || ni->nr_extents) {
+ ntfs_error(ni->vol->sb, "Inode already has attribute list");
+ return -EEXIST;
+ }
+
+ ni_mrec = map_mft_record(ni);
+ if (IS_ERR(ni_mrec))
+ return -EIO;
+
+ /* Form attribute list. */
+ ctx = ntfs_attr_get_search_ctx(ni, ni_mrec);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ /* Walk through all attributes. */
+ while (!(err = ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx))) {
+ int ale_size;
+
+ if (ctx->attr->type == AT_ATTRIBUTE_LIST) {
+ err = -EIO;
+ ntfs_error(ni->vol->sb, "Attribute list already present");
+ goto put_err_out;
+ }
+
+ ale_size = (sizeof(struct attr_list_entry) + sizeof(__le16) *
+ ctx->attr->name_length + 7) & ~7;
+ al_len += ale_size;
+
+ aln = ntfs_realloc_nofs(al, al_len, al_len-ale_size);
+ if (!aln) {
+ err = -ENOMEM;
+ ntfs_error(ni->vol->sb, "Failed to realloc %d bytes", al_len);
+ goto put_err_out;
+ }
+ ale = (struct attr_list_entry *)(aln + ((u8 *)ale - al));
+ al = aln;
+
+ memset(ale, 0, ale_size);
+
+ /* Add attribute to attribute list. */
+ ale->type = ctx->attr->type;
+ ale->length = cpu_to_le16((sizeof(struct attr_list_entry) +
+ sizeof(__le16) * ctx->attr->name_length + 7) & ~7);
+ ale->name_length = ctx->attr->name_length;
+ ale->name_offset = (u8 *)ale->name - (u8 *)ale;
+ if (ctx->attr->non_resident)
+ ale->lowest_vcn =
+ ctx->attr->data.non_resident.lowest_vcn;
+ else
+ ale->lowest_vcn = 0;
+ ale->mft_reference = MK_LE_MREF(ni->mft_no,
+ le16_to_cpu(ni_mrec->sequence_number));
+ ale->instance = ctx->attr->instance;
+ memcpy(ale->name, (u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->name_offset),
+ ctx->attr->name_length * sizeof(__le16));
+ ale = (struct attr_list_entry *)(al + al_len);
+ }
+
+ /* Check for real error occurred. */
+ if (err != -ENOENT) {
+ ntfs_error(ni->vol->sb, "%s: Attribute lookup failed, inode %lld",
+ __func__, (long long)ni->mft_no);
+ goto put_err_out;
+ }
+
+ /* Set in-memory attribute list. */
+ ni->attr_list = al;
+ ni->attr_list_size = al_len;
+ NInoSetAttrList(ni);
+
+ attr_al_len = offsetof(struct attr_record, data.resident.reserved) + 1 +
+ ((al_len + 7) & ~7);
+ /* Free space if there is not enough it for $ATTRIBUTE_LIST. */
+ if (le32_to_cpu(ni_mrec->bytes_allocated) -
+ le32_to_cpu(ni_mrec->bytes_in_use) < attr_al_len) {
+ if (ntfs_inode_free_space(ni, (int)attr_al_len)) {
+ /* Failed to free space. */
+ err = -ENOSPC;
+ ntfs_error(ni->vol->sb, "Failed to free space for attrlist");
+ goto rollback;
+ }
+ }
+
+ /* Add $ATTRIBUTE_LIST to mft record. */
+ err = ntfs_resident_attr_record_add(ni, AT_ATTRIBUTE_LIST, AT_UNNAMED, 0,
+ NULL, al_len, 0);
+ if (err < 0) {
+ ntfs_error(ni->vol->sb, "Couldn't add $ATTRIBUTE_LIST to MFT");
+ goto rollback;
+ }
+
+ err = ntfs_attrlist_update(ni);
+ if (err < 0)
+ goto remove_attrlist_record;
+
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+ return 0;
+
+remove_attrlist_record:
+ /* Prevent ntfs_attr_recorm_rm from freeing attribute list. */
+ ni->attr_list = NULL;
+ NInoClearAttrList(ni);
+ /* Remove $ATTRIBUTE_LIST record. */
+ ntfs_attr_reinit_search_ctx(ctx);
+ if (!ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0,
+ CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+ if (ntfs_attr_record_rm(ctx))
+ ntfs_error(ni->vol->sb, "Rollback failed to remove attrlist");
+ } else {
+ ntfs_error(ni->vol->sb, "Rollback failed to find attrlist");
+ }
+
+ /* Setup back in-memory runlist. */
+ ni->attr_list = al;
+ ni->attr_list_size = al_len;
+ NInoSetAttrList(ni);
+rollback:
+ /*
+ * Scan attribute list for attributes that placed not in the base MFT
+ * record and move them to it.
+ */
+ ntfs_attr_reinit_search_ctx(ctx);
+ ale = (struct attr_list_entry *)al;
+ while ((u8 *)ale < al + al_len) {
+ if (MREF_LE(ale->mft_reference) != ni->mft_no) {
+ if (!ntfs_attr_lookup(ale->type, ale->name,
+ ale->name_length,
+ CASE_SENSITIVE,
+ le64_to_cpu(ale->lowest_vcn),
+ NULL, 0, ctx)) {
+ if (ntfs_attr_record_move_to(ctx, ni))
+ ntfs_error(ni->vol->sb,
+ "Rollback failed to move attribute");
+ } else {
+ ntfs_error(ni->vol->sb, "Rollback failed to find attr");
+ }
+ ntfs_attr_reinit_search_ctx(ctx);
+ }
+ ale = (struct attr_list_entry *)((u8 *)ale + le16_to_cpu(ale->length));
+ }
+
+ /* Remove in-memory attribute list. */
+ ni->attr_list = NULL;
+ ni->attr_list_size = 0;
+ NInoClearAttrList(ni);
+ NInoClearAttrListDirty(ni);
+put_err_out:
+ ntfs_attr_put_search_ctx(ctx);
+err_out:
+ ntfs_free(al);
+ unmap_mft_record(ni);
+ return err;
+}
+
+/**
+ * ntfs_inode_close - close an ntfs inode and free all associated memory
+ * @ni: ntfs inode to close
+ *
+ * Make sure the ntfs inode @ni is clean.
+ *
+ * If the ntfs inode @ni is a base inode, close all associated extent inodes,
+ * then deallocate all memory attached to it, and finally free the ntfs inode
+ * structure itself.
+ *
+ * If it is an extent inode, we disconnect it from its base inode before we
+ * destroy it.
+ *
+ * It is OK to pass NULL to this function, it is just noop in this case.
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_close(struct ntfs_inode *ni)
+{
+ int err = -1;
+ struct ntfs_inode **tmp_nis;
+ struct ntfs_inode *base_ni;
+ s32 i;
+
+ if (!ni)
+ return 0;
+
+ ntfs_debug("Entering for inode %lld\n", (long long)ni->mft_no);
+
+ /* Is this a base inode with mapped extent inodes? */
+ /*
+ * If the inode is an extent inode, disconnect it from the
+ * base inode before destroying it.
+ */
+ base_ni = ni->ext.base_ntfs_ino;
+ for (i = 0; i < base_ni->nr_extents; ++i) {
+ tmp_nis = base_ni->ext.extent_ntfs_inos;
+ if (tmp_nis[i] != ni)
+ continue;
+ /* Found it. Disconnect. */
+ memmove(tmp_nis + i, tmp_nis + i + 1,
+ (base_ni->nr_extents - i - 1) *
+ sizeof(struct ntfs_inode *));
+ /* Buffer should be for multiple of four extents. */
+ if ((--base_ni->nr_extents) & 3) {
+ i = -1;
+ break;
+ }
+ /*
+ * ElectricFence is unhappy with realloc(x,0) as free(x)
+ * thus we explicitly separate these two cases.
+ */
+ if (base_ni->nr_extents) {
+ /* Resize the memory buffer. */
+ tmp_nis = ntfs_realloc_nofs(tmp_nis, base_ni->nr_extents *
+ sizeof(struct ntfs_inode *), base_ni->nr_extents *
+ sizeof(struct ntfs_inode *));
+ /* Ignore errors, they don't really matter. */
+ if (tmp_nis)
+ base_ni->ext.extent_ntfs_inos = tmp_nis;
+ } else if (tmp_nis) {
+ ntfs_free(tmp_nis);
+ base_ni->ext.extent_ntfs_inos = NULL;
+ }
+ /* Allow for error checking. */
+ i = -1;
+ break;
+ }
+
+ if (NInoDirty(ni))
+ ntfs_error(ni->vol->sb, "Releasing dirty inode %lld!\n",
+ (long long)ni->mft_no);
+ if (NInoAttrList(ni) && ni->attr_list)
+ ntfs_free(ni->attr_list);
+ ntfs_destroy_ext_inode(ni);
+ err = 0;
+ ntfs_debug("\n");
+ return err;
+}
+
+void ntfs_destroy_ext_inode(struct ntfs_inode *ni)
+{
+ ntfs_debug("Entering.");
+ if (ni == NULL)
+ return;
+
+ ntfs_attr_close(ni);
+
+ if (NInoDirty(ni))
+ ntfs_error(ni->vol->sb, "Releasing dirty ext inode %lld!\n",
+ (long long)ni->mft_no);
+ if (NInoAttrList(ni) && ni->attr_list)
+ ntfs_free(ni->attr_list);
+ kfree(ni->mrec);
+ kmem_cache_free(ntfs_inode_cache, ni);
+}
+
+static struct ntfs_inode *ntfs_inode_base(struct ntfs_inode *ni)
+{
+ if (ni->nr_extents == -1)
+ return ni->ext.base_ntfs_ino;
+ return ni;
+}
+
+static int ntfs_attr_position(__le32 type, struct ntfs_attr_search_ctx *ctx)
+{
+ int err;
+
+ err = ntfs_attr_lookup(type, NULL, 0, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (err) {
+ __le32 atype;
+
+ if (err != -ENOENT)
+ return err;
+
+ atype = ctx->attr->type;
+ if (atype == AT_END)
+ return -ENOSPC;
+
+ /*
+ * if ntfs_external_attr_lookup return -ENOENT, ctx->al_entry
+ * could point to an attribute in an extent mft record, but
+ * ctx->attr and ctx->ntfs_ino always points to an attibute in
+ * a base mft record.
+ */
+ if (ctx->al_entry &&
+ MREF_LE(ctx->al_entry->mft_reference) != ctx->ntfs_ino->mft_no) {
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_lookup(atype, NULL, 0, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+/**
+ * ntfs_inode_free_space - free space in the MFT record of inode
+ * @ni: ntfs inode in which MFT record free space
+ * @size: amount of space needed to free
+ *
+ * Return 0 on success or error.
+ */
+int ntfs_inode_free_space(struct ntfs_inode *ni, int size)
+{
+ struct ntfs_attr_search_ctx *ctx;
+ int freed, err;
+ struct mft_record *ni_mrec;
+ struct super_block *sb;
+
+ if (!ni || size < 0)
+ return -EINVAL;
+ ntfs_debug("Entering for inode %lld, size %d\n",
+ (unsigned long long)ni->mft_no, size);
+
+ sb = ni->vol->sb;
+ ni_mrec = map_mft_record(ni);
+ if (IS_ERR(ni_mrec))
+ return -EIO;
+
+ freed = (le32_to_cpu(ni_mrec->bytes_allocated) -
+ le32_to_cpu(ni_mrec->bytes_in_use));
+
+ unmap_mft_record(ni);
+
+ if (size <= freed)
+ return 0;
+
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx) {
+ ntfs_error(sb, "%s, Failed to get search context", __func__);
+ return -ENOMEM;
+ }
+
+ /*
+ * Chkdsk complain if $STANDARD_INFORMATION is not in the base MFT
+ * record.
+ *
+ * Also we can't move $ATTRIBUTE_LIST from base MFT_RECORD, so position
+ * search context on first attribute after $STANDARD_INFORMATION and
+ * $ATTRIBUTE_LIST.
+ *
+ * Why we reposition instead of simply skip this attributes during
+ * enumeration? Because in case we have got only in-memory attribute
+ * list ntfs_attr_lookup will fail when it will try to find
+ * $ATTRIBUTE_LIST.
+ */
+ err = ntfs_attr_position(AT_FILE_NAME, ctx);
+ if (err)
+ goto put_err_out;
+
+ while (1) {
+ int record_size;
+
+ /*
+ * Check whether attribute is from different MFT record. If so,
+ * find next, because we don't need such.
+ */
+ while (ctx->ntfs_ino->mft_no != ni->mft_no) {
+retry:
+ err = ntfs_attr_lookup(AT_UNUSED, NULL, 0, CASE_SENSITIVE,
+ 0, NULL, 0, ctx);
+ if (err) {
+ if (err != -ENOENT)
+ ntfs_error(sb, "Attr lookup failed #2");
+ else
+ err = -ENOSPC;
+ goto put_err_out;
+ }
+ }
+
+ if (ntfs_inode_base(ctx->ntfs_ino)->mft_no == FILE_MFT &&
+ ctx->attr->type == AT_DATA)
+ goto retry;
+
+ if (ctx->attr->type == AT_INDEX_ROOT)
+ goto retry;
+
+ record_size = le32_to_cpu(ctx->attr->length);
+
+ /* Move away attribute. */
+ err = ntfs_attr_record_move_away(ctx, 0);
+ if (err) {
+ ntfs_error(sb, "Failed to move out attribute #2");
+ break;
+ }
+ freed += record_size;
+
+ /* Check whether we done. */
+ if (size <= freed) {
+ ntfs_attr_put_search_ctx(ctx);
+ return 0;
+ }
+
+ /*
+ * Reposition to first attribute after $STANDARD_INFORMATION and
+ * $ATTRIBUTE_LIST (see comments upwards).
+ */
+ ntfs_attr_reinit_search_ctx(ctx);
+ err = ntfs_attr_position(AT_FILE_NAME, ctx);
+ if (err)
+ break;
+ }
+put_err_out:
+ ntfs_attr_put_search_ctx(ctx);
+ if (err == -ENOSPC)
+ ntfs_debug("No attributes left that can be moved out.\n");
+ return err;
+}
+
+s64 ntfs_inode_attr_pread(struct inode *vi, s64 pos, s64 count, u8 *buf)
+{
+ struct address_space *mapping = vi->i_mapping;
+ struct folio *folio;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ s64 isize;
+ u32 attr_len, total = 0, offset, index;
+ int err = 0;
+
+ BUG_ON(!ni);
+ BUG_ON(!NInoAttr(ni));
+ if (!count)
+ return 0;
+
+ mutex_lock(&ni->mrec_lock);
+ isize = i_size_read(vi);
+ if (pos > isize) {
+ mutex_unlock(&ni->mrec_lock);
+ return -EINVAL;
+ }
+ if (pos + count > isize)
+ count = isize - pos;
+
+ if (!NInoNonResident(ni)) {
+ struct ntfs_attr_search_ctx *ctx;
+ u8 *attr;
+
+ ctx = ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+ if (!ctx) {
+ ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+ err = -ENOMEM;
+ mutex_unlock(&ni->mrec_lock);
+ goto out;
+ }
+
+ err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+ 0, NULL, 0, ctx);
+ if (err) {
+ ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+ ntfs_attr_put_search_ctx(ctx);
+ mutex_unlock(&ni->mrec_lock);
+ goto out;
+ }
+
+ attr = (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_offset);
+ memcpy(buf, (u8 *)attr + pos, count);
+ ntfs_attr_put_search_ctx(ctx);
+ mutex_unlock(&ni->mrec_lock);
+ return count;
+ }
+ mutex_unlock(&ni->mrec_lock);
+
+ index = (u32)(pos / PAGE_SIZE);
+ do {
+ /* Update @index and get the next folio. */
+ folio = ntfs_read_mapping_folio(mapping, index);
+ if (IS_ERR(folio))
+ break;
+
+ offset = offset_in_folio(folio, pos);
+ attr_len = min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+ folio_lock(folio);
+ memcpy_from_folio(buf, folio, offset, attr_len);
+ folio_unlock(folio);
+ folio_put(folio);
+
+ total += attr_len;
+ buf += attr_len;
+ pos += attr_len;
+ count -= attr_len;
+ index++;
+ } while (count);
+out:
+ return err ? (s64)err : total;
+}
+
+static inline int ntfs_enlarge_attribute(struct inode *vi, s64 pos, s64 count,
+ struct ntfs_attr_search_ctx *ctx)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct super_block *sb = vi->i_sb;
+ int ret;
+
+ if (pos + count <= ni->initialized_size)
+ return 0;
+
+ if (NInoEncrypted(ni) && NInoNonResident(ni))
+ return -EACCES;
+
+ if (NInoCompressed(ni))
+ return -EOPNOTSUPP;
+
+ if (pos + count > ni->data_size) {
+ if (ntfs_attr_truncate(ni, pos + count)) {
+ ntfs_debug("Failed to truncate attribute");
+ return -1;
+ }
+
+ ntfs_attr_reinit_search_ctx(ctx);
+ ret = ntfs_attr_lookup(ni->type,
+ ni->name, ni->name_len, CASE_SENSITIVE,
+ 0, NULL, 0, ctx);
+ if (ret) {
+ ntfs_error(sb, "Failed to look up attr %#x", ni->type);
+ return ret;
+ }
+ }
+
+ if (!NInoNonResident(ni)) {
+ if (likely(i_size_read(vi) < ni->data_size))
+ i_size_write(vi, ni->data_size);
+ return 0;
+ }
+
+ if (pos + count > ni->initialized_size) {
+ ctx->attr->data.non_resident.initialized_size = cpu_to_le64(pos + count);
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ni->initialized_size = pos + count;
+ if (i_size_read(vi) < ni->initialized_size)
+ i_size_write(vi, ni->initialized_size);
+ }
+ return 0;
+}
+
+static s64 __ntfs_inode_resident_attr_pwrite(struct inode *vi,
+ s64 pos, s64 count, u8 *buf,
+ struct ntfs_attr_search_ctx *ctx)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct folio *folio;
+ struct address_space *mapping = vi->i_mapping;
+ u8 *addr;
+ int err = 0;
+
+ BUG_ON(NInoNonResident(ni));
+ if (pos + count > PAGE_SIZE) {
+ ntfs_error(vi->i_sb, "Out of write into resident attr %#x", ni->type);
+ return -EINVAL;
+ }
+
+ /* Copy to mft record page */
+ addr = (u8 *)ctx->attr + le16_to_cpu(ctx->attr->data.resident.value_offset);
+ memcpy(addr + pos, buf, count);
+ mark_mft_record_dirty(ctx->ntfs_ino);
+
+ /* Keep the first page clean and uptodate */
+ folio = __filemap_get_folio(mapping, 0, FGP_WRITEBEGIN | FGP_NOFS,
+ mapping_gfp_mask(mapping));
+ if (IS_ERR(folio)) {
+ err = PTR_ERR(folio);
+ ntfs_error(vi->i_sb, "Failed to read a page 0 for attr %#x: %d",
+ ni->type, err);
+ goto out;
+ }
+ if (!folio_test_uptodate(folio)) {
+ u32 len = le32_to_cpu(ctx->attr->data.resident.value_length);
+
+ memcpy_to_folio(folio, 0, addr, len);
+ folio_zero_segment(folio, offset_in_folio(folio, len),
+ folio_size(folio) - len);
+ } else {
+ memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, count);
+ }
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+out:
+ return err ? err : count;
+}
+
+static s64 __ntfs_inode_non_resident_attr_pwrite(struct inode *vi,
+ s64 pos, s64 count, u8 *buf,
+ struct ntfs_attr_search_ctx *ctx,
+ bool sync)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct address_space *mapping = vi->i_mapping;
+ struct folio *folio;
+ pgoff_t index;
+ unsigned long offset, length;
+ size_t attr_len;
+ s64 ret = 0, written = 0;
+
+ BUG_ON(!NInoNonResident(ni));
+
+ index = pos >> PAGE_SHIFT;
+ while (count) {
+ folio = ntfs_read_mapping_folio(mapping, index);
+ if (IS_ERR(folio)) {
+ ret = PTR_ERR(folio);
+ ntfs_error(vi->i_sb, "Failed to read a page %lu for attr %#x: %ld",
+ index, ni->type, PTR_ERR(folio));
+ break;
+ }
+
+ folio_lock(folio);
+ offset = offset_in_folio(folio, pos);
+ attr_len = min_t(size_t, (size_t)count, folio_size(folio) - offset);
+
+ memcpy_to_folio(folio, offset, buf, attr_len);
+
+ if (sync) {
+ struct ntfs_volume *vol = ni->vol;
+ s64 lcn, lcn_count;
+ unsigned int lcn_folio_off = 0;
+ struct bio *bio;
+ u64 rl_length = 0;
+ s64 vcn;
+ struct runlist_element *rl;
+
+ lcn_count = max_t(s64, 1, attr_len >> vol->cluster_size_bits);
+ vcn = (s64)folio->index << PAGE_SHIFT >> vol->cluster_size_bits;
+
+ do {
+ down_write(&ni->runlist.lock);
+ rl = ntfs_attr_vcn_to_rl(ni, vcn, &lcn);
+ if (IS_ERR(rl)) {
+ ret = PTR_ERR(rl);
+ up_write(&ni->runlist.lock);
+ goto err_unlock_folio;
+ }
+
+ rl_length = rl->length - (vcn - rl->vcn);
+ if (rl_length < lcn_count) {
+ lcn_count -= rl_length;
+ } else {
+ rl_length = lcn_count;
+ lcn_count = 0;
+ }
+ up_write(&ni->runlist.lock);
+
+ if (vol->cluster_size_bits > PAGE_SHIFT) {
+ lcn_folio_off = folio->index << PAGE_SHIFT;
+ lcn_folio_off &= vol->cluster_size_mask;
+ }
+
+ bio = ntfs_setup_bio(vol, REQ_OP_WRITE, lcn,
+ lcn_folio_off);
+ if (!bio) {
+ ret = -ENOMEM;
+ goto err_unlock_folio;
+ }
+
+ length = min_t(unsigned long,
+ rl_length << vol->cluster_size_bits,
+ folio_size(folio));
+ if (!bio_add_folio(bio, folio, length, offset)) {
+ ret = -EIO;
+ bio_put(bio);
+ goto err_unlock_folio;
+ }
+
+ submit_bio_wait(bio);
+ bio_put(bio);
+ vcn += rl_length;
+ offset += length;
+ } while (lcn_count != 0);
+
+ folio_mark_uptodate(folio);
+ } else
+ folio_mark_dirty(folio);
+err_unlock_folio:
+ folio_unlock(folio);
+ folio_put(folio);
+
+ if (ret)
+ break;
+
+ written += attr_len;
+ buf += attr_len;
+ pos += attr_len;
+ count -= attr_len;
+ index++;
+
+ cond_resched();
+ }
+
+ return ret ? ret : written;
+}
+
+s64 ntfs_inode_attr_pwrite(struct inode *vi, s64 pos, s64 count, u8 *buf, bool sync)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_attr_search_ctx *ctx;
+ s64 ret;
+
+ BUG_ON(!NInoAttr(ni));
+
+ ctx = ntfs_attr_get_search_ctx(ni->ext.base_ntfs_ino, NULL);
+ if (!ctx) {
+ ntfs_error(vi->i_sb, "Failed to get attr search ctx");
+ return -ENOMEM;
+ }
+
+ ret = ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
+ 0, NULL, 0, ctx);
+ if (ret) {
+ ntfs_attr_put_search_ctx(ctx);
+ ntfs_error(vi->i_sb, "Failed to look up attr %#x", ni->type);
+ return ret;
+ }
+
+ mutex_lock(&ni->mrec_lock);
+ ret = ntfs_enlarge_attribute(vi, pos, count, ctx);
+ mutex_unlock(&ni->mrec_lock);
+ if (ret)
+ goto out;
+
+ if (NInoNonResident(ni))
+ ret = __ntfs_inode_non_resident_attr_pwrite(vi, pos, count, buf, ctx, sync);
+ else
+ ret = __ntfs_inode_resident_attr_pwrite(vi, pos, count, buf, ctx);
+out:
+ ntfs_attr_put_search_ctx(ctx);
+ return ret;
+}
diff --git a/fs/ntfsplus/mft.c b/fs/ntfsplus/mft.c
new file mode 100644
index 000000000000..166cdf9ec0da
--- /dev/null
+++ b/fs/ntfsplus/mft.c
@@ -0,0 +1,2630 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel mft record operations. Part of the Linux-NTFS project.
+ * Part of this file is based on code from the NTFS-3G project.
+ *
+ * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/bio.h>
+
+#include "aops.h"
+#include "bitmap.h"
+#include "lcnalloc.h"
+#include "misc.h"
+#include "mft.h"
+#include "ntfs.h"
+
+/*
+ * ntfs_mft_record_check - Check the consistency of an MFT record
+ *
+ * Make sure its general fields are safe, then examine all its
+ * attributes and apply generic checks to them.
+ *
+ * Returns 0 if the checks are successful. If not, return -EIO.
+ */
+int ntfs_mft_record_check(const struct ntfs_volume *vol, struct mft_record *m,
+ unsigned long mft_no)
+{
+ struct attr_record *a;
+ struct super_block *sb = vol->sb;
+
+ if (!ntfs_is_file_record(m->magic)) {
+ ntfs_error(sb, "Record %llu has no FILE magic (0x%x)\n",
+ (unsigned long long)mft_no, le32_to_cpu(*(__le32 *)m));
+ goto err_out;
+ }
+
+ if ((m->usa_ofs & 0x1) ||
+ (vol->mft_record_size >> NTFS_BLOCK_SIZE_BITS) + 1 != le16_to_cpu(m->usa_count) ||
+ le16_to_cpu(m->usa_ofs) + le16_to_cpu(m->usa_count) * 2 > vol->mft_record_size) {
+ ntfs_error(sb, "Record %llu has corrupt fix-up values fields\n",
+ (unsigned long long)mft_no);
+ goto err_out;
+ }
+
+ if (le32_to_cpu(m->bytes_allocated) != vol->mft_record_size) {
+ ntfs_error(sb, "Record %llu has corrupt allocation size (%u <> %u)\n",
+ (unsigned long long)mft_no,
+ vol->mft_record_size,
+ le32_to_cpu(m->bytes_allocated));
+ goto err_out;
+ }
+
+ if (le32_to_cpu(m->bytes_in_use) > vol->mft_record_size) {
+ ntfs_error(sb, "Record %llu has corrupt in-use size (%u > %u)\n",
+ (unsigned long long)mft_no,
+ le32_to_cpu(m->bytes_in_use),
+ vol->mft_record_size);
+ goto err_out;
+ }
+
+ if (le16_to_cpu(m->attrs_offset) & 7) {
+ ntfs_error(sb, "Attributes badly aligned in record %llu\n",
+ (unsigned long long)mft_no);
+ goto err_out;
+ }
+
+ a = (struct attr_record *)((char *)m + le16_to_cpu(m->attrs_offset));
+ if ((char *)a < (char *)m || (char *)a > (char *)m + vol->mft_record_size) {
+ ntfs_error(sb, "Record %llu is corrupt\n",
+ (unsigned long long)mft_no);
+ goto err_out;
+ }
+
+ return 0;
+
+err_out:
+ return -EIO;
+}
+
+/**
+ * map_mft_record_page - map the page in which a specific mft record resides
+ * @ni: ntfs inode whose mft record page to map
+ *
+ * This maps the page in which the mft record of the ntfs inode @ni is situated
+ * and returns a pointer to the mft record within the mapped page.
+ *
+ * Return value needs to be checked with IS_ERR() and if that is true PTR_ERR()
+ * contains the negative error code returned.
+ */
+static inline struct mft_record *map_mft_record_folio(struct ntfs_inode *ni)
+{
+ loff_t i_size;
+ struct ntfs_volume *vol = ni->vol;
+ struct inode *mft_vi = vol->mft_ino;
+ struct folio *folio;
+ unsigned long index, end_index;
+ unsigned int ofs;
+
+ BUG_ON(ni->folio);
+ /*
+ * The index into the page cache and the offset within the page cache
+ * page of the wanted mft record.
+ */
+ index = (u64)ni->mft_no << vol->mft_record_size_bits >>
+ PAGE_SHIFT;
+ ofs = (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+
+ i_size = i_size_read(mft_vi);
+ /* The maximum valid index into the page cache for $MFT's data. */
+ end_index = i_size >> PAGE_SHIFT;
+
+ /* If the wanted index is out of bounds the mft record doesn't exist. */
+ if (unlikely(index >= end_index)) {
+ if (index > end_index || (i_size & ~PAGE_MASK) < ofs +
+ vol->mft_record_size) {
+ folio = ERR_PTR(-ENOENT);
+ ntfs_error(vol->sb,
+ "Attempt to read mft record 0x%lx, which is beyond the end of the mft. This is probably a bug in the ntfs driver.",
+ ni->mft_no);
+ goto err_out;
+ }
+ }
+
+ /* Read, map, and pin the folio. */
+ folio = ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+ if (!IS_ERR(folio)) {
+ u8 *addr;
+
+ ni->mrec = kmalloc(vol->mft_record_size, GFP_NOFS);
+ if (!ni->mrec) {
+ ntfs_unmap_folio(folio, NULL);
+ folio = ERR_PTR(-ENOMEM);
+ goto err_out;
+ }
+
+ addr = kmap_local_folio(folio, 0);
+ memcpy(ni->mrec, addr + ofs, vol->mft_record_size);
+ post_read_mst_fixup((struct ntfs_record *)ni->mrec, vol->mft_record_size);
+
+ /* Catch multi sector transfer fixup errors. */
+ if (!ntfs_mft_record_check(vol, (struct mft_record *)ni->mrec, ni->mft_no)) {
+ kunmap_local(addr);
+ ni->folio = folio;
+ ni->folio_ofs = ofs;
+ return ni->mrec;
+ }
+ ntfs_unmap_folio(folio, addr);
+ kfree(ni->mrec);
+ ni->mrec = NULL;
+ folio = ERR_PTR(-EIO);
+ NVolSetErrors(vol);
+ }
+err_out:
+ ni->folio = NULL;
+ ni->folio_ofs = 0;
+ return (void *)folio;
+}
+
+/**
+ * map_mft_record - map, pin and lock an mft record
+ * @ni: ntfs inode whose MFT record to map
+ *
+ * First, take the mrec_lock mutex. We might now be sleeping, while waiting
+ * for the mutex if it was already locked by someone else.
+ *
+ * The page of the record is mapped using map_mft_record_folio() before being
+ * returned to the caller.
+ *
+ * This in turn uses ntfs_read_mapping_folio() to get the page containing the wanted mft
+ * record (it in turn calls read_cache_page() which reads it in from disk if
+ * necessary, increments the use count on the page so that it cannot disappear
+ * under us and returns a reference to the page cache page).
+ *
+ * If read_cache_page() invokes ntfs_readpage() to load the page from disk, it
+ * sets PG_locked and clears PG_uptodate on the page. Once I/O has completed
+ * and the post-read mst fixups on each mft record in the page have been
+ * performed, the page gets PG_uptodate set and PG_locked cleared (this is done
+ * in our asynchronous I/O completion handler end_buffer_read_mft_async()).
+ * ntfs_read_mapping_folio() waits for PG_locked to become clear and checks if
+ * PG_uptodate is set and returns an error code if not. This provides
+ * sufficient protection against races when reading/using the page.
+ *
+ * However there is the write mapping to think about. Doing the above described
+ * checking here will be fine, because when initiating the write we will set
+ * PG_locked and clear PG_uptodate making sure nobody is touching the page
+ * contents. Doing the locking this way means that the commit to disk code in
+ * the page cache code paths is automatically sufficiently locked with us as
+ * we will not touch a page that has been locked or is not uptodate. The only
+ * locking problem then is them locking the page while we are accessing it.
+ *
+ * So that code will end up having to own the mrec_lock of all mft
+ * records/inodes present in the page before I/O can proceed. In that case we
+ * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be
+ * accessing anything without owning the mrec_lock mutex. But we do need to
+ * use them because of the read_cache_page() invocation and the code becomes so
+ * much simpler this way that it is well worth it.
+ *
+ * The mft record is now ours and we return a pointer to it. You need to check
+ * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will return
+ * the error code.
+ *
+ * NOTE: Caller is responsible for setting the mft record dirty before calling
+ * unmap_mft_record(). This is obviously only necessary if the caller really
+ * modified the mft record...
+ * Q: Do we want to recycle one of the VFS inode state bits instead?
+ * A: No, the inode ones mean we want to change the mft record, not we want to
+ * write it out.
+ */
+struct mft_record *map_mft_record(struct ntfs_inode *ni)
+{
+ struct mft_record *m;
+
+ if (!ni)
+ return ERR_PTR(-EINVAL);
+
+ ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+ /* Make sure the ntfs inode doesn't go away. */
+ atomic_inc(&ni->count);
+
+ if (ni->folio)
+ return (struct mft_record *)ni->mrec;
+
+ m = map_mft_record_folio(ni);
+ if (!IS_ERR(m))
+ return m;
+
+ atomic_dec(&ni->count);
+ ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
+ return m;
+}
+
+/**
+ * unmap_mft_record - release a mapped mft record
+ * @ni: ntfs inode whose MFT record to unmap
+ *
+ * We release the page mapping and the mrec_lock mutex which unmaps the mft
+ * record and releases it for others to get hold of. We also release the ntfs
+ * inode by decrementing the ntfs inode reference count.
+ *
+ * NOTE: If caller has modified the mft record, it is imperative to set the mft
+ * record dirty BEFORE calling unmap_mft_record().
+ */
+void unmap_mft_record(struct ntfs_inode *ni)
+{
+ struct folio *folio;
+
+ if (!ni)
+ return;
+
+ ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+ folio = ni->folio;
+ if (atomic_dec_return(&ni->count) > 1)
+ return;
+ BUG_ON(!folio);
+}
+
+/**
+ * map_extent_mft_record - load an extent inode and attach it to its base
+ * @base_ni: base ntfs inode
+ * @mref: mft reference of the extent inode to load
+ * @ntfs_ino: on successful return, pointer to the struct ntfs_inode structure
+ *
+ * Load the extent mft record @mref and attach it to its base inode @base_ni.
+ * Return the mapped extent mft record if IS_ERR(result) is false. Otherwise
+ * PTR_ERR(result) gives the negative error code.
+ *
+ * On successful return, @ntfs_ino contains a pointer to the ntfs_inode
+ * structure of the mapped extent inode.
+ */
+struct mft_record *map_extent_mft_record(struct ntfs_inode *base_ni, u64 mref,
+ struct ntfs_inode **ntfs_ino)
+{
+ struct mft_record *m;
+ struct ntfs_inode *ni = NULL;
+ struct ntfs_inode **extent_nis = NULL;
+ int i;
+ unsigned long mft_no = MREF(mref);
+ u16 seq_no = MSEQNO(mref);
+ bool destroy_ni = false;
+
+ ntfs_debug("Mapping extent mft record 0x%lx (base mft record 0x%lx).",
+ mft_no, base_ni->mft_no);
+ /* Make sure the base ntfs inode doesn't go away. */
+ atomic_inc(&base_ni->count);
+ /*
+ * Check if this extent inode has already been added to the base inode,
+ * in which case just return it. If not found, add it to the base
+ * inode before returning it.
+ */
+ mutex_lock(&base_ni->extent_lock);
+ if (base_ni->nr_extents > 0) {
+ extent_nis = base_ni->ext.extent_ntfs_inos;
+ for (i = 0; i < base_ni->nr_extents; i++) {
+ if (mft_no != extent_nis[i]->mft_no)
+ continue;
+ ni = extent_nis[i];
+ /* Make sure the ntfs inode doesn't go away. */
+ atomic_inc(&ni->count);
+ break;
+ }
+ }
+ if (likely(ni != NULL)) {
+ mutex_unlock(&base_ni->extent_lock);
+ atomic_dec(&base_ni->count);
+ /* We found the record; just have to map and return it. */
+ m = map_mft_record(ni);
+ /* map_mft_record() has incremented this on success. */
+ atomic_dec(&ni->count);
+ if (!IS_ERR(m)) {
+ /* Verify the sequence number. */
+ if (likely(le16_to_cpu(m->sequence_number) == seq_no)) {
+ ntfs_debug("Done 1.");
+ *ntfs_ino = ni;
+ return m;
+ }
+ unmap_mft_record(ni);
+ ntfs_error(base_ni->vol->sb,
+ "Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+ return ERR_PTR(-EIO);
+ }
+map_err_out:
+ ntfs_error(base_ni->vol->sb,
+ "Failed to map extent mft record, error code %ld.",
+ -PTR_ERR(m));
+ return m;
+ }
+ /* Record wasn't there. Get a new ntfs inode and initialize it. */
+ ni = ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
+ if (unlikely(!ni)) {
+ mutex_unlock(&base_ni->extent_lock);
+ atomic_dec(&base_ni->count);
+ return ERR_PTR(-ENOMEM);
+ }
+ ni->vol = base_ni->vol;
+ ni->seq_no = seq_no;
+ ni->nr_extents = -1;
+ ni->ext.base_ntfs_ino = base_ni;
+ /* Now map the record. */
+ m = map_mft_record(ni);
+ if (IS_ERR(m)) {
+ mutex_unlock(&base_ni->extent_lock);
+ atomic_dec(&base_ni->count);
+ ntfs_clear_extent_inode(ni);
+ goto map_err_out;
+ }
+ /* Verify the sequence number if it is present. */
+ if (seq_no && (le16_to_cpu(m->sequence_number) != seq_no)) {
+ ntfs_error(base_ni->vol->sb,
+ "Found stale extent mft reference! Corrupt filesystem. Run chkdsk.");
+ destroy_ni = true;
+ m = ERR_PTR(-EIO);
+ goto unm_err_out;
+ }
+ /* Attach extent inode to base inode, reallocating memory if needed. */
+ if (!(base_ni->nr_extents & 3)) {
+ struct ntfs_inode **tmp;
+ int new_size = (base_ni->nr_extents + 4) * sizeof(struct ntfs_inode *);
+
+ tmp = ntfs_malloc_nofs(new_size);
+ if (unlikely(!tmp)) {
+ ntfs_error(base_ni->vol->sb, "Failed to allocate internal buffer.");
+ destroy_ni = true;
+ m = ERR_PTR(-ENOMEM);
+ goto unm_err_out;
+ }
+ if (base_ni->nr_extents) {
+ BUG_ON(!base_ni->ext.extent_ntfs_inos);
+ memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size -
+ 4 * sizeof(struct ntfs_inode *));
+ ntfs_free(base_ni->ext.extent_ntfs_inos);
+ }
+ base_ni->ext.extent_ntfs_inos = tmp;
+ }
+ base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
+ mutex_unlock(&base_ni->extent_lock);
+ atomic_dec(&base_ni->count);
+ ntfs_debug("Done 2.");
+ *ntfs_ino = ni;
+ return m;
+unm_err_out:
+ unmap_mft_record(ni);
+ mutex_unlock(&base_ni->extent_lock);
+ atomic_dec(&base_ni->count);
+ /*
+ * If the extent inode was not attached to the base inode we need to
+ * release it or we will leak memory.
+ */
+ if (destroy_ni)
+ ntfs_clear_extent_inode(ni);
+ return m;
+}
+
+/**
+ * __mark_mft_record_dirty - set the mft record and the page containing it dirty
+ * @ni: ntfs inode describing the mapped mft record
+ *
+ * Internal function. Users should call mark_mft_record_dirty() instead.
+ *
+ * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
+ * as well as the page containing the mft record, dirty. Also, mark the base
+ * vfs inode dirty. This ensures that any changes to the mft record are
+ * written out to disk.
+ *
+ * NOTE: We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES)
+ * on the base vfs inode, because even though file data may have been modified,
+ * it is dirty in the inode meta data rather than the data page cache of the
+ * inode, and thus there are no data pages that need writing out. Therefore, a
+ * full mark_inode_dirty() is overkill. A mark_inode_dirty_sync(), on the
+ * other hand, is not sufficient, because ->write_inode needs to be called even
+ * in case of fdatasync. This needs to happen or the file data would not
+ * necessarily hit the device synchronously, even though the vfs inode has the
+ * O_SYNC flag set. Also, I_DIRTY_DATASYNC simply "feels" better than just
+ * I_DIRTY_SYNC, since the file data has not actually hit the block device yet,
+ * which is not what I_DIRTY_SYNC on its own would suggest.
+ */
+void __mark_mft_record_dirty(struct ntfs_inode *ni)
+{
+ struct ntfs_inode *base_ni;
+
+ ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+ BUG_ON(NInoAttr(ni));
+ /* Determine the base vfs inode and mark it dirty, too. */
+ if (likely(ni->nr_extents >= 0))
+ base_ni = ni;
+ else
+ base_ni = ni->ext.base_ntfs_ino;
+ __mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC);
+}
+
+/**
+ * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
+ * @vol: ntfs volume on which the mft record to synchronize resides
+ * @mft_no: mft record number of mft record to synchronize
+ * @m: mapped, mst protected (extent) mft record to synchronize
+ *
+ * Write the mapped, mst protected (extent) mft record @m with mft record
+ * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
+ *
+ * On success return 0. On error return -errno and set the volume errors flag
+ * in the ntfs volume @vol.
+ *
+ * NOTE: We always perform synchronous i/o and ignore the @sync parameter.
+ */
+int ntfs_sync_mft_mirror(struct ntfs_volume *vol, const unsigned long mft_no,
+ struct mft_record *m)
+{
+ u8 *kmirr = NULL;
+ struct folio *folio;
+ unsigned int folio_ofs, lcn_folio_off = 0;
+ int err = 0;
+ struct bio *bio;
+
+ ntfs_debug("Entering for inode 0x%lx.", mft_no);
+
+ if (unlikely(!vol->mftmirr_ino)) {
+ /* This could happen during umount... */
+ err = -EIO;
+ goto err_out;
+ }
+ /* Get the page containing the mirror copy of the mft record @m. */
+ folio = ntfs_read_mapping_folio(vol->mftmirr_ino->i_mapping, mft_no >>
+ (PAGE_SHIFT - vol->mft_record_size_bits));
+ if (IS_ERR(folio)) {
+ ntfs_error(vol->sb, "Failed to map mft mirror page.");
+ err = PTR_ERR(folio);
+ goto err_out;
+ }
+
+ folio_lock(folio);
+ BUG_ON(!folio_test_uptodate(folio));
+ folio_clear_uptodate(folio);
+ /* Offset of the mft mirror record inside the page. */
+ folio_ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+ /* The address in the page of the mirror copy of the mft record @m. */
+ kmirr = kmap_local_folio(folio, 0) + folio_ofs;
+ /* Copy the mst protected mft record to the mirror. */
+ memcpy(kmirr, m, vol->mft_record_size);
+
+ if (vol->cluster_size_bits > PAGE_SHIFT) {
+ lcn_folio_off = folio->index << PAGE_SHIFT;
+ lcn_folio_off &= vol->cluster_size_mask;
+ }
+
+ bio = ntfs_setup_bio(vol, REQ_OP_WRITE, vol->mftmirr_lcn,
+ lcn_folio_off + folio_ofs);
+ if (!bio) {
+ err = -ENOMEM;
+ goto unlock_folio;
+ }
+
+ if (!bio_add_folio(bio, folio, vol->mft_record_size, folio_ofs)) {
+ err = -EIO;
+ bio_put(bio);
+ goto unlock_folio;
+ }
+
+ submit_bio_wait(bio);
+ bio_put(bio);
+ /* Current state: all buffers are clean, unlocked, and uptodate. */
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+
+unlock_folio:
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, kmirr);
+ if (likely(!err)) {
+ ntfs_debug("Done.");
+ } else {
+ ntfs_error(vol->sb, "I/O error while writing mft mirror record 0x%lx!", mft_no);
+err_out:
+ ntfs_error(vol->sb,
+ "Failed to synchronize $MFTMirr (error code %i). Volume will be left marked dirty on umount. Run chkdsk on the partition after umounting to correct this.",
+ err);
+ NVolSetErrors(vol);
+ }
+ return err;
+}
+
+/**
+ * write_mft_record_nolock - write out a mapped (extent) mft record
+ * @ni: ntfs inode describing the mapped (extent) mft record
+ * @m: mapped (extent) mft record to write
+ * @sync: if true, wait for i/o completion
+ *
+ * Write the mapped (extent) mft record @m described by the (regular or extent)
+ * ntfs inode @ni to backing store. If the mft record @m has a counterpart in
+ * the mft mirror, that is also updated.
+ *
+ * We only write the mft record if the ntfs inode @ni is dirty and the first
+ * buffer belonging to its mft record is dirty, too. We ignore the dirty state
+ * of subsequent buffers because we could have raced with
+ * fs/ntfs/aops.c::mark_ntfs_record_dirty().
+ *
+ * On success, clean the mft record and return 0. On error, leave the mft
+ * record dirty and return -errno.
+ *
+ * NOTE: We always perform synchronous i/o and ignore the @sync parameter.
+ * However, if the mft record has a counterpart in the mft mirror and @sync is
+ * true, we write the mft record, wait for i/o completion, and only then write
+ * the mft mirror copy. This ensures that if the system crashes either the mft
+ * or the mft mirror will contain a self-consistent mft record @m. If @sync is
+ * false on the other hand, we start i/o on both and then wait for completion
+ * on them. This provides a speedup but no longer guarantees that you will end
+ * up with a self-consistent mft record in the case of a crash but if you asked
+ * for asynchronous writing you probably do not care about that anyway.
+ */
+int write_mft_record_nolock(struct ntfs_inode *ni, struct mft_record *m, int sync)
+{
+ struct ntfs_volume *vol = ni->vol;
+ struct folio *folio = ni->folio;
+ unsigned int lcn_folio_off = 0;
+ int err = 0, i = 0;
+ u8 *kaddr;
+ struct mft_record *fixup_m;
+ struct bio *bio;
+ unsigned int offset = 0, folio_size;
+
+ ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+
+ BUG_ON(NInoAttr(ni));
+ BUG_ON(!folio_test_locked(folio));
+
+ /*
+ * If the struct ntfs_inode is clean no need to do anything. If it is dirty,
+ * mark it as clean now so that it can be redirtied later on if needed.
+ * There is no danger of races since the caller is holding the locks
+ * for the mft record @m and the page it is in.
+ */
+ if (!NInoTestClearDirty(ni))
+ goto done;
+
+ if (ni->mft_lcn[0] == LCN_RL_NOT_MAPPED) {
+ sector_t iblock;
+ s64 vcn;
+ unsigned char blocksize_bits = vol->sb->s_blocksize_bits;
+ struct runlist_element *rl;
+
+ iblock = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
+ vcn = ((s64)iblock << blocksize_bits) >> vol->cluster_size_bits;
+
+ down_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+ rl = NTFS_I(vol->mft_ino)->runlist.rl;
+ BUG_ON(!rl);
+
+ /* Seek to element containing target vcn. */
+ while (rl->length && rl[1].vcn <= vcn)
+ rl++;
+ ni->mft_lcn[0] = ntfs_rl_vcn_to_lcn(rl, vcn);
+ ni->mft_lcn_count++;
+
+ if (vol->cluster_size < vol->mft_record_size &&
+ (rl->length - (vcn - rl->vcn)) <= 1) {
+ rl++;
+ ni->mft_lcn[1] = ntfs_rl_vcn_to_lcn(rl, vcn + 1);
+ ni->mft_lcn_count++;
+ }
+ up_read(&NTFS_I(vol->mft_ino)->runlist.lock);
+ }
+
+ kaddr = kmap_local_folio(folio, 0);
+ fixup_m = (struct mft_record *)(kaddr + ni->folio_ofs);
+ memcpy(fixup_m, m, vol->mft_record_size);
+
+ /* Apply the mst protection fixups. */
+ err = pre_write_mst_fixup((struct ntfs_record *)fixup_m, vol->mft_record_size);
+ if (err) {
+ ntfs_error(vol->sb, "Failed to apply mst fixups!");
+ goto err_out;
+ }
+
+ while (i < ni->mft_lcn_count) {
+ folio_size = vol->mft_record_size / ni->mft_lcn_count;
+
+ flush_dcache_folio(folio);
+
+ if (vol->cluster_size_bits > PAGE_SHIFT) {
+ lcn_folio_off = folio->index << PAGE_SHIFT;
+ lcn_folio_off &= vol->cluster_size_mask;
+ }
+
+ bio = ntfs_setup_bio(vol, REQ_OP_WRITE, ni->mft_lcn[i],
+ lcn_folio_off + ni->folio_ofs);
+ if (!bio) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ if (!bio_add_folio(bio, folio, folio_size,
+ ni->folio_ofs + offset)) {
+ err = -EIO;
+ goto put_bio_out;
+ }
+
+ /* Synchronize the mft mirror now if not @sync. */
+ if (!sync && ni->mft_no < vol->mftmirr_size)
+ ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+
+ submit_bio_wait(bio);
+ bio_put(bio);
+ offset += vol->cluster_size;
+ i++;
+ }
+
+ /* If @sync, now synchronize the mft mirror. */
+ if (sync && ni->mft_no < vol->mftmirr_size)
+ ntfs_sync_mft_mirror(vol, ni->mft_no, fixup_m);
+ kunmap_local(kaddr);
+ if (unlikely(err)) {
+ /* I/O error during writing. This is really bad! */
+ ntfs_error(vol->sb,
+ "I/O error while writing mft record 0x%lx! Marking base inode as bad. You should unmount the volume and run chkdsk.",
+ ni->mft_no);
+ goto err_out;
+ }
+done:
+ ntfs_debug("Done.");
+ return 0;
+put_bio_out:
+ bio_put(bio);
+err_out:
+ /*
+ * Current state: all buffers are clean, unlocked, and uptodate.
+ * The caller should mark the base inode as bad so that no more i/o
+ * happens. ->clear_inode() will still be invoked so all extent inodes
+ * and other allocated memory will be freed.
+ */
+ if (err == -ENOMEM) {
+ ntfs_error(vol->sb,
+ "Not enough memory to write mft record. Redirtying so the write is retried later.");
+ mark_mft_record_dirty(ni);
+ err = 0;
+ } else
+ NVolSetErrors(vol);
+ return err;
+}
+
+static int ntfs_test_inode_wb(struct inode *vi, unsigned long ino, void *data)
+{
+ struct ntfs_attr *na = (struct ntfs_attr *)data;
+
+ if (!ntfs_test_inode(vi, na))
+ return 0;
+
+ /*
+ * Without this, ntfs_write_mst_block() could call iput_final()
+ * , and ntfs_evict_big_inode() could try to unlink this inode
+ * and the contex could be blocked infinitly in map_mft_record().
+ */
+ if (NInoBeingDeleted(NTFS_I(vi))) {
+ na->state = NI_BeingDeleted;
+ return -1;
+ }
+
+ /*
+ * This condition can prevent ntfs_write_mst_block()
+ * from applying/undo fixups while ntfs_create() being
+ * called
+ */
+ spin_lock(&vi->i_lock);
+ if (vi->i_state & I_CREATING) {
+ spin_unlock(&vi->i_lock);
+ na->state = NI_BeingCreated;
+ return -1;
+ }
+ spin_unlock(&vi->i_lock);
+
+ return igrab(vi) ? 1 : -1;
+}
+
+/**
+ * ntfs_may_write_mft_record - check if an mft record may be written out
+ * @vol: [IN] ntfs volume on which the mft record to check resides
+ * @mft_no: [IN] mft record number of the mft record to check
+ * @m: [IN] mapped mft record to check
+ * @locked_ni: [OUT] caller has to unlock this ntfs inode if one is returned
+ *
+ * Check if the mapped (base or extent) mft record @m with mft record number
+ * @mft_no belonging to the ntfs volume @vol may be written out. If necessary
+ * and possible the ntfs inode of the mft record is locked and the base vfs
+ * inode is pinned. The locked ntfs inode is then returned in @locked_ni. The
+ * caller is responsible for unlocking the ntfs inode and unpinning the base
+ * vfs inode.
+ *
+ * Return 'true' if the mft record may be written out and 'false' if not.
+ *
+ * The caller has locked the page and cleared the uptodate flag on it which
+ * means that we can safely write out any dirty mft records that do not have
+ * their inodes in icache as determined by ilookup5() as anyone
+ * opening/creating such an inode would block when attempting to map the mft
+ * record in read_cache_page() until we are finished with the write out.
+ *
+ * Here is a description of the tests we perform:
+ *
+ * If the inode is found in icache we know the mft record must be a base mft
+ * record. If it is dirty, we do not write it and return 'false' as the vfs
+ * inode write paths will result in the access times being updated which would
+ * cause the base mft record to be redirtied and written out again. (We know
+ * the access time update will modify the base mft record because Windows
+ * chkdsk complains if the standard information attribute is not in the base
+ * mft record.)
+ *
+ * If the inode is in icache and not dirty, we attempt to lock the mft record
+ * and if we find the lock was already taken, it is not safe to write the mft
+ * record and we return 'false'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the mft record,
+ * which also allows us safe writeout of the mft record. We then set
+ * @locked_ni to the locked ntfs inode and return 'true'.
+ *
+ * Note we cannot just lock the mft record and sleep while waiting for the lock
+ * because this would deadlock due to lock reversal (normally the mft record is
+ * locked before the page is locked but we already have the page locked here
+ * when we try to lock the mft record).
+ *
+ * If the inode is not in icache we need to perform further checks.
+ *
+ * If the mft record is not a FILE record or it is a base mft record, we can
+ * safely write it and return 'true'.
+ *
+ * We now know the mft record is an extent mft record. We check if the inode
+ * corresponding to its base mft record is in icache and obtain a reference to
+ * it if it is. If it is not, we can safely write it and return 'true'.
+ *
+ * We now have the base inode for the extent mft record. We check if it has an
+ * ntfs inode for the extent mft record attached and if not it is safe to write
+ * the extent mft record and we return 'true'.
+ *
+ * The ntfs inode for the extent mft record is attached to the base inode so we
+ * attempt to lock the extent mft record and if we find the lock was already
+ * taken, it is not safe to write the extent mft record and we return 'false'.
+ *
+ * If we manage to obtain the lock we have exclusive access to the extent mft
+ * record, which also allows us safe writeout of the extent mft record. We
+ * set the ntfs inode of the extent mft record clean and then set @locked_ni to
+ * the now locked ntfs inode and return 'true'.
+ *
+ * Note, the reason for actually writing dirty mft records here and not just
+ * relying on the vfs inode dirty code paths is that we can have mft records
+ * modified without them ever having actual inodes in memory. Also we can have
+ * dirty mft records with clean ntfs inodes in memory. None of the described
+ * cases would result in the dirty mft records being written out if we only
+ * relied on the vfs inode dirty code paths. And these cases can really occur
+ * during allocation of new mft records and in particular when the
+ * initialized_size of the $MFT/$DATA attribute is extended and the new space
+ * is initialized using ntfs_mft_record_format(). The clean inode can then
+ * appear if the mft record is reused for a new inode before it got written
+ * out.
+ */
+bool ntfs_may_write_mft_record(struct ntfs_volume *vol, const unsigned long mft_no,
+ const struct mft_record *m, struct ntfs_inode **locked_ni)
+{
+ struct super_block *sb = vol->sb;
+ struct inode *mft_vi = vol->mft_ino;
+ struct inode *vi;
+ struct ntfs_inode *ni, *eni, **extent_nis;
+ int i;
+ struct ntfs_attr na = {0};
+
+ ntfs_debug("Entering for inode 0x%lx.", mft_no);
+ /*
+ * Normally we do not return a locked inode so set @locked_ni to NULL.
+ */
+ BUG_ON(!locked_ni);
+ *locked_ni = NULL;
+ /*
+ * Check if the inode corresponding to this mft record is in the VFS
+ * inode cache and obtain a reference to it if it is.
+ */
+ ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
+ na.mft_no = mft_no;
+ na.type = AT_UNUSED;
+ /*
+ * Optimize inode 0, i.e. $MFT itself, since we have it in memory and
+ * we get here for it rather often.
+ */
+ if (!mft_no) {
+ /* Balance the below iput(). */
+ vi = igrab(mft_vi);
+ BUG_ON(vi != mft_vi);
+ } else {
+ /*
+ * Have to use find_inode_nowait() since ilookup5_nowait()
+ * waits for inode with I_FREEING, which causes ntfs to deadlock
+ * when inodes are unlinked concurrently
+ */
+ vi = find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+ if (na.state == NI_BeingDeleted || na.state == NI_BeingCreated)
+ return false;
+ }
+ if (vi) {
+ ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
+ /* The inode is in icache. */
+ ni = NTFS_I(vi);
+ /* Take a reference to the ntfs inode. */
+ atomic_inc(&ni->count);
+ /* If the inode is dirty, do not write this record. */
+ if (NInoDirty(ni)) {
+ ntfs_debug("Inode 0x%lx is dirty, do not write it.",
+ mft_no);
+ atomic_dec(&ni->count);
+ iput(vi);
+ return false;
+ }
+ ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
+ /* The inode is not dirty, try to take the mft record lock. */
+ if (unlikely(!mutex_trylock(&ni->mrec_lock))) {
+ ntfs_debug("Mft record 0x%lx is already locked, do not write it.", mft_no);
+ atomic_dec(&ni->count);
+ iput(vi);
+ return false;
+ }
+ ntfs_debug("Managed to lock mft record 0x%lx, write it.",
+ mft_no);
+ /*
+ * The write has to occur while we hold the mft record lock so
+ * return the locked ntfs inode.
+ */
+ *locked_ni = ni;
+ return true;
+ }
+ ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
+ /* The inode is not in icache. */
+ /* Write the record if it is not a mft record (type "FILE"). */
+ if (!ntfs_is_mft_record(m->magic)) {
+ ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
+ mft_no);
+ return true;
+ }
+ /* Write the mft record if it is a base inode. */
+ if (!m->base_mft_record) {
+ ntfs_debug("Mft record 0x%lx is a base record, write it.",
+ mft_no);
+ return true;
+ }
+ /*
+ * This is an extent mft record. Check if the inode corresponding to
+ * its base mft record is in icache and obtain a reference to it if it
+ * is.
+ */
+ na.mft_no = MREF_LE(m->base_mft_record);
+ na.state = 0;
+ ntfs_debug("Mft record 0x%lx is an extent record. Looking for base inode 0x%lx in icache.",
+ mft_no, na.mft_no);
+ if (!na.mft_no) {
+ /* Balance the below iput(). */
+ vi = igrab(mft_vi);
+ BUG_ON(vi != mft_vi);
+ } else {
+ vi = find_inode_nowait(sb, mft_no, ntfs_test_inode_wb, &na);
+ if (na.state == NI_BeingDeleted || na.state == NI_BeingCreated)
+ return false;
+ }
+
+ if (!vi)
+ return false;
+ ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
+ /*
+ * The base inode is in icache. Check if it has the extent inode
+ * corresponding to this extent mft record attached.
+ */
+ ni = NTFS_I(vi);
+ mutex_lock(&ni->extent_lock);
+ if (ni->nr_extents <= 0) {
+ /*
+ * The base inode has no attached extent inodes, write this
+ * extent mft record.
+ */
+ mutex_unlock(&ni->extent_lock);
+ iput(vi);
+ ntfs_debug("Base inode 0x%lx has no attached extent inodes, write the extent record.",
+ na.mft_no);
+ return true;
+ }
+ /* Iterate over the attached extent inodes. */
+ extent_nis = ni->ext.extent_ntfs_inos;
+ for (eni = NULL, i = 0; i < ni->nr_extents; ++i) {
+ if (mft_no == extent_nis[i]->mft_no) {
+ /*
+ * Found the extent inode corresponding to this extent
+ * mft record.
+ */
+ eni = extent_nis[i];
+ break;
+ }
+ }
+ /*
+ * If the extent inode was not attached to the base inode, write this
+ * extent mft record.
+ */
+ if (!eni) {
+ mutex_unlock(&ni->extent_lock);
+ iput(vi);
+ ntfs_debug("Extent inode 0x%lx is not attached to its base inode 0x%lx, write the extent record.",
+ mft_no, na.mft_no);
+ return true;
+ }
+ ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
+ mft_no, na.mft_no);
+ /* Take a reference to the extent ntfs inode. */
+ atomic_inc(&eni->count);
+ mutex_unlock(&ni->extent_lock);
+
+ /* if extent inode is dirty, write_inode will write it */
+ if (NInoDirty(eni)) {
+ atomic_dec(&eni->count);
+ iput(vi);
+ return false;
+ }
+
+ /*
+ * Found the extent inode coresponding to this extent mft record.
+ * Try to take the mft record lock.
+ */
+ if (unlikely(!mutex_trylock(&eni->mrec_lock))) {
+ atomic_dec(&eni->count);
+ iput(vi);
+ ntfs_debug("Extent mft record 0x%lx is already locked, do not write it.",
+ mft_no);
+ return false;
+ }
+ ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
+ mft_no);
+ /*
+ * The write has to occur while we hold the mft record lock so return
+ * the locked extent ntfs inode.
+ */
+ *locked_ni = eni;
+ return true;
+}
+
+static const char *es = " Leaving inconsistent metadata. Unmount and run chkdsk.";
+
+/**
+ * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
+ * @vol: volume on which to search for a free mft record
+ * @base_ni: open base inode if allocating an extent mft record or NULL
+ *
+ * Search for a free mft record in the mft bitmap attribute on the ntfs volume
+ * @vol.
+ *
+ * If @base_ni is NULL start the search at the default allocator position.
+ *
+ * If @base_ni is not NULL start the search at the mft record after the base
+ * mft record @base_ni.
+ *
+ * Return the free mft record on success and -errno on error. An error code of
+ * -ENOSPC means that there are no free mft records in the currently
+ * initialized mft bitmap.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(struct ntfs_volume *vol,
+ struct ntfs_inode *base_ni)
+{
+ s64 pass_end, ll, data_pos, pass_start, ofs, bit;
+ unsigned long flags;
+ struct address_space *mftbmp_mapping;
+ u8 *buf = NULL, *byte;
+ struct folio *folio;
+ unsigned int folio_ofs, size;
+ u8 pass, b;
+
+ ntfs_debug("Searching for free mft record in the currently initialized mft bitmap.");
+ mftbmp_mapping = vol->mftbmp_ino->i_mapping;
+ /*
+ * Set the end of the pass making sure we do not overflow the mft
+ * bitmap.
+ */
+ read_lock_irqsave(&NTFS_I(vol->mft_ino)->size_lock, flags);
+ pass_end = NTFS_I(vol->mft_ino)->allocated_size >>
+ vol->mft_record_size_bits;
+ read_unlock_irqrestore(&NTFS_I(vol->mft_ino)->size_lock, flags);
+ read_lock_irqsave(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+ ll = NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
+ read_unlock_irqrestore(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
+ if (pass_end > ll)
+ pass_end = ll;
+ pass = 1;
+ if (!base_ni)
+ data_pos = vol->mft_data_pos;
+ else
+ data_pos = base_ni->mft_no + 1;
+ if (data_pos < 24)
+ data_pos = 24;
+ if (data_pos >= pass_end) {
+ data_pos = 24;
+ pass = 2;
+ /* This happens on a freshly formatted volume. */
+ if (data_pos >= pass_end)
+ return -ENOSPC;
+ }
+ pass_start = data_pos;
+ ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, pass_end 0x%llx, data_pos 0x%llx.",
+ pass, pass_start, pass_end, data_pos);
+ /* Loop until a free mft record is found. */
+ for (; pass <= 2;) {
+ /* Cap size to pass_end. */
+ ofs = data_pos >> 3;
+ folio_ofs = ofs & ~PAGE_MASK;
+ size = PAGE_SIZE - folio_ofs;
+ ll = ((pass_end + 7) >> 3) - ofs;
+ if (size > ll)
+ size = ll;
+ size <<= 3;
+ /*
+ * If we are still within the active pass, search the next page
+ * for a zero bit.
+ */
+ if (size) {
+ folio = ntfs_read_mapping_folio(mftbmp_mapping,
+ ofs >> PAGE_SHIFT);
+ if (IS_ERR(folio)) {
+ ntfs_error(vol->sb, "Failed to read mft bitmap, aborting.");
+ return PTR_ERR(folio);
+ }
+ folio_lock(folio);
+ buf = (u8 *)kmap_local_folio(folio, 0) + folio_ofs;
+ bit = data_pos & 7;
+ data_pos &= ~7ull;
+ ntfs_debug("Before inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%llx",
+ size, data_pos, bit);
+ for (; bit < size && data_pos + bit < pass_end;
+ bit &= ~7ull, bit += 8) {
+ byte = buf + (bit >> 3);
+ if (*byte == 0xff)
+ continue;
+ b = ffz((unsigned long)*byte);
+ if (b < 8 && b >= (bit & 7)) {
+ ll = data_pos + (bit & ~7ull) + b;
+ if (unlikely(ll > (1ll << 32))) {
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, buf);
+ return -ENOSPC;
+ }
+ *byte |= 1 << b;
+ flush_dcache_folio(folio);
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, buf);
+ ntfs_debug("Done. (Found and allocated mft record 0x%llx.)",
+ ll);
+ return ll;
+ }
+ }
+ ntfs_debug("After inner for loop: size 0x%x, data_pos 0x%llx, bit 0x%llx",
+ size, data_pos, bit);
+ data_pos += size;
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, buf);
+ /*
+ * If the end of the pass has not been reached yet,
+ * continue searching the mft bitmap for a zero bit.
+ */
+ if (data_pos < pass_end)
+ continue;
+ }
+ /* Do the next pass. */
+ if (++pass == 2) {
+ /*
+ * Starting the second pass, in which we scan the first
+ * part of the zone which we omitted earlier.
+ */
+ pass_end = pass_start;
+ data_pos = pass_start = 24;
+ ntfs_debug("pass %i, pass_start 0x%llx, pass_end 0x%llx.",
+ pass, pass_start, pass_end);
+ if (data_pos >= pass_end)
+ break;
+ }
+ }
+ /* No free mft records in currently initialized mft bitmap. */
+ ntfs_debug("Done. (No free mft records left in currently initialized mft bitmap.)");
+ return -ENOSPC;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a cluster
+ * @vol: volume on which to extend the mft bitmap attribute
+ *
+ * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
+ *
+ * Note: Only changes allocated_size, i.e. does not touch initialized_size or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ * - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
+ * writing and releases it before returning.
+ * - This function takes vol->lcnbmp_lock for writing and releases it
+ * before returning.
+ */
+static int ntfs_mft_bitmap_extend_allocation_nolock(struct ntfs_volume *vol)
+{
+ s64 lcn;
+ s64 ll;
+ unsigned long flags;
+ struct folio *folio;
+ struct ntfs_inode *mft_ni, *mftbmp_ni;
+ struct runlist_element *rl, *rl2 = NULL;
+ struct ntfs_attr_search_ctx *ctx = NULL;
+ struct mft_record *mrec;
+ struct attr_record *a = NULL;
+ int ret, mp_size;
+ u32 old_alen = 0;
+ u8 *b, tb;
+ struct {
+ u8 added_cluster:1;
+ u8 added_run:1;
+ u8 mp_rebuilt:1;
+ } status = { 0, 0, 0 };
+ size_t new_rl_count;
+
+ ntfs_debug("Extending mft bitmap allocation.");
+ mft_ni = NTFS_I(vol->mft_ino);
+ mftbmp_ni = NTFS_I(vol->mftbmp_ino);
+ /*
+ * Determine the last lcn of the mft bitmap. The allocated size of the
+ * mft bitmap cannot be zero so we are ok to do this.
+ */
+ down_write(&mftbmp_ni->runlist.lock);
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ ll = mftbmp_ni->allocated_size;
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ rl = ntfs_attr_find_vcn_nolock(mftbmp_ni,
+ (ll - 1) >> vol->cluster_size_bits, NULL);
+ if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+ up_write(&mftbmp_ni->runlist.lock);
+ ntfs_error(vol->sb,
+ "Failed to determine last allocated cluster of mft bitmap attribute.");
+ if (!IS_ERR(rl))
+ ret = -EIO;
+ else
+ ret = PTR_ERR(rl);
+ return ret;
+ }
+ lcn = rl->lcn + rl->length;
+ ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
+ (long long)lcn);
+ /*
+ * Attempt to get the cluster following the last allocated cluster by
+ * hand as it may be in the MFT zone so the allocator would not give it
+ * to us.
+ */
+ ll = lcn >> 3;
+ folio = ntfs_read_mapping_folio(vol->lcnbmp_ino->i_mapping,
+ ll >> PAGE_SHIFT);
+ if (IS_ERR(folio)) {
+ up_write(&mftbmp_ni->runlist.lock);
+ ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
+ return PTR_ERR(folio);
+ }
+
+ down_write(&vol->lcnbmp_lock);
+ folio_lock(folio);
+ b = (u8 *)kmap_local_folio(folio, 0) + (ll & ~PAGE_MASK);
+ tb = 1 << (lcn & 7ull);
+ if (*b != 0xff && !(*b & tb)) {
+ /* Next cluster is free, allocate it. */
+ *b |= tb;
+ flush_dcache_folio(folio);
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, b);
+ up_write(&vol->lcnbmp_lock);
+ /* Update the mft bitmap runlist. */
+ rl->length++;
+ rl[1].vcn++;
+ status.added_cluster = 1;
+ ntfs_debug("Appending one cluster to mft bitmap.");
+ } else {
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, b);
+ up_write(&vol->lcnbmp_lock);
+ /* Allocate a cluster from the DATA_ZONE. */
+ rl2 = ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE,
+ true, false, false);
+ if (IS_ERR(rl2)) {
+ up_write(&mftbmp_ni->runlist.lock);
+ ntfs_error(vol->sb,
+ "Failed to allocate a cluster for the mft bitmap.");
+ return PTR_ERR(rl2);
+ }
+ rl = ntfs_runlists_merge(&mftbmp_ni->runlist, rl2, 0, &new_rl_count);
+ if (IS_ERR(rl)) {
+ up_write(&mftbmp_ni->runlist.lock);
+ ntfs_error(vol->sb, "Failed to merge runlists for mft bitmap.");
+ if (ntfs_cluster_free_from_rl(vol, rl2)) {
+ ntfs_error(vol->sb, "Failed to deallocate allocated cluster.%s",
+ es);
+ NVolSetErrors(vol);
+ }
+ ntfs_free(rl2);
+ return PTR_ERR(rl);
+ }
+ mftbmp_ni->runlist.rl = rl;
+ mftbmp_ni->runlist.count = new_rl_count;
+ status.added_run = 1;
+ ntfs_debug("Adding one run to mft bitmap.");
+ /* Find the last run in the new runlist. */
+ for (; rl[1].length; rl++)
+ ;
+ }
+ /*
+ * Update the attribute record as well. Note: @rl is the last
+ * (non-terminator) runlist element of mft bitmap.
+ */
+ mrec = map_mft_record(mft_ni);
+ if (IS_ERR(mrec)) {
+ ntfs_error(vol->sb, "Failed to map mft record.");
+ ret = PTR_ERR(mrec);
+ goto undo_alloc;
+ }
+ ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+ if (unlikely(!ctx)) {
+ ntfs_error(vol->sb, "Failed to get search context.");
+ ret = -ENOMEM;
+ goto undo_alloc;
+ }
+ ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+ mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+ 0, ctx);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb,
+ "Failed to find last attribute extent of mft bitmap attribute.");
+ if (ret == -ENOENT)
+ ret = -EIO;
+ goto undo_alloc;
+ }
+ a = ctx->attr;
+ ll = le64_to_cpu(a->data.non_resident.lowest_vcn);
+ /* Search back for the previous last allocated cluster of mft bitmap. */
+ for (rl2 = rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
+ if (ll >= rl2->vcn)
+ break;
+ }
+ BUG_ON(ll < rl2->vcn);
+ BUG_ON(ll >= rl2->vcn + rl2->length);
+ /* Get the size for the new mapping pairs array for this extent. */
+ mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+ if (unlikely(mp_size <= 0)) {
+ ntfs_error(vol->sb,
+ "Get size for mapping pairs failed for mft bitmap attribute extent.");
+ ret = mp_size;
+ if (!ret)
+ ret = -EIO;
+ goto undo_alloc;
+ }
+ /* Expand the attribute record if necessary. */
+ old_alen = le32_to_cpu(a->length);
+ ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+ le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+ if (unlikely(ret)) {
+ if (ret != -ENOSPC) {
+ ntfs_error(vol->sb,
+ "Failed to resize attribute record for mft bitmap attribute.");
+ goto undo_alloc;
+ }
+ /*
+ * Note: It will need to be a special mft record and if none of
+ * those are available it gets rather complicated...
+ */
+ ntfs_error(vol->sb,
+ "Not enough space in this mft record to accommodate extended mft bitmap attribute extent. Cannot handle this yet.");
+ ret = -EOPNOTSUPP;
+ goto undo_alloc;
+ }
+ status.mp_rebuilt = 1;
+ /* Generate the mapping pairs array directly into the attr record. */
+ ret = ntfs_mapping_pairs_build(vol, (u8 *)a +
+ le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+ mp_size, rl2, ll, -1, NULL, NULL, NULL);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb,
+ "Failed to build mapping pairs array for mft bitmap attribute.");
+ goto undo_alloc;
+ }
+ /* Update the highest_vcn. */
+ a->data.non_resident.highest_vcn = cpu_to_le64(rl[1].vcn - 1);
+ /*
+ * We now have extended the mft bitmap allocated_size by one cluster.
+ * Reflect this in the struct ntfs_inode structure and the attribute record.
+ */
+ if (a->data.non_resident.lowest_vcn) {
+ /*
+ * We are not in the first attribute extent, switch to it, but
+ * first ensure the changes will make it to disk later.
+ */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_reinit_search_ctx(ctx);
+ ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+ mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb,
+ "Failed to find first attribute extent of mft bitmap attribute.");
+ goto restore_undo_alloc;
+ }
+ a = ctx->attr;
+ }
+ write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ mftbmp_ni->allocated_size += vol->cluster_size;
+ a->data.non_resident.allocated_size =
+ cpu_to_le64(mftbmp_ni->allocated_size);
+ write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ /* Ensure the changes make it to disk. */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ up_write(&mftbmp_ni->runlist.lock);
+ ntfs_debug("Done.");
+ return 0;
+
+restore_undo_alloc:
+ ntfs_attr_reinit_search_ctx(ctx);
+ if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+ mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+ 0, ctx)) {
+ ntfs_error(vol->sb,
+ "Failed to find last attribute extent of mft bitmap attribute.%s", es);
+ write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ mftbmp_ni->allocated_size += vol->cluster_size;
+ write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ up_write(&mftbmp_ni->runlist.lock);
+ /*
+ * The only thing that is now wrong is ->allocated_size of the
+ * base attribute extent which chkdsk should be able to fix.
+ */
+ NVolSetErrors(vol);
+ return ret;
+ }
+ a = ctx->attr;
+ a->data.non_resident.highest_vcn = cpu_to_le64(rl[1].vcn - 2);
+undo_alloc:
+ if (status.added_cluster) {
+ /* Truncate the last run in the runlist by one cluster. */
+ rl->length--;
+ rl[1].vcn--;
+ } else if (status.added_run) {
+ lcn = rl->lcn;
+ /* Remove the last run from the runlist. */
+ rl->lcn = rl[1].lcn;
+ rl->length = 0;
+ }
+ /* Deallocate the cluster. */
+ down_write(&vol->lcnbmp_lock);
+ if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
+ ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
+ NVolSetErrors(vol);
+ } else
+ ntfs_inc_free_clusters(vol, 1);
+ up_write(&vol->lcnbmp_lock);
+ if (status.mp_rebuilt) {
+ if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+ a->data.non_resident.mapping_pairs_offset),
+ old_alen - le16_to_cpu(
+ a->data.non_resident.mapping_pairs_offset),
+ rl2, ll, -1, NULL, NULL, NULL)) {
+ ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+ NVolSetErrors(vol);
+ }
+ if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+ ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+ NVolSetErrors(vol);
+ }
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ }
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ if (!IS_ERR(mrec))
+ unmap_mft_record(mft_ni);
+ up_write(&mftbmp_ni->runlist.lock);
+ return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized data
+ * @vol: volume on which to extend the mft bitmap attribute
+ *
+ * Extend the initialized portion of the mft bitmap attribute on the ntfs
+ * volume @vol by 8 bytes.
+ *
+ * Note: Only changes initialized_size and data_size, i.e. requires that
+ * allocated_size is big enough to fit the new initialized_size.
+ *
+ * Return 0 on success and -error on error.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_extend_initialized_nolock(struct ntfs_volume *vol)
+{
+ s64 old_data_size, old_initialized_size;
+ unsigned long flags;
+ struct inode *mftbmp_vi;
+ struct ntfs_inode *mft_ni, *mftbmp_ni;
+ struct ntfs_attr_search_ctx *ctx;
+ struct mft_record *mrec;
+ struct attr_record *a;
+ int ret;
+
+ ntfs_debug("Extending mft bitmap initiailized (and data) size.");
+ mft_ni = NTFS_I(vol->mft_ino);
+ mftbmp_vi = vol->mftbmp_ino;
+ mftbmp_ni = NTFS_I(mftbmp_vi);
+ /* Get the attribute record. */
+ mrec = map_mft_record(mft_ni);
+ if (IS_ERR(mrec)) {
+ ntfs_error(vol->sb, "Failed to map mft record.");
+ return PTR_ERR(mrec);
+ }
+ ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+ if (unlikely(!ctx)) {
+ ntfs_error(vol->sb, "Failed to get search context.");
+ ret = -ENOMEM;
+ goto unm_err_out;
+ }
+ ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+ mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb,
+ "Failed to find first attribute extent of mft bitmap attribute.");
+ if (ret == -ENOENT)
+ ret = -EIO;
+ goto put_err_out;
+ }
+ a = ctx->attr;
+ write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ old_data_size = i_size_read(mftbmp_vi);
+ old_initialized_size = mftbmp_ni->initialized_size;
+ /*
+ * We can simply update the initialized_size before filling the space
+ * with zeroes because the caller is holding the mft bitmap lock for
+ * writing which ensures that no one else is trying to access the data.
+ */
+ mftbmp_ni->initialized_size += 8;
+ a->data.non_resident.initialized_size =
+ cpu_to_le64(mftbmp_ni->initialized_size);
+ if (mftbmp_ni->initialized_size > old_data_size) {
+ i_size_write(mftbmp_vi, mftbmp_ni->initialized_size);
+ a->data.non_resident.data_size =
+ cpu_to_le64(mftbmp_ni->initialized_size);
+ }
+ write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ /* Ensure the changes make it to disk. */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ /* Initialize the mft bitmap attribute value with zeroes. */
+ ret = ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
+ if (likely(!ret)) {
+ ntfs_debug("Done. (Wrote eight initialized bytes to mft bitmap.");
+ ntfs_inc_free_mft_records(vol, 8 * 8);
+ return 0;
+ }
+ ntfs_error(vol->sb, "Failed to write to mft bitmap.");
+ /* Try to recover from the error. */
+ mrec = map_mft_record(mft_ni);
+ if (IS_ERR(mrec)) {
+ ntfs_error(vol->sb, "Failed to map mft record.%s", es);
+ NVolSetErrors(vol);
+ return ret;
+ }
+ ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+ if (unlikely(!ctx)) {
+ ntfs_error(vol->sb, "Failed to get search context.%s", es);
+ NVolSetErrors(vol);
+ goto unm_err_out;
+ }
+ if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+ mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+ ntfs_error(vol->sb,
+ "Failed to find first attribute extent of mft bitmap attribute.%s", es);
+ NVolSetErrors(vol);
+put_err_out:
+ ntfs_attr_put_search_ctx(ctx);
+unm_err_out:
+ unmap_mft_record(mft_ni);
+ goto err_out;
+ }
+ a = ctx->attr;
+ write_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ mftbmp_ni->initialized_size = old_initialized_size;
+ a->data.non_resident.initialized_size =
+ cpu_to_le64(old_initialized_size);
+ if (i_size_read(mftbmp_vi) != old_data_size) {
+ i_size_write(mftbmp_vi, old_data_size);
+ a->data.non_resident.data_size = cpu_to_le64(old_data_size);
+ }
+ write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+#ifdef DEBUG
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mftbmp_ni->allocated_size, i_size_read(mftbmp_vi),
+ mftbmp_ni->initialized_size);
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+err_out:
+ return ret;
+}
+
+/**
+ * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
+ * @vol: volume on which to extend the mft data attribute
+ *
+ * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
+ * worth of clusters or if not enough space for this by one mft record worth
+ * of clusters.
+ *
+ * Note: Only changes allocated_size, i.e. does not touch initialized_size or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ * - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
+ * writing and releases it before returning.
+ * - This function calls functions which take vol->lcnbmp_lock for
+ * writing and release it before returning.
+ */
+static int ntfs_mft_data_extend_allocation_nolock(struct ntfs_volume *vol)
+{
+ s64 lcn;
+ s64 old_last_vcn;
+ s64 min_nr, nr, ll;
+ unsigned long flags;
+ struct ntfs_inode *mft_ni;
+ struct runlist_element *rl, *rl2;
+ struct ntfs_attr_search_ctx *ctx = NULL;
+ struct mft_record *mrec;
+ struct attr_record *a = NULL;
+ int ret, mp_size;
+ u32 old_alen = 0;
+ bool mp_rebuilt = false;
+ size_t new_rl_count;
+
+ ntfs_debug("Extending mft data allocation.");
+ mft_ni = NTFS_I(vol->mft_ino);
+ /*
+ * Determine the preferred allocation location, i.e. the last lcn of
+ * the mft data attribute. The allocated size of the mft data
+ * attribute cannot be zero so we are ok to do this.
+ */
+ down_write(&mft_ni->runlist.lock);
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ll = mft_ni->allocated_size;
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ rl = ntfs_attr_find_vcn_nolock(mft_ni,
+ (ll - 1) >> vol->cluster_size_bits, NULL);
+ if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
+ up_write(&mft_ni->runlist.lock);
+ ntfs_error(vol->sb,
+ "Failed to determine last allocated cluster of mft data attribute.");
+ if (!IS_ERR(rl))
+ ret = -EIO;
+ else
+ ret = PTR_ERR(rl);
+ return ret;
+ }
+ lcn = rl->lcn + rl->length;
+ ntfs_debug("Last lcn of mft data attribute is 0x%llx.", lcn);
+ /* Minimum allocation is one mft record worth of clusters. */
+ min_nr = vol->mft_record_size >> vol->cluster_size_bits;
+ if (!min_nr)
+ min_nr = 1;
+ /* Want to allocate 16 mft records worth of clusters. */
+ nr = vol->mft_record_size << 4 >> vol->cluster_size_bits;
+ if (!nr)
+ nr = min_nr;
+ /* Ensure we do not go above 2^32-1 mft records. */
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ll = mft_ni->allocated_size;
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+ vol->mft_record_size_bits >= (1ll << 32))) {
+ nr = min_nr;
+ if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
+ vol->mft_record_size_bits >= (1ll << 32))) {
+ ntfs_warning(vol->sb,
+ "Cannot allocate mft record because the maximum number of inodes (2^32) has already been reached.");
+ up_write(&mft_ni->runlist.lock);
+ return -ENOSPC;
+ }
+ }
+ ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
+ nr > min_nr ? "default" : "minimal", (long long)nr);
+ old_last_vcn = rl[1].vcn;
+ /*
+ * We can release the mft_ni runlist lock, Because this function is
+ * the only one that expends $MFT data attribute and is called with
+ * mft_ni->mrec_lock.
+ * This is required for the lock order, vol->lcnbmp_lock =>
+ * mft_ni->runlist.lock.
+ */
+ up_write(&mft_ni->runlist.lock);
+
+ do {
+ rl2 = ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE,
+ true, false, false);
+ if (!IS_ERR(rl2))
+ break;
+ if (PTR_ERR(rl2) != -ENOSPC || nr == min_nr) {
+ ntfs_error(vol->sb,
+ "Failed to allocate the minimal number of clusters (%lli) for the mft data attribute.",
+ nr);
+ return PTR_ERR(rl2);
+ }
+ /*
+ * There is not enough space to do the allocation, but there
+ * might be enough space to do a minimal allocation so try that
+ * before failing.
+ */
+ nr = min_nr;
+ ntfs_debug("Retrying mft data allocation with minimal cluster count %lli.", nr);
+ } while (1);
+
+ down_write(&mft_ni->runlist.lock);
+ rl = ntfs_runlists_merge(&mft_ni->runlist, rl2, 0, &new_rl_count);
+ if (IS_ERR(rl)) {
+ up_write(&mft_ni->runlist.lock);
+ ntfs_error(vol->sb, "Failed to merge runlists for mft data attribute.");
+ if (ntfs_cluster_free_from_rl(vol, rl2)) {
+ ntfs_error(vol->sb,
+ "Failed to deallocate clusters from the mft data attribute.%s", es);
+ NVolSetErrors(vol);
+ }
+ ntfs_free(rl2);
+ return PTR_ERR(rl);
+ }
+ mft_ni->runlist.rl = rl;
+ mft_ni->runlist.count = new_rl_count;
+ ntfs_debug("Allocated %lli clusters.", (long long)nr);
+ /* Find the last run in the new runlist. */
+ for (; rl[1].length; rl++)
+ ;
+ up_write(&mft_ni->runlist.lock);
+
+ /* Update the attribute record as well. */
+ mrec = map_mft_record(mft_ni);
+ if (IS_ERR(mrec)) {
+ ntfs_error(vol->sb, "Failed to map mft record.");
+ ret = PTR_ERR(mrec);
+ down_write(&mft_ni->runlist.lock);
+ goto undo_alloc;
+ }
+ ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+ if (unlikely(!ctx)) {
+ ntfs_error(vol->sb, "Failed to get search context.");
+ ret = -ENOMEM;
+ down_write(&mft_ni->runlist.lock);
+ goto undo_alloc;
+ }
+ ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+ CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb, "Failed to find last attribute extent of mft data attribute.");
+ if (ret == -ENOENT)
+ ret = -EIO;
+ down_write(&mft_ni->runlist.lock);
+ goto undo_alloc;
+ }
+ a = ctx->attr;
+ ll = le64_to_cpu(a->data.non_resident.lowest_vcn);
+
+ down_write(&mft_ni->runlist.lock);
+ /* Search back for the previous last allocated cluster of mft bitmap. */
+ for (rl2 = rl; rl2 > mft_ni->runlist.rl; rl2--) {
+ if (ll >= rl2->vcn)
+ break;
+ }
+ BUG_ON(ll < rl2->vcn);
+ BUG_ON(ll >= rl2->vcn + rl2->length);
+ /* Get the size for the new mapping pairs array for this extent. */
+ mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1, -1);
+ if (unlikely(mp_size <= 0)) {
+ ntfs_error(vol->sb,
+ "Get size for mapping pairs failed for mft data attribute extent.");
+ ret = mp_size;
+ if (!ret)
+ ret = -EIO;
+ goto undo_alloc;
+ }
+ /* Expand the attribute record if necessary. */
+ old_alen = le32_to_cpu(a->length);
+ ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+ le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+ if (unlikely(ret)) {
+ if (ret != -ENOSPC) {
+ ntfs_error(vol->sb,
+ "Failed to resize attribute record for mft data attribute.");
+ goto undo_alloc;
+ }
+ /*
+ * Note: Use the special reserved mft records and ensure that
+ * this extent is not required to find the mft record in
+ * question. If no free special records left we would need to
+ * move an existing record away, insert ours in its place, and
+ * then place the moved record into the newly allocated space
+ * and we would then need to update all references to this mft
+ * record appropriately. This is rather complicated...
+ */
+ ntfs_error(vol->sb,
+ "Not enough space in this mft record to accommodate extended mft data attribute extent. Cannot handle this yet.");
+ ret = -EOPNOTSUPP;
+ goto undo_alloc;
+ }
+ mp_rebuilt = true;
+ /* Generate the mapping pairs array directly into the attr record. */
+ ret = ntfs_mapping_pairs_build(vol, (u8 *)a +
+ le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+ mp_size, rl2, ll, -1, NULL, NULL, NULL);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb, "Failed to build mapping pairs array of mft data attribute.");
+ goto undo_alloc;
+ }
+ /* Update the highest_vcn. */
+ a->data.non_resident.highest_vcn = cpu_to_le64(rl[1].vcn - 1);
+ /*
+ * We now have extended the mft data allocated_size by nr clusters.
+ * Reflect this in the struct ntfs_inode structure and the attribute record.
+ * @rl is the last (non-terminator) runlist element of mft data
+ * attribute.
+ */
+ up_write(&mft_ni->runlist.lock);
+ if (a->data.non_resident.lowest_vcn) {
+ /*
+ * We are not in the first attribute extent, switch to it, but
+ * first ensure the changes will make it to disk later.
+ */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_reinit_search_ctx(ctx);
+ ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name,
+ mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
+ ctx);
+ if (unlikely(ret)) {
+ ntfs_error(vol->sb,
+ "Failed to find first attribute extent of mft data attribute.");
+ goto restore_undo_alloc;
+ }
+ a = ctx->attr;
+ }
+ write_lock_irqsave(&mft_ni->size_lock, flags);
+ mft_ni->allocated_size += nr << vol->cluster_size_bits;
+ a->data.non_resident.allocated_size =
+ cpu_to_le64(mft_ni->allocated_size);
+ write_unlock_irqrestore(&mft_ni->size_lock, flags);
+ /* Ensure the changes make it to disk. */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ ntfs_debug("Done.");
+ return 0;
+restore_undo_alloc:
+ ntfs_attr_reinit_search_ctx(ctx);
+ if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+ CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
+ ntfs_error(vol->sb,
+ "Failed to find last attribute extent of mft data attribute.%s", es);
+ write_lock_irqsave(&mft_ni->size_lock, flags);
+ mft_ni->allocated_size += nr << vol->cluster_size_bits;
+ write_unlock_irqrestore(&mft_ni->size_lock, flags);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ up_write(&mft_ni->runlist.lock);
+ /*
+ * The only thing that is now wrong is ->allocated_size of the
+ * base attribute extent which chkdsk should be able to fix.
+ */
+ NVolSetErrors(vol);
+ return ret;
+ }
+ ctx->attr->data.non_resident.highest_vcn =
+ cpu_to_le64(old_last_vcn - 1);
+undo_alloc:
+ if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) {
+ ntfs_error(vol->sb, "Failed to free clusters from mft data attribute.%s", es);
+ NVolSetErrors(vol);
+ }
+
+ if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
+ ntfs_error(vol->sb, "Failed to truncate mft data attribute runlist.%s", es);
+ NVolSetErrors(vol);
+ }
+ up_write(&mft_ni->runlist.lock);
+ if (ctx) {
+ a = ctx->attr;
+ if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
+ if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
+ a->data.non_resident.mapping_pairs_offset),
+ old_alen - le16_to_cpu(
+ a->data.non_resident.mapping_pairs_offset),
+ rl2, ll, -1, NULL, NULL, NULL)) {
+ ntfs_error(vol->sb, "Failed to restore mapping pairs array.%s", es);
+ NVolSetErrors(vol);
+ }
+ if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+ ntfs_error(vol->sb, "Failed to restore attribute record.%s", es);
+ NVolSetErrors(vol);
+ }
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ } else if (IS_ERR(ctx->mrec)) {
+ ntfs_error(vol->sb, "Failed to restore attribute search context.%s", es);
+ NVolSetErrors(vol);
+ }
+ ntfs_attr_put_search_ctx(ctx);
+ }
+ if (!IS_ERR(mrec))
+ unmap_mft_record(mft_ni);
+ return ret;
+}
+
+/**
+ * ntfs_mft_record_layout - layout an mft record into a memory buffer
+ * @vol: volume to which the mft record will belong
+ * @mft_no: mft reference specifying the mft record number
+ * @m: destination buffer of size >= @vol->mft_record_size bytes
+ *
+ * Layout an empty, unused mft record with the mft record number @mft_no into
+ * the buffer @m. The volume @vol is needed because the mft record structure
+ * was modified in NTFS 3.1 so we need to know which volume version this mft
+ * record will be used on.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_layout(const struct ntfs_volume *vol, const s64 mft_no,
+ struct mft_record *m)
+{
+ struct attr_record *a;
+
+ ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+ if (mft_no >= (1ll << 32)) {
+ ntfs_error(vol->sb, "Mft record number 0x%llx exceeds maximum of 2^32.",
+ (long long)mft_no);
+ return -ERANGE;
+ }
+ /* Start by clearing the whole mft record to gives us a clean slate. */
+ memset(m, 0, vol->mft_record_size);
+ /* Aligned to 2-byte boundary. */
+ if (vol->major_ver < 3 || (vol->major_ver == 3 && !vol->minor_ver))
+ m->usa_ofs = cpu_to_le16((sizeof(struct mft_record_old) + 1) & ~1);
+ else {
+ m->usa_ofs = cpu_to_le16((sizeof(struct mft_record) + 1) & ~1);
+ /*
+ * Set the NTFS 3.1+ specific fields while we know that the
+ * volume version is 3.1+.
+ */
+ m->reserved = 0;
+ m->mft_record_number = cpu_to_le32((u32)mft_no);
+ }
+ m->magic = magic_FILE;
+ if (vol->mft_record_size >= NTFS_BLOCK_SIZE)
+ m->usa_count = cpu_to_le16(vol->mft_record_size /
+ NTFS_BLOCK_SIZE + 1);
+ else {
+ m->usa_count = cpu_to_le16(1);
+ ntfs_warning(vol->sb,
+ "Sector size is bigger than mft record size. Setting usa_count to 1. If chkdsk reports this as corruption");
+ }
+ /* Set the update sequence number to 1. */
+ *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) = cpu_to_le16(1);
+ m->lsn = 0;
+ m->sequence_number = cpu_to_le16(1);
+ m->link_count = 0;
+ /*
+ * Place the attributes straight after the update sequence array,
+ * aligned to 8-byte boundary.
+ */
+ m->attrs_offset = cpu_to_le16((le16_to_cpu(m->usa_ofs) +
+ (le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
+ m->flags = 0;
+ /*
+ * Using attrs_offset plus eight bytes (for the termination attribute).
+ * attrs_offset is already aligned to 8-byte boundary, so no need to
+ * align again.
+ */
+ m->bytes_in_use = cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
+ m->bytes_allocated = cpu_to_le32(vol->mft_record_size);
+ m->base_mft_record = 0;
+ m->next_attr_instance = 0;
+ /* Add the termination attribute. */
+ a = (struct attr_record *)((u8 *)m + le16_to_cpu(m->attrs_offset));
+ a->type = AT_END;
+ a->length = 0;
+ ntfs_debug("Done.");
+ return 0;
+}
+
+/**
+ * ntfs_mft_record_format - format an mft record on an ntfs volume
+ * @vol: volume on which to format the mft record
+ * @mft_no: mft record number to format
+ *
+ * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unused
+ * mft record into the appropriate place of the mft data attribute. This is
+ * used when extending the mft data attribute.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_format(const struct ntfs_volume *vol, const s64 mft_no)
+{
+ loff_t i_size;
+ struct inode *mft_vi = vol->mft_ino;
+ struct folio *folio;
+ struct mft_record *m;
+ pgoff_t index, end_index;
+ unsigned int ofs;
+ int err;
+
+ ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+ /*
+ * The index into the page cache and the offset within the page cache
+ * page of the wanted mft record.
+ */
+ index = mft_no << vol->mft_record_size_bits >> PAGE_SHIFT;
+ ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
+ /* The maximum valid index into the page cache for $MFT's data. */
+ i_size = i_size_read(mft_vi);
+ end_index = i_size >> PAGE_SHIFT;
+ if (unlikely(index >= end_index)) {
+ if (unlikely(index > end_index ||
+ ofs + vol->mft_record_size > (i_size & ~PAGE_MASK))) {
+ ntfs_error(vol->sb, "Tried to format non-existing mft record 0x%llx.",
+ (long long)mft_no);
+ return -ENOENT;
+ }
+ }
+
+ /* Read, map, and pin the folio containing the mft record. */
+ folio = ntfs_read_mapping_folio(mft_vi->i_mapping, index);
+ if (IS_ERR(folio)) {
+ ntfs_error(vol->sb, "Failed to map page containing mft record to format 0x%llx.",
+ (long long)mft_no);
+ return PTR_ERR(folio);
+ }
+ folio_lock(folio);
+ BUG_ON(!folio_test_uptodate(folio));
+ folio_clear_uptodate(folio);
+ m = (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+ err = ntfs_mft_record_layout(vol, mft_no, m);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
+ (long long)mft_no);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ return err;
+ }
+ pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ /*
+ * Make sure the mft record is written out to disk. We could use
+ * ilookup5() to check if an inode is in icache and so on but this is
+ * unnecessary as ntfs_writepage() will write the dirty record anyway.
+ */
+ mark_ntfs_record_dirty(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ ntfs_debug("Done.");
+ return 0;
+}
+
+/**
+ * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
+ * @vol: [IN] volume on which to allocate the mft record
+ * @mode: [IN] mode if want a file or directory, i.e. base inode or 0
+ * @base_ni: [IN] open base inode if allocating an extent mft record or NULL
+ * @ni_mrec: [OUT] on successful return this is the mapped mft record
+ *
+ * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
+ *
+ * If @base_ni is NULL make the mft record a base mft record, i.e. a file or
+ * direvctory inode, and allocate it at the default allocator position. In
+ * this case @mode is the file mode as given to us by the caller. We in
+ * particular use @mode to distinguish whether a file or a directory is being
+ * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
+ *
+ * If @base_ni is not NULL make the allocated mft record an extent record,
+ * allocate it starting at the mft record after the base mft record and attach
+ * the allocated and opened ntfs inode to the base inode @base_ni. In this
+ * case @mode must be 0 as it is meaningless for extent inodes.
+ *
+ * You need to check the return value with IS_ERR(). If false, the function
+ * was successful and the return value is the now opened ntfs inode of the
+ * allocated mft record. *@mrec is then set to the allocated, mapped, pinned,
+ * and locked mft record. If IS_ERR() is true, the function failed and the
+ * error code is obtained from PTR_ERR(return value). *@mrec is undefined in
+ * this case.
+ *
+ * Allocation strategy:
+ *
+ * To find a free mft record, we scan the mft bitmap for a zero bit. To
+ * optimize this we start scanning at the place specified by @base_ni or if
+ * @base_ni is NULL we start where we last stopped and we perform wrap around
+ * when we reach the end. Note, we do not try to allocate mft records below
+ * number 24 because numbers 0 to 15 are the defined system files anyway and 16
+ * to 24 are special in that they are used for storing extension mft records
+ * for the $DATA attribute of $MFT. This is required to avoid the possibility
+ * of creating a runlist with a circular dependency which once written to disk
+ * can never be read in again. Windows will only use records 16 to 24 for
+ * normal files if the volume is completely out of space. We never use them
+ * which means that when the volume is really out of space we cannot create any
+ * more files while Windows can still create up to 8 small files. We can start
+ * doing this at some later time, it does not matter much for now.
+ *
+ * When scanning the mft bitmap, we only search up to the last allocated mft
+ * record. If there are no free records left in the range 24 to number of
+ * allocated mft records, then we extend the $MFT/$DATA attribute in order to
+ * create free mft records. We extend the allocated size of $MFT/$DATA by 16
+ * records at a time or one cluster, if cluster size is above 16kiB. If there
+ * is not sufficient space to do this, we try to extend by a single mft record
+ * or one cluster, if cluster size is above the mft record size.
+ *
+ * No matter how many mft records we allocate, we initialize only the first
+ * allocated mft record, incrementing mft data size and initialized size
+ * accordingly, open an struct ntfs_inode for it and return it to the caller, unless
+ * there are less than 24 mft records, in which case we allocate and initialize
+ * mft records until we reach record 24 which we consider as the first free mft
+ * record for use by normal files.
+ *
+ * If during any stage we overflow the initialized data in the mft bitmap, we
+ * extend the initialized size (and data size) by 8 bytes, allocating another
+ * cluster if required. The bitmap data size has to be at least equal to the
+ * number of mft records in the mft, but it can be bigger, in which case the
+ * superfluous bits are padded with zeroes.
+ *
+ * Thus, when we return successfully (IS_ERR() is false), we will have:
+ * - initialized / extended the mft bitmap if necessary,
+ * - initialized / extended the mft data if necessary,
+ * - set the bit corresponding to the mft record being allocated in the
+ * mft bitmap,
+ * - opened an struct ntfs_inode for the allocated mft record, and we will have
+ * - returned the struct ntfs_inode as well as the allocated mapped, pinned, and
+ * locked mft record.
+ *
+ * On error, the volume will be left in a consistent state and no record will
+ * be allocated. If rolling back a partial operation fails, we may leave some
+ * inconsistent metadata in which case we set NVolErrors() so the volume is
+ * left dirty when unmounted.
+ *
+ * Note, this function cannot make use of most of the normal functions, like
+ * for example for attribute resizing, etc, because when the run list overflows
+ * the base mft record and an attribute list is used, it is very important that
+ * the extension mft records used to store the $DATA attribute of $MFT can be
+ * reached without having to read the information contained inside them, as
+ * this would make it impossible to find them in the first place after the
+ * volume is unmounted. $MFT/$BITMAP probably does not need to follow this
+ * rule because the bitmap is not essential for finding the mft records, but on
+ * the other hand, handling the bitmap in this special way would make life
+ * easier because otherwise there might be circular invocations of functions
+ * when reading the bitmap.
+ */
+int ntfs_mft_record_alloc(struct ntfs_volume *vol, const int mode,
+ struct ntfs_inode **ni, struct ntfs_inode *base_ni,
+ struct mft_record **ni_mrec)
+{
+ s64 ll, bit, old_data_initialized, old_data_size;
+ unsigned long flags;
+ struct folio *folio;
+ struct ntfs_inode *mft_ni, *mftbmp_ni;
+ struct ntfs_attr_search_ctx *ctx;
+ struct mft_record *m = NULL;
+ struct attr_record *a;
+ pgoff_t index;
+ unsigned int ofs;
+ int err;
+ __le16 seq_no, usn;
+ bool record_formatted = false;
+ unsigned int memalloc_flags;
+
+ if (base_ni && *ni)
+ return -EINVAL;
+
+ if (base_ni) {
+ ntfs_debug("Entering (allocating an extent mft record for base mft record 0x%llx).",
+ (long long)base_ni->mft_no);
+ /* @mode and @base_ni are mutually exclusive. */
+ BUG_ON(mode);
+ } else
+ ntfs_debug("Entering (allocating a base mft record).");
+ if (mode) {
+ /* @mode and @base_ni are mutually exclusive. */
+ BUG_ON(base_ni);
+ }
+
+ memalloc_flags = memalloc_nofs_save();
+
+ mft_ni = NTFS_I(vol->mft_ino);
+ mutex_lock(&mft_ni->mrec_lock);
+ mftbmp_ni = NTFS_I(vol->mftbmp_ino);
+search_free_rec:
+ down_write(&vol->mftbmp_lock);
+ bit = ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
+ if (bit >= 0) {
+ ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
+ (long long)bit);
+ goto have_alloc_rec;
+ }
+ if (bit != -ENOSPC) {
+ up_write(&vol->mftbmp_lock);
+ mutex_unlock(&mft_ni->mrec_lock);
+ memalloc_nofs_restore(memalloc_flags);
+ return bit;
+ }
+
+ /*
+ * No free mft records left. If the mft bitmap already covers more
+ * than the currently used mft records, the next records are all free,
+ * so we can simply allocate the first unused mft record.
+ * Note: We also have to make sure that the mft bitmap at least covers
+ * the first 24 mft records as they are special and whilst they may not
+ * be in use, we do not allocate from them.
+ */
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ll = mft_ni->initialized_size >> vol->mft_record_size_bits;
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ old_data_initialized = mftbmp_ni->initialized_size;
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ if (old_data_initialized << 3 > ll && old_data_initialized > 3) {
+ bit = ll;
+ if (bit < 24)
+ bit = 24;
+ if (unlikely(bit >= (1ll << 32)))
+ goto max_err_out;
+ ntfs_debug("Found free record (#2), bit 0x%llx.",
+ (long long)bit);
+ goto found_free_rec;
+ }
+ /*
+ * The mft bitmap needs to be expanded until it covers the first unused
+ * mft record that we can allocate.
+ * Note: The smallest mft record we allocate is mft record 24.
+ */
+ bit = old_data_initialized << 3;
+ if (unlikely(bit >= (1ll << 32)))
+ goto max_err_out;
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ old_data_size = mftbmp_ni->allocated_size;
+ ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ old_data_size, i_size_read(vol->mftbmp_ino),
+ old_data_initialized);
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+ if (old_data_initialized + 8 > old_data_size) {
+ /* Need to extend bitmap by one more cluster. */
+ ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
+ err = ntfs_mft_bitmap_extend_allocation_nolock(vol);
+ if (unlikely(err)) {
+ up_write(&vol->mftbmp_lock);
+ goto err_out;
+ }
+#ifdef DEBUG
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ ntfs_debug("Status of mftbmp after allocation extension: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mftbmp_ni->allocated_size,
+ i_size_read(vol->mftbmp_ino),
+ mftbmp_ni->initialized_size);
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+ }
+ /*
+ * We now have sufficient allocated space, extend the initialized_size
+ * as well as the data_size if necessary and fill the new space with
+ * zeroes.
+ */
+ err = ntfs_mft_bitmap_extend_initialized_nolock(vol);
+ if (unlikely(err)) {
+ up_write(&vol->mftbmp_lock);
+ goto err_out;
+ }
+#ifdef DEBUG
+ read_lock_irqsave(&mftbmp_ni->size_lock, flags);
+ ntfs_debug("Status of mftbmp after initialized extension: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mftbmp_ni->allocated_size,
+ i_size_read(vol->mftbmp_ino),
+ mftbmp_ni->initialized_size);
+ read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
+#endif /* DEBUG */
+ ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
+found_free_rec:
+ /* @bit is the found free mft record, allocate it in the mft bitmap. */
+ ntfs_debug("At found_free_rec.");
+ err = ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
+ up_write(&vol->mftbmp_lock);
+ goto err_out;
+ }
+ ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
+have_alloc_rec:
+ /*
+ * The mft bitmap is now uptodate. Deal with mft data attribute now.
+ * Note, we keep hold of the mft bitmap lock for writing until all
+ * modifications to the mft data attribute are complete, too, as they
+ * will impact decisions for mft bitmap and mft record allocation done
+ * by a parallel allocation and if the lock is not maintained a
+ * parallel allocation could allocate the same mft record as this one.
+ */
+ ll = (bit + 1) << vol->mft_record_size_bits;
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ old_data_initialized = mft_ni->initialized_size;
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ if (ll <= old_data_initialized) {
+ ntfs_debug("Allocated mft record already initialized.");
+ goto mft_rec_already_initialized;
+ }
+ ntfs_debug("Initializing allocated mft record.");
+ /*
+ * The mft record is outside the initialized data. Extend the mft data
+ * attribute until it covers the allocated record. The loop is only
+ * actually traversed more than once when a freshly formatted volume is
+ * first written to so it optimizes away nicely in the common case.
+ */
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ntfs_debug("Status of mft data before extension: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mft_ni->allocated_size, i_size_read(vol->mft_ino),
+ mft_ni->initialized_size);
+ while (ll > mft_ni->allocated_size) {
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ err = ntfs_mft_data_extend_allocation_nolock(vol);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to extend mft data allocation.");
+ goto undo_mftbmp_alloc_nolock;
+ }
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ntfs_debug("Status of mft data after allocation extension: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mft_ni->allocated_size, i_size_read(vol->mft_ino),
+ mft_ni->initialized_size);
+ }
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ /*
+ * Extend mft data initialized size (and data size of course) to reach
+ * the allocated mft record, formatting the mft records allong the way.
+ * Note: We only modify the struct ntfs_inode structure as that is all that is
+ * needed by ntfs_mft_record_format(). We will update the attribute
+ * record itself in one fell swoop later on.
+ */
+ write_lock_irqsave(&mft_ni->size_lock, flags);
+ old_data_initialized = mft_ni->initialized_size;
+ old_data_size = vol->mft_ino->i_size;
+ while (ll > mft_ni->initialized_size) {
+ s64 new_initialized_size, mft_no;
+
+ new_initialized_size = mft_ni->initialized_size +
+ vol->mft_record_size;
+ mft_no = mft_ni->initialized_size >> vol->mft_record_size_bits;
+ if (new_initialized_size > i_size_read(vol->mft_ino))
+ i_size_write(vol->mft_ino, new_initialized_size);
+ write_unlock_irqrestore(&mft_ni->size_lock, flags);
+ ntfs_debug("Initializing mft record 0x%llx.",
+ (long long)mft_no);
+ err = ntfs_mft_record_format(vol, mft_no);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to format mft record.");
+ goto undo_data_init;
+ }
+ write_lock_irqsave(&mft_ni->size_lock, flags);
+ mft_ni->initialized_size = new_initialized_size;
+ }
+ write_unlock_irqrestore(&mft_ni->size_lock, flags);
+ record_formatted = true;
+ /* Update the mft data attribute record to reflect the new sizes. */
+ m = map_mft_record(mft_ni);
+ if (IS_ERR(m)) {
+ ntfs_error(vol->sb, "Failed to map mft record.");
+ err = PTR_ERR(m);
+ goto undo_data_init;
+ }
+ ctx = ntfs_attr_get_search_ctx(mft_ni, m);
+ if (unlikely(!ctx)) {
+ ntfs_error(vol->sb, "Failed to get search context.");
+ err = -ENOMEM;
+ unmap_mft_record(mft_ni);
+ goto undo_data_init;
+ }
+ err = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+ CASE_SENSITIVE, 0, NULL, 0, ctx);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to find first attribute extent of mft data attribute.");
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ goto undo_data_init;
+ }
+ a = ctx->attr;
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ a->data.non_resident.initialized_size =
+ cpu_to_le64(mft_ni->initialized_size);
+ a->data.non_resident.data_size =
+ cpu_to_le64(i_size_read(vol->mft_ino));
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+ /* Ensure the changes make it to disk. */
+ mark_mft_record_dirty(ctx->ntfs_ino);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(mft_ni);
+ read_lock_irqsave(&mft_ni->size_lock, flags);
+ ntfs_debug("Status of mft data after mft record initialization: allocated_size 0x%llx, data_size 0x%llx, initialized_size 0x%llx.",
+ mft_ni->allocated_size, i_size_read(vol->mft_ino),
+ mft_ni->initialized_size);
+ BUG_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size);
+ BUG_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino));
+ read_unlock_irqrestore(&mft_ni->size_lock, flags);
+mft_rec_already_initialized:
+ /*
+ * We can finally drop the mft bitmap lock as the mft data attribute
+ * has been fully updated. The only disparity left is that the
+ * allocated mft record still needs to be marked as in use to match the
+ * set bit in the mft bitmap but this is actually not a problem since
+ * this mft record is not referenced from anywhere yet and the fact
+ * that it is allocated in the mft bitmap means that no-one will try to
+ * allocate it either.
+ */
+ up_write(&vol->mftbmp_lock);
+ /*
+ * We now have allocated and initialized the mft record. Calculate the
+ * index of and the offset within the page cache page the record is in.
+ */
+ index = bit << vol->mft_record_size_bits >> PAGE_SHIFT;
+ ofs = (bit << vol->mft_record_size_bits) & ~PAGE_MASK;
+ /* Read, map, and pin the folio containing the mft record. */
+ folio = ntfs_read_mapping_folio(vol->mft_ino->i_mapping, index);
+ if (IS_ERR(folio)) {
+ ntfs_error(vol->sb, "Failed to map page containing allocated mft record 0x%llx.",
+ bit);
+ err = PTR_ERR(folio);
+ goto undo_mftbmp_alloc;
+ }
+ folio_lock(folio);
+ BUG_ON(!folio_test_uptodate(folio));
+ folio_clear_uptodate(folio);
+ m = (struct mft_record *)((u8 *)kmap_local_folio(folio, 0) + ofs);
+ /* If we just formatted the mft record no need to do it again. */
+ if (!record_formatted) {
+ /* Sanity check that the mft record is really not in use. */
+ if (ntfs_is_file_record(m->magic) &&
+ (m->flags & MFT_RECORD_IN_USE)) {
+ ntfs_warning(vol->sb,
+ "Mft record 0x%llx was marked free in mft bitmap but is marked used itself. Unmount and run chkdsk.",
+ bit);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ NVolSetErrors(vol);
+ goto search_free_rec;
+ }
+ /*
+ * We need to (re-)format the mft record, preserving the
+ * sequence number if it is not zero as well as the update
+ * sequence number if it is not zero or -1 (0xffff). This
+ * means we do not need to care whether or not something went
+ * wrong with the previous mft record.
+ */
+ seq_no = m->sequence_number;
+ usn = *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs));
+ err = ntfs_mft_record_layout(vol, bit, m);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb, "Failed to layout allocated mft record 0x%llx.",
+ bit);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ goto undo_mftbmp_alloc;
+ }
+ if (seq_no)
+ m->sequence_number = seq_no;
+ if (usn && le16_to_cpu(usn) != 0xffff)
+ *(__le16 *)((u8 *)m + le16_to_cpu(m->usa_ofs)) = usn;
+ pre_write_mst_fixup((struct ntfs_record *)m, vol->mft_record_size);
+ }
+ /* Set the mft record itself in use. */
+ m->flags |= MFT_RECORD_IN_USE;
+ if (S_ISDIR(mode))
+ m->flags |= MFT_RECORD_IS_DIRECTORY;
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ if (base_ni) {
+ struct mft_record *m_tmp;
+
+ /*
+ * Setup the base mft record in the extent mft record. This
+ * completes initialization of the allocated extent mft record
+ * and we can simply use it with map_extent_mft_record().
+ */
+ m->base_mft_record = MK_LE_MREF(base_ni->mft_no,
+ base_ni->seq_no);
+ /*
+ * Allocate an extent inode structure for the new mft record,
+ * attach it to the base inode @base_ni and map, pin, and lock
+ * its, i.e. the allocated, mft record.
+ */
+ m_tmp = map_extent_mft_record(base_ni,
+ MK_MREF(bit, le16_to_cpu(m->sequence_number)),
+ ni);
+ if (IS_ERR(m_tmp)) {
+ ntfs_error(vol->sb, "Failed to map allocated extent mft record 0x%llx.",
+ bit);
+ err = PTR_ERR(m_tmp);
+ /* Set the mft record itself not in use. */
+ m->flags &= cpu_to_le16(
+ ~le16_to_cpu(MFT_RECORD_IN_USE));
+ flush_dcache_folio(folio);
+ /* Make sure the mft record is written out to disk. */
+ mark_ntfs_record_dirty(folio);
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ goto undo_mftbmp_alloc;
+ }
+
+ /*
+ * Make sure the allocated mft record is written out to disk.
+ * No need to set the inode dirty because the caller is going
+ * to do that anyway after finishing with the new extent mft
+ * record (e.g. at a minimum a new attribute will be added to
+ * the mft record.
+ */
+ mark_ntfs_record_dirty(folio);
+ folio_unlock(folio);
+ /*
+ * Need to unmap the page since map_extent_mft_record() mapped
+ * it as well so we have it mapped twice at the moment.
+ */
+ ntfs_unmap_folio(folio, m);
+ } else {
+ /*
+ * Manually map, pin, and lock the mft record as we already
+ * have its page mapped and it is very easy to do.
+ */
+ (*ni)->seq_no = le16_to_cpu(m->sequence_number);
+ /*
+ * Make sure the allocated mft record is written out to disk.
+ * NOTE: We do not set the ntfs inode dirty because this would
+ * fail in ntfs_write_inode() because the inode does not have a
+ * standard information attribute yet. Also, there is no need
+ * to set the inode dirty because the caller is going to do
+ * that anyway after finishing with the new mft record (e.g. at
+ * a minimum some new attributes will be added to the mft
+ * record.
+ */
+
+ (*ni)->mrec = kmalloc(vol->mft_record_size, GFP_NOFS);
+ if (!(*ni)->mrec) {
+ folio_unlock(folio);
+ ntfs_unmap_folio(folio, m);
+ goto undo_mftbmp_alloc;
+ }
+
+ memcpy((*ni)->mrec, m, vol->mft_record_size);
+ post_read_mst_fixup((struct ntfs_record *)(*ni)->mrec, vol->mft_record_size);
+ mark_ntfs_record_dirty(folio);
+ folio_unlock(folio);
+ (*ni)->folio = folio;
+ (*ni)->folio_ofs = ofs;
+ atomic_inc(&(*ni)->count);
+ /* Update the default mft allocation position. */
+ vol->mft_data_pos = bit + 1;
+ }
+ mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+ memalloc_nofs_restore(memalloc_flags);
+
+ /*
+ * Return the opened, allocated inode of the allocated mft record as
+ * well as the mapped, pinned, and locked mft record.
+ */
+ ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
+ base_ni ? "extent " : "", bit);
+ (*ni)->mft_no = bit;
+ if (ni_mrec)
+ *ni_mrec = (*ni)->mrec;
+ ntfs_dec_free_mft_records(vol, 1);
+ return 0;
+undo_data_init:
+ write_lock_irqsave(&mft_ni->size_lock, flags);
+ mft_ni->initialized_size = old_data_initialized;
+ i_size_write(vol->mft_ino, old_data_size);
+ write_unlock_irqrestore(&mft_ni->size_lock, flags);
+ goto undo_mftbmp_alloc_nolock;
+undo_mftbmp_alloc:
+ down_write(&vol->mftbmp_lock);
+undo_mftbmp_alloc_nolock:
+ if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
+ ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
+ NVolSetErrors(vol);
+ }
+ up_write(&vol->mftbmp_lock);
+err_out:
+ mutex_unlock(&mft_ni->mrec_lock);
+ memalloc_nofs_restore(memalloc_flags);
+ return err;
+max_err_out:
+ ntfs_warning(vol->sb,
+ "Cannot allocate mft record because the maximum number of inodes (2^32) has already been reached.");
+ up_write(&vol->mftbmp_lock);
+ mutex_unlock(&NTFS_I(vol->mft_ino)->mrec_lock);
+ memalloc_nofs_restore(memalloc_flags);
+ return -ENOSPC;
+}
+
+/**
+ * ntfs_mft_record_free - free an mft record on an ntfs volume
+ * @vol: volume on which to free the mft record
+ * @ni: open ntfs inode of the mft record to free
+ *
+ * Free the mft record of the open inode @ni on the mounted ntfs volume @vol.
+ * Note that this function calls ntfs_inode_close() internally and hence you
+ * cannot use the pointer @ni any more after this function returns success.
+ *
+ * On success return 0 and on error return -1 with errno set to the error code.
+ */
+int ntfs_mft_record_free(struct ntfs_volume *vol, struct ntfs_inode *ni)
+{
+ u64 mft_no;
+ int err;
+ u16 seq_no;
+ __le16 old_seq_no;
+ struct mft_record *ni_mrec;
+ unsigned int memalloc_flags;
+
+ ntfs_debug("Entering for inode 0x%llx.\n", (long long)ni->mft_no);
+
+ if (!vol || !ni)
+ return -EINVAL;
+
+ ni_mrec = map_mft_record(ni);
+ if (IS_ERR(ni_mrec))
+ return -EIO;
+
+ /* Cache the mft reference for later. */
+ mft_no = ni->mft_no;
+
+ /* Mark the mft record as not in use. */
+ ni_mrec->flags &= ~MFT_RECORD_IN_USE;
+
+ /* Increment the sequence number, skipping zero, if it is not zero. */
+ old_seq_no = ni_mrec->sequence_number;
+ seq_no = le16_to_cpu(old_seq_no);
+ if (seq_no == 0xffff)
+ seq_no = 1;
+ else if (seq_no)
+ seq_no++;
+ ni_mrec->sequence_number = cpu_to_le16(seq_no);
+
+ /*
+ * Set the ntfs inode dirty and write it out. We do not need to worry
+ * about the base inode here since whatever caused the extent mft
+ * record to be freed is guaranteed to do it already.
+ */
+ NInoSetDirty(ni);
+ err = write_mft_record(ni, ni_mrec, 0);
+ if (err)
+ goto sync_rollback;
+
+ /* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
+ memalloc_flags = memalloc_nofs_save();
+ down_write(&vol->mftbmp_lock);
+ err = ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
+ up_write(&vol->mftbmp_lock);
+ memalloc_nofs_restore(memalloc_flags);
+ if (err)
+ goto bitmap_rollback;
+
+ unmap_mft_record(ni);
+ ntfs_inc_free_mft_records(vol, 1);
+ return 0;
+
+ /* Rollback what we did... */
+bitmap_rollback:
+ memalloc_flags = memalloc_nofs_save();
+ down_write(&vol->mftbmp_lock);
+ if (ntfs_bitmap_set_bit(vol->mftbmp_ino, mft_no))
+ ntfs_error(vol->sb, "ntfs_bitmap_set_bit failed in bitmap_rollback\n");
+ up_write(&vol->mftbmp_lock);
+ memalloc_nofs_restore(memalloc_flags);
+sync_rollback:
+ ntfs_error(vol->sb,
+ "Eeek! Rollback failed in %s. Leaving inconsistent metadata!\n", __func__);
+ ni_mrec->flags |= MFT_RECORD_IN_USE;
+ ni_mrec->sequence_number = old_seq_no;
+ NInoSetDirty(ni);
+ write_mft_record(ni, ni_mrec, 0);
+ unmap_mft_record(ni);
+ return err;
+}
diff --git a/fs/ntfsplus/mst.c b/fs/ntfsplus/mst.c
new file mode 100644
index 000000000000..e88f52831cb8
--- /dev/null
+++ b/fs/ntfsplus/mst.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS multi sector transfer protection handling code.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ */
+
+#include <linux/ratelimit.h>
+
+#include "ntfs.h"
+
+/**
+ * post_read_mst_fixup - deprotect multi sector transfer protected data
+ * @b: pointer to the data to deprotect
+ * @size: size in bytes of @b
+ *
+ * Perform the necessary post read multi sector transfer fixup and detect the
+ * presence of incomplete multi sector transfers. - In that case, overwrite the
+ * magic of the ntfs record header being processed with "BAAD" (in memory only!)
+ * and abort processing.
+ *
+ * Return 0 on success and -EINVAL on error ("BAAD" magic will be present).
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array to
+ * mean that the structure is not protected at all and hence doesn't need to
+ * be fixed up. Thus, we return success and not failure in this case. This is
+ * in contrast to pre_write_mst_fixup(), see below.
+ */
+int post_read_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+ u16 usa_ofs, usa_count, usn;
+ u16 *usa_pos, *data_pos;
+
+ /* Setup the variables. */
+ usa_ofs = le16_to_cpu(b->usa_ofs);
+ /* Decrement usa_count to get number of fixups. */
+ usa_count = le16_to_cpu(b->usa_count) - 1;
+ /* Size and alignment checks. */
+ if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 ||
+ usa_ofs + (usa_count * 2) > size ||
+ (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
+ return 0;
+ /* Position of usn in update sequence array. */
+ usa_pos = (u16 *)b + usa_ofs/sizeof(u16);
+ /*
+ * The update sequence number which has to be equal to each of the
+ * u16 values before they are fixed up. Note no need to care for
+ * endianness since we are comparing and moving data for on disk
+ * structures which means the data is consistent. - If it is
+ * consistenty the wrong endianness it doesn't make any difference.
+ */
+ usn = *usa_pos;
+ /*
+ * Position in protected data of first u16 that needs fixing up.
+ */
+ data_pos = (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+ /*
+ * Check for incomplete multi sector transfer(s).
+ */
+ while (usa_count--) {
+ if (*data_pos != usn) {
+ struct mft_record *m = (struct mft_record *)b;
+
+ pr_err_ratelimited("ntfs: Incomplete multi sector transfer detected! (Record magic : 0x%x, mft number : 0x%x, base mft number : 0x%lx, mft in use : %d, data : 0x%x, usn 0x%x)\n",
+ le32_to_cpu(m->magic), le32_to_cpu(m->mft_record_number),
+ MREF_LE(m->base_mft_record), m->flags & MFT_RECORD_IN_USE,
+ *data_pos, usn);
+ /*
+ * Incomplete multi sector transfer detected! )-:
+ * Set the magic to "BAAD" and return failure.
+ * Note that magic_BAAD is already converted to le32.
+ */
+ b->magic = magic_BAAD;
+ return -EINVAL;
+ }
+ data_pos += NTFS_BLOCK_SIZE / sizeof(u16);
+ }
+ /* Re-setup the variables. */
+ usa_count = le16_to_cpu(b->usa_count) - 1;
+ data_pos = (u16 *)b + NTFS_BLOCK_SIZE / sizeof(u16) - 1;
+ /* Fixup all sectors. */
+ while (usa_count--) {
+ /*
+ * Increment position in usa and restore original data from
+ * the usa into the data buffer.
+ */
+ *data_pos = *(++usa_pos);
+ /* Increment position in data as well. */
+ data_pos += NTFS_BLOCK_SIZE/sizeof(u16);
+ }
+ return 0;
+}
+
+/**
+ * pre_write_mst_fixup - apply multi sector transfer protection
+ * @b: pointer to the data to protect
+ * @size: size in bytes of @b
+ *
+ * Perform the necessary pre write multi sector transfer fixup on the data
+ * pointer to by @b of @size.
+ *
+ * Return 0 if fixup applied (success) or -EINVAL if no fixup was performed
+ * (assumed not needed). This is in contrast to post_read_mst_fixup() above.
+ *
+ * NOTE: We consider the absence / invalidity of an update sequence array to
+ * mean that the structure is not subject to protection and hence doesn't need
+ * to be fixed up. This means that you have to create a valid update sequence
+ * array header in the ntfs record before calling this function, otherwise it
+ * will fail (the header needs to contain the position of the update sequence
+ * array together with the number of elements in the array). You also need to
+ * initialise the update sequence number before calling this function
+ * otherwise a random word will be used (whatever was in the record at that
+ * position at that time).
+ */
+int pre_write_mst_fixup(struct ntfs_record *b, const u32 size)
+{
+ __le16 *usa_pos, *data_pos;
+ u16 usa_ofs, usa_count, usn;
+ __le16 le_usn;
+
+ /* Sanity check + only fixup if it makes sense. */
+ if (!b || ntfs_is_baad_record(b->magic) ||
+ ntfs_is_hole_record(b->magic))
+ return -EINVAL;
+ /* Setup the variables. */
+ usa_ofs = le16_to_cpu(b->usa_ofs);
+ /* Decrement usa_count to get number of fixups. */
+ usa_count = le16_to_cpu(b->usa_count) - 1;
+ /* Size and alignment checks. */
+ if (size & (NTFS_BLOCK_SIZE - 1) || usa_ofs & 1 ||
+ usa_ofs + (usa_count * 2) > size ||
+ (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
+ return -EINVAL;
+ /* Position of usn in update sequence array. */
+ usa_pos = (__le16 *)((u8 *)b + usa_ofs);
+ /*
+ * Cyclically increment the update sequence number
+ * (skipping 0 and -1, i.e. 0xffff).
+ */
+ usn = le16_to_cpup(usa_pos) + 1;
+ if (usn == 0xffff || !usn)
+ usn = 1;
+ le_usn = cpu_to_le16(usn);
+ *usa_pos = le_usn;
+ /* Position in data of first u16 that needs fixing up. */
+ data_pos = (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+ /* Fixup all sectors. */
+ while (usa_count--) {
+ /*
+ * Increment the position in the usa and save the
+ * original data from the data buffer into the usa.
+ */
+ *(++usa_pos) = *data_pos;
+ /* Apply fixup to data. */
+ *data_pos = le_usn;
+ /* Increment position in data as well. */
+ data_pos += NTFS_BLOCK_SIZE / sizeof(__le16);
+ }
+ return 0;
+}
+
+/**
+ * post_write_mst_fixup - fast deprotect multi sector transfer protected data
+ * @b: pointer to the data to deprotect
+ *
+ * Perform the necessary post write multi sector transfer fixup, not checking
+ * for any errors, because we assume we have just used pre_write_mst_fixup(),
+ * thus the data will be fine or we would never have gotten here.
+ */
+void post_write_mst_fixup(struct ntfs_record *b)
+{
+ __le16 *usa_pos, *data_pos;
+
+ u16 usa_ofs = le16_to_cpu(b->usa_ofs);
+ u16 usa_count = le16_to_cpu(b->usa_count) - 1;
+
+ /* Position of usn in update sequence array. */
+ usa_pos = (__le16 *)b + usa_ofs/sizeof(__le16);
+
+ /* Position in protected data of first u16 that needs fixing up. */
+ data_pos = (__le16 *)b + NTFS_BLOCK_SIZE/sizeof(__le16) - 1;
+
+ /* Fixup all sectors. */
+ while (usa_count--) {
+ /*
+ * Increment position in usa and restore original data from
+ * the usa into the data buffer.
+ */
+ *data_pos = *(++usa_pos);
+
+ /* Increment position in data as well. */
+ data_pos += NTFS_BLOCK_SIZE/sizeof(__le16);
+ }
+}
diff --git a/fs/ntfsplus/namei.c b/fs/ntfsplus/namei.c
new file mode 100644
index 000000000000..d3f9dc629563
--- /dev/null
+++ b/fs/ntfsplus/namei.c
@@ -0,0 +1,1606 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel directory inode operations.
+ * Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2006 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/exportfs.h>
+#include <linux/iversion.h>
+
+#include "ntfs.h"
+#include "misc.h"
+#include "index.h"
+#include "reparse.h"
+#include "ea.h"
+
+static inline int ntfs_check_bad_char(const unsigned short *wc,
+ unsigned int wc_len)
+{
+ int i;
+
+ for (i = 0; i < wc_len; i++) {
+ if ((wc[i] < 0x0020) ||
+ (wc[i] == 0x0022) || (wc[i] == 0x002A) || (wc[i] == 0x002F) ||
+ (wc[i] == 0x003A) || (wc[i] == 0x003C) || (wc[i] == 0x003E) ||
+ (wc[i] == 0x003F) || (wc[i] == 0x005C) || (wc[i] == 0x007C))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * ntfs_lookup - find the inode represented by a dentry in a directory inode
+ * @dir_ino: directory inode in which to look for the inode
+ * @dent: dentry representing the inode to look for
+ * @flags: lookup flags
+ *
+ * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
+ * in the directory inode @dir_ino and if found attaches the inode to the
+ * dentry @dent.
+ *
+ * In more detail, the dentry @dent specifies which inode to look for by
+ * supplying the name of the inode in @dent->d_name.name. ntfs_lookup()
+ * converts the name to Unicode and walks the contents of the directory inode
+ * @dir_ino looking for the converted Unicode name. If the name is found in the
+ * directory, the corresponding inode is loaded by calling ntfs_iget() on its
+ * inode number and the inode is associated with the dentry @dent via a call to
+ * d_splice_alias().
+ *
+ * If the name is not found in the directory, a NULL inode is inserted into the
+ * dentry @dent via a call to d_add(). The dentry is then termed a negative
+ * dentry.
+ *
+ * Only if an actual error occurs, do we return an error via ERR_PTR().
+ *
+ * In order to handle the case insensitivity issues of NTFS with regards to the
+ * dcache and the dcache requiring only one dentry per directory, we deal with
+ * dentry aliases that only differ in case in ->ntfs_lookup() while maintaining
+ * a case sensitive dcache. This means that we get the full benefit of dcache
+ * speed when the file/directory is looked up with the same case as returned by
+ * ->ntfs_readdir() but that a lookup for any other case (or for the short file
+ * name) will not find anything in dcache and will enter ->ntfs_lookup()
+ * instead, where we search the directory for a fully matching file name
+ * (including case) and if that is not found, we search for a file name that
+ * matches with different case and if that has non-POSIX semantics we return
+ * that. We actually do only one search (case sensitive) and keep tabs on
+ * whether we have found a case insensitive match in the process.
+ *
+ * To simplify matters for us, we do not treat the short vs long filenames as
+ * two hard links but instead if the lookup matches a short filename, we
+ * return the dentry for the corresponding long filename instead.
+ *
+ * There are three cases we need to distinguish here:
+ *
+ * 1) @dent perfectly matches (i.e. including case) a directory entry with a
+ * file name in the WIN32 or POSIX namespaces. In this case
+ * ntfs_lookup_inode_by_name() will return with name set to NULL and we
+ * just d_splice_alias() @dent.
+ * 2) @dent matches (not including case) a directory entry with a file name in
+ * the WIN32 namespace. In this case ntfs_lookup_inode_by_name() will return
+ * with name set to point to a kmalloc()ed ntfs_name structure containing
+ * the properly cased little endian Unicode name. We convert the name to the
+ * current NLS code page, search if a dentry with this name already exists
+ * and if so return that instead of @dent. At this point things are
+ * complicated by the possibility of 'disconnected' dentries due to NFS
+ * which we deal with appropriately (see the code comments). The VFS will
+ * then destroy the old @dent and use the one we returned. If a dentry is
+ * not found, we allocate a new one, d_splice_alias() it, and return it as
+ * above.
+ * 3) @dent matches either perfectly or not (i.e. we don't care about case) a
+ * directory entry with a file name in the DOS namespace. In this case
+ * ntfs_lookup_inode_by_name() will return with name set to point to a
+ * kmalloc()ed ntfs_name structure containing the mft reference (cpu endian)
+ * of the inode. We use the mft reference to read the inode and to find the
+ * file name in the WIN32 namespace corresponding to the matched short file
+ * name. We then convert the name to the current NLS code page, and proceed
+ * searching for a dentry with this name, etc, as in case 2), above.
+ *
+ * Locking: Caller must hold i_mutex on the directory.
+ */
+static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
+ unsigned int flags)
+{
+ struct ntfs_volume *vol = NTFS_SB(dir_ino->i_sb);
+ struct inode *dent_inode;
+ __le16 *uname;
+ struct ntfs_name *name = NULL;
+ u64 mref;
+ unsigned long dent_ino;
+ int uname_len;
+
+ ntfs_debug("Looking up %pd in directory inode 0x%lx.",
+ dent, dir_ino->i_ino);
+ /* Convert the name of the dentry to Unicode. */
+ uname_len = ntfs_nlstoucs(vol, dent->d_name.name, dent->d_name.len,
+ &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_debug("Failed to convert name to Unicode.");
+ return ERR_PTR(uname_len);
+ }
+ mutex_lock(&NTFS_I(dir_ino)->mrec_lock);
+ mref = ntfs_lookup_inode_by_name(NTFS_I(dir_ino), uname, uname_len,
+ &name);
+ mutex_unlock(&NTFS_I(dir_ino)->mrec_lock);
+ kmem_cache_free(ntfs_name_cache, uname);
+ if (!IS_ERR_MREF(mref)) {
+ dent_ino = MREF(mref);
+ ntfs_debug("Found inode 0x%lx. Calling ntfs_iget.", dent_ino);
+ dent_inode = ntfs_iget(vol->sb, dent_ino);
+ if (!IS_ERR(dent_inode)) {
+ /* Consistency check. */
+ if (MSEQNO(mref) == NTFS_I(dent_inode)->seq_no ||
+ dent_ino == FILE_MFT) {
+ /* Perfect WIN32/POSIX match. -- Case 1. */
+ if (!name) {
+ ntfs_debug("Done. (Case 1.)");
+ return d_splice_alias(dent_inode, dent);
+ }
+ /*
+ * We are too indented. Handle imperfect
+ * matches and short file names further below.
+ */
+ goto handle_name;
+ }
+ ntfs_error(vol->sb,
+ "Found stale reference to inode 0x%lx (reference sequence number = 0x%x, inode sequence number = 0x%x), returning -EIO. Run chkdsk.",
+ dent_ino, MSEQNO(mref),
+ NTFS_I(dent_inode)->seq_no);
+ iput(dent_inode);
+ dent_inode = ERR_PTR(-EIO);
+ } else
+ ntfs_error(vol->sb, "ntfs_iget(0x%lx) failed with error code %li.",
+ dent_ino, PTR_ERR(dent_inode));
+ kfree(name);
+ /* Return the error code. */
+ return ERR_CAST(dent_inode);
+ }
+ kfree(name);
+ /* It is guaranteed that @name is no longer allocated at this point. */
+ if (MREF_ERR(mref) == -ENOENT) {
+ ntfs_debug("Entry was not found, adding negative dentry.");
+ /* The dcache will handle negative entries. */
+ d_add(dent, NULL);
+ ntfs_debug("Done.");
+ return NULL;
+ }
+ ntfs_error(vol->sb, "ntfs_lookup_ino_by_name() failed with error code %i.",
+ -MREF_ERR(mref));
+ return ERR_PTR(MREF_ERR(mref));
+handle_name:
+ {
+ struct mft_record *m;
+ struct ntfs_attr_search_ctx *ctx;
+ struct ntfs_inode *ni = NTFS_I(dent_inode);
+ int err;
+ struct qstr nls_name;
+
+ nls_name.name = NULL;
+ if (name->type != FILE_NAME_DOS) { /* Case 2. */
+ ntfs_debug("Case 2.");
+ nls_name.len = (unsigned int)ntfs_ucstonls(vol,
+ (__le16 *)&name->name, name->len,
+ (unsigned char **)&nls_name.name, 0);
+ kfree(name);
+ } else /* if (name->type == FILE_NAME_DOS) */ { /* Case 3. */
+ struct file_name_attr *fn;
+
+ ntfs_debug("Case 3.");
+ kfree(name);
+
+ /* Find the WIN32 name corresponding to the matched DOS name. */
+ ni = NTFS_I(dent_inode);
+ m = map_mft_record(ni);
+ if (IS_ERR(m)) {
+ err = PTR_ERR(m);
+ m = NULL;
+ ctx = NULL;
+ goto err_out;
+ }
+ ctx = ntfs_attr_get_search_ctx(ni, m);
+ if (unlikely(!ctx)) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ do {
+ struct attr_record *a;
+ u32 val_len;
+
+ err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0,
+ NULL, 0, ctx);
+ if (unlikely(err)) {
+ ntfs_error(vol->sb,
+ "Inode corrupt: No WIN32 namespace counterpart to DOS file name. Run chkdsk.");
+ if (err == -ENOENT)
+ err = -EIO;
+ goto err_out;
+ }
+ /* Consistency checks. */
+ a = ctx->attr;
+ if (a->non_resident || a->flags)
+ goto eio_err_out;
+ val_len = le32_to_cpu(a->data.resident.value_length);
+ if (le16_to_cpu(a->data.resident.value_offset) +
+ val_len > le32_to_cpu(a->length))
+ goto eio_err_out;
+ fn = (struct file_name_attr *)((u8 *)ctx->attr + le16_to_cpu(
+ ctx->attr->data.resident.value_offset));
+ if ((u32)(fn->file_name_length * sizeof(__le16) +
+ sizeof(struct file_name_attr)) > val_len)
+ goto eio_err_out;
+ } while (fn->file_name_type != FILE_NAME_WIN32);
+
+ /* Convert the found WIN32 name to current NLS code page. */
+ nls_name.len = (unsigned int)ntfs_ucstonls(vol,
+ (__le16 *)&fn->file_name, fn->file_name_length,
+ (unsigned char **)&nls_name.name, 0);
+
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+ }
+ m = NULL;
+ ctx = NULL;
+
+ /* Check if a conversion error occurred. */
+ if ((int)nls_name.len < 0) {
+ err = (int)nls_name.len;
+ goto err_out;
+ }
+ nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
+
+ dent = d_add_ci(dent, dent_inode, &nls_name);
+ kfree(nls_name.name);
+ return dent;
+
+eio_err_out:
+ ntfs_error(vol->sb, "Illegal file name attribute. Run chkdsk.");
+ err = -EIO;
+err_out:
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ if (m)
+ unmap_mft_record(ni);
+ iput(dent_inode);
+ ntfs_error(vol->sb, "Failed, returning error code %i.", err);
+ return ERR_PTR(err);
+ }
+}
+
+static int ntfs_sd_add_everyone(struct ntfs_inode *ni)
+{
+ struct security_descriptor_relative *sd;
+ struct ntfs_acl *acl;
+ struct ntfs_ace *ace;
+ struct ntfs_sid *sid;
+ int ret, sd_len;
+
+ /* Create SECURITY_DESCRIPTOR attribute (everyone has full access). */
+ /*
+ * Calculate security descriptor length. We have 2 sub-authorities in
+ * owner and group SIDs, So add 8 bytes to every SID.
+ */
+ sd_len = sizeof(struct security_descriptor_relative) + 2 *
+ (sizeof(struct ntfs_sid) + 8) + sizeof(struct ntfs_acl) +
+ sizeof(struct ntfs_ace) + 4;
+ sd = ntfs_malloc_nofs(sd_len);
+ if (!sd)
+ return -1;
+
+ sd->revision = 1;
+ sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
+
+ sid = (struct ntfs_sid *)((u8 *)sd + sizeof(struct security_descriptor_relative));
+ sid->revision = 1;
+ sid->sub_authority_count = 2;
+ sid->sub_authority[0] = cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+ sid->sub_authority[1] = cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+ sid->identifier_authority.value[5] = 5;
+ sd->owner = cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+ sid = (struct ntfs_sid *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+ sid->revision = 1;
+ sid->sub_authority_count = 2;
+ sid->sub_authority[0] = cpu_to_le32(SECURITY_BUILTIN_DOMAIN_RID);
+ sid->sub_authority[1] = cpu_to_le32(DOMAIN_ALIAS_RID_ADMINS);
+ sid->identifier_authority.value[5] = 5;
+ sd->group = cpu_to_le32((u8 *)sid - (u8 *)sd);
+
+ acl = (struct ntfs_acl *)((u8 *)sid + sizeof(struct ntfs_sid) + 8);
+ acl->revision = 2;
+ acl->size = cpu_to_le16(sizeof(struct ntfs_acl) + sizeof(struct ntfs_ace) + 4);
+ acl->ace_count = cpu_to_le16(1);
+ sd->dacl = cpu_to_le32((u8 *)acl - (u8 *)sd);
+
+ ace = (struct ntfs_ace *)((u8 *)acl + sizeof(struct ntfs_acl));
+ ace->type = ACCESS_ALLOWED_ACE_TYPE;
+ ace->flags = OBJECT_INHERIT_ACE | CONTAINER_INHERIT_ACE;
+ ace->size = cpu_to_le16(sizeof(struct ntfs_ace) + 4);
+ ace->mask = cpu_to_le32(0x1f01ff);
+ ace->sid.revision = 1;
+ ace->sid.sub_authority_count = 1;
+ ace->sid.sub_authority[0] = 0;
+ ace->sid.identifier_authority.value[5] = 1;
+
+ ret = ntfs_attr_add(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0, (u8 *)sd,
+ sd_len);
+ if (ret)
+ ntfs_error(ni->vol->sb, "Failed to add SECURITY_DESCRIPTOR\n");
+
+ ntfs_free(sd);
+ return ret;
+}
+
+static struct ntfs_inode *__ntfs_create(struct mnt_idmap *idmap, struct inode *dir,
+ __le16 *name, u8 name_len, mode_t mode, dev_t dev,
+ __le16 *target, int target_len)
+{
+ struct ntfs_inode *dir_ni = NTFS_I(dir);
+ struct ntfs_volume *vol = dir_ni->vol;
+ struct ntfs_inode *ni;
+ bool rollback_data = false, rollback_sd = false, rollback_reparse = false;
+ struct file_name_attr *fn = NULL;
+ struct standard_information *si = NULL;
+ int err = 0, fn_len, si_len;
+ struct inode *vi;
+ struct mft_record *ni_mrec, *dni_mrec;
+ struct super_block *sb = dir_ni->vol->sb;
+ __le64 parent_mft_ref;
+ u64 child_mft_ref;
+ __le16 ea_size;
+
+ vi = new_inode(vol->sb);
+ if (!vi)
+ return ERR_PTR(-ENOMEM);
+
+ ntfs_init_big_inode(vi);
+ ni = NTFS_I(vi);
+ ni->vol = dir_ni->vol;
+ ni->name_len = 0;
+ ni->name = NULL;
+
+ /*
+ * Set the appropriate mode, attribute type, and name. For
+ * directories, also setup the index values to the defaults.
+ */
+ if (S_ISDIR(mode)) {
+ mode &= ~vol->dmask;
+
+ NInoSetMstProtected(ni);
+ ni->itype.index.block_size = 4096;
+ ni->itype.index.block_size_bits = ntfs_ffs(4096) - 1;
+ ni->itype.index.collation_rule = COLLATION_FILE_NAME;
+ if (vol->cluster_size <= ni->itype.index.block_size) {
+ ni->itype.index.vcn_size = vol->cluster_size;
+ ni->itype.index.vcn_size_bits =
+ vol->cluster_size_bits;
+ } else {
+ ni->itype.index.vcn_size = vol->sector_size;
+ ni->itype.index.vcn_size_bits =
+ vol->sector_size_bits;
+ }
+ } else {
+ mode &= ~vol->fmask;
+ }
+
+ if (IS_RDONLY(vi))
+ mode &= ~0222;
+
+ inode_init_owner(idmap, vi, dir, mode);
+
+ if (uid_valid(vol->uid))
+ vi->i_uid = vol->uid;
+
+ if (gid_valid(vol->gid))
+ vi->i_gid = vol->gid;
+
+ /*
+ * Set the file size to 0, the ntfs inode sizes are set to 0 by
+ * the call to ntfs_init_big_inode() below.
+ */
+ vi->i_size = 0;
+ vi->i_blocks = 0;
+
+ inode_inc_iversion(vi);
+
+ simple_inode_init_ts(vi);
+ ni->i_crtime = inode_get_ctime(vi);
+
+ inode_set_mtime_to_ts(dir, ni->i_crtime);
+ inode_set_ctime_to_ts(dir, ni->i_crtime);
+ mark_inode_dirty(dir);
+
+ err = ntfs_mft_record_alloc(dir_ni->vol, mode, &ni, NULL,
+ &ni_mrec);
+ if (err) {
+ iput(vi);
+ return ERR_PTR(err);
+ }
+
+ /*
+ * Prevent iget and writeback from finding this inode.
+ * Caller must call d_instantiate_new instead of d_instantiate.
+ */
+ spin_lock(&vi->i_lock);
+ vi->i_state = I_NEW | I_CREATING;
+ spin_unlock(&vi->i_lock);
+
+ /* Add the inode to the inode hash for the superblock. */
+ vi->i_ino = ni->mft_no;
+ inode_set_iversion(vi, 1);
+ insert_inode_hash(vi);
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+ if (NInoBeingDeleted(dir_ni)) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ dni_mrec = map_mft_record(dir_ni);
+ if (IS_ERR(dni_mrec)) {
+ ntfs_error(dir_ni->vol->sb, "failed to map mft record for file %ld.\n",
+ dir_ni->mft_no);
+ err = -EIO;
+ goto err_out;
+ }
+ parent_mft_ref = MK_LE_MREF(dir_ni->mft_no,
+ le16_to_cpu(dni_mrec->sequence_number));
+ unmap_mft_record(dir_ni);
+
+ /*
+ * Create STANDARD_INFORMATION attribute. Write STANDARD_INFORMATION
+ * version 1.2, windows will upgrade it to version 3 if needed.
+ */
+ si_len = offsetof(struct standard_information, file_attributes) +
+ sizeof(__le32) + 12;
+ si = ntfs_malloc_nofs(si_len);
+ if (!si) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ si->creation_time = si->last_data_change_time = utc2ntfs(ni->i_crtime);
+ si->last_mft_change_time = si->last_access_time = si->creation_time;
+
+ if (!S_ISREG(mode) && !S_ISDIR(mode))
+ si->file_attributes = FILE_ATTR_SYSTEM;
+
+ /* Add STANDARD_INFORMATION to inode. */
+ err = ntfs_attr_add(ni, AT_STANDARD_INFORMATION, AT_UNNAMED, 0, (u8 *)si,
+ si_len);
+ if (err) {
+ ntfs_error(sb, "Failed to add STANDARD_INFORMATION attribute.\n");
+ goto err_out;
+ }
+
+ err = ntfs_sd_add_everyone(ni);
+ if (err)
+ goto err_out;
+ rollback_sd = true;
+
+ if (S_ISDIR(mode)) {
+ struct index_root *ir = NULL;
+ struct index_entry *ie;
+ int ir_len, index_len;
+
+ /* Create struct index_root attribute. */
+ index_len = sizeof(struct index_header) + sizeof(struct index_entry_header);
+ ir_len = offsetof(struct index_root, index) + index_len;
+ ir = ntfs_malloc_nofs(ir_len);
+ if (!ir) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ ir->type = AT_FILE_NAME;
+ ir->collation_rule = COLLATION_FILE_NAME;
+ ir->index_block_size = cpu_to_le32(ni->vol->index_record_size);
+ if (ni->vol->cluster_size <= ni->vol->index_record_size)
+ ir->clusters_per_index_block =
+ ni->vol->index_record_size >> ni->vol->cluster_size_bits;
+ else
+ ir->clusters_per_index_block =
+ ni->vol->index_record_size >> ni->vol->sector_size_bits;
+ ir->index.entries_offset = cpu_to_le32(sizeof(struct index_header));
+ ir->index.index_length = cpu_to_le32(index_len);
+ ir->index.allocated_size = cpu_to_le32(index_len);
+ ie = (struct index_entry *)((u8 *)ir + sizeof(struct index_root));
+ ie->length = cpu_to_le16(sizeof(struct index_entry_header));
+ ie->key_length = 0;
+ ie->flags = INDEX_ENTRY_END;
+
+ /* Add struct index_root attribute to inode. */
+ err = ntfs_attr_add(ni, AT_INDEX_ROOT, I30, 4, (u8 *)ir, ir_len);
+ if (err) {
+ ntfs_free(ir);
+ ntfs_error(vi->i_sb, "Failed to add struct index_root attribute.\n");
+ goto err_out;
+ }
+ ntfs_free(ir);
+ err = ntfs_attr_open(ni, AT_INDEX_ROOT, I30, 4);
+ if (err)
+ goto err_out;
+ } else {
+ /* Add DATA attribute to inode. */
+ err = ntfs_attr_add(ni, AT_DATA, AT_UNNAMED, 0, NULL, 0);
+ if (err) {
+ ntfs_error(dir_ni->vol->sb, "Failed to add DATA attribute.\n");
+ goto err_out;
+ }
+ rollback_data = true;
+
+ err = ntfs_attr_open(ni, AT_DATA, AT_UNNAMED, 0);
+ if (err)
+ goto err_out;
+
+ if (S_ISLNK(mode)) {
+ err = ntfs_reparse_set_wsl_symlink(ni, target, target_len);
+ if (!err)
+ rollback_reparse = true;
+ } else if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISSOCK(mode) ||
+ S_ISFIFO(mode)) {
+ si->file_attributes = FILE_ATTRIBUTE_RECALL_ON_OPEN;
+ ni->flags = FILE_ATTRIBUTE_RECALL_ON_OPEN;
+ err = ntfs_reparse_set_wsl_not_symlink(ni, mode);
+ if (!err)
+ rollback_reparse = true;
+ }
+ if (err)
+ goto err_out;
+ }
+
+ err = ntfs_ea_set_wsl_inode(vi, dev, &ea_size,
+ NTFS_EA_UID | NTFS_EA_GID | NTFS_EA_MODE);
+ if (err)
+ goto err_out;
+
+ /* Create FILE_NAME attribute. */
+ fn_len = sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+ fn = ntfs_malloc_nofs(fn_len);
+ if (!fn) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ fn->parent_directory = parent_mft_ref;
+ fn->file_name_length = name_len;
+ fn->file_name_type = FILE_NAME_POSIX;
+ fn->type.ea.packed_ea_size = ea_size;
+ if (S_ISDIR(mode)) {
+ fn->file_attributes = FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+ fn->allocated_size = fn->data_size = 0;
+ } else {
+ fn->data_size = cpu_to_le64(ni->data_size);
+ fn->allocated_size = cpu_to_le64(ni->allocated_size);
+ }
+ if (!S_ISREG(mode) && !S_ISDIR(mode))
+ fn->file_attributes = FILE_ATTR_SYSTEM;
+ fn->creation_time = fn->last_data_change_time = utc2ntfs(ni->i_crtime);
+ fn->last_mft_change_time = fn->last_access_time = fn->creation_time;
+ memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+ /* Add FILE_NAME attribute to inode. */
+ err = ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+ if (err) {
+ ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+ goto err_out;
+ }
+
+ child_mft_ref = MK_MREF(ni->mft_no,
+ le16_to_cpu(ni_mrec->sequence_number));
+ /* Set hard links count and directory flag. */
+ ni_mrec->link_count = cpu_to_le16(1);
+ mark_mft_record_dirty(ni);
+
+ /* Add FILE_NAME attribute to index. */
+ err = ntfs_index_add_filename(dir_ni, fn, child_mft_ref);
+ if (err) {
+ ntfs_debug("Failed to add entry to the index");
+ goto err_out;
+ }
+
+ unmap_mft_record(ni);
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+
+ /* Set the sequence number. */
+ vi->i_generation = ni->seq_no;
+ set_nlink(vi, 1);
+ ntfs_set_vfs_operations(vi, mode, dev);
+
+#ifdef CONFIG_NTFSPLUS_FS_POSIX_ACL
+ if (!S_ISLNK(mode) && (sb->s_flags & SB_POSIXACL)) {
+ err = ntfs_init_acl(idmap, vi, dir);
+ if (err)
+ goto err_out;
+ } else
+#endif
+ {
+ vi->i_flags |= S_NOSEC;
+ }
+
+ /* Done! */
+ ntfs_free(fn);
+ ntfs_free(si);
+ ntfs_debug("Done.\n");
+ return ni;
+
+err_out:
+ if (rollback_sd)
+ ntfs_attr_remove(ni, AT_SECURITY_DESCRIPTOR, AT_UNNAMED, 0);
+
+ if (rollback_data)
+ ntfs_attr_remove(ni, AT_DATA, AT_UNNAMED, 0);
+
+ if (rollback_reparse)
+ ntfs_delete_reparse_index(ni);
+ /*
+ * Free extent MFT records (should not exist any with current
+ * ntfs_create implementation, but for any case if something will be
+ * changed in the future).
+ */
+ while (ni->nr_extents != 0) {
+ int err2;
+
+ err2 = ntfs_mft_record_free(ni->vol, *(ni->ext.extent_ntfs_inos));
+ if (err2)
+ ntfs_error(sb,
+ "Failed to free extent MFT record. Leaving inconsistent metadata.\n");
+ ntfs_inode_close(*(ni->ext.extent_ntfs_inos));
+ }
+ if (ntfs_mft_record_free(ni->vol, ni))
+ ntfs_error(sb,
+ "Failed to free MFT record. Leaving inconsistent metadata. Run chkdsk.\n");
+ unmap_mft_record(ni);
+ ntfs_free(fn);
+ ntfs_free(si);
+
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+
+ remove_inode_hash(vi);
+ discard_new_inode(vi);
+ return ERR_PTR(err);
+}
+
+static int ntfs_create(struct mnt_idmap *idmap, struct inode *dir,
+ struct dentry *dentry, umode_t mode, bool excl)
+{
+ struct ntfs_volume *vol = NTFS_SB(dir->i_sb);
+ struct ntfs_inode *ni;
+ __le16 *uname;
+ int uname_len, err;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+ &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(vol->sb, "Failed to convert name to unicode.");
+ return uname_len;
+ }
+
+ err = ntfs_check_bad_char(uname, uname_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ ni = __ntfs_create(idmap, dir, uname, uname_len, S_IFREG | mode, 0, NULL, 0);
+ kmem_cache_free(ntfs_name_cache, uname);
+ if (IS_ERR(ni))
+ return PTR_ERR(ni);
+
+ d_instantiate_new(dentry, VFS_I(ni));
+
+ return 0;
+}
+
+static int ntfs_check_unlinkable_dir(struct ntfs_attr_search_ctx *ctx, struct file_name_attr *fn)
+{
+ int link_count;
+ int ret;
+ struct ntfs_inode *ni = ctx->base_ntfs_ino ? ctx->base_ntfs_ino : ctx->ntfs_ino;
+ struct mft_record *ni_mrec = ctx->base_mrec ? ctx->base_mrec : ctx->mrec;
+
+ ret = ntfs_check_empty_dir(ni, ni_mrec);
+ if (!ret || ret != -ENOTEMPTY)
+ return ret;
+
+ link_count = le16_to_cpu(ni_mrec->link_count);
+ /*
+ * Directory is non-empty, so we can unlink only if there is more than
+ * one "real" hard link, i.e. links aren't different DOS and WIN32 names
+ */
+ if ((link_count == 1) ||
+ (link_count == 2 && fn->file_name_type == FILE_NAME_DOS)) {
+ ret = -ENOTEMPTY;
+ ntfs_debug("Non-empty directory without hard links\n");
+ goto no_hardlink;
+ }
+
+ ret = 0;
+no_hardlink:
+ return ret;
+}
+
+static int ntfs_test_inode_attr(struct inode *vi, void *data)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ unsigned long mft_no = (unsigned long)data;
+
+ if (ni->mft_no != mft_no)
+ return 0;
+ if (NInoAttr(ni) || ni->nr_extents == -1)
+ return 1;
+ else
+ return 0;
+}
+
+/**
+ * ntfs_delete - delete file or directory from ntfs volume
+ * @ni: ntfs inode for object to delte
+ * @dir_ni: ntfs inode for directory in which delete object
+ * @name: unicode name of the object to delete
+ * @name_len: length of the name in unicode characters
+ * @need_lock: whether mrec lock is needed or not
+ *
+ * @ni is always closed after the call to this function (even if it failed),
+ * user does not need to call ntfs_inode_close himself.
+ */
+static int ntfs_delete(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+ __le16 *name, u8 name_len, bool need_lock)
+{
+ struct ntfs_attr_search_ctx *actx = NULL;
+ struct file_name_attr *fn = NULL;
+ bool looking_for_dos_name = false, looking_for_win32_name = false;
+ bool case_sensitive_match = true;
+ int err = 0;
+ struct mft_record *ni_mrec;
+ struct super_block *sb;
+ bool link_count_zero = false;
+
+ ntfs_debug("Entering.\n");
+
+ if (need_lock == true) {
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+ }
+
+ sb = dir_ni->vol->sb;
+
+ if (ni->nr_extents == -1)
+ ni = ni->ext.base_ntfs_ino;
+ if (dir_ni->nr_extents == -1)
+ dir_ni = dir_ni->ext.base_ntfs_ino;
+ /*
+ * Search for FILE_NAME attribute with such name. If it's in POSIX or
+ * WIN32_AND_DOS namespace, then simply remove it from index and inode.
+ * If filename in DOS or in WIN32 namespace, then remove DOS name first,
+ * only then remove WIN32 name.
+ */
+ actx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!actx) {
+ ntfs_error(sb, "%s, Failed to get search context", __func__);
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+ return -ENOMEM;
+ }
+search:
+ while ((err = ntfs_attr_lookup(AT_FILE_NAME, AT_UNNAMED, 0, CASE_SENSITIVE,
+ 0, NULL, 0, actx)) == 0) {
+#ifdef DEBUG
+ unsigned char *s;
+#endif
+ bool case_sensitive = IGNORE_CASE;
+
+ fn = (struct file_name_attr *)((u8 *)actx->attr +
+ le16_to_cpu(actx->attr->data.resident.value_offset));
+#ifdef DEBUG
+ s = ntfs_attr_name_get(ni->vol, fn->file_name, fn->file_name_length);
+ ntfs_debug("name: '%s' type: %d dos: %d win32: %d case: %d\n",
+ s, fn->file_name_type,
+ looking_for_dos_name, looking_for_win32_name,
+ case_sensitive_match);
+ ntfs_attr_name_free(&s);
+#endif
+ if (looking_for_dos_name) {
+ if (fn->file_name_type == FILE_NAME_DOS)
+ break;
+ continue;
+ }
+ if (looking_for_win32_name) {
+ if (fn->file_name_type == FILE_NAME_WIN32)
+ break;
+ continue;
+ }
+
+ /* Ignore hard links from other directories */
+ if (dir_ni->mft_no != MREF_LE(fn->parent_directory)) {
+ ntfs_debug("MFT record numbers don't match (%lu != %lu)\n",
+ dir_ni->mft_no,
+ MREF_LE(fn->parent_directory));
+ continue;
+ }
+
+ if (fn->file_name_type == FILE_NAME_POSIX || case_sensitive_match)
+ case_sensitive = CASE_SENSITIVE;
+
+ if (ntfs_names_are_equal(fn->file_name, fn->file_name_length,
+ name, name_len, case_sensitive,
+ ni->vol->upcase, ni->vol->upcase_len)) {
+ if (fn->file_name_type == FILE_NAME_WIN32) {
+ looking_for_dos_name = true;
+ ntfs_attr_reinit_search_ctx(actx);
+ continue;
+ }
+ if (fn->file_name_type == FILE_NAME_DOS)
+ looking_for_dos_name = true;
+ break;
+ }
+ }
+ if (err) {
+ /*
+ * If case sensitive search failed, then try once again
+ * ignoring case.
+ */
+ if (err == -ENOENT && case_sensitive_match) {
+ case_sensitive_match = false;
+ ntfs_attr_reinit_search_ctx(actx);
+ goto search;
+ }
+ goto err_out;
+ }
+
+ err = ntfs_check_unlinkable_dir(actx, fn);
+ if (err)
+ goto err_out;
+
+ err = ntfs_index_remove(dir_ni, fn, le32_to_cpu(actx->attr->data.resident.value_length));
+ if (err)
+ goto err_out;
+
+ err = ntfs_attr_record_rm(actx);
+ if (err)
+ goto err_out;
+
+ ni_mrec = actx->base_mrec ? actx->base_mrec : actx->mrec;
+ ni_mrec->link_count = cpu_to_le16(le16_to_cpu(ni_mrec->link_count) - 1);
+ drop_nlink(VFS_I(ni));
+
+ mark_mft_record_dirty(ni);
+ if (looking_for_dos_name) {
+ looking_for_dos_name = false;
+ looking_for_win32_name = true;
+ ntfs_attr_reinit_search_ctx(actx);
+ goto search;
+ }
+
+ /*
+ * If hard link count is not equal to zero then we are done. In other
+ * case there are no reference to this inode left, so we should free all
+ * non-resident attributes and mark all MFT record as not in use.
+ */
+ if (ni_mrec->link_count == 0) {
+ NInoSetBeingDeleted(ni);
+ ntfs_delete_reparse_index(ni);
+ link_count_zero = true;
+ }
+
+ ntfs_attr_put_search_ctx(actx);
+ if (need_lock == true) {
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+ }
+
+ /*
+ * If hard link count is not equal to zero then we are done. In other
+ * case there are no reference to this inode left, so we should free all
+ * non-resident attributes and mark all MFT record as not in use.
+ */
+ if (link_count_zero == true) {
+ struct inode *attr_vi;
+
+ while ((attr_vi = ilookup5(sb, ni->mft_no, ntfs_test_inode_attr,
+ (void *)ni->mft_no)) != NULL) {
+ clear_nlink(attr_vi);
+ iput(attr_vi);
+ }
+ }
+ ntfs_debug("Done.\n");
+ return 0;
+err_out:
+ ntfs_attr_put_search_ctx(actx);
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+ return err;
+}
+
+static int ntfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ struct inode *vi = dentry->d_inode;
+ struct super_block *sb = dir->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int err = 0;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ __le16 *uname = NULL;
+ int uname_len;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+ &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to Unicode.");
+ return -ENOMEM;
+ }
+
+ err = ntfs_check_bad_char(uname, uname_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ err = ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+ if (err)
+ goto out;
+
+ inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
+ mark_inode_dirty(dir);
+ inode_set_ctime_to_ts(vi, inode_get_ctime(dir));
+ if (vi->i_nlink)
+ mark_inode_dirty(vi);
+out:
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+}
+
+static struct dentry *ntfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
+ struct dentry *dentry, umode_t mode)
+{
+ struct super_block *sb = dir->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int err = 0;
+ struct ntfs_inode *ni;
+ __le16 *uname;
+ int uname_len;
+
+ if (NVolShutdown(vol))
+ return ERR_PTR(-EIO);
+
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+ &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to unicode.");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ err = ntfs_check_bad_char(uname, uname_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname);
+ return ERR_PTR(err);
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ ni = __ntfs_create(idmap, dir, uname, uname_len, S_IFDIR | mode, 0, NULL, 0);
+ kmem_cache_free(ntfs_name_cache, uname);
+ if (IS_ERR(ni)) {
+ err = PTR_ERR(ni);
+ return ERR_PTR(err);
+ }
+
+ d_instantiate_new(dentry, VFS_I(ni));
+ return ERR_PTR(err);
+}
+
+static int ntfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ struct inode *vi = dentry->d_inode;
+ struct super_block *sb = dir->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int err = 0;
+ struct ntfs_inode *ni;
+ __le16 *uname = NULL;
+ int uname_len;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ ni = NTFS_I(vi);
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name, dentry->d_name.len,
+ &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to unicode.");
+ return -ENOMEM;
+ }
+
+ err = ntfs_check_bad_char(uname, uname_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ err = ntfs_delete(ni, NTFS_I(dir), uname, uname_len, true);
+ if (err)
+ goto out;
+
+ inode_set_mtime_to_ts(vi, inode_set_atime_to_ts(vi, current_time(vi)));
+out:
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+}
+
+/**
+ * __ntfs_link - create hard link for file or directory
+ * @ni: ntfs inode for object to create hard link
+ * @dir_ni: ntfs inode for directory in which new link should be placed
+ * @name: unicode name of the new link
+ * @name_len: length of the name in unicode characters
+ *
+ * NOTE: At present we allow creating hardlinks to directories, we use them
+ * in a temporary state during rename. But it's defenitely bad idea to have
+ * hard links to directories as a result of operation.
+ */
+static int __ntfs_link(struct ntfs_inode *ni, struct ntfs_inode *dir_ni,
+ __le16 *name, u8 name_len)
+{
+ struct super_block *sb;
+ struct inode *vi = VFS_I(ni);
+ struct file_name_attr *fn = NULL;
+ int fn_len, err = 0;
+ struct mft_record *dir_mrec = NULL, *ni_mrec = NULL;
+
+ ntfs_debug("Entering.\n");
+
+ sb = dir_ni->vol->sb;
+ if (NInoBeingDeleted(dir_ni) || NInoBeingDeleted(ni))
+ return -ENOENT;
+
+ ni_mrec = map_mft_record(ni);
+ if (IS_ERR(ni_mrec)) {
+ err = -EIO;
+ goto err_out;
+ }
+
+ if (le16_to_cpu(ni_mrec->link_count) == 0) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ /* Create FILE_NAME attribute. */
+ fn_len = sizeof(struct file_name_attr) + name_len * sizeof(__le16);
+ fn = ntfs_malloc_nofs(fn_len);
+ if (!fn) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ dir_mrec = map_mft_record(dir_ni);
+ if (IS_ERR(dir_mrec)) {
+ err = -EIO;
+ goto err_out;
+ }
+
+ fn->parent_directory = MK_LE_MREF(dir_ni->mft_no,
+ le16_to_cpu(dir_mrec->sequence_number));
+ unmap_mft_record(dir_ni);
+ fn->file_name_length = name_len;
+ fn->file_name_type = FILE_NAME_POSIX;
+ fn->file_attributes = ni->flags;
+ if (ni_mrec->flags & MFT_RECORD_IS_DIRECTORY) {
+ fn->file_attributes |= FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT;
+ fn->allocated_size = fn->data_size = 0;
+ } else {
+ if (NInoSparse(ni) || NInoCompressed(ni))
+ fn->allocated_size =
+ cpu_to_le64(ni->itype.compressed.size);
+ else
+ fn->allocated_size = cpu_to_le64(ni->allocated_size);
+ fn->data_size = cpu_to_le64(ni->data_size);
+ }
+
+ fn->creation_time = utc2ntfs(ni->i_crtime);
+ fn->last_data_change_time = utc2ntfs(inode_get_mtime(vi));
+ fn->last_mft_change_time = utc2ntfs(inode_get_ctime(vi));
+ fn->last_access_time = utc2ntfs(inode_get_atime(vi));
+ memcpy(fn->file_name, name, name_len * sizeof(__le16));
+
+ /* Add FILE_NAME attribute to index. */
+ err = ntfs_index_add_filename(dir_ni, fn, MK_MREF(ni->mft_no,
+ le16_to_cpu(ni_mrec->sequence_number)));
+ if (err) {
+ ntfs_error(sb, "Failed to add filename to the index");
+ goto err_out;
+ }
+ /* Add FILE_NAME attribute to inode. */
+ err = ntfs_attr_add(ni, AT_FILE_NAME, AT_UNNAMED, 0, (u8 *)fn, fn_len);
+ if (err) {
+ ntfs_error(sb, "Failed to add FILE_NAME attribute.\n");
+ /* Try to remove just added attribute from index. */
+ if (ntfs_index_remove(dir_ni, fn, fn_len))
+ goto rollback_failed;
+ goto err_out;
+ }
+ /* Increment hard links count. */
+ ni_mrec->link_count = cpu_to_le16(le16_to_cpu(ni_mrec->link_count) + 1);
+ inc_nlink(VFS_I(ni));
+
+ /* Done! */
+ mark_mft_record_dirty(ni);
+ ntfs_free(fn);
+ unmap_mft_record(ni);
+
+ ntfs_debug("Done.\n");
+
+ return 0;
+rollback_failed:
+ ntfs_error(sb, "Rollback failed. Leaving inconsistent metadata.\n");
+err_out:
+ ntfs_free(fn);
+ if (!IS_ERR_OR_NULL(ni_mrec))
+ unmap_mft_record(ni);
+ return err;
+}
+
+static int ntfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
+ struct dentry *old_dentry, struct inode *new_dir,
+ struct dentry *new_dentry, unsigned int flags)
+{
+ struct inode *old_inode, *new_inode = NULL;
+ int err = 0;
+ int is_dir;
+ struct super_block *sb = old_dir->i_sb;
+ __le16 *uname_new = NULL;
+ __le16 *uname_old = NULL;
+ int new_name_len;
+ int old_name_len;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ struct ntfs_inode *old_ni, *new_ni = NULL;
+ struct ntfs_inode *old_dir_ni = NTFS_I(old_dir), *new_dir_ni = NTFS_I(new_dir);
+
+ if (NVolShutdown(old_dir_ni->vol))
+ return -EIO;
+
+ if (flags & (RENAME_EXCHANGE | RENAME_WHITEOUT))
+ return -EINVAL;
+
+ new_name_len = ntfs_nlstoucs(NTFS_I(new_dir)->vol, new_dentry->d_name.name,
+ new_dentry->d_name.len, &uname_new,
+ NTFS_MAX_NAME_LEN);
+ if (new_name_len < 0) {
+ if (new_name_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to unicode.");
+ return -ENOMEM;
+ }
+
+ err = ntfs_check_bad_char(uname_new, new_name_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname_new);
+ return err;
+ }
+
+ old_name_len = ntfs_nlstoucs(NTFS_I(old_dir)->vol, old_dentry->d_name.name,
+ old_dentry->d_name.len, &uname_old,
+ NTFS_MAX_NAME_LEN);
+ if (old_name_len < 0) {
+ kmem_cache_free(ntfs_name_cache, uname_new);
+ if (old_name_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to unicode.");
+ return -ENOMEM;
+ }
+
+ old_inode = old_dentry->d_inode;
+ new_inode = new_dentry->d_inode;
+ old_ni = NTFS_I(old_inode);
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ mutex_lock_nested(&old_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ mutex_lock_nested(&old_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+
+ if (NInoBeingDeleted(old_ni) || NInoBeingDeleted(old_dir_ni)) {
+ err = -ENOENT;
+ goto unlock_old;
+ }
+
+ is_dir = S_ISDIR(old_inode->i_mode);
+
+ if (new_inode) {
+ new_ni = NTFS_I(new_inode);
+ mutex_lock_nested(&new_ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+ if (old_dir != new_dir) {
+ mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+ if (NInoBeingDeleted(new_dir_ni)) {
+ err = -ENOENT;
+ goto err_out;
+ }
+ }
+
+ if (NInoBeingDeleted(new_ni)) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ if (is_dir) {
+ struct mft_record *ni_mrec;
+
+ ni_mrec = map_mft_record(NTFS_I(new_inode));
+ if (IS_ERR(ni_mrec)) {
+ err = -EIO;
+ goto err_out;
+ }
+ err = ntfs_check_empty_dir(NTFS_I(new_inode), ni_mrec);
+ unmap_mft_record(NTFS_I(new_inode));
+ if (err)
+ goto err_out;
+ }
+
+ err = ntfs_delete(new_ni, new_dir_ni, uname_new, new_name_len, false);
+ if (err)
+ goto err_out;
+ } else {
+ if (old_dir != new_dir) {
+ mutex_lock_nested(&new_dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+ if (NInoBeingDeleted(new_dir_ni)) {
+ err = -ENOENT;
+ goto err_out;
+ }
+ }
+ }
+
+ err = __ntfs_link(old_ni, new_dir_ni, uname_new, new_name_len);
+ if (err)
+ goto err_out;
+
+ err = ntfs_delete(old_ni, old_dir_ni, uname_old, old_name_len, false);
+ if (err) {
+ int err2;
+
+ ntfs_error(sb, "Failed to delete old ntfs inode(%ld) in old dir, err : %d\n",
+ old_ni->mft_no, err);
+ err2 = ntfs_delete(old_ni, new_dir_ni, uname_new, new_name_len, false);
+ if (err2)
+ ntfs_error(sb, "Failed to delete old ntfs inode in new dir, err : %d\n",
+ err2);
+ goto err_out;
+ }
+
+ simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
+ mark_inode_dirty(old_inode);
+ mark_inode_dirty(old_dir);
+ if (old_dir != new_dir)
+ mark_inode_dirty(new_dir);
+ if (new_inode)
+ mark_inode_dirty(old_inode);
+
+ inode_inc_iversion(new_dir);
+
+err_out:
+ if (old_dir != new_dir)
+ mutex_unlock(&new_dir_ni->mrec_lock);
+ if (new_inode)
+ mutex_unlock(&new_ni->mrec_lock);
+
+unlock_old:
+ mutex_unlock(&old_dir_ni->mrec_lock);
+ mutex_unlock(&old_ni->mrec_lock);
+ if (uname_new)
+ kmem_cache_free(ntfs_name_cache, uname_new);
+ if (uname_old)
+ kmem_cache_free(ntfs_name_cache, uname_old);
+
+ return err;
+}
+
+static int ntfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
+ struct dentry *dentry, const char *symname)
+{
+ struct super_block *sb = dir->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ struct inode *vi;
+ int err = 0;
+ struct ntfs_inode *ni;
+ __le16 *usrc;
+ __le16 *utarget;
+ int usrc_len;
+ int utarget_len;
+ int symlen = strlen(symname);
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ usrc_len = ntfs_nlstoucs(vol, dentry->d_name.name,
+ dentry->d_name.len, &usrc, NTFS_MAX_NAME_LEN);
+ if (usrc_len < 0) {
+ if (usrc_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to Unicode.");
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = ntfs_check_bad_char(usrc, usrc_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, usrc);
+ goto out;
+ }
+
+ utarget_len = ntfs_nlstoucs(vol, symname, symlen, &utarget,
+ PATH_MAX);
+ if (utarget_len < 0) {
+ if (utarget_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert target name to Unicode.");
+ err = -ENOMEM;
+ kmem_cache_free(ntfs_name_cache, usrc);
+ goto out;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ ni = __ntfs_create(idmap, dir, usrc, usrc_len, S_IFLNK | 0777, 0,
+ utarget, utarget_len);
+ kmem_cache_free(ntfs_name_cache, usrc);
+ kvfree(utarget);
+ if (IS_ERR(ni)) {
+ err = PTR_ERR(ni);
+ goto out;
+ }
+
+ vi = VFS_I(ni);
+ vi->i_size = symlen;
+ d_instantiate_new(dentry, vi);
+out:
+ return err;
+}
+
+static int ntfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
+ struct dentry *dentry, umode_t mode, dev_t rdev)
+{
+ struct super_block *sb = dir->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ int err = 0;
+ struct ntfs_inode *ni;
+ __le16 *uname = NULL;
+ int uname_len;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name,
+ dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to Unicode.");
+ return -ENOMEM;
+ }
+
+ err = ntfs_check_bad_char(uname, uname_len);
+ if (err) {
+ kmem_cache_free(ntfs_name_cache, uname);
+ return err;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ case S_IFBLK:
+ ni = __ntfs_create(idmap, dir, uname, uname_len,
+ mode, rdev, NULL, 0);
+ break;
+ default:
+ ni = __ntfs_create(idmap, dir, uname, uname_len,
+ mode, 0, NULL, 0);
+ }
+
+ kmem_cache_free(ntfs_name_cache, uname);
+ if (IS_ERR(ni)) {
+ err = PTR_ERR(ni);
+ goto out;
+ }
+
+ d_instantiate_new(dentry, VFS_I(ni));
+out:
+ return err;
+}
+
+static int ntfs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+ struct inode *vi = old_dentry->d_inode;
+ struct super_block *sb = vi->i_sb;
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ __le16 *uname = NULL;
+ int uname_len;
+ int err;
+ struct ntfs_inode *ni = NTFS_I(vi), *dir_ni = NTFS_I(dir);
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ uname_len = ntfs_nlstoucs(vol, dentry->d_name.name,
+ dentry->d_name.len, &uname, NTFS_MAX_NAME_LEN);
+ if (uname_len < 0) {
+ if (uname_len != -ENAMETOOLONG)
+ ntfs_error(sb, "Failed to convert name to unicode.");
+ err = -ENOMEM;
+ goto out;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ ihold(vi);
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ mutex_lock_nested(&dir_ni->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+ err = __ntfs_link(NTFS_I(vi), NTFS_I(dir), uname, uname_len);
+ if (err) {
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+ iput(vi);
+ pr_err("failed to create link, err = %d\n", err);
+ goto out;
+ }
+
+ inode_inc_iversion(dir);
+ simple_inode_init_ts(dir);
+
+ inode_inc_iversion(vi);
+ simple_inode_init_ts(vi);
+
+ /* timestamp is already written, so mark_inode_dirty() is unneeded. */
+ d_instantiate(dentry, vi);
+ mutex_unlock(&dir_ni->mrec_lock);
+ mutex_unlock(&ni->mrec_lock);
+
+out:
+ ntfs_free(uname);
+ return err;
+}
+
+/**
+ * Inode operations for directories.
+ */
+const struct inode_operations ntfs_dir_inode_ops = {
+ .lookup = ntfs_lookup, /* VFS: Lookup directory. */
+ .create = ntfs_create,
+ .unlink = ntfs_unlink,
+ .mkdir = ntfs_mkdir,
+ .rmdir = ntfs_rmdir,
+ .rename = ntfs_rename,
+ .get_acl = ntfs_get_acl,
+ .set_acl = ntfs_set_acl,
+ .listxattr = ntfs_listxattr,
+ .setattr = ntfs_setattr,
+ .getattr = ntfs_getattr,
+ .symlink = ntfs_symlink,
+ .mknod = ntfs_mknod,
+ .link = ntfs_link,
+};
+
+/**
+ * ntfs_get_parent - find the dentry of the parent of a given directory dentry
+ * @child_dent: dentry of the directory whose parent directory to find
+ *
+ * Find the dentry for the parent directory of the directory specified by the
+ * dentry @child_dent. This function is called from
+ * fs/exportfs/expfs.c::find_exported_dentry() which in turn is called from the
+ * default ->decode_fh() which is export_decode_fh() in the same file.
+ *
+ * Note: ntfs_get_parent() is called with @d_inode(child_dent)->i_mutex down.
+ *
+ * Return the dentry of the parent directory on success or the error code on
+ * error (IS_ERR() is true).
+ */
+static struct dentry *ntfs_get_parent(struct dentry *child_dent)
+{
+ struct inode *vi = d_inode(child_dent);
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct mft_record *mrec;
+ struct ntfs_attr_search_ctx *ctx;
+ struct attr_record *attr;
+ struct file_name_attr *fn;
+ unsigned long parent_ino;
+ int err;
+
+ ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+ /* Get the mft record of the inode belonging to the child dentry. */
+ mrec = map_mft_record(ni);
+ if (IS_ERR(mrec))
+ return ERR_CAST(mrec);
+ /* Find the first file name attribute in the mft record. */
+ ctx = ntfs_attr_get_search_ctx(ni, mrec);
+ if (unlikely(!ctx)) {
+ unmap_mft_record(ni);
+ return ERR_PTR(-ENOMEM);
+ }
+try_next:
+ err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (unlikely(err)) {
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+ if (err == -ENOENT)
+ ntfs_error(vi->i_sb,
+ "Inode 0x%lx does not have a file name attribute. Run chkdsk.",
+ vi->i_ino);
+ return ERR_PTR(err);
+ }
+ attr = ctx->attr;
+ if (unlikely(attr->non_resident))
+ goto try_next;
+ fn = (struct file_name_attr *)((u8 *)attr +
+ le16_to_cpu(attr->data.resident.value_offset));
+ if (unlikely((u8 *)fn + le32_to_cpu(attr->data.resident.value_length) >
+ (u8 *)attr + le32_to_cpu(attr->length)))
+ goto try_next;
+ /* Get the inode number of the parent directory. */
+ parent_ino = MREF_LE(fn->parent_directory);
+ /* Release the search context and the mft record of the child. */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(ni);
+
+ return d_obtain_alias(ntfs_iget(vi->i_sb, parent_ino));
+}
+
+static struct inode *ntfs_nfs_get_inode(struct super_block *sb,
+ u64 ino, u32 generation)
+{
+ struct inode *inode;
+
+ inode = ntfs_iget(sb, ino);
+ if (!IS_ERR(inode)) {
+ if (inode->i_generation != generation) {
+ iput(inode);
+ inode = ERR_PTR(-ESTALE);
+ }
+ }
+
+ return inode;
+}
+
+static struct dentry *ntfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
+ int fh_len, int fh_type)
+{
+ return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+ ntfs_nfs_get_inode);
+}
+
+static struct dentry *ntfs_fh_to_parent(struct super_block *sb, struct fid *fid,
+ int fh_len, int fh_type)
+{
+ return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+ ntfs_nfs_get_inode);
+}
+
+/**
+ * Export operations allowing NFS exporting of mounted NTFS partitions.
+ */
+const struct export_operations ntfs_export_ops = {
+ .encode_fh = generic_encode_ino32_fh,
+ .get_parent = ntfs_get_parent, /* Find the parent of a given directory. */
+ .fh_to_dentry = ntfs_fh_to_dentry,
+ .fh_to_parent = ntfs_fh_to_parent,
+};
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 04/11] ntfsplus: add directory operations
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
` (2 preceding siblings ...)
2025-10-20 2:07 ` [PATCH 03/11] ntfsplus: add inode operations Namjae Jeon
@ 2025-10-20 2:07 ` Namjae Jeon
2025-10-20 2:07 ` [PATCH 05/11] ntfsplus: add file operations Namjae Jeon
` (3 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
This adds the implementation of directory operations for ntfsplus.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/ntfsplus/dir.c | 1226 +++++++++++++++++++++++++
fs/ntfsplus/index.c | 2114 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 3340 insertions(+)
create mode 100644 fs/ntfsplus/dir.c
create mode 100644 fs/ntfsplus/index.c
diff --git a/fs/ntfsplus/dir.c b/fs/ntfsplus/dir.c
new file mode 100644
index 000000000000..9a97eeaf8a4c
--- /dev/null
+++ b/fs/ntfsplus/dir.c
@@ -0,0 +1,1226 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/**
+ * NTFS kernel directory operations. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2007 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/blkdev.h>
+
+#include "dir.h"
+#include "mft.h"
+#include "ntfs.h"
+#include "index.h"
+#include "reparse.h"
+
+/**
+ * The little endian Unicode string $I30 as a global constant.
+ */
+__le16 I30[5] = { cpu_to_le16('$'), cpu_to_le16('I'),
+ cpu_to_le16('3'), cpu_to_le16('0'), 0 };
+
+/**
+ * ntfs_lookup_inode_by_name - find an inode in a directory given its name
+ * @dir_ni: ntfs inode of the directory in which to search for the name
+ * @uname: Unicode name for which to search in the directory
+ * @uname_len: length of the name @uname in Unicode characters
+ * @res: return the found file name if necessary (see below)
+ *
+ * Look for an inode with name @uname in the directory with inode @dir_ni.
+ * ntfs_lookup_inode_by_name() walks the contents of the directory looking for
+ * the Unicode name. If the name is found in the directory, the corresponding
+ * inode number (>= 0) is returned as a mft reference in cpu format, i.e. it
+ * is a 64-bit number containing the sequence number.
+ *
+ * On error, a negative value is returned corresponding to the error code. In
+ * particular if the inode is not found -ENOENT is returned. Note that you
+ * can't just check the return value for being negative, you have to check the
+ * inode number for being negative which you can extract using MREC(return
+ * value).
+ *
+ * Note, @uname_len does not include the (optional) terminating NULL character.
+ *
+ * Note, we look for a case sensitive match first but we also look for a case
+ * insensitive match at the same time. If we find a case insensitive match, we
+ * save that for the case that we don't find an exact match, where we return
+ * the case insensitive match and setup @res (which we allocate!) with the mft
+ * reference, the file name type, length and with a copy of the little endian
+ * Unicode file name itself. If we match a file name which is in the DOS name
+ * space, we only return the mft reference and file name type in @res.
+ * ntfs_lookup() then uses this to find the long file name in the inode itself.
+ * This is to avoid polluting the dcache with short file names. We want them to
+ * work but we don't care for how quickly one can access them. This also fixes
+ * the dcache aliasing issues.
+ *
+ * Locking: - Caller must hold i_mutex on the directory.
+ * - Each page cache page in the index allocation mapping must be
+ * locked whilst being accessed otherwise we may find a corrupt
+ * page due to it being under ->writepage at the moment which
+ * applies the mst protection fixups before writing out and then
+ * removes them again after the write is complete after which it
+ * unlocks the page.
+ */
+u64 ntfs_lookup_inode_by_name(struct ntfs_inode *dir_ni, const __le16 *uname,
+ const int uname_len, struct ntfs_name **res)
+{
+ struct ntfs_volume *vol = dir_ni->vol;
+ struct super_block *sb = vol->sb;
+ struct inode *ia_vi = NULL;
+ struct mft_record *m;
+ struct index_root *ir;
+ struct index_entry *ie;
+ struct index_block *ia;
+ u8 *index_end;
+ u64 mref;
+ struct ntfs_attr_search_ctx *ctx;
+ int err, rc;
+ s64 vcn, old_vcn;
+ struct address_space *ia_mapping;
+ struct folio *folio;
+ u8 *kaddr = NULL;
+ struct ntfs_name *name = NULL;
+
+ BUG_ON(!S_ISDIR(VFS_I(dir_ni)->i_mode));
+ BUG_ON(NInoAttr(dir_ni));
+ /* Get hold of the mft record for the directory. */
+ m = map_mft_record(dir_ni);
+ if (IS_ERR(m)) {
+ ntfs_error(sb, "map_mft_record() failed with error code %ld.",
+ -PTR_ERR(m));
+ return ERR_MREF(PTR_ERR(m));
+ }
+ ctx = ntfs_attr_get_search_ctx(dir_ni, m);
+ if (unlikely(!ctx)) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ /* Find the index root attribute in the mft record. */
+ err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (unlikely(err)) {
+ if (err == -ENOENT) {
+ ntfs_error(sb,
+ "Index root attribute missing in directory inode 0x%lx.",
+ dir_ni->mft_no);
+ err = -EIO;
+ }
+ goto err_out;
+ }
+ /* Get to the index root value (it's been verified in read_inode). */
+ ir = (struct index_root *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+ index_end = (u8 *)&ir->index + le32_to_cpu(ir->index.index_length);
+ /* The first index entry. */
+ ie = (struct index_entry *)((u8 *)&ir->index +
+ le32_to_cpu(ir->index.entries_offset));
+ /*
+ * Loop until we exceed valid memory (corruption case) or until we
+ * reach the last entry.
+ */
+ for (;; ie = (struct index_entry *)((u8 *)ie + le16_to_cpu(ie->length))) {
+ /* Bounds checks. */
+ if ((u8 *)ie < (u8 *)ctx->mrec ||
+ (u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+ (u8 *)ie + sizeof(struct index_entry_header) + le16_to_cpu(ie->key_length) >
+ index_end || (u8 *)ie + le16_to_cpu(ie->length) > index_end)
+ goto dir_err_out;
+ /*
+ * The last entry cannot contain a name. It can however contain
+ * a pointer to a child node in the B+tree so we just break out.
+ */
+ if (ie->flags & INDEX_ENTRY_END)
+ break;
+ /* Key length should not be zero if it is not last entry. */
+ if (!ie->key_length)
+ goto dir_err_out;
+ /* Check the consistency of an index entry */
+ if (ntfs_index_entry_inconsistent(NULL, vol, ie, COLLATION_FILE_NAME,
+ dir_ni->mft_no))
+ goto dir_err_out;
+ /*
+ * We perform a case sensitive comparison and if that matches
+ * we are done and return the mft reference of the inode (i.e.
+ * the inode number together with the sequence number for
+ * consistency checking). We convert it to cpu format before
+ * returning.
+ */
+ if (ntfs_are_names_equal(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length,
+ CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
+found_it:
+ /*
+ * We have a perfect match, so we don't need to care
+ * about having matched imperfectly before, so we can
+ * free name and set *res to NULL.
+ * However, if the perfect match is a short file name,
+ * we need to signal this through *res, so that
+ * ntfs_lookup() can fix dcache aliasing issues.
+ * As an optimization we just reuse an existing
+ * allocation of *res.
+ */
+ if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
+ if (!name) {
+ name = kmalloc(sizeof(struct ntfs_name),
+ GFP_NOFS);
+ if (!name) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ }
+ name->mref = le64_to_cpu(
+ ie->data.dir.indexed_file);
+ name->type = FILE_NAME_DOS;
+ name->len = 0;
+ *res = name;
+ } else {
+ kfree(name);
+ *res = NULL;
+ }
+ mref = le64_to_cpu(ie->data.dir.indexed_file);
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(dir_ni);
+ return mref;
+ }
+ /*
+ * For a case insensitive mount, we also perform a case
+ * insensitive comparison (provided the file name is not in the
+ * POSIX namespace). If the comparison matches, and the name is
+ * in the WIN32 namespace, we cache the filename in *res so
+ * that the caller, ntfs_lookup(), can work on it. If the
+ * comparison matches, and the name is in the DOS namespace, we
+ * only cache the mft reference and the file name type (we set
+ * the name length to zero for simplicity).
+ */
+ if (!NVolCaseSensitive(vol) &&
+ ie->key.file_name.file_name_type &&
+ ntfs_are_names_equal(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length,
+ IGNORE_CASE, vol->upcase, vol->upcase_len)) {
+ int name_size = sizeof(struct ntfs_name);
+ u8 type = ie->key.file_name.file_name_type;
+ u8 len = ie->key.file_name.file_name_length;
+
+ /* Only one case insensitive matching name allowed. */
+ if (name) {
+ ntfs_error(sb,
+ "Found already allocated name in phase 1. Please run chkdsk");
+ goto dir_err_out;
+ }
+
+ if (type != FILE_NAME_DOS)
+ name_size += len * sizeof(__le16);
+ name = kmalloc(name_size, GFP_NOFS);
+ if (!name) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ name->mref = le64_to_cpu(ie->data.dir.indexed_file);
+ name->type = type;
+ if (type != FILE_NAME_DOS) {
+ name->len = len;
+ memcpy(name->name, ie->key.file_name.file_name,
+ len * sizeof(__le16));
+ } else
+ name->len = 0;
+ *res = name;
+ }
+ /*
+ * Not a perfect match, need to do full blown collation so we
+ * know which way in the B+tree we have to go.
+ */
+ rc = ntfs_collate_names(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length, 1,
+ IGNORE_CASE, vol->upcase, vol->upcase_len);
+ /*
+ * If uname collates before the name of the current entry, there
+ * is definitely no such name in this index but we might need to
+ * descend into the B+tree so we just break out of the loop.
+ */
+ if (rc == -1)
+ break;
+ /* The names are not equal, continue the search. */
+ if (rc)
+ continue;
+ /*
+ * Names match with case insensitive comparison, now try the
+ * case sensitive comparison, which is required for proper
+ * collation.
+ */
+ rc = ntfs_collate_names(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length, 1,
+ CASE_SENSITIVE, vol->upcase, vol->upcase_len);
+ if (rc == -1)
+ break;
+ if (rc)
+ continue;
+ /*
+ * Perfect match, this will never happen as the
+ * ntfs_are_names_equal() call will have gotten a match but we
+ * still treat it correctly.
+ */
+ goto found_it;
+ }
+ /*
+ * We have finished with this index without success. Check for the
+ * presence of a child node and if not present return -ENOENT, unless
+ * we have got a matching name cached in name in which case return the
+ * mft reference associated with it.
+ */
+ if (!(ie->flags & INDEX_ENTRY_NODE)) {
+ if (name) {
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(dir_ni);
+ return name->mref;
+ }
+ ntfs_debug("Entry not found.");
+ err = -ENOENT;
+ goto err_out;
+ } /* Child node present, descend into it. */
+
+ /* Get the starting vcn of the index_block holding the child node. */
+ vcn = le64_to_cpup((__le64 *)((u8 *)ie + le16_to_cpu(ie->length) - 8));
+
+ /*
+ * We are done with the index root and the mft record. Release them,
+ * otherwise we deadlock with ntfs_read_mapping_folio().
+ */
+ ntfs_attr_put_search_ctx(ctx);
+ unmap_mft_record(dir_ni);
+ m = NULL;
+ ctx = NULL;
+
+ ia_vi = ntfs_index_iget(VFS_I(dir_ni), I30, 4);
+ if (IS_ERR(ia_vi)) {
+ err = PTR_ERR(ia_vi);
+ goto err_out;
+ }
+
+ ia_mapping = ia_vi->i_mapping;
+descend_into_child_node:
+ /*
+ * Convert vcn to index into the index allocation attribute in units
+ * of PAGE_SIZE and map the page cache page, reading it from
+ * disk if necessary.
+ */
+ folio = ntfs_read_mapping_folio(ia_mapping, vcn <<
+ dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
+ if (IS_ERR(folio)) {
+ ntfs_error(sb, "Failed to map directory index page, error %ld.",
+ -PTR_ERR(folio));
+ err = PTR_ERR(folio);
+ goto err_out;
+ }
+
+ folio_lock(folio);
+ kaddr = kmalloc(PAGE_SIZE, GFP_NOFS);
+ if (!kaddr) {
+ err = -ENOMEM;
+ folio_unlock(folio);
+ folio_put(folio);
+ goto unm_err_out;
+ }
+
+ memcpy_from_folio(kaddr, folio, 0, PAGE_SIZE);
+ post_read_mst_fixup((struct ntfs_record *)kaddr, PAGE_SIZE);
+ folio_unlock(folio);
+ folio_put(folio);
+fast_descend_into_child_node:
+ /* Get to the index allocation block. */
+ ia = (struct index_block *)(kaddr + ((vcn <<
+ dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
+ /* Bounds checks. */
+ if ((u8 *)ia < kaddr || (u8 *)ia > kaddr + PAGE_SIZE) {
+ ntfs_error(sb,
+ "Out of bounds check failed. Corrupt directory inode 0x%lx or driver bug.",
+ dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ /* Catch multi sector transfer fixup errors. */
+ if (unlikely(!ntfs_is_indx_record(ia->magic))) {
+ ntfs_error(sb,
+ "Directory index record with vcn 0x%llx is corrupt. Corrupt inode 0x%lx. Run chkdsk.",
+ (unsigned long long)vcn, dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ if (le64_to_cpu(ia->index_block_vcn) != vcn) {
+ ntfs_error(sb,
+ "Actual VCN (0x%llx) of index buffer is different from expected VCN (0x%llx). Directory inode 0x%lx is corrupt or driver bug.",
+ (unsigned long long)le64_to_cpu(ia->index_block_vcn),
+ (unsigned long long)vcn, dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
+ dir_ni->itype.index.block_size) {
+ ntfs_error(sb,
+ "Index buffer (VCN 0x%llx) of directory inode 0x%lx has a size (%u) differing from the directory specified size (%u). Directory inode is corrupt or driver bug.",
+ (unsigned long long)vcn, dir_ni->mft_no,
+ le32_to_cpu(ia->index.allocated_size) + 0x18,
+ dir_ni->itype.index.block_size);
+ goto unm_err_out;
+ }
+ index_end = (u8 *)ia + dir_ni->itype.index.block_size;
+ if (index_end > kaddr + PAGE_SIZE) {
+ ntfs_error(sb,
+ "Index buffer (VCN 0x%llx) of directory inode 0x%lx crosses page boundary. Impossible! Cannot access! This is probably a bug in the driver.",
+ (unsigned long long)vcn, dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ index_end = (u8 *)&ia->index + le32_to_cpu(ia->index.index_length);
+ if (index_end > (u8 *)ia + dir_ni->itype.index.block_size) {
+ ntfs_error(sb,
+ "Size of index buffer (VCN 0x%llx) of directory inode 0x%lx exceeds maximum size.",
+ (unsigned long long)vcn, dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ /* The first index entry. */
+ ie = (struct index_entry *)((u8 *)&ia->index +
+ le32_to_cpu(ia->index.entries_offset));
+ /*
+ * Iterate similar to above big loop but applied to index buffer, thus
+ * loop until we exceed valid memory (corruption case) or until we
+ * reach the last entry.
+ */
+ for (;; ie = (struct index_entry *)((u8 *)ie + le16_to_cpu(ie->length))) {
+ /* Bounds checks. */
+ if ((u8 *)ie < (u8 *)ia ||
+ (u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+ (u8 *)ie + sizeof(struct index_entry_header) + le16_to_cpu(ie->key_length) >
+ index_end || (u8 *)ie + le16_to_cpu(ie->length) > index_end) {
+ ntfs_error(sb, "Index entry out of bounds in directory inode 0x%lx.",
+ dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ /*
+ * The last entry cannot contain a name. It can however contain
+ * a pointer to a child node in the B+tree so we just break out.
+ */
+ if (ie->flags & INDEX_ENTRY_END)
+ break;
+ /* Key length should not be zero if it is not last entry. */
+ if (!ie->key_length)
+ goto unm_err_out;
+ /* Check the consistency of an index entry */
+ if (ntfs_index_entry_inconsistent(NULL, vol, ie, COLLATION_FILE_NAME,
+ dir_ni->mft_no))
+ goto unm_err_out;
+ /*
+ * We perform a case sensitive comparison and if that matches
+ * we are done and return the mft reference of the inode (i.e.
+ * the inode number together with the sequence number for
+ * consistency checking). We convert it to cpu format before
+ * returning.
+ */
+ if (ntfs_are_names_equal(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length,
+ CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
+found_it2:
+ /*
+ * We have a perfect match, so we don't need to care
+ * about having matched imperfectly before, so we can
+ * free name and set *res to NULL.
+ * However, if the perfect match is a short file name,
+ * we need to signal this through *res, so that
+ * ntfs_lookup() can fix dcache aliasing issues.
+ * As an optimization we just reuse an existing
+ * allocation of *res.
+ */
+ if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
+ if (!name) {
+ name = kmalloc(sizeof(struct ntfs_name),
+ GFP_NOFS);
+ if (!name) {
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+ }
+ name->mref = le64_to_cpu(
+ ie->data.dir.indexed_file);
+ name->type = FILE_NAME_DOS;
+ name->len = 0;
+ *res = name;
+ } else {
+ kfree(name);
+ *res = NULL;
+ }
+ mref = le64_to_cpu(ie->data.dir.indexed_file);
+ kfree(kaddr);
+ iput(ia_vi);
+ return mref;
+ }
+ /*
+ * For a case insensitive mount, we also perform a case
+ * insensitive comparison (provided the file name is not in the
+ * POSIX namespace). If the comparison matches, and the name is
+ * in the WIN32 namespace, we cache the filename in *res so
+ * that the caller, ntfs_lookup(), can work on it. If the
+ * comparison matches, and the name is in the DOS namespace, we
+ * only cache the mft reference and the file name type (we set
+ * the name length to zero for simplicity).
+ */
+ if (!NVolCaseSensitive(vol) &&
+ ie->key.file_name.file_name_type &&
+ ntfs_are_names_equal(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length,
+ IGNORE_CASE, vol->upcase, vol->upcase_len)) {
+ int name_size = sizeof(struct ntfs_name);
+ u8 type = ie->key.file_name.file_name_type;
+ u8 len = ie->key.file_name.file_name_length;
+
+ /* Only one case insensitive matching name allowed. */
+ if (name) {
+ ntfs_error(sb,
+ "Found already allocated name in phase 2. Please run chkdsk");
+ kfree(kaddr);
+ goto dir_err_out;
+ }
+
+ if (type != FILE_NAME_DOS)
+ name_size += len * sizeof(__le16);
+ name = kmalloc(name_size, GFP_NOFS);
+ if (!name) {
+ err = -ENOMEM;
+ goto unm_err_out;
+ }
+ name->mref = le64_to_cpu(ie->data.dir.indexed_file);
+ name->type = type;
+ if (type != FILE_NAME_DOS) {
+ name->len = len;
+ memcpy(name->name, ie->key.file_name.file_name,
+ len * sizeof(__le16));
+ } else
+ name->len = 0;
+ *res = name;
+ }
+ /*
+ * Not a perfect match, need to do full blown collation so we
+ * know which way in the B+tree we have to go.
+ */
+ rc = ntfs_collate_names(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length, 1,
+ IGNORE_CASE, vol->upcase, vol->upcase_len);
+ /*
+ * If uname collates before the name of the current entry, there
+ * is definitely no such name in this index but we might need to
+ * descend into the B+tree so we just break out of the loop.
+ */
+ if (rc == -1)
+ break;
+ /* The names are not equal, continue the search. */
+ if (rc)
+ continue;
+ /*
+ * Names match with case insensitive comparison, now try the
+ * case sensitive comparison, which is required for proper
+ * collation.
+ */
+ rc = ntfs_collate_names(uname, uname_len,
+ (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length, 1,
+ CASE_SENSITIVE, vol->upcase, vol->upcase_len);
+ if (rc == -1)
+ break;
+ if (rc)
+ continue;
+ /*
+ * Perfect match, this will never happen as the
+ * ntfs_are_names_equal() call will have gotten a match but we
+ * still treat it correctly.
+ */
+ goto found_it2;
+ }
+ /*
+ * We have finished with this index buffer without success. Check for
+ * the presence of a child node.
+ */
+ if (ie->flags & INDEX_ENTRY_NODE) {
+ if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
+ ntfs_error(sb,
+ "Index entry with child node found in a leaf node in directory inode 0x%lx.",
+ dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ /* Child node present, descend into it. */
+ old_vcn = vcn;
+ vcn = le64_to_cpup((__le64 *)((u8 *)ie +
+ le16_to_cpu(ie->length) - 8));
+ if (vcn >= 0) {
+ /*
+ * If vcn is in the same page cache page as old_vcn we
+ * recycle the mapped page.
+ */
+ if ((old_vcn << vol->cluster_size_bits >> PAGE_SHIFT) ==
+ (vcn << vol->cluster_size_bits >> PAGE_SHIFT))
+ goto fast_descend_into_child_node;
+ kfree(kaddr);
+ kaddr = NULL;
+ goto descend_into_child_node;
+ }
+ ntfs_error(sb, "Negative child node vcn in directory inode 0x%lx.",
+ dir_ni->mft_no);
+ goto unm_err_out;
+ }
+ /*
+ * No child node present, return -ENOENT, unless we have got a matching
+ * name cached in name in which case return the mft reference
+ * associated with it.
+ */
+ if (name) {
+ kfree(kaddr);
+ iput(ia_vi);
+ return name->mref;
+ }
+ ntfs_debug("Entry not found.");
+ err = -ENOENT;
+unm_err_out:
+ kfree(kaddr);
+err_out:
+ if (!err)
+ err = -EIO;
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+ if (m)
+ unmap_mft_record(dir_ni);
+ kfree(name);
+ *res = NULL;
+ if (ia_vi && !IS_ERR(ia_vi))
+ iput(ia_vi);
+ return ERR_MREF(err);
+dir_err_out:
+ ntfs_error(sb, "Corrupt directory. Aborting lookup.");
+ goto err_out;
+}
+
+/**
+ * ntfs_filldir - ntfs specific filldir method
+ * @vol: current ntfs volume
+ * @ndir: ntfs inode of current directory
+ * @ia_page: page in which the index allocation buffer @ie is in resides
+ * @ie: current index entry
+ * @name: buffer to use for the converted name
+ * @actor: what to feed the entries to
+ *
+ * Convert the Unicode @name to the loaded NLS and pass it to the @filldir
+ * callback.
+ *
+ * If @ia_page is not NULL it is the locked page containing the index
+ * allocation block containing the index entry @ie.
+ *
+ * Note, we drop (and then reacquire) the page lock on @ia_page across the
+ * @filldir() call otherwise we would deadlock with NFSd when it calls ->lookup
+ * since ntfs_lookup() will lock the same page. As an optimization, we do not
+ * retake the lock if we are returning a non-zero value as ntfs_readdir()
+ * would need to drop the lock immediately anyway.
+ */
+static inline int ntfs_filldir(struct ntfs_volume *vol,
+ struct ntfs_inode *ndir, struct page *ia_page, struct index_entry *ie,
+ u8 *name, struct dir_context *actor)
+{
+ unsigned long mref;
+ int name_len;
+ unsigned int dt_type;
+ u8 name_type;
+
+ name_type = ie->key.file_name.file_name_type;
+ if (name_type == FILE_NAME_DOS) {
+ ntfs_debug("Skipping DOS name space entry.");
+ return 0;
+ }
+ if (MREF_LE(ie->data.dir.indexed_file) == FILE_root) {
+ ntfs_debug("Skipping root directory self reference entry.");
+ return 0;
+ }
+ if (MREF_LE(ie->data.dir.indexed_file) < FILE_first_user &&
+ !NVolShowSystemFiles(vol)) {
+ ntfs_debug("Skipping system file.");
+ return 0;
+ }
+
+ name_len = ntfs_ucstonls(vol, (__le16 *)&ie->key.file_name.file_name,
+ ie->key.file_name.file_name_length, &name,
+ NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1);
+ if (name_len <= 0) {
+ ntfs_warning(vol->sb, "Skipping unrepresentable inode 0x%llx.",
+ (long long)MREF_LE(ie->data.dir.indexed_file));
+ return 0;
+ }
+
+ mref = MREF_LE(ie->data.dir.indexed_file);
+ if (ie->key.file_name.file_attributes &
+ FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT)
+ dt_type = DT_DIR;
+ else if (ie->key.file_name.file_attributes & FILE_ATTR_REPARSE_POINT)
+ dt_type = ntfs_reparse_tag_dt_types(vol, mref);
+ else
+ dt_type = DT_REG;
+
+ /*
+ * Drop the page lock otherwise we deadlock with NFS when it calls
+ * ->lookup since ntfs_lookup() will lock the same page.
+ */
+ if (ia_page)
+ unlock_page(ia_page);
+ ntfs_debug("Calling filldir for %s with len %i, fpos 0x%llx, inode 0x%lx, DT_%s.",
+ name, name_len, actor->pos, mref, dt_type == DT_DIR ? "DIR" : "REG");
+ if (!dir_emit(actor, name, name_len, mref, dt_type))
+ return 1;
+ /* Relock the page but not if we are aborting ->readdir. */
+ if (ia_page)
+ lock_page(ia_page);
+ return 0;
+}
+
+struct ntfs_file_private {
+ void *key;
+ __le16 key_length;
+ bool end_in_iterate;
+ loff_t curr_pos;
+};
+
+struct ntfs_index_ra {
+ unsigned long start_index;
+ unsigned int count;
+ struct rb_node rb_node;
+};
+
+static void ntfs_insert_rb(struct ntfs_index_ra *nir, struct rb_root *root)
+{
+ struct rb_node **new = &root->rb_node, *parent = NULL;
+ struct ntfs_index_ra *cnir;
+
+ while (*new) {
+ parent = *new;
+ cnir = rb_entry(parent, struct ntfs_index_ra, rb_node);
+ if (nir->start_index < cnir->start_index)
+ new = &parent->rb_left;
+ else if (nir->start_index >= cnir->start_index + cnir->count)
+ new = &parent->rb_right;
+ else {
+ pr_err("nir start index : %ld, count : %d, cnir start_index : %ld, count : %d\n",
+ nir->start_index, nir->count, cnir->start_index, cnir->count);
+ BUG_ON(1);
+ }
+ }
+
+ rb_link_node(&nir->rb_node, parent, new);
+ rb_insert_color(&nir->rb_node, root);
+}
+
+static int ntfs_ia_blocks_readahead(struct ntfs_inode *ia_ni, loff_t pos)
+{
+ unsigned long dir_start_index, dir_end_index;
+ struct inode *ia_vi = VFS_I(ia_ni);
+ struct file_ra_state *dir_ra;
+
+ dir_end_index = (i_size_read(ia_vi) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ dir_start_index = (pos + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ if (dir_start_index >= dir_end_index)
+ return 0;
+
+ dir_ra = kzalloc(sizeof(*dir_ra), GFP_NOFS);
+ if (!dir_ra)
+ return -ENOMEM;
+
+ file_ra_state_init(dir_ra, ia_vi->i_mapping);
+ dir_end_index = (i_size_read(ia_vi) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ dir_start_index = (pos + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ dir_ra->ra_pages = dir_end_index - dir_start_index;
+ page_cache_sync_readahead(ia_vi->i_mapping, dir_ra, NULL,
+ dir_start_index, dir_end_index - dir_start_index);
+ kfree(dir_ra);
+
+ return 0;
+}
+
+static int ntfs_readdir(struct file *file, struct dir_context *actor)
+{
+ struct inode *vdir = file_inode(file);
+ struct super_block *sb = vdir->i_sb;
+ struct ntfs_inode *ndir = NTFS_I(vdir);
+ struct ntfs_volume *vol = NTFS_SB(sb);
+ struct ntfs_attr_search_ctx *ctx = NULL;
+ struct ntfs_index_context *ictx = NULL;
+ u8 *name;
+ struct index_root *ir;
+ struct index_entry *next = NULL;
+ struct ntfs_file_private *private = NULL;
+ int err = 0;
+ loff_t ie_pos = 2; /* initialize it with dot and dotdot size */
+ struct ntfs_index_ra *nir = NULL;
+ unsigned long index;
+ struct rb_root ra_root = RB_ROOT;
+ struct file_ra_state *ra;
+
+ ntfs_debug("Entering for inode 0x%lx, fpos 0x%llx.",
+ vdir->i_ino, actor->pos);
+
+ if (file->private_data) {
+ private = file->private_data;
+
+ if (actor->pos != private->curr_pos) {
+ /*
+ * If actor->pos is different from the previous passed
+ * one, Discard the private->key and fill dirent buffer
+ * with linear lookup.
+ */
+ kfree(private->key);
+ private->key = NULL;
+ private->end_in_iterate = false;
+ } else if (private->end_in_iterate) {
+ kfree(private->key);
+ kfree(file->private_data);
+ file->private_data = NULL;
+ return 0;
+ }
+ }
+
+ /* Emulate . and .. for all directories. */
+ if (!dir_emit_dots(file, actor))
+ return 0;
+
+ /*
+ * Allocate a buffer to store the current name being processed
+ * converted to format determined by current NLS.
+ */
+ name = kmalloc(NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1, GFP_NOFS);
+ if (unlikely(!name))
+ return -ENOMEM;
+
+ mutex_lock_nested(&ndir->mrec_lock, NTFS_INODE_MUTEX_PARENT);
+ ictx = ntfs_index_ctx_get(ndir, I30, 4);
+ if (!ictx) {
+ kfree(name);
+ mutex_unlock(&ndir->mrec_lock);
+ return -ENOMEM;
+ }
+
+ ra = kzalloc(sizeof(struct file_ra_state), GFP_NOFS);
+ if (!ra) {
+ kfree(name);
+ ntfs_index_ctx_put(ictx);
+ mutex_unlock(&ndir->mrec_lock);
+ return -ENOMEM;
+ }
+ file_ra_state_init(ra, vol->mft_ino->i_mapping);
+
+ if (private && private->key) {
+ /*
+ * Find index witk private->key using ntfs_index_lookup()
+ * instead of linear index lookup.
+ */
+ err = ntfs_index_lookup(private->key,
+ le16_to_cpu(private->key_length),
+ ictx);
+ if (!err) {
+ next = ictx->entry;
+ /*
+ * Update ie_pos with private->curr_pos
+ * to make next d_off of dirent correct.
+ */
+ ie_pos = private->curr_pos;
+
+ if (actor->pos > vol->mft_record_size && ictx->ia_ni) {
+ err = ntfs_ia_blocks_readahead(ictx->ia_ni, actor->pos);
+ if (err)
+ goto out;
+ }
+
+ goto nextdir;
+ } else {
+ goto out;
+ }
+ } else if (!private) {
+ private = kzalloc(sizeof(struct ntfs_file_private), GFP_KERNEL);
+ if (!private) {
+ err = -ENOMEM;
+ goto out;
+ }
+ file->private_data = private;
+ }
+
+ ctx = ntfs_attr_get_search_ctx(ndir, NULL);
+ if (!ctx) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* Find the index root attribute in the mft record. */
+ if (ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL, 0,
+ ctx)) {
+ ntfs_error(sb, "Index root attribute missing in directory inode %ld",
+ ndir->mft_no);
+ ntfs_attr_put_search_ctx(ctx);
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* Get to the index root value. */
+ ir = (struct index_root *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+ ictx->ir = ir;
+ ictx->actx = ctx;
+ ictx->parent_vcn[ictx->pindex] = VCN_INDEX_ROOT_PARENT;
+ ictx->is_in_root = true;
+ ictx->parent_pos[ictx->pindex] = 0;
+
+ ictx->block_size = le32_to_cpu(ir->index_block_size);
+ if (ictx->block_size < NTFS_BLOCK_SIZE) {
+ ntfs_error(sb, "Index block size (%d) is smaller than the sector size (%d)",
+ ictx->block_size, NTFS_BLOCK_SIZE);
+ err = -EIO;
+ goto out;
+ }
+
+ if (vol->cluster_size <= ictx->block_size)
+ ictx->vcn_size_bits = vol->cluster_size_bits;
+ else
+ ictx->vcn_size_bits = NTFS_BLOCK_SIZE_BITS;
+
+ /* The first index entry. */
+ next = (struct index_entry *)((u8 *)&ir->index +
+ le32_to_cpu(ir->index.entries_offset));
+
+ if (next->flags & INDEX_ENTRY_NODE) {
+ ictx->ia_ni = ntfs_ia_open(ictx, ictx->idx_ni);
+ if (!ictx->ia_ni) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = ntfs_ia_blocks_readahead(ictx->ia_ni, actor->pos);
+ if (err)
+ goto out;
+ }
+
+ if (next->flags & INDEX_ENTRY_NODE) {
+ next = ntfs_index_walk_down(next, ictx);
+ if (!next) {
+ err = -EIO;
+ goto out;
+ }
+ }
+
+ if (next && !(next->flags & INDEX_ENTRY_END))
+ goto nextdir;
+
+ while ((next = ntfs_index_next(next, ictx)) != NULL) {
+nextdir:
+ /* Check the consistency of an index entry */
+ if (ntfs_index_entry_inconsistent(ictx, vol, next, COLLATION_FILE_NAME,
+ ndir->mft_no)) {
+ err = -EIO;
+ goto out;
+ }
+
+ if (ie_pos < actor->pos) {
+ ie_pos += next->length;
+ continue;
+ }
+
+ actor->pos = ie_pos;
+
+ index = (MREF_LE(next->data.dir.indexed_file) <<
+ vol->mft_record_size_bits) >> PAGE_SHIFT;
+ if (nir) {
+ struct ntfs_index_ra *cnir;
+ struct rb_node *node = ra_root.rb_node;
+
+ if (nir->start_index <= index &&
+ index < nir->start_index + nir->count) {
+ /* No behavior */
+ goto filldir;
+ }
+
+ while (node) {
+ cnir = rb_entry(node, struct ntfs_index_ra, rb_node);
+ if (cnir->start_index <= index &&
+ index < cnir->start_index + cnir->count) {
+ goto filldir;
+ } else if (cnir->start_index + cnir->count == index) {
+ cnir->count++;
+ goto filldir;
+ } else if (!cnir->start_index && cnir->start_index - 1 == index) {
+ cnir->start_index = index;
+ goto filldir;
+ }
+
+ if (index < cnir->start_index)
+ node = node->rb_left;
+ else if (index >= cnir->start_index + cnir->count)
+ node = node->rb_right;
+ }
+
+ if (nir->start_index + nir->count == index) {
+ nir->count++;
+ } else if (!nir->start_index && nir->start_index - 1 == index) {
+ nir->start_index = index;
+ } else if (nir->count > 2) {
+ ntfs_insert_rb(nir, &ra_root);
+ nir = NULL;
+ } else {
+ nir->start_index = index;
+ nir->count = 1;
+ }
+ }
+
+ if (!nir) {
+ nir = kzalloc(sizeof(struct ntfs_index_ra), GFP_KERNEL);
+ if (nir) {
+ nir->start_index = index;
+ nir->count = 1;
+ }
+ }
+
+filldir:
+ /* Submit the name to the filldir callback. */
+ err = ntfs_filldir(vol, ndir, NULL, next, name, actor);
+ if (err) {
+ /*
+ * Store index key value to file private_data to start
+ * from current index offset on next round.
+ */
+ private = file->private_data;
+ kfree(private->key);
+ private->key = kmalloc(le16_to_cpu(next->key_length), GFP_KERNEL);
+ if (!private->key) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ memcpy(private->key, &next->key.file_name, le16_to_cpu(next->key_length));
+ private->key_length = next->key_length;
+ break;
+ }
+ ie_pos += next->length;
+ }
+
+ if (!err)
+ private->end_in_iterate = true;
+ else
+ err = 0;
+
+ private->curr_pos = actor->pos = ie_pos;
+out:
+ while (!RB_EMPTY_ROOT(&ra_root)) {
+ struct ntfs_index_ra *cnir;
+ struct rb_node *node;
+
+ node = rb_first(&ra_root);
+ cnir = rb_entry(node, struct ntfs_index_ra, rb_node);
+ ra->ra_pages = cnir->count;
+ page_cache_sync_readahead(vol->mft_ino->i_mapping, ra, NULL,
+ cnir->start_index, cnir->count);
+ rb_erase(node, &ra_root);
+ kfree(cnir);
+ }
+
+ if (err) {
+ private->curr_pos = actor->pos;
+ private->end_in_iterate = true;
+ err = 0;
+ }
+ ntfs_index_ctx_put(ictx);
+ kfree(name);
+ kfree(nir);
+ kfree(ra);
+ mutex_unlock(&ndir->mrec_lock);
+ return err;
+}
+
+int ntfs_check_empty_dir(struct ntfs_inode *ni, struct mft_record *ni_mrec)
+{
+ struct ntfs_attr_search_ctx *ctx;
+ int ret = 0;
+
+ if (!(ni_mrec->flags & MFT_RECORD_IS_DIRECTORY))
+ return 0;
+
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx) {
+ ntfs_error(ni->vol->sb, "Failed to get search context");
+ return -ENOMEM;
+ }
+
+ /* Find the index root attribute in the mft record. */
+ ret = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
+ 0, ctx);
+ if (ret) {
+ ntfs_error(ni->vol->sb, "Index root attribute missing in directory inode %lld",
+ (unsigned long long)ni->mft_no);
+ ntfs_attr_put_search_ctx(ctx);
+ return ret;
+ }
+
+ /* Non-empty directory? */
+ if (ctx->attr->data.resident.value_length !=
+ sizeof(struct index_root) + sizeof(struct index_entry_header)) {
+ /* Both ENOTEMPTY and EEXIST are ok. We use the more common. */
+ ret = -ENOTEMPTY;
+ ntfs_debug("Directory is not empty\n");
+ }
+
+ ntfs_attr_put_search_ctx(ctx);
+
+ return ret;
+}
+
+/**
+ * ntfs_dir_open - called when an inode is about to be opened
+ * @vi: inode to be opened
+ * @filp: file structure describing the inode
+ *
+ * Limit directory size to the page cache limit on architectures where unsigned
+ * long is 32-bits. This is the most we can do for now without overflowing the
+ * page cache page index. Doing it this way means we don't run into problems
+ * because of existing too large directories. It would be better to allow the
+ * user to read the accessible part of the directory but I doubt very much
+ * anyone is going to hit this check on a 32-bit architecture, so there is no
+ * point in adding the extra complexity required to support this.
+ *
+ * On 64-bit architectures, the check is hopefully optimized away by the
+ * compiler.
+ */
+static int ntfs_dir_open(struct inode *vi, struct file *filp)
+{
+ if (sizeof(unsigned long) < 8) {
+ if (i_size_read(vi) > MAX_LFS_FILESIZE)
+ return -EFBIG;
+ }
+ return 0;
+}
+
+static int ntfs_dir_release(struct inode *vi, struct file *filp)
+{
+ if (filp->private_data) {
+ kfree(((struct ntfs_file_private *)filp->private_data)->key);
+ kfree(filp->private_data);
+ filp->private_data = NULL;
+ }
+ return 0;
+}
+
+/**
+ * ntfs_dir_fsync - sync a directory to disk
+ * @filp: file describing the directory to be synced
+ * @start: start offset to be synced
+ * @end: end offset to be synced
+ * @datasync: if non-zero only flush user data and not metadata
+ *
+ * Data integrity sync of a directory to disk. Used for fsync, fdatasync, and
+ * msync system calls. This function is based on file.c::ntfs_file_fsync().
+ *
+ * Write the mft record and all associated extent mft records as well as the
+ * $INDEX_ALLOCATION and $BITMAP attributes and then sync the block device.
+ *
+ * If @datasync is true, we do not wait on the inode(s) to be written out
+ * but we always wait on the page cache pages to be written out.
+ *
+ * Note: In the past @filp could be NULL so we ignore it as we don't need it
+ * anyway.
+ *
+ * Locking: Caller must hold i_mutex on the inode.
+ */
+static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
+ int datasync)
+{
+ struct inode *bmp_vi, *vi = filp->f_mapping->host;
+ struct ntfs_volume *vol = NTFS_I(vi)->vol;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_attr_search_ctx *ctx;
+ struct inode *parent_vi, *ia_vi;
+ int err, ret;
+ struct ntfs_attr na;
+
+ ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx)
+ return -ENOMEM;
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+ while (!(err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0, ctx))) {
+ struct file_name_attr *fn = (struct file_name_attr *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+ parent_vi = ntfs_iget(vi->i_sb, MREF_LE(fn->parent_directory));
+ if (IS_ERR(parent_vi))
+ continue;
+ mutex_lock_nested(&NTFS_I(parent_vi)->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+ ia_vi = ntfs_index_iget(parent_vi, I30, 4);
+ mutex_unlock(&NTFS_I(parent_vi)->mrec_lock);
+ if (IS_ERR(ia_vi)) {
+ iput(parent_vi);
+ continue;
+ }
+ write_inode_now(ia_vi, 1);
+ iput(ia_vi);
+ write_inode_now(parent_vi, 1);
+ iput(parent_vi);
+ }
+ mutex_unlock(&ni->mrec_lock);
+ ntfs_attr_put_search_ctx(ctx);
+
+ err = file_write_and_wait_range(filp, start, end);
+ if (err)
+ return err;
+ inode_lock(vi);
+
+ BUG_ON(!S_ISDIR(vi->i_mode));
+ /* If the bitmap attribute inode is in memory sync it, too. */
+ na.mft_no = vi->i_ino;
+ na.type = AT_BITMAP;
+ na.name = I30;
+ na.name_len = 4;
+ bmp_vi = ilookup5(vi->i_sb, vi->i_ino, ntfs_test_inode, &na);
+ if (bmp_vi) {
+ write_inode_now(bmp_vi, !datasync);
+ iput(bmp_vi);
+ }
+ ret = __ntfs_write_inode(vi, 1);
+
+ write_inode_now(vi, !datasync);
+
+ write_inode_now(vol->mftbmp_ino, 1);
+ down_write(&vol->lcnbmp_lock);
+ write_inode_now(vol->lcnbmp_ino, 1);
+ up_write(&vol->lcnbmp_lock);
+ write_inode_now(vol->mft_ino, 1);
+
+ err = sync_blockdev(vi->i_sb->s_bdev);
+ if (unlikely(err && !ret))
+ ret = err;
+ if (likely(!ret))
+ ntfs_debug("Done.");
+ else
+ ntfs_warning(vi->i_sb,
+ "Failed to f%ssync inode 0x%lx. Error %u.",
+ datasync ? "data" : "", vi->i_ino, -ret);
+ inode_unlock(vi);
+ return ret;
+}
+
+const struct file_operations ntfs_dir_ops = {
+ .llseek = generic_file_llseek, /* Seek inside directory. */
+ .read = generic_read_dir, /* Return -EISDIR. */
+ .iterate_shared = ntfs_readdir, /* Read directory contents. */
+ .fsync = ntfs_dir_fsync, /* Sync a directory to disk. */
+ .open = ntfs_dir_open, /* Open directory. */
+ .release = ntfs_dir_release,
+ .unlocked_ioctl = ntfs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ntfs_compat_ioctl,
+#endif
+};
diff --git a/fs/ntfsplus/index.c b/fs/ntfsplus/index.c
new file mode 100644
index 000000000000..dce7602ccb03
--- /dev/null
+++ b/fs/ntfsplus/index.c
@@ -0,0 +1,2114 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel index handling. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ *
+ * Part of this file is based on code from the NTFS-3G project.
+ * and is copyrighted by the respective authors below:
+ * Copyright (c) 2004-2005 Anton Altaparmakov
+ * Copyright (c) 2004-2005 Richard Russon
+ * Copyright (c) 2005-2006 Yura Pakhuchiy
+ * Copyright (c) 2005-2008 Szabolcs Szakacsits
+ * Copyright (c) 2007-2021 Jean-Pierre Andre
+ */
+
+#include "collate.h"
+#include "index.h"
+#include "ntfs.h"
+#include "misc.h"
+#include "attrlist.h"
+
+/*
+ * ntfs_index_entry_inconsistent - Check the consistency of an index entry
+ *
+ * Make sure data and key do not overflow from entry.
+ * As a side effect, an entry with zero length is rejected.
+ * This entry must be a full one (no INDEX_ENTRY_END flag), and its
+ * length must have been checked beforehand to not overflow from the
+ * index record.
+ */
+int ntfs_index_entry_inconsistent(struct ntfs_index_context *icx,
+ struct ntfs_volume *vol, const struct index_entry *ie,
+ __le32 collation_rule, u64 inum)
+{
+ if (icx) {
+ struct index_header *ih;
+ u8 *ie_start, *ie_end;
+
+ if (icx->is_in_root)
+ ih = &icx->ir->index;
+ else
+ ih = &icx->ib->index;
+
+ if ((le32_to_cpu(ih->index_length) > le32_to_cpu(ih->allocated_size)) ||
+ (le32_to_cpu(ih->index_length) > icx->block_size)) {
+ ntfs_error(vol->sb, "%s Index entry(0x%p)'s length is too big.",
+ icx->is_in_root ? "Index root" : "Index block",
+ (u8 *)icx->entry);
+ return -EINVAL;
+ }
+
+ ie_start = (u8 *)ih + le32_to_cpu(ih->entries_offset);
+ ie_end = (u8 *)ih + le32_to_cpu(ih->index_length);
+
+ if (ie_start > (u8 *)ie ||
+ ie_end <= ((u8 *)ie + ie->length) ||
+ ie->length > le32_to_cpu(ih->allocated_size) ||
+ ie->length > icx->block_size) {
+ ntfs_error(vol->sb, "Index entry(0x%p) is out of range from %s",
+ (u8 *)icx->entry,
+ icx->is_in_root ? "index root" : "index block");
+ return -EIO;
+ }
+ }
+
+ if (ie->key_length &&
+ ((le16_to_cpu(ie->key_length) + offsetof(struct index_entry, key)) >
+ le16_to_cpu(ie->length))) {
+ ntfs_error(vol->sb, "Overflow from index entry in inode %lld\n",
+ (long long)inum);
+ return -EIO;
+
+ } else {
+ if (collation_rule == COLLATION_FILE_NAME) {
+ if ((offsetof(struct index_entry, key.file_name.file_name) +
+ ie->key.file_name.file_name_length * sizeof(__le16)) >
+ le16_to_cpu(ie->length)) {
+ ntfs_error(vol->sb,
+ "File name overflow from index entry in inode %lld\n",
+ (long long)inum);
+ return -EIO;
+ }
+ } else {
+ if (ie->data.vi.data_length &&
+ ((le16_to_cpu(ie->data.vi.data_offset) +
+ le16_to_cpu(ie->data.vi.data_length)) >
+ le16_to_cpu(ie->length))) {
+ ntfs_error(vol->sb,
+ "Data overflow from index entry in inode %lld\n",
+ (long long)inum);
+ return -EIO;
+ }
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * ntfs_index_entry_mark_dirty - mark an index entry dirty
+ * @ictx: ntfs index context describing the index entry
+ *
+ * Mark the index entry described by the index entry context @ictx dirty.
+ *
+ * If the index entry is in the index root attribute, simply mark the inode
+ * containing the index root attribute dirty. This ensures the mftrecord, and
+ * hence the index root attribute, will be written out to disk later.
+ *
+ * If the index entry is in an index block belonging to the index allocation
+ * attribute, set ib_dirty to true, thus index block will be updated during
+ * ntfs_index_ctx_put.
+ */
+void ntfs_index_entry_mark_dirty(struct ntfs_index_context *ictx)
+{
+ if (ictx->is_in_root)
+ mark_mft_record_dirty(ictx->actx->ntfs_ino);
+ else if (ictx->ib)
+ ictx->ib_dirty = true;
+}
+
+static s64 ntfs_ib_vcn_to_pos(struct ntfs_index_context *icx, s64 vcn)
+{
+ return vcn << icx->vcn_size_bits;
+}
+
+static s64 ntfs_ib_pos_to_vcn(struct ntfs_index_context *icx, s64 pos)
+{
+ return pos >> icx->vcn_size_bits;
+}
+
+static int ntfs_ib_write(struct ntfs_index_context *icx, struct index_block *ib)
+{
+ s64 ret, vcn = le64_to_cpu(ib->index_block_vcn);
+
+ ntfs_debug("vcn: %lld\n", vcn);
+
+ ret = pre_write_mst_fixup((struct ntfs_record *)ib, icx->block_size);
+ if (ret)
+ return -EIO;
+
+ ret = ntfs_inode_attr_pwrite(VFS_I(icx->ia_ni),
+ ntfs_ib_vcn_to_pos(icx, vcn), icx->block_size,
+ (u8 *)ib, icx->sync_write);
+ if (ret != icx->block_size) {
+ ntfs_debug("Failed to write index block %lld, inode %llu",
+ vcn, (unsigned long long)icx->idx_ni->mft_no);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int ntfs_icx_ib_write(struct ntfs_index_context *icx)
+{
+ int err;
+
+ err = ntfs_ib_write(icx, icx->ib);
+ if (err)
+ return err;
+
+ icx->ib_dirty = false;
+
+ return 0;
+}
+
+int ntfs_icx_ib_sync_write(struct ntfs_index_context *icx)
+{
+ int ret;
+
+ if (icx->ib_dirty == false)
+ return 0;
+
+ icx->sync_write = true;
+
+ ret = ntfs_ib_write(icx, icx->ib);
+ if (!ret) {
+ ntfs_free(icx->ib);
+ icx->ib = NULL;
+ icx->ib_dirty = false;
+ } else {
+ post_write_mst_fixup((struct ntfs_record *)icx->ib);
+ icx->sync_write = false;
+ }
+
+ return ret;
+}
+
+/**
+ * ntfs_index_ctx_get - allocate and initialize a new index context
+ * @ni: ntfs inode with which to initialize the context
+ * @name: name of the which context describes
+ * @name_len: length of the index name
+ *
+ * Allocate a new index context, initialize it with @ni and return it.
+ * Return NULL if allocation failed.
+ */
+struct ntfs_index_context *ntfs_index_ctx_get(struct ntfs_inode *ni,
+ __le16 *name, u32 name_len)
+{
+ struct ntfs_index_context *icx;
+
+ ntfs_debug("Entering\n");
+
+ if (!ni)
+ return NULL;
+
+ if (ni->nr_extents == -1)
+ ni = ni->ext.base_ntfs_ino;
+
+ icx = kmem_cache_alloc(ntfs_index_ctx_cache, GFP_NOFS);
+ if (icx)
+ *icx = (struct ntfs_index_context) {
+ .idx_ni = ni,
+ .name = name,
+ .name_len = name_len,
+ };
+ return icx;
+}
+
+static void ntfs_index_ctx_free(struct ntfs_index_context *icx)
+{
+ ntfs_debug("Entering\n");
+
+ if (icx->actx) {
+ ntfs_attr_put_search_ctx(icx->actx);
+ icx->actx = NULL;
+ }
+
+ if (!icx->is_in_root) {
+ if (icx->ib_dirty)
+ ntfs_ib_write(icx, icx->ib);
+ ntfs_free(icx->ib);
+ icx->ib = NULL;
+ }
+
+ if (icx->ia_ni) {
+ iput(VFS_I(icx->ia_ni));
+ icx->ia_ni = NULL;
+ }
+}
+
+/**
+ * ntfs_index_ctx_put - release an index context
+ * @icx: index context to free
+ *
+ * Release the index context @icx, releasing all associated resources.
+ */
+void ntfs_index_ctx_put(struct ntfs_index_context *icx)
+{
+ ntfs_index_ctx_free(icx);
+ kmem_cache_free(ntfs_index_ctx_cache, icx);
+}
+
+/**
+ * ntfs_index_ctx_reinit - reinitialize an index context
+ * @icx: index context to reinitialize
+ *
+ * Reinitialize the index context @icx so it can be used for ntfs_index_lookup.
+ */
+void ntfs_index_ctx_reinit(struct ntfs_index_context *icx)
+{
+ ntfs_debug("Entering\n");
+
+ ntfs_index_ctx_free(icx);
+
+ *icx = (struct ntfs_index_context) {
+ .idx_ni = icx->idx_ni,
+ .name = icx->name,
+ .name_len = icx->name_len,
+ };
+}
+
+static __le64 *ntfs_ie_get_vcn_addr(struct index_entry *ie)
+{
+ return (__le64 *)((u8 *)ie + le16_to_cpu(ie->length) - sizeof(s64));
+}
+
+/**
+ * Get the subnode vcn to which the index entry refers.
+ */
+static s64 ntfs_ie_get_vcn(struct index_entry *ie)
+{
+ return le64_to_cpup(ntfs_ie_get_vcn_addr(ie));
+}
+
+static struct index_entry *ntfs_ie_get_first(struct index_header *ih)
+{
+ return (struct index_entry *)((u8 *)ih + le32_to_cpu(ih->entries_offset));
+}
+
+static struct index_entry *ntfs_ie_get_next(struct index_entry *ie)
+{
+ return (struct index_entry *)((char *)ie + le16_to_cpu(ie->length));
+}
+
+static u8 *ntfs_ie_get_end(struct index_header *ih)
+{
+ return (u8 *)ih + le32_to_cpu(ih->index_length);
+}
+
+static int ntfs_ie_end(struct index_entry *ie)
+{
+ return ie->flags & INDEX_ENTRY_END || !ie->length;
+}
+
+/**
+ * Find the last entry in the index block
+ */
+static struct index_entry *ntfs_ie_get_last(struct index_entry *ie, char *ies_end)
+{
+ ntfs_debug("Entering\n");
+
+ while ((char *)ie < ies_end && !ntfs_ie_end(ie))
+ ie = ntfs_ie_get_next(ie);
+
+ return ie;
+}
+
+static struct index_entry *ntfs_ie_get_by_pos(struct index_header *ih, int pos)
+{
+ struct index_entry *ie;
+
+ ntfs_debug("pos: %d\n", pos);
+
+ ie = ntfs_ie_get_first(ih);
+
+ while (pos-- > 0)
+ ie = ntfs_ie_get_next(ie);
+
+ return ie;
+}
+
+static struct index_entry *ntfs_ie_prev(struct index_header *ih, struct index_entry *ie)
+{
+ struct index_entry *ie_prev = NULL;
+ struct index_entry *tmp;
+
+ ntfs_debug("Entering\n");
+
+ tmp = ntfs_ie_get_first(ih);
+
+ while (tmp != ie) {
+ ie_prev = tmp;
+ tmp = ntfs_ie_get_next(tmp);
+ }
+
+ return ie_prev;
+}
+
+static int ntfs_ih_numof_entries(struct index_header *ih)
+{
+ int n;
+ struct index_entry *ie;
+ u8 *end;
+
+ ntfs_debug("Entering\n");
+
+ end = ntfs_ie_get_end(ih);
+ ie = ntfs_ie_get_first(ih);
+ for (n = 0; !ntfs_ie_end(ie) && (u8 *)ie < end; n++)
+ ie = ntfs_ie_get_next(ie);
+ return n;
+}
+
+static int ntfs_ih_one_entry(struct index_header *ih)
+{
+ return (ntfs_ih_numof_entries(ih) == 1);
+}
+
+static int ntfs_ih_zero_entry(struct index_header *ih)
+{
+ return (ntfs_ih_numof_entries(ih) == 0);
+}
+
+static void ntfs_ie_delete(struct index_header *ih, struct index_entry *ie)
+{
+ u32 new_size;
+
+ ntfs_debug("Entering\n");
+
+ new_size = le32_to_cpu(ih->index_length) - le16_to_cpu(ie->length);
+ ih->index_length = cpu_to_le32(new_size);
+ memmove(ie, (u8 *)ie + le16_to_cpu(ie->length),
+ new_size - ((u8 *)ie - (u8 *)ih));
+}
+
+static void ntfs_ie_set_vcn(struct index_entry *ie, s64 vcn)
+{
+ *ntfs_ie_get_vcn_addr(ie) = cpu_to_le64(vcn);
+}
+
+/**
+ * Insert @ie index entry at @pos entry. Used @ih values should be ok already.
+ */
+static void ntfs_ie_insert(struct index_header *ih, struct index_entry *ie,
+ struct index_entry *pos)
+{
+ int ie_size = le16_to_cpu(ie->length);
+
+ ntfs_debug("Entering\n");
+
+ ih->index_length = cpu_to_le32(le32_to_cpu(ih->index_length) + ie_size);
+ memmove((u8 *)pos + ie_size, pos,
+ le32_to_cpu(ih->index_length) - ((u8 *)pos - (u8 *)ih) - ie_size);
+ memcpy(pos, ie, ie_size);
+}
+
+static struct index_entry *ntfs_ie_dup(struct index_entry *ie)
+{
+ struct index_entry *dup;
+
+ ntfs_debug("Entering\n");
+
+ dup = ntfs_malloc_nofs(le16_to_cpu(ie->length));
+ if (dup)
+ memcpy(dup, ie, le16_to_cpu(ie->length));
+
+ return dup;
+}
+
+static struct index_entry *ntfs_ie_dup_novcn(struct index_entry *ie)
+{
+ struct index_entry *dup;
+ int size = le16_to_cpu(ie->length);
+
+ ntfs_debug("Entering\n");
+
+ if (ie->flags & INDEX_ENTRY_NODE)
+ size -= sizeof(s64);
+
+ dup = ntfs_malloc_nofs(size);
+ if (dup) {
+ memcpy(dup, ie, size);
+ dup->flags &= ~INDEX_ENTRY_NODE;
+ dup->length = cpu_to_le16(size);
+ }
+ return dup;
+}
+
+/*
+ * Check the consistency of an index block
+ *
+ * Make sure the index block does not overflow from the index record.
+ * The size of block is assumed to have been checked to be what is
+ * defined in the index root.
+ *
+ * Returns 0 if no error was found -1 otherwise (with errno unchanged)
+ *
+ * |<--->| offsetof(struct index_block, index)
+ * | |<--->| sizeof(struct index_header)
+ * | | |
+ * | | | seq index entries unused
+ * |=====|=====|=====|===========================|==============|
+ * | | | | |
+ * | |<--------->| entries_offset | |
+ * | |<---------------- index_length ------->| |
+ * | |<--------------------- allocated_size --------------->|
+ * |<--------------------------- block_size ------------------->|
+ *
+ * size(struct index_header) <= ent_offset < ind_length <= alloc_size < bk_size
+ */
+static int ntfs_index_block_inconsistent(struct ntfs_index_context *icx,
+ struct index_block *ib, s64 vcn)
+{
+ u32 ib_size = (unsigned int)le32_to_cpu(ib->index.allocated_size) +
+ offsetof(struct index_block, index);
+ struct super_block *sb = icx->idx_ni->vol->sb;
+ unsigned long long inum = icx->idx_ni->mft_no;
+
+ ntfs_debug("Entering\n");
+
+ if (!ntfs_is_indx_record(ib->magic)) {
+
+ ntfs_error(sb, "Corrupt index block signature: vcn %lld inode %llu\n",
+ vcn, (unsigned long long)icx->idx_ni->mft_no);
+ return -1;
+ }
+
+ if (le64_to_cpu(ib->index_block_vcn) != vcn) {
+ ntfs_error(sb,
+ "Corrupt index block: s64 (%lld) is different from expected s64 (%lld) in inode %llu\n",
+ (long long)le64_to_cpu(ib->index_block_vcn),
+ vcn, inum);
+ return -1;
+ }
+
+ if (ib_size != icx->block_size) {
+ ntfs_error(sb,
+ "Corrupt index block : s64 (%lld) of inode %llu has a size (%u) differing from the index specified size (%u)\n",
+ vcn, inum, ib_size, icx->block_size);
+ return -1;
+ }
+
+ if (le32_to_cpu(ib->index.entries_offset) < sizeof(struct index_header)) {
+ ntfs_error(sb, "Invalid index entry offset in inode %lld\n", inum);
+ return -1;
+ }
+ if (le32_to_cpu(ib->index.index_length) <=
+ le32_to_cpu(ib->index.entries_offset)) {
+ ntfs_error(sb, "No space for index entries in inode %lld\n", inum);
+ return -1;
+ }
+ if (le32_to_cpu(ib->index.allocated_size) <
+ le32_to_cpu(ib->index.index_length)) {
+ ntfs_error(sb, "Index entries overflow in inode %lld\n", inum);
+ return -1;
+ }
+
+ return 0;
+}
+
+static struct index_root *ntfs_ir_lookup(struct ntfs_inode *ni, __le16 *name,
+ u32 name_len, struct ntfs_attr_search_ctx **ctx)
+{
+ struct attr_record *a;
+ struct index_root *ir = NULL;
+
+ ntfs_debug("Entering\n");
+ *ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!*ctx) {
+ ntfs_error(ni->vol->sb, "%s, Failed to get search context", __func__);
+ return NULL;
+ }
+
+ if (ntfs_attr_lookup(AT_INDEX_ROOT, name, name_len, CASE_SENSITIVE,
+ 0, NULL, 0, *ctx)) {
+ ntfs_error(ni->vol->sb, "Failed to lookup $INDEX_ROOT");
+ goto err_out;
+ }
+
+ a = (*ctx)->attr;
+ if (a->non_resident) {
+ ntfs_error(ni->vol->sb, "Non-resident $INDEX_ROOT detected");
+ goto err_out;
+ }
+
+ ir = (struct index_root *)((char *)a + le16_to_cpu(a->data.resident.value_offset));
+err_out:
+ if (!ir) {
+ ntfs_attr_put_search_ctx(*ctx);
+ *ctx = NULL;
+ }
+ return ir;
+}
+
+static struct index_root *ntfs_ir_lookup2(struct ntfs_inode *ni, __le16 *name, u32 len)
+{
+ struct ntfs_attr_search_ctx *ctx;
+ struct index_root *ir;
+
+ ir = ntfs_ir_lookup(ni, name, len, &ctx);
+ if (ir)
+ ntfs_attr_put_search_ctx(ctx);
+ return ir;
+}
+
+/**
+ * Find a key in the index block.
+ */
+static int ntfs_ie_lookup(const void *key, const int key_len,
+ struct ntfs_index_context *icx, struct index_header *ih,
+ s64 *vcn, struct index_entry **ie_out)
+{
+ struct index_entry *ie;
+ u8 *index_end;
+ int rc, item = 0;
+
+ ntfs_debug("Entering\n");
+
+ index_end = ntfs_ie_get_end(ih);
+
+ /*
+ * Loop until we exceed valid memory (corruption case) or until we
+ * reach the last entry.
+ */
+ for (ie = ntfs_ie_get_first(ih); ; ie = ntfs_ie_get_next(ie)) {
+ /* Bounds checks. */
+ if ((u8 *)ie + sizeof(struct index_entry_header) > index_end ||
+ (u8 *)ie + le16_to_cpu(ie->length) > index_end) {
+ ntfs_error(icx->idx_ni->vol->sb,
+ "Index entry out of bounds in inode %llu.\n",
+ (unsigned long long)icx->idx_ni->mft_no);
+ return -ERANGE;
+ }
+
+ /*
+ * The last entry cannot contain a key. It can however contain
+ * a pointer to a child node in the B+tree so we just break out.
+ */
+ if (ntfs_ie_end(ie))
+ break;
+
+ /*
+ * Not a perfect match, need to do full blown collation so we
+ * know which way in the B+tree we have to go.
+ */
+ rc = ntfs_collate(icx->idx_ni->vol, icx->cr, key, key_len, &ie->key,
+ le16_to_cpu(ie->key_length));
+ if (rc == -2) {
+ ntfs_error(icx->idx_ni->vol->sb,
+ "Collation error. Perhaps a filename contains invalid characters?\n");
+ return -ERANGE;
+ }
+ /*
+ * If @key collates before the key of the current entry, there
+ * is definitely no such key in this index but we might need to
+ * descend into the B+tree so we just break out of the loop.
+ */
+ if (rc == -1)
+ break;
+
+ if (!rc) {
+ *ie_out = ie;
+ icx->parent_pos[icx->pindex] = item;
+ return 0;
+ }
+
+ item++;
+ }
+ /*
+ * We have finished with this index block without success. Check for the
+ * presence of a child node and if not present return with errno ENOENT,
+ * otherwise we will keep searching in another index block.
+ */
+ if (!(ie->flags & INDEX_ENTRY_NODE)) {
+ ntfs_debug("Index entry wasn't found.\n");
+ *ie_out = ie;
+ return -ENOENT;
+ }
+
+ /* Get the starting vcn of the index_block holding the child node. */
+ *vcn = ntfs_ie_get_vcn(ie);
+ if (*vcn < 0) {
+ ntfs_error(icx->idx_ni->vol->sb, "Negative vcn in inode %llu\n",
+ (unsigned long long)icx->idx_ni->mft_no);
+ return -EINVAL;
+ }
+
+ ntfs_debug("Parent entry number %d\n", item);
+ icx->parent_pos[icx->pindex] = item;
+
+ return -EAGAIN;
+}
+
+struct ntfs_inode *ntfs_ia_open(struct ntfs_index_context *icx, struct ntfs_inode *ni)
+{
+ struct inode *ia_vi;
+
+ ia_vi = ntfs_index_iget(VFS_I(ni), icx->name, icx->name_len);
+ if (IS_ERR(ia_vi)) {
+ ntfs_error(icx->idx_ni->vol->sb,
+ "Failed to open index allocation of inode %llu",
+ (unsigned long long)ni->mft_no);
+ return NULL;
+ }
+
+ return NTFS_I(ia_vi);
+}
+
+static int ntfs_ib_read(struct ntfs_index_context *icx, s64 vcn, struct index_block *dst)
+{
+ s64 pos, ret;
+
+ ntfs_debug("vcn: %lld\n", vcn);
+
+ pos = ntfs_ib_vcn_to_pos(icx, vcn);
+
+ ret = ntfs_inode_attr_pread(VFS_I(icx->ia_ni), pos, icx->block_size, (u8 *)dst);
+ if (ret != icx->block_size) {
+ if (ret == -1)
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to read index block");
+ else
+ ntfs_error(icx->idx_ni->vol->sb,
+ "Failed to read full index block at %lld\n", pos);
+ return -1;
+ }
+
+ post_read_mst_fixup((struct ntfs_record *)((u8 *)dst), icx->block_size);
+ if (ntfs_index_block_inconsistent(icx, dst, vcn))
+ return -1;
+
+ return 0;
+}
+
+static int ntfs_icx_parent_inc(struct ntfs_index_context *icx)
+{
+ icx->pindex++;
+ if (icx->pindex >= MAX_PARENT_VCN) {
+ ntfs_error(icx->idx_ni->vol->sb, "Index is over %d level deep", MAX_PARENT_VCN);
+ return -EOPNOTSUPP;
+ }
+ return 0;
+}
+
+static int ntfs_icx_parent_dec(struct ntfs_index_context *icx)
+{
+ icx->pindex--;
+ if (icx->pindex < 0) {
+ ntfs_error(icx->idx_ni->vol->sb, "Corrupt index pointer (%d)", icx->pindex);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/**
+ * ntfs_index_lookup - find a key in an index and return its index entry
+ * @key: key for which to search in the index
+ * @key_len: length of @key in bytes
+ * @icx: context describing the index and the returned entry
+ *
+ * Before calling ntfs_index_lookup(), @icx must have been obtained from a
+ * call to ntfs_index_ctx_get().
+ *
+ * Look for the @key in the index specified by the index lookup context @icx.
+ * ntfs_index_lookup() walks the contents of the index looking for the @key.
+ *
+ * If the @key is found in the index, 0 is returned and @icx is setup to
+ * describe the index entry containing the matching @key. @icx->entry is the
+ * index entry and @icx->data and @icx->data_len are the index entry data and
+ * its length in bytes, respectively.
+ *
+ * If the @key is not found in the index, -ENOENT is returned and
+ * @icx is setup to describe the index entry whose key collates immediately
+ * after the search @key, i.e. this is the position in the index at which
+ * an index entry with a key of @key would need to be inserted.
+ *
+ * When finished with the entry and its data, call ntfs_index_ctx_put() to free
+ * the context and other associated resources.
+ *
+ * If the index entry was modified, call ntfs_index_entry_mark_dirty() before
+ * the call to ntfs_index_ctx_put() to ensure that the changes are written
+ * to disk.
+ */
+int ntfs_index_lookup(const void *key, const int key_len, struct ntfs_index_context *icx)
+{
+ s64 old_vcn, vcn;
+ struct ntfs_inode *ni = icx->idx_ni;
+ struct super_block *sb = ni->vol->sb;
+ struct index_root *ir;
+ struct index_entry *ie;
+ struct index_block *ib = NULL;
+ int err = 0;
+
+ ntfs_debug("Entering\n");
+
+ if (!key || key_len <= 0) {
+ ntfs_error(sb, "key: %p key_len: %d", key, key_len);
+ return -EINVAL;
+ }
+
+ ir = ntfs_ir_lookup(ni, icx->name, icx->name_len, &icx->actx);
+ if (!ir)
+ return -EIO;
+
+ icx->block_size = le32_to_cpu(ir->index_block_size);
+ if (icx->block_size < NTFS_BLOCK_SIZE) {
+ err = -EINVAL;
+ ntfs_error(sb,
+ "Index block size (%d) is smaller than the sector size (%d)",
+ icx->block_size, NTFS_BLOCK_SIZE);
+ goto err_out;
+ }
+
+ if (ni->vol->cluster_size <= icx->block_size)
+ icx->vcn_size_bits = ni->vol->cluster_size_bits;
+ else
+ icx->vcn_size_bits = ni->vol->sector_size_bits;
+
+ icx->cr = ir->collation_rule;
+ if (!ntfs_is_collation_rule_supported(icx->cr)) {
+ err = -EOPNOTSUPP;
+ ntfs_error(sb, "Unknown collation rule 0x%x",
+ (unsigned int)le32_to_cpu(icx->cr));
+ goto err_out;
+ }
+
+ old_vcn = VCN_INDEX_ROOT_PARENT;
+ err = ntfs_ie_lookup(key, key_len, icx, &ir->index, &vcn, &ie);
+ if (err == -ERANGE || err == -EINVAL)
+ goto err_out;
+
+ icx->ir = ir;
+ if (err != -EAGAIN) {
+ icx->is_in_root = true;
+ icx->parent_vcn[icx->pindex] = old_vcn;
+ goto done;
+ }
+
+ /* Child node present, descend into it. */
+ icx->ia_ni = ntfs_ia_open(icx, ni);
+ if (!icx->ia_ni) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ ib = ntfs_malloc_nofs(icx->block_size);
+ if (!ib) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+descend_into_child_node:
+ icx->parent_vcn[icx->pindex] = old_vcn;
+ if (ntfs_icx_parent_inc(icx)) {
+ err = -EIO;
+ goto err_out;
+ }
+ old_vcn = vcn;
+
+ ntfs_debug("Descend into node with s64 %lld.\n", vcn);
+
+ if (ntfs_ib_read(icx, vcn, ib)) {
+ err = -EIO;
+ goto err_out;
+ }
+ err = ntfs_ie_lookup(key, key_len, icx, &ib->index, &vcn, &ie);
+ if (err != -EAGAIN) {
+ if (err == -EINVAL || err == -ERANGE)
+ goto err_out;
+
+ icx->is_in_root = false;
+ icx->ib = ib;
+ icx->parent_vcn[icx->pindex] = vcn;
+ goto done;
+ }
+
+ if ((ib->index.flags & NODE_MASK) == LEAF_NODE) {
+ ntfs_error(icx->idx_ni->vol->sb,
+ "Index entry with child node found in a leaf node in inode 0x%llx.\n",
+ (unsigned long long)ni->mft_no);
+ goto err_out;
+ }
+
+ goto descend_into_child_node;
+err_out:
+ if (icx->actx) {
+ ntfs_attr_put_search_ctx(icx->actx);
+ icx->actx = NULL;
+ }
+ ntfs_free(ib);
+ if (!err)
+ err = -EIO;
+ return err;
+done:
+ icx->entry = ie;
+ icx->data = (u8 *)ie + offsetof(struct index_entry, key);
+ icx->data_len = le16_to_cpu(ie->key_length);
+ ntfs_debug("Done.\n");
+ return err;
+
+}
+
+static struct index_block *ntfs_ib_alloc(s64 ib_vcn, u32 ib_size,
+ u8 node_type)
+{
+ struct index_block *ib;
+ int ih_size = sizeof(struct index_header);
+
+ ntfs_debug("Entering ib_vcn = %lld ib_size = %u\n", ib_vcn, ib_size);
+
+ ib = ntfs_malloc_nofs(ib_size);
+ if (!ib)
+ return NULL;
+
+ ib->magic = magic_INDX;
+ ib->usa_ofs = cpu_to_le16(sizeof(struct index_block));
+ ib->usa_count = cpu_to_le16(ib_size / NTFS_BLOCK_SIZE + 1);
+ /* Set USN to 1 */
+ *(__le16 *)((char *)ib + le16_to_cpu(ib->usa_ofs)) = cpu_to_le16(1);
+ ib->lsn = 0;
+ ib->index_block_vcn = cpu_to_le64(ib_vcn);
+ ib->index.entries_offset = cpu_to_le32((ih_size +
+ le16_to_cpu(ib->usa_count) * 2 + 7) & ~7);
+ ib->index.index_length = 0;
+ ib->index.allocated_size = cpu_to_le32(ib_size -
+ (sizeof(struct index_block) - ih_size));
+ ib->index.flags = node_type;
+
+ return ib;
+}
+
+/**
+ * Find the median by going through all the entries
+ */
+static struct index_entry *ntfs_ie_get_median(struct index_header *ih)
+{
+ struct index_entry *ie, *ie_start;
+ u8 *ie_end;
+ int i = 0, median;
+
+ ntfs_debug("Entering\n");
+
+ ie = ie_start = ntfs_ie_get_first(ih);
+ ie_end = (u8 *)ntfs_ie_get_end(ih);
+
+ while ((u8 *)ie < ie_end && !ntfs_ie_end(ie)) {
+ ie = ntfs_ie_get_next(ie);
+ i++;
+ }
+ /*
+ * NOTE: this could be also the entry at the half of the index block.
+ */
+ median = i / 2 - 1;
+
+ ntfs_debug("Entries: %d median: %d\n", i, median);
+
+ for (i = 0, ie = ie_start; i <= median; i++)
+ ie = ntfs_ie_get_next(ie);
+
+ return ie;
+}
+
+static s64 ntfs_ibm_vcn_to_pos(struct ntfs_index_context *icx, s64 vcn)
+{
+ return ntfs_ib_vcn_to_pos(icx, vcn) / icx->block_size;
+}
+
+static s64 ntfs_ibm_pos_to_vcn(struct ntfs_index_context *icx, s64 pos)
+{
+ return ntfs_ib_pos_to_vcn(icx, pos * icx->block_size);
+}
+
+static int ntfs_ibm_add(struct ntfs_index_context *icx)
+{
+ u8 bmp[8];
+
+ ntfs_debug("Entering\n");
+
+ if (ntfs_attr_exist(icx->idx_ni, AT_BITMAP, icx->name, icx->name_len))
+ return 0;
+ /*
+ * AT_BITMAP must be at least 8 bytes.
+ */
+ memset(bmp, 0, sizeof(bmp));
+ if (ntfs_attr_add(icx->idx_ni, AT_BITMAP, icx->name, icx->name_len,
+ bmp, sizeof(bmp))) {
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to add AT_BITMAP");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ntfs_ibm_modify(struct ntfs_index_context *icx, s64 vcn, int set)
+{
+ u8 byte;
+ u64 pos = (u64)ntfs_ibm_vcn_to_pos(icx, vcn);
+ u32 bpos = pos / 8;
+ u32 bit = 1 << (pos % 8);
+ struct ntfs_inode *bmp_ni;
+ struct inode *bmp_vi;
+ int ret = 0;
+
+ ntfs_debug("%s vcn: %lld\n", set ? "set" : "clear", vcn);
+
+ bmp_vi = ntfs_attr_iget(VFS_I(icx->idx_ni), AT_BITMAP, icx->name, icx->name_len);
+ if (IS_ERR(bmp_vi)) {
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to open $BITMAP attribute");
+ return PTR_ERR(bmp_vi);
+ }
+
+ bmp_ni = NTFS_I(bmp_vi);
+
+ if (set) {
+ if (bmp_ni->data_size < bpos + 1) {
+ ret = ntfs_attr_truncate(bmp_ni, (bmp_ni->data_size + 8) & ~7);
+ if (ret) {
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to truncate AT_BITMAP");
+ goto err;
+ }
+ i_size_write(bmp_vi, (loff_t)bmp_ni->data_size);
+ }
+ }
+
+ if (ntfs_inode_attr_pread(bmp_vi, bpos, 1, &byte) != 1) {
+ ret = -EIO;
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to read $BITMAP");
+ goto err;
+ }
+
+ if (set)
+ byte |= bit;
+ else
+ byte &= ~bit;
+
+ if (ntfs_inode_attr_pwrite(bmp_vi, bpos, 1, &byte, false) != 1) {
+ ret = -EIO;
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to write $Bitmap");
+ goto err;
+ }
+
+err:
+ iput(bmp_vi);
+ return ret;
+}
+
+static int ntfs_ibm_set(struct ntfs_index_context *icx, s64 vcn)
+{
+ return ntfs_ibm_modify(icx, vcn, 1);
+}
+
+static int ntfs_ibm_clear(struct ntfs_index_context *icx, s64 vcn)
+{
+ return ntfs_ibm_modify(icx, vcn, 0);
+}
+
+static s64 ntfs_ibm_get_free(struct ntfs_index_context *icx)
+{
+ u8 *bm;
+ int bit;
+ s64 vcn, byte, size;
+
+ ntfs_debug("Entering\n");
+
+ bm = ntfs_attr_readall(icx->idx_ni, AT_BITMAP, icx->name, icx->name_len,
+ &size);
+ if (!bm)
+ return (s64)-1;
+
+ for (byte = 0; byte < size; byte++) {
+ if (bm[byte] == 255)
+ continue;
+
+ for (bit = 0; bit < 8; bit++) {
+ if (!(bm[byte] & (1 << bit))) {
+ vcn = ntfs_ibm_pos_to_vcn(icx, byte * 8 + bit);
+ goto out;
+ }
+ }
+ }
+
+ vcn = ntfs_ibm_pos_to_vcn(icx, size * 8);
+out:
+ ntfs_debug("allocated vcn: %lld\n", vcn);
+
+ if (ntfs_ibm_set(icx, vcn))
+ vcn = (s64)-1;
+
+ ntfs_free(bm);
+ return vcn;
+}
+
+static struct index_block *ntfs_ir_to_ib(struct index_root *ir, s64 ib_vcn)
+{
+ struct index_block *ib;
+ struct index_entry *ie_last;
+ char *ies_start, *ies_end;
+ int i;
+
+ ntfs_debug("Entering\n");
+
+ ib = ntfs_ib_alloc(ib_vcn, le32_to_cpu(ir->index_block_size), LEAF_NODE);
+ if (!ib)
+ return NULL;
+
+ ies_start = (char *)ntfs_ie_get_first(&ir->index);
+ ies_end = (char *)ntfs_ie_get_end(&ir->index);
+ ie_last = ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+ /*
+ * Copy all entries, including the termination entry
+ * as well, which can never have any data.
+ */
+ i = (char *)ie_last - ies_start + le16_to_cpu(ie_last->length);
+ memcpy(ntfs_ie_get_first(&ib->index), ies_start, i);
+
+ ib->index.flags = ir->index.flags;
+ ib->index.index_length = cpu_to_le32(i +
+ le32_to_cpu(ib->index.entries_offset));
+ return ib;
+}
+
+static void ntfs_ir_nill(struct index_root *ir)
+{
+ struct index_entry *ie_last;
+ char *ies_start, *ies_end;
+
+ ntfs_debug("Entering\n");
+
+ ies_start = (char *)ntfs_ie_get_first(&ir->index);
+ ies_end = (char *)ntfs_ie_get_end(&ir->index);
+ ie_last = ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+ /*
+ * Move the index root termination entry forward
+ */
+ if ((char *)ie_last > ies_start) {
+ memmove((char *)ntfs_ie_get_first(&ir->index),
+ (char *)ie_last, le16_to_cpu(ie_last->length));
+ ie_last = (struct index_entry *)ies_start;
+ }
+}
+
+static int ntfs_ib_copy_tail(struct ntfs_index_context *icx, struct index_block *src,
+ struct index_entry *median, s64 new_vcn)
+{
+ u8 *ies_end;
+ struct index_entry *ie_head; /* first entry after the median */
+ int tail_size, ret;
+ struct index_block *dst;
+
+ ntfs_debug("Entering\n");
+
+ dst = ntfs_ib_alloc(new_vcn, icx->block_size,
+ src->index.flags & NODE_MASK);
+ if (!dst)
+ return -ENOMEM;
+
+ ie_head = ntfs_ie_get_next(median);
+
+ ies_end = (u8 *)ntfs_ie_get_end(&src->index);
+ tail_size = ies_end - (u8 *)ie_head;
+ memcpy(ntfs_ie_get_first(&dst->index), ie_head, tail_size);
+
+ dst->index.index_length = cpu_to_le32(tail_size +
+ le32_to_cpu(dst->index.entries_offset));
+ ret = ntfs_ib_write(icx, dst);
+
+ ntfs_free(dst);
+ return ret;
+}
+
+static int ntfs_ib_cut_tail(struct ntfs_index_context *icx, struct index_block *ib,
+ struct index_entry *ie)
+{
+ char *ies_start, *ies_end;
+ struct index_entry *ie_last;
+ int ret;
+
+ ntfs_debug("Entering\n");
+
+ ies_start = (char *)ntfs_ie_get_first(&ib->index);
+ ies_end = (char *)ntfs_ie_get_end(&ib->index);
+
+ ie_last = ntfs_ie_get_last((struct index_entry *)ies_start, ies_end);
+ if (ie_last->flags & INDEX_ENTRY_NODE)
+ ntfs_ie_set_vcn(ie_last, ntfs_ie_get_vcn(ie));
+
+ unsafe_memcpy(ie, ie_last, le16_to_cpu(ie_last->length),
+ /* alloc is larger than ie_last->length, see ntfs_ie_get_last() */);
+
+ ib->index.index_length = cpu_to_le32(((char *)ie - ies_start) +
+ le16_to_cpu(ie->length) + le32_to_cpu(ib->index.entries_offset));
+
+ ret = ntfs_ib_write(icx, ib);
+ return ret;
+}
+
+static int ntfs_ia_add(struct ntfs_index_context *icx)
+{
+ int ret;
+
+ ntfs_debug("Entering\n");
+
+ ret = ntfs_ibm_add(icx);
+ if (ret)
+ return ret;
+
+ if (!ntfs_attr_exist(icx->idx_ni, AT_INDEX_ALLOCATION, icx->name, icx->name_len)) {
+ ret = ntfs_attr_add(icx->idx_ni, AT_INDEX_ALLOCATION, icx->name,
+ icx->name_len, NULL, 0);
+ if (ret) {
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to add AT_INDEX_ALLOCATION");
+ return ret;
+ }
+ }
+
+ icx->ia_ni = ntfs_ia_open(icx, icx->idx_ni);
+ if (!icx->ia_ni)
+ return -ENOENT;
+
+ return 0;
+}
+
+static int ntfs_ir_reparent(struct ntfs_index_context *icx)
+{
+ struct ntfs_attr_search_ctx *ctx = NULL;
+ struct index_root *ir;
+ struct index_entry *ie;
+ struct index_block *ib = NULL;
+ s64 new_ib_vcn;
+ int ix_root_size;
+ int ret = 0;
+
+ ntfs_debug("Entering\n");
+
+ ir = ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+ if (!ir) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ if ((ir->index.flags & NODE_MASK) == SMALL_INDEX) {
+ ret = ntfs_ia_add(icx);
+ if (ret)
+ goto out;
+ }
+
+ new_ib_vcn = ntfs_ibm_get_free(icx);
+ if (new_ib_vcn < 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ir = ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+ if (!ir) {
+ ret = -ENOENT;
+ goto clear_bmp;
+ }
+
+ ib = ntfs_ir_to_ib(ir, new_ib_vcn);
+ if (ib == NULL) {
+ ret = -EIO;
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to move index root to index block");
+ goto clear_bmp;
+ }
+
+ ret = ntfs_ib_write(icx, ib);
+ if (ret)
+ goto clear_bmp;
+
+retry:
+ ir = ntfs_ir_lookup(icx->idx_ni, icx->name, icx->name_len, &ctx);
+ if (!ir) {
+ ret = -ENOENT;
+ goto clear_bmp;
+ }
+
+ ntfs_ir_nill(ir);
+
+ ie = ntfs_ie_get_first(&ir->index);
+ ie->flags |= INDEX_ENTRY_NODE;
+ ie->length = cpu_to_le16(sizeof(struct index_entry_header) + sizeof(s64));
+
+ ir->index.flags = LARGE_INDEX;
+ NInoSetIndexAllocPresent(icx->idx_ni);
+ ir->index.index_length = cpu_to_le32(le32_to_cpu(ir->index.entries_offset) +
+ le16_to_cpu(ie->length));
+ ir->index.allocated_size = ir->index.index_length;
+
+ ix_root_size = sizeof(struct index_root) - sizeof(struct index_header) +
+ le32_to_cpu(ir->index.allocated_size);
+ ret = ntfs_resident_attr_value_resize(ctx->mrec, ctx->attr, ix_root_size);
+ if (ret) {
+ /*
+ * When there is no space to build a non-resident
+ * index, we may have to move the root to an extent
+ */
+ if ((ret == -ENOSPC) && (ctx->al_entry || !ntfs_inode_add_attrlist(icx->idx_ni))) {
+ ntfs_attr_put_search_ctx(ctx);
+ ctx = NULL;
+ ir = ntfs_ir_lookup(icx->idx_ni, icx->name, icx->name_len, &ctx);
+ if (ir && !ntfs_attr_record_move_away(ctx, ix_root_size -
+ le32_to_cpu(ctx->attr->data.resident.value_length))) {
+ if (ntfs_attrlist_update(ctx->base_ntfs_ino ?
+ ctx->base_ntfs_ino : ctx->ntfs_ino))
+ goto clear_bmp;
+ ntfs_attr_put_search_ctx(ctx);
+ ctx = NULL;
+ goto retry;
+ }
+ }
+ goto clear_bmp;
+ } else {
+ icx->idx_ni->data_size = icx->idx_ni->initialized_size = ix_root_size;
+ icx->idx_ni->allocated_size = (ix_root_size + 7) & ~7;
+ }
+ ntfs_ie_set_vcn(ie, new_ib_vcn);
+
+err_out:
+ ntfs_free(ib);
+ if (ctx)
+ ntfs_attr_put_search_ctx(ctx);
+out:
+ return ret;
+clear_bmp:
+ ntfs_ibm_clear(icx, new_ib_vcn);
+ goto err_out;
+}
+
+/**
+ * ntfs_ir_truncate - Truncate index root attribute
+ */
+static int ntfs_ir_truncate(struct ntfs_index_context *icx, int data_size)
+{
+ int ret;
+
+ ntfs_debug("Entering\n");
+
+ /*
+ * INDEX_ROOT must be resident and its entries can be moved to
+ * struct index_block, so ENOSPC isn't a real error.
+ */
+ ret = ntfs_attr_truncate(icx->idx_ni, data_size + offsetof(struct index_root, index));
+ if (!ret) {
+ i_size_write(VFS_I(icx->idx_ni), icx->idx_ni->initialized_size);
+ icx->ir = ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+ if (!icx->ir)
+ return -ENOENT;
+
+ icx->ir->index.allocated_size = cpu_to_le32(data_size);
+ } else if (ret != -ENOSPC)
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to truncate INDEX_ROOT");
+
+ return ret;
+}
+
+/**
+ * ntfs_ir_make_space - Make more space for the index root attribute
+ */
+static int ntfs_ir_make_space(struct ntfs_index_context *icx, int data_size)
+{
+ int ret;
+
+ ntfs_debug("Entering\n");
+
+ ret = ntfs_ir_truncate(icx, data_size);
+ if (ret == -ENOSPC) {
+ ret = ntfs_ir_reparent(icx);
+ if (!ret)
+ ret = -EAGAIN;
+ else
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to modify INDEX_ROOT");
+ }
+
+ return ret;
+}
+
+/*
+ * NOTE: 'ie' must be a copy of a real index entry.
+ */
+static int ntfs_ie_add_vcn(struct index_entry **ie)
+{
+ struct index_entry *p, *old = *ie;
+
+ old->length = cpu_to_le16(le16_to_cpu(old->length) + sizeof(s64));
+ p = ntfs_realloc_nofs(old, le16_to_cpu(old->length),
+ le16_to_cpu(old->length) - sizeof(s64));
+ if (!p)
+ return -ENOMEM;
+
+ p->flags |= INDEX_ENTRY_NODE;
+ *ie = p;
+ return 0;
+}
+
+static int ntfs_ih_insert(struct index_header *ih, struct index_entry *orig_ie, s64 new_vcn,
+ int pos)
+{
+ struct index_entry *ie_node, *ie;
+ int ret = 0;
+ s64 old_vcn;
+
+ ntfs_debug("Entering\n");
+ ie = ntfs_ie_dup(orig_ie);
+ if (!ie)
+ return -ENOMEM;
+
+ if (!(ie->flags & INDEX_ENTRY_NODE)) {
+ ret = ntfs_ie_add_vcn(&ie);
+ if (ret)
+ goto out;
+ }
+
+ ie_node = ntfs_ie_get_by_pos(ih, pos);
+ old_vcn = ntfs_ie_get_vcn(ie_node);
+ ntfs_ie_set_vcn(ie_node, new_vcn);
+
+ ntfs_ie_insert(ih, ie, ie_node);
+ ntfs_ie_set_vcn(ie_node, old_vcn);
+out:
+ ntfs_free(ie);
+ return ret;
+}
+
+static s64 ntfs_icx_parent_vcn(struct ntfs_index_context *icx)
+{
+ return icx->parent_vcn[icx->pindex];
+}
+
+static s64 ntfs_icx_parent_pos(struct ntfs_index_context *icx)
+{
+ return icx->parent_pos[icx->pindex];
+}
+
+static int ntfs_ir_insert_median(struct ntfs_index_context *icx, struct index_entry *median,
+ s64 new_vcn)
+{
+ u32 new_size;
+ int ret;
+
+ ntfs_debug("Entering\n");
+
+ icx->ir = ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+ if (!icx->ir)
+ return -ENOENT;
+
+ new_size = le32_to_cpu(icx->ir->index.index_length) +
+ le16_to_cpu(median->length);
+ if (!(median->flags & INDEX_ENTRY_NODE))
+ new_size += sizeof(s64);
+
+ ret = ntfs_ir_make_space(icx, new_size);
+ if (ret)
+ return ret;
+
+ icx->ir = ntfs_ir_lookup2(icx->idx_ni, icx->name, icx->name_len);
+ if (!icx->ir)
+ return -ENOENT;
+
+ return ntfs_ih_insert(&icx->ir->index, median, new_vcn,
+ ntfs_icx_parent_pos(icx));
+}
+
+static int ntfs_ib_split(struct ntfs_index_context *icx, struct index_block *ib);
+
+struct split_info {
+ struct list_head entry;
+ s64 new_vcn;
+ struct index_block *ib;
+};
+
+static int ntfs_ib_insert(struct ntfs_index_context *icx, struct index_entry *ie, s64 new_vcn,
+ struct split_info *si)
+{
+ struct index_block *ib;
+ u32 idx_size, allocated_size;
+ int err;
+ s64 old_vcn;
+
+ ntfs_debug("Entering\n");
+
+ ib = ntfs_malloc_nofs(icx->block_size);
+ if (!ib)
+ return -ENOMEM;
+
+ old_vcn = ntfs_icx_parent_vcn(icx);
+
+ err = ntfs_ib_read(icx, old_vcn, ib);
+ if (err)
+ goto err_out;
+
+ idx_size = le32_to_cpu(ib->index.index_length);
+ allocated_size = le32_to_cpu(ib->index.allocated_size);
+ if (idx_size + le16_to_cpu(ie->length) + sizeof(s64) > allocated_size) {
+ si->ib = ib;
+ si->new_vcn = new_vcn;
+ return -EAGAIN;
+ }
+
+ err = ntfs_ih_insert(&ib->index, ie, new_vcn, ntfs_icx_parent_pos(icx));
+ if (err)
+ goto err_out;
+
+ err = ntfs_ib_write(icx, ib);
+
+err_out:
+ ntfs_free(ib);
+ return err;
+}
+
+/**
+ * ntfs_ib_split - Split an index block
+ */
+static int ntfs_ib_split(struct ntfs_index_context *icx, struct index_block *ib)
+{
+ struct index_entry *median;
+ s64 new_vcn;
+ int ret;
+ struct split_info *si;
+ LIST_HEAD(ntfs_cut_tail_list);
+
+ ntfs_debug("Entering\n");
+
+resplit:
+ ret = ntfs_icx_parent_dec(icx);
+ if (ret)
+ goto out;
+
+ median = ntfs_ie_get_median(&ib->index);
+ new_vcn = ntfs_ibm_get_free(icx);
+ if (new_vcn < 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = ntfs_ib_copy_tail(icx, ib, median, new_vcn);
+ if (ret) {
+ ntfs_ibm_clear(icx, new_vcn);
+ goto out;
+ }
+
+ if (ntfs_icx_parent_vcn(icx) == VCN_INDEX_ROOT_PARENT) {
+ ret = ntfs_ir_insert_median(icx, median, new_vcn);
+ if (ret) {
+ ntfs_ibm_clear(icx, new_vcn);
+ goto out;
+ }
+ } else {
+ si = kzalloc(sizeof(struct split_info), GFP_NOFS);
+ if (!si) {
+ ntfs_ibm_clear(icx, new_vcn);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = ntfs_ib_insert(icx, median, new_vcn, si);
+ if (ret == -EAGAIN) {
+ list_add_tail(&si->entry, &ntfs_cut_tail_list);
+ ib = si->ib;
+ goto resplit;
+ } else if (ret) {
+ ntfs_free(si->ib);
+ kfree(si);
+ ntfs_ibm_clear(icx, new_vcn);
+ goto out;
+ }
+ kfree(si);
+ }
+
+ ret = ntfs_ib_cut_tail(icx, ib, median);
+
+out:
+ while (!list_empty(&ntfs_cut_tail_list)) {
+ si = list_last_entry(&ntfs_cut_tail_list, struct split_info, entry);
+ ntfs_ibm_clear(icx, si->new_vcn);
+ ntfs_free(si->ib);
+ list_del(&si->entry);
+ kfree(si);
+ if (!ret)
+ ret = -EAGAIN;
+ }
+
+ return ret;
+}
+
+int ntfs_ie_add(struct ntfs_index_context *icx, struct index_entry *ie)
+{
+ struct index_header *ih;
+ int allocated_size, new_size;
+ int ret;
+
+ while (1) {
+ ret = ntfs_index_lookup(&ie->key, le16_to_cpu(ie->key_length), icx);
+ if (!ret) {
+ ret = -EEXIST;
+ ntfs_error(icx->idx_ni->vol->sb, "Index already have such entry");
+ goto err_out;
+ }
+ if (ret != -ENOENT) {
+ ntfs_error(icx->idx_ni->vol->sb, "Failed to find place for new entry");
+ goto err_out;
+ }
+ ret = 0;
+
+ if (icx->is_in_root)
+ ih = &icx->ir->index;
+ else
+ ih = &icx->ib->index;
+
+ allocated_size = le32_to_cpu(ih->allocated_size);
+ new_size = le32_to_cpu(ih->index_length) + le16_to_cpu(ie->length);
+
+ if (new_size <= allocated_size)
+ break;
+
+ ntfs_debug("index block sizes: allocated: %d needed: %d\n",
+ allocated_size, new_size);
+
+ if (icx->is_in_root)
+ ret = ntfs_ir_make_space(icx, new_size);
+ else
+ ret = ntfs_ib_split(icx, icx->ib);
+ if (ret && ret != -EAGAIN)
+ goto err_out;
+
+ mark_mft_record_dirty(icx->actx->ntfs_ino);
+ ntfs_index_ctx_reinit(icx);
+ }
+
+ ntfs_ie_insert(ih, ie, icx->entry);
+ ntfs_index_entry_mark_dirty(icx);
+
+err_out:
+ ntfs_debug("%s\n", ret ? "Failed" : "Done");
+ return ret;
+}
+
+/**
+ * ntfs_index_add_filename - add filename to directory index
+ * @ni: ntfs inode describing directory to which index add filename
+ * @fn: FILE_NAME attribute to add
+ * @mref: reference of the inode which @fn describes
+ */
+int ntfs_index_add_filename(struct ntfs_inode *ni, struct file_name_attr *fn, u64 mref)
+{
+ struct index_entry *ie;
+ struct ntfs_index_context *icx;
+ int fn_size, ie_size, err;
+
+ ntfs_debug("Entering\n");
+
+ if (!ni || !fn)
+ return -EINVAL;
+
+ fn_size = (fn->file_name_length * sizeof(__le16)) +
+ sizeof(struct file_name_attr);
+ ie_size = (sizeof(struct index_entry_header) + fn_size + 7) & ~7;
+
+ ie = ntfs_malloc_nofs(ie_size);
+ if (!ie)
+ return -ENOMEM;
+
+ ie->data.dir.indexed_file = cpu_to_le64(mref);
+ ie->length = cpu_to_le16(ie_size);
+ ie->key_length = cpu_to_le16(fn_size);
+
+ unsafe_memcpy(&ie->key, fn, fn_size,
+ /* "fn_size" was correctly calculated above */);
+
+ icx = ntfs_index_ctx_get(ni, I30, 4);
+ if (!icx) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = ntfs_ie_add(icx, ie);
+ ntfs_index_ctx_put(icx);
+out:
+ ntfs_free(ie);
+ return err;
+}
+
+static int ntfs_ih_takeout(struct ntfs_index_context *icx, struct index_header *ih,
+ struct index_entry *ie, struct index_block *ib)
+{
+ struct index_entry *ie_roam;
+ int freed_space;
+ bool full;
+ int ret = 0;
+
+ ntfs_debug("Entering\n");
+
+ full = ih->index_length == ih->allocated_size;
+ ie_roam = ntfs_ie_dup_novcn(ie);
+ if (!ie_roam)
+ return -ENOMEM;
+
+ ntfs_ie_delete(ih, ie);
+
+ if (ntfs_icx_parent_vcn(icx) == VCN_INDEX_ROOT_PARENT) {
+ /*
+ * Recover the space which may have been freed
+ * while deleting an entry from root index
+ */
+ freed_space = le32_to_cpu(ih->allocated_size) -
+ le32_to_cpu(ih->index_length);
+ if (full && (freed_space > 0) && !(freed_space & 7)) {
+ ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+ /* do nothing if truncation fails */
+ }
+
+ mark_mft_record_dirty(icx->actx->ntfs_ino);
+ } else {
+ ret = ntfs_ib_write(icx, ib);
+ if (ret)
+ goto out;
+ }
+
+ ntfs_index_ctx_reinit(icx);
+
+ ret = ntfs_ie_add(icx, ie_roam);
+out:
+ ntfs_free(ie_roam);
+ return ret;
+}
+
+/**
+ * Used if an empty index block to be deleted has END entry as the parent
+ * in the INDEX_ROOT which is the only one there.
+ */
+static void ntfs_ir_leafify(struct ntfs_index_context *icx, struct index_header *ih)
+{
+ struct index_entry *ie;
+
+ ntfs_debug("Entering\n");
+
+ ie = ntfs_ie_get_first(ih);
+ ie->flags &= ~INDEX_ENTRY_NODE;
+ ie->length = cpu_to_le16(le16_to_cpu(ie->length) - sizeof(s64));
+
+ ih->index_length = cpu_to_le32(le32_to_cpu(ih->index_length) - sizeof(s64));
+ ih->flags &= ~LARGE_INDEX;
+ NInoClearIndexAllocPresent(icx->idx_ni);
+
+ /* Not fatal error */
+ ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+}
+
+/**
+ * Used if an empty index block to be deleted has END entry as the parent
+ * in the INDEX_ROOT which is not the only one there.
+ */
+static int ntfs_ih_reparent_end(struct ntfs_index_context *icx, struct index_header *ih,
+ struct index_block *ib)
+{
+ struct index_entry *ie, *ie_prev;
+
+ ntfs_debug("Entering\n");
+
+ ie = ntfs_ie_get_by_pos(ih, ntfs_icx_parent_pos(icx));
+ ie_prev = ntfs_ie_prev(ih, ie);
+ if (!ie_prev)
+ return -EIO;
+ ntfs_ie_set_vcn(ie, ntfs_ie_get_vcn(ie_prev));
+
+ return ntfs_ih_takeout(icx, ih, ie_prev, ib);
+}
+
+static int ntfs_index_rm_leaf(struct ntfs_index_context *icx)
+{
+ struct index_block *ib = NULL;
+ struct index_header *parent_ih;
+ struct index_entry *ie;
+ int ret;
+
+ ntfs_debug("pindex: %d\n", icx->pindex);
+
+ ret = ntfs_icx_parent_dec(icx);
+ if (ret)
+ return ret;
+
+ ret = ntfs_ibm_clear(icx, icx->parent_vcn[icx->pindex + 1]);
+ if (ret)
+ return ret;
+
+ if (ntfs_icx_parent_vcn(icx) == VCN_INDEX_ROOT_PARENT)
+ parent_ih = &icx->ir->index;
+ else {
+ ib = ntfs_malloc_nofs(icx->block_size);
+ if (!ib)
+ return -ENOMEM;
+
+ ret = ntfs_ib_read(icx, ntfs_icx_parent_vcn(icx), ib);
+ if (ret)
+ goto out;
+
+ parent_ih = &ib->index;
+ }
+
+ ie = ntfs_ie_get_by_pos(parent_ih, ntfs_icx_parent_pos(icx));
+ if (!ntfs_ie_end(ie)) {
+ ret = ntfs_ih_takeout(icx, parent_ih, ie, ib);
+ goto out;
+ }
+
+ if (ntfs_ih_zero_entry(parent_ih)) {
+ if (ntfs_icx_parent_vcn(icx) == VCN_INDEX_ROOT_PARENT) {
+ ntfs_ir_leafify(icx, parent_ih);
+ goto out;
+ }
+
+ ret = ntfs_index_rm_leaf(icx);
+ goto out;
+ }
+
+ ret = ntfs_ih_reparent_end(icx, parent_ih, ib);
+out:
+ ntfs_free(ib);
+ return ret;
+}
+
+static int ntfs_index_rm_node(struct ntfs_index_context *icx)
+{
+ int entry_pos, pindex;
+ s64 vcn;
+ struct index_block *ib = NULL;
+ struct index_entry *ie_succ, *ie, *entry = icx->entry;
+ struct index_header *ih;
+ u32 new_size;
+ int delta, ret;
+
+ ntfs_debug("Entering\n");
+
+ if (!icx->ia_ni) {
+ icx->ia_ni = ntfs_ia_open(icx, icx->idx_ni);
+ if (!icx->ia_ni)
+ return -EINVAL;
+ }
+
+ ib = ntfs_malloc_nofs(icx->block_size);
+ if (!ib)
+ return -ENOMEM;
+
+ ie_succ = ntfs_ie_get_next(icx->entry);
+ entry_pos = icx->parent_pos[icx->pindex]++;
+ pindex = icx->pindex;
+descend:
+ vcn = ntfs_ie_get_vcn(ie_succ);
+ ret = ntfs_ib_read(icx, vcn, ib);
+ if (ret)
+ goto out;
+
+ ie_succ = ntfs_ie_get_first(&ib->index);
+
+ ret = ntfs_icx_parent_inc(icx);
+ if (ret)
+ goto out;
+
+ icx->parent_vcn[icx->pindex] = vcn;
+ icx->parent_pos[icx->pindex] = 0;
+
+ if ((ib->index.flags & NODE_MASK) == INDEX_NODE)
+ goto descend;
+
+ if (ntfs_ih_zero_entry(&ib->index)) {
+ ret = -EIO;
+ ntfs_error(icx->idx_ni->vol->sb, "Empty index block");
+ goto out;
+ }
+
+ ie = ntfs_ie_dup(ie_succ);
+ if (!ie) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = ntfs_ie_add_vcn(&ie);
+ if (ret)
+ goto out2;
+
+ ntfs_ie_set_vcn(ie, ntfs_ie_get_vcn(icx->entry));
+
+ if (icx->is_in_root)
+ ih = &icx->ir->index;
+ else
+ ih = &icx->ib->index;
+
+ delta = le16_to_cpu(ie->length) - le16_to_cpu(icx->entry->length);
+ new_size = le32_to_cpu(ih->index_length) + delta;
+ if (delta > 0) {
+ if (icx->is_in_root) {
+ ret = ntfs_ir_make_space(icx, new_size);
+ if (ret != 0)
+ goto out2;
+
+ ih = &icx->ir->index;
+ entry = ntfs_ie_get_by_pos(ih, entry_pos);
+
+ } else if (new_size > le32_to_cpu(ih->allocated_size)) {
+ icx->pindex = pindex;
+ ret = ntfs_ib_split(icx, icx->ib);
+ if (!ret)
+ ret = -EAGAIN;
+ goto out2;
+ }
+ }
+
+ ntfs_ie_delete(ih, entry);
+ ntfs_ie_insert(ih, ie, entry);
+
+ if (icx->is_in_root)
+ ret = ntfs_ir_truncate(icx, new_size);
+ else
+ ret = ntfs_icx_ib_write(icx);
+ if (ret)
+ goto out2;
+
+ ntfs_ie_delete(&ib->index, ie_succ);
+
+ if (ntfs_ih_zero_entry(&ib->index))
+ ret = ntfs_index_rm_leaf(icx);
+ else
+ ret = ntfs_ib_write(icx, ib);
+
+out2:
+ ntfs_free(ie);
+out:
+ ntfs_free(ib);
+ return ret;
+}
+
+/**
+ * ntfs_index_rm - remove entry from the index
+ * @icx: index context describing entry to delete
+ *
+ * Delete entry described by @icx from the index. Index context is always
+ * reinitialized after use of this function, so it can be used for index
+ * lookup once again.
+ */
+int ntfs_index_rm(struct ntfs_index_context *icx)
+{
+ struct index_header *ih;
+ int ret = 0;
+
+ ntfs_debug("Entering\n");
+
+ if (!icx || (!icx->ib && !icx->ir) || ntfs_ie_end(icx->entry)) {
+ ret = -EINVAL;
+ goto err_out;
+ }
+ if (icx->is_in_root)
+ ih = &icx->ir->index;
+ else
+ ih = &icx->ib->index;
+
+ if (icx->entry->flags & INDEX_ENTRY_NODE) {
+ ret = ntfs_index_rm_node(icx);
+ if (ret)
+ goto err_out;
+ } else if (icx->is_in_root || !ntfs_ih_one_entry(ih)) {
+ ntfs_ie_delete(ih, icx->entry);
+
+ if (icx->is_in_root)
+ ret = ntfs_ir_truncate(icx, le32_to_cpu(ih->index_length));
+ else
+ ret = ntfs_icx_ib_write(icx);
+ if (ret)
+ goto err_out;
+ } else {
+ ret = ntfs_index_rm_leaf(icx);
+ if (ret)
+ goto err_out;
+ }
+
+ return 0;
+err_out:
+ return ret;
+}
+
+int ntfs_index_remove(struct ntfs_inode *dir_ni, const void *key, const int keylen)
+{
+ int ret = 0;
+ struct ntfs_index_context *icx;
+
+ icx = ntfs_index_ctx_get(dir_ni, I30, 4);
+ if (!icx)
+ return -EINVAL;
+
+ while (1) {
+ ret = ntfs_index_lookup(key, keylen, icx);
+ if (ret)
+ goto err_out;
+
+ ret = ntfs_index_rm(icx);
+ if (ret && ret != -EAGAIN)
+ goto err_out;
+ else if (!ret)
+ break;
+
+ mark_mft_record_dirty(icx->actx->ntfs_ino);
+ ntfs_index_ctx_reinit(icx);
+ }
+
+ mark_mft_record_dirty(icx->actx->ntfs_ino);
+
+ ntfs_index_ctx_put(icx);
+ return 0;
+err_out:
+ ntfs_index_ctx_put(icx);
+ ntfs_error(dir_ni->vol->sb, "Delete failed");
+ return ret;
+}
+
+/*
+ * ntfs_index_walk_down - walk down the index tree (leaf bound)
+ * until there are no subnode in the first index entry returns
+ * the entry at the bottom left in subnode
+ */
+struct index_entry *ntfs_index_walk_down(struct index_entry *ie, struct ntfs_index_context *ictx)
+{
+ struct index_entry *entry;
+ s64 vcn;
+
+ entry = ie;
+ do {
+ vcn = ntfs_ie_get_vcn(entry);
+ if (ictx->is_in_root) {
+ /* down from level zero */
+ ictx->ir = NULL;
+ ictx->ib = (struct index_block *)ntfs_malloc_nofs(ictx->block_size);
+ ictx->pindex = 1;
+ ictx->is_in_root = false;
+ } else {
+ /* down from non-zero level */
+ ictx->pindex++;
+ }
+
+ ictx->parent_pos[ictx->pindex] = 0;
+ ictx->parent_vcn[ictx->pindex] = vcn;
+ if (!ntfs_ib_read(ictx, vcn, ictx->ib)) {
+ ictx->entry = ntfs_ie_get_first(&ictx->ib->index);
+ entry = ictx->entry;
+ } else
+ entry = NULL;
+ } while (entry && (entry->flags & INDEX_ENTRY_NODE));
+
+ return entry;
+}
+
+/**
+ * ntfs_index_walk_up - walk up the index tree (root bound) until
+ * there is a valid data entry in parent returns the parent entry
+ * or NULL if no more parent.
+ */
+static struct index_entry *ntfs_index_walk_up(struct index_entry *ie,
+ struct ntfs_index_context *ictx)
+{
+ struct index_entry *entry;
+ s64 vcn;
+
+ entry = ie;
+ if (ictx->pindex > 0) {
+ do {
+ ictx->pindex--;
+ if (!ictx->pindex) {
+ /* we have reached the root */
+ kfree(ictx->ib);
+ ictx->ib = NULL;
+ ictx->is_in_root = true;
+ /* a new search context is to be allocated */
+ if (ictx->actx)
+ ntfs_attr_put_search_ctx(ictx->actx);
+ ictx->ir = ntfs_ir_lookup(ictx->idx_ni, ictx->name,
+ ictx->name_len, &ictx->actx);
+ if (ictx->ir)
+ entry = ntfs_ie_get_by_pos(&ictx->ir->index,
+ ictx->parent_pos[ictx->pindex]);
+ else
+ entry = NULL;
+ } else {
+ /* up into non-root node */
+ vcn = ictx->parent_vcn[ictx->pindex];
+ if (!ntfs_ib_read(ictx, vcn, ictx->ib)) {
+ entry = ntfs_ie_get_by_pos(&ictx->ib->index,
+ ictx->parent_pos[ictx->pindex]);
+ } else
+ entry = NULL;
+ }
+ ictx->entry = entry;
+ } while (entry && (ictx->pindex > 0) &&
+ (entry->flags & INDEX_ENTRY_END));
+ } else
+ entry = NULL;
+
+ return entry;
+}
+
+/**
+ * ntfs_index_next - get next entry in an index according to collating sequence.
+ * Returns next entry or NULL if none.
+ *
+ * Sample layout :
+ *
+ * +---+---+---+---+---+---+---+---+ n ptrs to subnodes
+ * | | | 10| 25| 33| | | | n-1 keys in between
+ * +---+---+---+---+---+---+---+---+ no key in last entry
+ * | A | A
+ * | | | +-------------------------------+
+ * +--------------------------+ | +-----+ |
+ * | +--+ | |
+ * V | V |
+ * +---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+
+ * | 11| 12| 13| 14| 15| 16| 17| | | | 26| 27| 28| 29| 30| 31| 32| |
+ * +---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+
+ * | |
+ * +-----------------------+ |
+ * | |
+ * +---+---+---+---+---+---+---+---+
+ * | 18| 19| 20| 21| 22| 23| 24| |
+ * +---+---+---+---+---+---+---+---+
+ */
+struct index_entry *ntfs_index_next(struct index_entry *ie, struct ntfs_index_context *ictx)
+{
+ struct index_entry *next;
+ __le16 flags;
+
+ /*
+ * lookup() may have returned an invalid node
+ * when searching for a partial key
+ * if this happens, walk up
+ */
+ if (ie->flags & INDEX_ENTRY_END)
+ next = ntfs_index_walk_up(ie, ictx);
+ else {
+ /*
+ * get next entry in same node
+ * there is always one after any entry with data
+ */
+ next = (struct index_entry *)((char *)ie + le16_to_cpu(ie->length));
+ ++ictx->parent_pos[ictx->pindex];
+ flags = next->flags;
+
+ /* walk down if it has a subnode */
+ if (flags & INDEX_ENTRY_NODE) {
+ if (!ictx->ia_ni) {
+ ictx->ia_ni = ntfs_ia_open(ictx, ictx->idx_ni);
+ BUG_ON(!ictx->ia_ni);
+ }
+
+ next = ntfs_index_walk_down(next, ictx);
+ } else {
+
+ /* walk up it has no subnode, nor data */
+ if (flags & INDEX_ENTRY_END)
+ next = ntfs_index_walk_up(next, ictx);
+ }
+ }
+
+ /* return NULL if stuck at end of a block */
+ if (next && (next->flags & INDEX_ENTRY_END))
+ next = NULL;
+
+ return next;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 05/11] ntfsplus: add file operations
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
` (3 preceding siblings ...)
2025-10-20 2:07 ` [PATCH 04/11] ntfsplus: add directory operations Namjae Jeon
@ 2025-10-20 2:07 ` Namjae Jeon
2025-10-20 18:33 ` [PATCH 00/11] ntfsplus: ntfs filesystem remake Pali Rohár
` (2 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-20 2:07 UTC (permalink / raw)
To: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee, Namjae Jeon
This adds the implementation of file operations for ntfsplus.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/ntfsplus/file.c | 1056 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1056 insertions(+)
create mode 100644 fs/ntfsplus/file.c
diff --git a/fs/ntfsplus/file.c b/fs/ntfsplus/file.c
new file mode 100644
index 000000000000..b4114017b128
--- /dev/null
+++ b/fs/ntfsplus/file.c
@@ -0,0 +1,1056 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * NTFS kernel file operations. Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2015 Anton Altaparmakov and Tuxera Inc.
+ * Copyright (c) 2025 LG Electronics Co., Ltd.
+ */
+
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+#include <linux/iomap.h>
+#include <linux/uio.h>
+#include <linux/posix_acl.h>
+#include <linux/posix_acl_xattr.h>
+#include <linux/compat.h>
+#include <linux/falloc.h>
+#include <uapi/linux/ntfs.h>
+
+#include "lcnalloc.h"
+#include "ntfs.h"
+#include "aops.h"
+#include "reparse.h"
+#include "ea.h"
+#include "ntfs_iomap.h"
+#include "misc.h"
+
+/**
+ * ntfs_file_open - called when an inode is about to be opened
+ * @vi: inode to be opened
+ * @filp: file structure describing the inode
+ *
+ * Limit file size to the page cache limit on architectures where unsigned long
+ * is 32-bits. This is the most we can do for now without overflowing the page
+ * cache page index. Doing it this way means we don't run into problems because
+ * of existing too large files. It would be better to allow the user to read
+ * the beginning of the file but I doubt very much anyone is going to hit this
+ * check on a 32-bit architecture, so there is no point in adding the extra
+ * complexity required to support this.
+ *
+ * On 64-bit architectures, the check is hopefully optimized away by the
+ * compiler.
+ *
+ * After the check passes, just call generic_file_open() to do its work.
+ */
+static int ntfs_file_open(struct inode *vi, struct file *filp)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+
+ if (NVolShutdown(ni->vol))
+ return -EIO;
+
+ if (sizeof(unsigned long) < 8) {
+ if (i_size_read(vi) > MAX_LFS_FILESIZE)
+ return -EOVERFLOW;
+ }
+
+ if (filp->f_flags & O_TRUNC && NInoNonResident(ni)) {
+ int err;
+
+ mutex_lock(&ni->mrec_lock);
+ down_read(&ni->runlist.lock);
+ if (!ni->runlist.rl) {
+ err = ntfs_attr_map_whole_runlist(ni);
+ if (err) {
+ up_read(&ni->runlist.lock);
+ mutex_unlock(&ni->mrec_lock);
+ return err;
+ }
+ }
+ ni->lcn_seek_trunc = ni->runlist.rl->lcn;
+ up_read(&ni->runlist.lock);
+ mutex_unlock(&ni->mrec_lock);
+ }
+
+ filp->f_mode |= FMODE_NOWAIT;
+
+ return generic_file_open(vi, filp);
+}
+
+static int ntfs_file_release(struct inode *vi, struct file *filp)
+{
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+ s64 aligned_data_size = round_up(ni->data_size, vol->cluster_size);
+
+ if (NInoCompressed(ni))
+ return 0;
+
+ inode_lock(vi);
+ mutex_lock(&ni->mrec_lock);
+ down_write(&ni->runlist.lock);
+ if (aligned_data_size < ni->allocated_size) {
+ int err;
+ s64 vcn_ds = aligned_data_size >> vol->cluster_size_bits;
+ s64 vcn_tr = -1;
+ struct runlist_element *rl = ni->runlist.rl;
+ ssize_t rc = ni->runlist.count - 2;
+
+ while (rc >= 0 && rl[rc].lcn == LCN_HOLE && vcn_ds <= rl[rc].vcn) {
+ vcn_tr = rl[rc].vcn;
+ rc--;
+ }
+
+ if (vcn_tr >= 0) {
+ err = ntfs_rl_truncate_nolock(vol, &ni->runlist, vcn_tr);
+ if (err) {
+ ntfs_free(ni->runlist.rl);
+ ni->runlist.rl = NULL;
+ ntfs_error(vol->sb, "Preallocated block rollback failed");
+ } else {
+ ni->allocated_size = vcn_tr << vol->cluster_size_bits;
+ err = ntfs_attr_update_mapping_pairs(ni, 0);
+ if (err)
+ ntfs_error(vol->sb,
+ "Failed to rollback mapping pairs for prealloc");
+ }
+ }
+ }
+ up_write(&ni->runlist.lock);
+ mutex_unlock(&ni->mrec_lock);
+ inode_unlock(vi);
+
+ return 0;
+}
+
+/**
+ * ntfs_file_fsync - sync a file to disk
+ * @filp: file to be synced
+ * @start: start offset to be synced
+ * @end: end offset to be synced
+ * @datasync: if non-zero only flush user data and not metadata
+ *
+ * Data integrity sync of a file to disk. Used for fsync, fdatasync, and msync
+ * system calls. This function is inspired by fs/buffer.c::file_fsync().
+ *
+ * If @datasync is false, write the mft record and all associated extent mft
+ * records as well as the $DATA attribute and then sync the block device.
+ *
+ * If @datasync is true and the attribute is non-resident, we skip the writing
+ * of the mft record and all associated extent mft records (this might still
+ * happen due to the write_inode_now() call).
+ *
+ * Also, if @datasync is true, we do not wait on the inode to be written out
+ * but we always wait on the page cache pages to be written out.
+ */
+static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
+ int datasync)
+{
+ struct inode *vi = filp->f_mapping->host;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+ int err, ret = 0;
+ struct inode *parent_vi, *ia_vi;
+ struct ntfs_attr_search_ctx *ctx;
+
+ ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ err = file_write_and_wait_range(filp, start, end);
+ if (err)
+ return err;
+
+ BUG_ON(S_ISDIR(vi->i_mode));
+
+ if (!datasync || !NInoNonResident(NTFS_I(vi)))
+ ret = __ntfs_write_inode(vi, 1);
+ write_inode_now(vi, !datasync);
+
+ ctx = ntfs_attr_get_search_ctx(ni, NULL);
+ if (!ctx)
+ return -ENOMEM;
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL_2);
+ while (!(err = ntfs_attr_lookup(AT_UNUSED, NULL, 0, 0, 0, NULL, 0, ctx))) {
+ if (ctx->attr->type == AT_FILE_NAME) {
+ struct file_name_attr *fn = (struct file_name_attr *)((u8 *)ctx->attr +
+ le16_to_cpu(ctx->attr->data.resident.value_offset));
+
+ parent_vi = ntfs_iget(vi->i_sb, MREF_LE(fn->parent_directory));
+ if (IS_ERR(parent_vi))
+ continue;
+ mutex_lock_nested(&NTFS_I(parent_vi)->mrec_lock, NTFS_INODE_MUTEX_PARENT_2);
+ ia_vi = ntfs_index_iget(parent_vi, I30, 4);
+ mutex_unlock(&NTFS_I(parent_vi)->mrec_lock);
+ if (IS_ERR(ia_vi)) {
+ iput(parent_vi);
+ continue;
+ }
+ write_inode_now(ia_vi, 1);
+ iput(ia_vi);
+ write_inode_now(parent_vi, 1);
+ iput(parent_vi);
+ } else if (ctx->attr->non_resident) {
+ struct inode *attr_vi;
+ __le16 *name;
+
+ name = (__le16 *)((u8 *)ctx->attr + le16_to_cpu(ctx->attr->name_offset));
+ if (ctx->attr->type == AT_DATA && ctx->attr->name_length == 0)
+ continue;
+
+ attr_vi = ntfs_attr_iget(vi, ctx->attr->type,
+ name, ctx->attr->name_length);
+ if (IS_ERR(attr_vi))
+ continue;
+ spin_lock(&attr_vi->i_lock);
+ if (attr_vi->i_state & I_DIRTY_PAGES) {
+ spin_unlock(&attr_vi->i_lock);
+ filemap_write_and_wait(attr_vi->i_mapping);
+ } else
+ spin_unlock(&attr_vi->i_lock);
+ iput(attr_vi);
+ }
+ }
+ mutex_unlock(&ni->mrec_lock);
+ ntfs_attr_put_search_ctx(ctx);
+
+ write_inode_now(vol->mftbmp_ino, 1);
+ down_write(&vol->lcnbmp_lock);
+ write_inode_now(vol->lcnbmp_ino, 1);
+ up_write(&vol->lcnbmp_lock);
+ write_inode_now(vol->mft_ino, 1);
+
+ /*
+ * NOTE: If we were to use mapping->private_list (see ext2 and
+ * fs/buffer.c) for dirty blocks then we could optimize the below to be
+ * sync_mapping_buffers(vi->i_mapping).
+ */
+ err = sync_blockdev(vi->i_sb->s_bdev);
+ if (unlikely(err && !ret))
+ ret = err;
+ if (likely(!ret))
+ ntfs_debug("Done.");
+ else
+ ntfs_warning(vi->i_sb,
+ "Failed to f%ssync inode 0x%lx. Error %u.",
+ datasync ? "data" : "", vi->i_ino, -ret);
+ if (!ret)
+ blkdev_issue_flush(vi->i_sb->s_bdev);
+ return ret;
+}
+
+/**
+ * ntfs_setattr - called from notify_change() when an attribute is being changed
+ * @idmap: idmap of the mount the inode was found from
+ * @dentry: dentry whose attributes to change
+ * @attr: structure describing the attributes and the changes
+ *
+ * We have to trap VFS attempts to truncate the file described by @dentry as
+ * soon as possible, because we do not implement changes in i_size yet. So we
+ * abort all i_size changes here.
+ *
+ * We also abort all changes of user, group, and mode as we do not implement
+ * the NTFS ACLs yet.
+ */
+int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct iattr *attr)
+{
+ struct inode *vi = d_inode(dentry);
+ int err;
+ unsigned int ia_valid = attr->ia_valid;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ err = setattr_prepare(idmap, dentry, attr);
+ if (err)
+ goto out;
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ if (ia_valid & ATTR_SIZE) {
+ if (NInoCompressed(ni) || NInoEncrypted(ni)) {
+ ntfs_warning(vi->i_sb,
+ "Changes in inode size are not supported yet for %s files, ignoring.",
+ NInoCompressed(ni) ? "compressed" : "encrypted");
+ err = -EOPNOTSUPP;
+ } else {
+ loff_t old_size = vi->i_size;
+
+ err = inode_newsize_ok(vi, attr->ia_size);
+ if (err)
+ goto out;
+
+ inode_dio_wait(vi);
+ /* Serialize against page faults */
+ if (NInoNonResident(NTFS_I(vi)) &&
+ attr->ia_size < old_size) {
+ err = iomap_truncate_page(vi, attr->ia_size, NULL,
+ &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ if (err)
+ goto out;
+ }
+
+ truncate_setsize(vi, attr->ia_size);
+ err = ntfs_truncate_vfs(vi, attr->ia_size, old_size);
+ if (err) {
+ i_size_write(vi, old_size);
+ goto out;
+ }
+
+ if (NInoNonResident(ni) && attr->ia_size > old_size &&
+ old_size % PAGE_SIZE != 0) {
+ loff_t len = min_t(loff_t,
+ round_up(old_size, PAGE_SIZE) - old_size,
+ attr->ia_size - old_size);
+ err = iomap_zero_range(vi, old_size, len,
+ NULL, &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ }
+ }
+ if (ia_valid == ATTR_SIZE)
+ goto out;
+ ia_valid |= ATTR_MTIME | ATTR_CTIME;
+ }
+
+ setattr_copy(idmap, vi, attr);
+
+ if (vol->sb->s_flags & SB_POSIXACL && !S_ISLNK(vi->i_mode)) {
+ err = posix_acl_chmod(idmap, dentry, vi->i_mode);
+ if (err)
+ goto out;
+ }
+
+ if (0222 & vi->i_mode)
+ ni->flags &= ~FILE_ATTR_READONLY;
+ else
+ ni->flags |= FILE_ATTR_READONLY;
+
+ if (ia_valid & (ATTR_UID | ATTR_GID | ATTR_MODE)) {
+ unsigned int flags = 0;
+
+ if (ia_valid & ATTR_UID)
+ flags |= NTFS_EA_UID;
+ if (ia_valid & ATTR_GID)
+ flags |= NTFS_EA_GID;
+ if (ia_valid & ATTR_MODE)
+ flags |= NTFS_EA_MODE;
+
+ if (S_ISDIR(vi->i_mode))
+ vi->i_mode &= ~vol->dmask;
+ else
+ vi->i_mode &= ~vol->fmask;
+
+ mutex_lock(&ni->mrec_lock);
+ ntfs_ea_set_wsl_inode(vi, 0, NULL, flags);
+ mutex_unlock(&ni->mrec_lock);
+ }
+
+ mark_inode_dirty(vi);
+out:
+ return err;
+}
+
+int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
+ struct kstat *stat, unsigned int request_mask,
+ unsigned int query_flags)
+{
+ struct inode *inode = d_backing_inode(path->dentry);
+
+ generic_fillattr(idmap, request_mask, inode, stat);
+
+ stat->blksize = NTFS_SB(inode->i_sb)->cluster_size;
+ stat->blocks = (((u64)NTFS_I(inode)->i_dealloc_clusters <<
+ NTFS_SB(inode->i_sb)->cluster_size_bits) >> 9) + inode->i_blocks;
+ stat->result_mask |= STATX_BTIME;
+ stat->btime = NTFS_I(inode)->i_crtime;
+
+ return 0;
+}
+
+static loff_t ntfs_file_llseek(struct file *file, loff_t offset, int whence)
+{
+ struct inode *vi = file->f_mapping->host;
+
+ if (whence == SEEK_DATA || whence == SEEK_HOLE) {
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+ struct runlist_element *rl;
+ s64 vcn;
+ unsigned int vcn_off;
+ loff_t end_off;
+ unsigned long flags;
+ int i;
+
+ inode_lock_shared(vi);
+
+ if (NInoCompressed(ni) || NInoEncrypted(ni))
+ goto error;
+
+ read_lock_irqsave(&ni->size_lock, flags);
+ end_off = ni->data_size;
+ read_unlock_irqrestore(&ni->size_lock, flags);
+
+ if (offset < 0 || offset >= end_off)
+ goto error;
+
+ if (!NInoNonResident(ni)) {
+ if (whence == SEEK_HOLE)
+ offset = end_off;
+ goto found_no_runlist_lock;
+ }
+
+ vcn = offset >> vol->cluster_size_bits;
+ vcn_off = offset & vol->cluster_size_mask;
+
+ down_read(&ni->runlist.lock);
+ rl = ni->runlist.rl;
+ i = 0;
+
+ BUG_ON(rl && rl[0].vcn > vcn);
+#ifdef DEBUG
+ ntfs_debug("init:");
+ ntfs_debug_dump_runlist(rl);
+#endif
+ while (1) {
+ if (!rl || !NInoFullyMapped(ni) || rl[i].lcn == LCN_RL_NOT_MAPPED) {
+ int ret;
+
+ up_read(&ni->runlist.lock);
+ ret = ntfs_map_runlist(ni, rl ? rl[i].vcn : 0);
+ if (ret)
+ goto error;
+ down_read(&ni->runlist.lock);
+ rl = ni->runlist.rl;
+#ifdef DEBUG
+ ntfs_debug("mapped:");
+ ntfs_debug_dump_runlist(ni->runlist.rl);
+#endif
+ continue;
+ } else if (rl[i].lcn == LCN_ENOENT) {
+ if (whence == SEEK_DATA) {
+ up_read(&ni->runlist.lock);
+ goto error;
+ } else {
+ offset = end_off;
+ goto found;
+ }
+ } else if (rl[i + 1].vcn > vcn) {
+ if ((whence == SEEK_DATA && (rl[i].lcn >= 0 ||
+ rl[i].lcn == LCN_DELALLOC)) ||
+ (whence == SEEK_HOLE && rl[i].lcn == LCN_HOLE)) {
+ offset = (vcn << vol->cluster_size_bits) + vcn_off;
+ if (offset < ni->data_size)
+ goto found;
+ }
+ vcn = rl[i + 1].vcn;
+ vcn_off = 0;
+ }
+ i++;
+ }
+ up_read(&ni->runlist.lock);
+ inode_unlock_shared(vi);
+ BUG();
+found:
+ up_read(&ni->runlist.lock);
+found_no_runlist_lock:
+ inode_unlock_shared(vi);
+ return vfs_setpos(file, offset, vi->i_sb->s_maxbytes);
+error:
+ inode_unlock_shared(vi);
+ return -ENXIO;
+ } else {
+ return generic_file_llseek_size(file, offset, whence,
+ vi->i_sb->s_maxbytes,
+ i_size_read(vi));
+ }
+}
+
+static ssize_t ntfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct inode *vi = file_inode(iocb->ki_filp);
+ struct super_block *sb = vi->i_sb;
+ ssize_t ret;
+
+ if (NVolShutdown(NTFS_SB(sb)))
+ return -EIO;
+
+ if (NInoCompressed(NTFS_I(vi)) && iocb->ki_flags & IOCB_DIRECT)
+ return -EOPNOTSUPP;
+
+ inode_lock_shared(vi);
+
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ size_t count = iov_iter_count(to);
+
+ if ((iocb->ki_pos | count) & (sb->s_blocksize - 1)) {
+ ret = -EINVAL;
+ goto inode_unlock;
+ }
+
+ file_accessed(iocb->ki_filp);
+ ret = iomap_dio_rw(iocb, to, &ntfs_read_iomap_ops, NULL, IOMAP_DIO_PARTIAL,
+ NULL, 0);
+ } else {
+ ret = generic_file_read_iter(iocb, to);
+ }
+
+inode_unlock:
+ inode_unlock_shared(vi);
+
+ return ret;
+}
+
+static int ntfs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
+ int error, unsigned int flags)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ if (error)
+ return error;
+
+ if (size) {
+ if (i_size_read(inode) < iocb->ki_pos + size) {
+ i_size_write(inode, iocb->ki_pos + size);
+ mark_inode_dirty(inode);
+ }
+ }
+
+ return 0;
+}
+
+static const struct iomap_dio_ops ntfs_write_dio_ops = {
+ .end_io = ntfs_file_write_dio_end_io,
+};
+
+static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *vi = file->f_mapping->host;
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+ ssize_t ret;
+ ssize_t count;
+ loff_t pos;
+ int err;
+ loff_t old_data_size, old_init_size;
+
+ if (NVolShutdown(vol))
+ return -EIO;
+
+ if (NInoEncrypted(ni)) {
+ ntfs_error(vi->i_sb, "Writing for %s files is not supported yet",
+ NInoCompressed(ni) ? "Compressed" : "Encrypted");
+ return -EOPNOTSUPP;
+ }
+
+ if (NInoCompressed(ni) && iocb->ki_flags & IOCB_DIRECT)
+ return -EOPNOTSUPP;
+
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (!inode_trylock(vi))
+ return -EAGAIN;
+ } else
+ inode_lock(vi);
+
+ ret = generic_write_checks(iocb, from);
+ if (ret <= 0)
+ goto out_lock;
+
+ if (NInoNonResident(ni) && (iocb->ki_flags & IOCB_DIRECT) &&
+ ((iocb->ki_pos | ret) & (vi->i_sb->s_blocksize - 1))) {
+ ret = -EINVAL;
+ goto out_lock;
+ }
+
+ err = file_modified(iocb->ki_filp);
+ if (err) {
+ ret = err;
+ goto out_lock;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY))
+ ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+
+ pos = iocb->ki_pos;
+ count = ret;
+
+ old_data_size = ni->data_size;
+ old_init_size = ni->initialized_size;
+ if (iocb->ki_pos + ret > old_data_size) {
+ mutex_lock(&ni->mrec_lock);
+ if (!NInoCompressed(ni) && iocb->ki_pos + ret > ni->allocated_size &&
+ iocb->ki_pos + ret < ni->allocated_size + vol->preallocated_size)
+ ret = ntfs_attr_expand(ni, iocb->ki_pos + ret,
+ ni->allocated_size + vol->preallocated_size);
+ else if (NInoCompressed(ni) && iocb->ki_pos + ret > ni->allocated_size)
+ ret = ntfs_attr_expand(ni, iocb->ki_pos + ret,
+ round_up(iocb->ki_pos + ret, ni->itype.compressed.block_size));
+ else
+ ret = ntfs_attr_expand(ni, iocb->ki_pos + ret, 0);
+ mutex_unlock(&ni->mrec_lock);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (NInoNonResident(ni) && iocb->ki_pos + count > old_init_size) {
+ ret = ntfs_extend_initialized_size(vi, iocb->ki_pos,
+ iocb->ki_pos + count);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (NInoNonResident(ni) && NInoCompressed(ni)) {
+ ret = ntfs_compress_write(ni, pos, count, from);
+ if (ret > 0)
+ iocb->ki_pos += ret;
+ goto out;
+ }
+
+ if (NInoNonResident(ni) && iocb->ki_flags & IOCB_DIRECT) {
+ ret = iomap_dio_rw(iocb, from, &ntfs_dio_iomap_ops,
+ &ntfs_write_dio_ops, 0, NULL, 0);
+ if (ret == -ENOTBLK)
+ ret = 0;
+ else if (ret < 0)
+ goto out;
+
+ if (iov_iter_count(from)) {
+ loff_t offset, end;
+ ssize_t written;
+ int ret2;
+
+ offset = iocb->ki_pos;
+ iocb->ki_flags &= ~IOCB_DIRECT;
+ written = iomap_file_buffered_write(iocb, from,
+ &ntfs_write_iomap_ops, &ntfs_iomap_folio_ops,
+ NULL);
+ if (written < 0) {
+ err = written;
+ goto out;
+ }
+
+ ret += written;
+ end = iocb->ki_pos + written - 1;
+ ret2 = filemap_write_and_wait_range(iocb->ki_filp->f_mapping,
+ offset, end);
+ if (ret2)
+ goto out_err;
+ if (!ret2)
+ invalidate_mapping_pages(iocb->ki_filp->f_mapping,
+ offset >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
+ }
+ } else {
+ ret = iomap_file_buffered_write(iocb, from, &ntfs_write_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ }
+out:
+ if (ret < 0 && ret != -EIOCBQUEUED) {
+out_err:
+ if (ni->initialized_size != old_init_size) {
+ mutex_lock(&ni->mrec_lock);
+ ntfs_attr_set_initialized_size(ni, old_init_size);
+ mutex_unlock(&ni->mrec_lock);
+ }
+ if (ni->data_size != old_data_size) {
+ truncate_setsize(vi, old_data_size);
+ ntfs_attr_truncate(ni, old_data_size);
+ }
+ }
+out_lock:
+ inode_unlock(vi);
+ if (ret > 0)
+ ret = generic_write_sync(iocb, ret);
+ return ret;
+}
+
+static vm_fault_t ntfs_filemap_page_mkwrite(struct vm_fault *vmf)
+{
+ struct inode *inode = file_inode(vmf->vma->vm_file);
+ vm_fault_t ret;
+
+ if (unlikely(IS_IMMUTABLE(inode)))
+ return VM_FAULT_SIGBUS;
+
+ sb_start_pagefault(inode->i_sb);
+ file_update_time(vmf->vma->vm_file);
+
+ ret = iomap_page_mkwrite(vmf, &ntfs_page_mkwrite_iomap_ops, NULL);
+ sb_end_pagefault(inode->i_sb);
+ return ret;
+}
+
+static const struct vm_operations_struct ntfs_file_vm_ops = {
+ .fault = filemap_fault,
+ .map_pages = filemap_map_pages,
+ .page_mkwrite = ntfs_filemap_page_mkwrite,
+};
+
+static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
+{
+ struct file *file = desc->file;
+ struct inode *inode = file_inode(file);
+
+ if (NVolShutdown(NTFS_SB(file->f_mapping->host->i_sb)))
+ return -EIO;
+
+ if (NInoCompressed(NTFS_I(inode)))
+ return -EOPNOTSUPP;
+
+ if (desc->vm_flags & VM_WRITE) {
+ struct inode *inode = file_inode(file);
+ loff_t from, to;
+ int err;
+
+ from = ((loff_t)desc->pgoff << PAGE_SHIFT);
+ to = min_t(loff_t, i_size_read(inode),
+ from + desc->end - desc->start);
+
+ if (NTFS_I(inode)->initialized_size < to) {
+ err = ntfs_extend_initialized_size(inode, to, to);
+ if (err)
+ return err;
+ }
+ }
+
+
+ file_accessed(file);
+ desc->vm_ops = &ntfs_file_vm_ops;
+ return 0;
+}
+
+static int ntfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+ u64 start, u64 len)
+{
+ return iomap_fiemap(inode, fieinfo, start, len, &ntfs_read_iomap_ops);
+}
+
+static const char *ntfs_get_link(struct dentry *dentry, struct inode *inode,
+ struct delayed_call *done)
+{
+ if (!NTFS_I(inode)->target)
+ return ERR_PTR(-EINVAL);
+
+ return NTFS_I(inode)->target;
+}
+
+static ssize_t ntfs_file_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+ if (NVolShutdown(NTFS_SB(in->f_mapping->host->i_sb)))
+ return -EIO;
+
+ return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
+static int ntfs_ioctl_shutdown(struct super_block *sb, unsigned long arg)
+{
+ u32 flags;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (get_user(flags, (__u32 __user *)arg))
+ return -EFAULT;
+
+ return ntfs_force_shutdown(sb, flags);
+}
+
+long ntfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+
+ switch (cmd) {
+ case NTFS_IOC_SHUTDOWN:
+ return ntfs_ioctl_shutdown(inode->i_sb, arg);
+ default:
+ return -ENOTTY;
+ }
+}
+
+#ifdef CONFIG_COMPAT
+long ntfs_compat_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ return ntfs_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
+}
+#endif
+
+static long ntfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
+{
+ struct inode *vi = file_inode(file);
+ struct ntfs_inode *ni = NTFS_I(vi);
+ struct ntfs_volume *vol = ni->vol;
+ int err = 0;
+ loff_t end_offset = offset + len;
+ loff_t old_size, new_size;
+ s64 start_vcn, end_vcn;
+ bool map_locked = false;
+
+ if (!S_ISREG(vi->i_mode))
+ return -EOPNOTSUPP;
+
+ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_INSERT_RANGE |
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE))
+ return -EOPNOTSUPP;
+
+ if (!NVolFreeClusterKnown(vol))
+ wait_event(vol->free_waitq, NVolFreeClusterKnown(vol));
+
+ if ((ni->vol->mft_zone_end - ni->vol->mft_zone_start) == 0)
+ return -ENOSPC;
+
+ if (NInoNonResident(ni) && !NInoFullyMapped(ni)) {
+ down_write(&ni->runlist.lock);
+ err = ntfs_attr_map_whole_runlist(ni);
+ up_write(&ni->runlist.lock);
+ if (err)
+ return err;
+ }
+
+ if (!(vol->vol_flags & VOLUME_IS_DIRTY)) {
+ err = ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY);
+ if (err)
+ return err;
+ }
+
+ old_size = i_size_read(vi);
+ new_size = max_t(loff_t, old_size, end_offset);
+ start_vcn = offset >> vol->cluster_size_bits;
+ end_vcn = ((end_offset - 1) >> vol->cluster_size_bits) + 1;
+
+ inode_lock(vi);
+ if (NInoCompressed(ni) || NInoEncrypted(ni)) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
+ inode_dio_wait(vi);
+ if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE |
+ FALLOC_FL_INSERT_RANGE)) {
+ filemap_invalidate_lock(vi->i_mapping);
+ map_locked = true;
+ }
+
+ if (mode & FALLOC_FL_INSERT_RANGE) {
+ loff_t offset_down = round_down(offset,
+ max_t(unsigned long, vol->cluster_size, PAGE_SIZE));
+ loff_t alloc_size;
+
+ if ((offset & vol->cluster_size_mask) ||
+ (len & vol->cluster_size_mask) ||
+ offset >= ni->allocated_size) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ new_size = old_size +
+ ((end_vcn - start_vcn) << vol->cluster_size_bits);
+ alloc_size = ni->allocated_size +
+ ((end_vcn - start_vcn) << vol->cluster_size_bits);
+ if (alloc_size < 0) {
+ err = -EFBIG;
+ goto out;
+ }
+ err = inode_newsize_ok(vi, alloc_size);
+ if (err)
+ goto out;
+
+ err = filemap_write_and_wait_range(vi->i_mapping,
+ offset_down, LLONG_MAX);
+ if (err)
+ goto out;
+
+ truncate_pagecache(vi, offset_down);
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ err = ntfs_non_resident_attr_insert_range(ni, start_vcn,
+ end_vcn - start_vcn);
+ mutex_unlock(&ni->mrec_lock);
+ if (err)
+ goto out;
+ } else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
+ loff_t offset_down = round_down(offset,
+ max_t(unsigned long, vol->cluster_size, PAGE_SIZE));
+
+ if ((offset & vol->cluster_size_mask) ||
+ (len & vol->cluster_size_mask) ||
+ offset >= ni->allocated_size) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if ((end_vcn << vol->cluster_size_bits) > ni->allocated_size)
+ end_vcn = DIV_ROUND_UP(ni->allocated_size - 1,
+ vol->cluster_size) + 1;
+ new_size = old_size -
+ ((end_vcn - start_vcn) << vol->cluster_size_bits);
+ if (new_size < 0)
+ new_size = 0;
+ err = filemap_write_and_wait_range(vi->i_mapping,
+ offset_down, LLONG_MAX);
+ if (err)
+ goto out;
+
+ truncate_pagecache(vi, offset_down);
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ err = ntfs_non_resident_attr_collapse_range(ni, start_vcn,
+ end_vcn - start_vcn);
+ mutex_unlock(&ni->mrec_lock);
+ if (err)
+ goto out;
+ } else if (mode & FALLOC_FL_PUNCH_HOLE) {
+ loff_t offset_down = round_down(offset, max_t(unsigned int,
+ vol->cluster_size, PAGE_SIZE));
+
+ if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (offset >= ni->data_size)
+ goto out;
+
+ if (offset + len > ni->data_size) {
+ end_offset = ni->data_size;
+ end_vcn = ((end_offset - 1) >> vol->cluster_size_bits) + 1;
+ }
+
+ err = filemap_write_and_wait_range(vi->i_mapping, offset_down, LLONG_MAX);
+ if (err)
+ goto out;
+ truncate_pagecache(vi, offset_down);
+
+ if (offset & vol->cluster_size_mask) {
+ loff_t to;
+
+ to = min_t(loff_t, (start_vcn + 1) << vol->cluster_size_bits,
+ end_offset);
+ err = iomap_zero_range(vi, offset, to - offset, NULL,
+ &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ if (err < 0 || (end_vcn - start_vcn) == 1)
+ goto out;
+ start_vcn++;
+ }
+ if (end_offset & vol->cluster_size_mask) {
+ loff_t from;
+
+ from = (end_vcn - 1) << vol->cluster_size_bits;
+ err = iomap_zero_range(vi, from, end_offset - from, NULL,
+ &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ if (err < 0 || (end_vcn - start_vcn) == 1)
+ goto out;
+ end_vcn--;
+ }
+
+ mutex_lock_nested(&ni->mrec_lock, NTFS_INODE_MUTEX_NORMAL);
+ err = ntfs_non_resident_attr_punch_hole(ni, start_vcn,
+ end_vcn - start_vcn);
+ mutex_unlock(&ni->mrec_lock);
+ if (err)
+ goto out;
+ } else if (mode == 0 || mode == FALLOC_FL_KEEP_SIZE) {
+ s64 need_space;
+
+ err = inode_newsize_ok(vi, new_size);
+ if (err)
+ goto out;
+
+ need_space = ni->allocated_size << vol->cluster_size_bits;
+ if (need_space > start_vcn)
+ need_space = end_vcn - need_space;
+ else
+ need_space = end_vcn - start_vcn;
+ if (need_space > 0 &&
+ need_space > (atomic64_read(&vol->free_clusters) -
+ atomic64_read(&vol->dirty_clusters))) {
+ err = -ENOSPC;
+ goto out;
+ }
+
+ err = ntfs_attr_fallocate(ni, offset, len,
+ mode & FALLOC_FL_KEEP_SIZE ? true : false);
+ if (err)
+ goto out;
+ }
+
+ /* inode->i_blocks is already updated in ntfs_attr_update_mapping_pairs */
+ if (!(mode & FALLOC_FL_KEEP_SIZE) && new_size != old_size)
+ i_size_write(vi, ni->data_size);
+
+out:
+ if (map_locked)
+ filemap_invalidate_unlock(vi->i_mapping);
+ if (!err) {
+ if (mode == 0 && NInoNonResident(ni) &&
+ offset > old_size && old_size % PAGE_SIZE != 0) {
+ loff_t len = min_t(loff_t,
+ round_up(old_size, PAGE_SIZE) - old_size,
+ offset - old_size);
+ err = iomap_zero_range(vi, old_size, len, NULL,
+ &ntfs_read_iomap_ops,
+ &ntfs_iomap_folio_ops, NULL);
+ }
+ NInoSetFileNameDirty(ni);
+ inode_set_mtime_to_ts(vi, inode_set_ctime_current(vi));
+ mark_inode_dirty(vi);
+ }
+
+ inode_unlock(vi);
+ return err;
+}
+
+const struct file_operations ntfs_file_ops = {
+ .llseek = ntfs_file_llseek,
+ .read_iter = ntfs_file_read_iter,
+ .write_iter = ntfs_file_write_iter,
+ .fsync = ntfs_file_fsync,
+ .mmap_prepare = ntfs_file_mmap_prepare,
+ .open = ntfs_file_open,
+ .release = ntfs_file_release,
+ .splice_read = ntfs_file_splice_read,
+ .splice_write = iter_file_splice_write,
+ .unlocked_ioctl = ntfs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ntfs_compat_ioctl,
+#endif
+ .fallocate = ntfs_fallocate,
+};
+
+const struct inode_operations ntfs_file_inode_ops = {
+ .setattr = ntfs_setattr,
+ .getattr = ntfs_getattr,
+ .listxattr = ntfs_listxattr,
+ .get_acl = ntfs_get_acl,
+ .set_acl = ntfs_set_acl,
+ .fiemap = ntfs_fiemap,
+};
+
+const struct inode_operations ntfs_symlink_inode_operations = {
+ .get_link = ntfs_get_link,
+ .setattr = ntfs_setattr,
+ .listxattr = ntfs_listxattr,
+};
+
+const struct inode_operations ntfs_special_inode_operations = {
+ .setattr = ntfs_setattr,
+ .getattr = ntfs_getattr,
+ .listxattr = ntfs_listxattr,
+ .get_acl = ntfs_get_acl,
+ .set_acl = ntfs_set_acl,
+};
+
+const struct file_operations ntfs_empty_file_ops = {};
+
+const struct inode_operations ntfs_empty_inode_ops = {};
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
` (4 preceding siblings ...)
2025-10-20 2:07 ` [PATCH 05/11] ntfsplus: add file operations Namjae Jeon
@ 2025-10-20 18:33 ` Pali Rohár
2025-10-21 1:49 ` Namjae Jeon
2025-10-21 0:17 ` Bagas Sanjaya
2025-10-22 6:30 ` David Sterba
7 siblings, 1 reply; 18+ messages in thread
From: Pali Rohár @ 2025-10-20 18:33 UTC (permalink / raw)
To: Namjae Jeon
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
Hello,
Do you have a plan, what should be the future of the NTFS support in
Linux? Because basically this is a third NTFS driver in recent years
and I think it is not a good idea to replace NTFS driver every decade by
a new different implementation.
Is this new driver going to replace existing ntfs3 driver? Or should it
live side-by-side together with ntfs3?
If this new driver is going to replace ntfs3 then it should provide same
API/ABI to userspace. For this case at least same/compatible mount
options, ioctl interface and/or attribute features (not sure what is
already supported).
You wrote that ntfsplus is based on the old ntfs driver. How big is the
diff between old ntfs and new ntfsplus driver? If the code is still
same, maybe it would be better to call it ntfs as before and construct
commits in a way which will first "revert the old ntfs driver" and then
apply your changes on top of it (like write feature, etc..)?
For mount options, for example I see that new driver does not use
de-facto standard iocharset= mount option like all other fs driver but
instead has nls= mount option. This should be fixed.
Pali
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
` (5 preceding siblings ...)
2025-10-20 18:33 ` [PATCH 00/11] ntfsplus: ntfs filesystem remake Pali Rohár
@ 2025-10-21 0:17 ` Bagas Sanjaya
2025-10-21 1:55 ` Namjae Jeon
2025-10-22 6:30 ` David Sterba
7 siblings, 1 reply; 18+ messages in thread
From: Bagas Sanjaya @ 2025-10-21 0:17 UTC (permalink / raw)
To: Namjae Jeon, viro, brauner, hch, hch, tytso, willy, jack, djwong,
josef, sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil,
amir73il
Cc: linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
[-- Attachment #1: Type: text/plain, Size: 955 bytes --]
On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> Introduction
> ============
Can you write the documentation at least in
Documentation/filesystems/ntfsplus.rst?
> - Journaling support:
> ntfs3 does not provide full journaling support. It only implement journal
> replay[4], which in our testing did not function correctly. My next task
> after upstreaming will be to add full journal support to ntfsplus.
What's the plan for journaling? Mirroring the Windows implementation AFAIK?
For the timeline: I guess you plan to submit journaling patches right after
ntfsplus is merged (at least applied to the filesystem tree or direct PR to
Linus), or would it be done for the subsequent release cycle (6.n+1)?
Regarding stability: As it is a new filesystem, shouldn't it be marked
experimental (and be stabilized for a few cycles) first?
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-20 18:33 ` [PATCH 00/11] ntfsplus: ntfs filesystem remake Pali Rohár
@ 2025-10-21 1:49 ` Namjae Jeon
2025-10-21 22:19 ` Pali Rohár
0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2025-10-21 1:49 UTC (permalink / raw)
To: Pali Rohár
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Tue, Oct 21, 2025 at 3:33 AM Pali Rohár <pali@kernel.org> wrote:
>
> Hello,
Hi Pali,
>
> Do you have a plan, what should be the future of the NTFS support in
> Linux? Because basically this is a third NTFS driver in recent years
> and I think it is not a good idea to replace NTFS driver every decade by
> a new different implementation.
Our product is currently using ntfsplus without any issues, but we plan to
provide support for the various issues that are reported from users or
developers once it is merged into the mainline kernel.
This is very basic, but the current ntfs3 has not provided this support
for the last four years.
After ntfsplus was merged, our next step will be to implement full journal
support. Our ultimate goal is to provide stable NTFS support in Linux,
utilities support included fsck(ntfsprogs-plus) and journaling.
>
> Is this new driver going to replace existing ntfs3 driver? Or should it
> live side-by-side together with ntfs3?
Currently, it is the latter. I think the two drivers should compete.
A ntfs driver that users can reliably use for ntfs in their
products is what should be the one that remains.
Four years ago, ntfs3 promised to soon release the full journal and
public utilities support that were in their commercial version.
That promise hasn't been kept yet, Probably, It would not be easy for
a company that sells a ntfs driver commercially to open some or all sources.
That's why I think we need at least competition.
>
> If this new driver is going to replace ntfs3 then it should provide same
> API/ABI to userspace. For this case at least same/compatible mount
> options, ioctl interface and/or attribute features (not sure what is
> already supported).
Sure, If ntfsplus replace ntfs3, it will support them.
>
> You wrote that ntfsplus is based on the old ntfs driver. How big is the
> diff between old ntfs and new ntfsplus driver? If the code is still
> same, maybe it would be better to call it ntfs as before and construct
> commits in a way which will first "revert the old ntfs driver" and then
> apply your changes on top of it (like write feature, etc..)?
I thought this patch-set was better because a lot of code clean-up
was done, resulting in a large diff, and the old ntfs was removed.
I would like to proceed with the current set of patches rather than
restructuring the patchset again.
>
> For mount options, for example I see that new driver does not use
> de-facto standard iocharset= mount option like all other fs driver but
> instead has nls= mount option. This should be fixed.
Okay, I will fix it on the next version.
>
> Pali
Thank you for your review:)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-21 0:17 ` Bagas Sanjaya
@ 2025-10-21 1:55 ` Namjae Jeon
2025-10-26 5:37 ` Bagas Sanjaya
0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2025-10-21 1:55 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Tue, Oct 21, 2025 at 9:17 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> > Introduction
> > ============
>
> Can you write the documentation at least in
> Documentation/filesystems/ntfsplus.rst?
Okay, I will add it on the next version.
>
>
> > - Journaling support:
> > ntfs3 does not provide full journaling support. It only implement journal
> > replay[4], which in our testing did not function correctly. My next task
> > after upstreaming will be to add full journal support to ntfsplus.
>
> What's the plan for journaling? Mirroring the Windows implementation AFAIK?
Yes. It would be best to first obtain the NTFS journal specification,
and I'll try that.
>
> For the timeline: I guess you plan to submit journaling patches right after
> ntfsplus is merged (at least applied to the filesystem tree or direct PR to
> Linus), or would it be done for the subsequent release cycle (6.n+1)?
It will probably take about a year to implement and stabilize it.
>
> Regarding stability: As it is a new filesystem, shouldn't it be marked
> experimental (and be stabilized for a few cycles) first?
I heard from Chrisitan's email that he was considering adding fs/staging trees.
In my opinion, it would be a good idea to promote ntfsplus after it's
been tested
there for a few cycles. And an experimental mark is also possible.
>
> Thanks.
Thanks!
>
> --
> An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-21 1:49 ` Namjae Jeon
@ 2025-10-21 22:19 ` Pali Rohár
2025-10-22 2:13 ` Namjae Jeon
0 siblings, 1 reply; 18+ messages in thread
From: Pali Rohár @ 2025-10-21 22:19 UTC (permalink / raw)
To: Namjae Jeon
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Tuesday 21 October 2025 10:49:48 Namjae Jeon wrote:
> On Tue, Oct 21, 2025 at 3:33 AM Pali Rohár <pali@kernel.org> wrote:
> >
> > Hello,
> Hi Pali,
> >
> > Do you have a plan, what should be the future of the NTFS support in
> > Linux? Because basically this is a third NTFS driver in recent years
> > and I think it is not a good idea to replace NTFS driver every decade by
> > a new different implementation.
> Our product is currently using ntfsplus without any issues, but we plan to
> provide support for the various issues that are reported from users or
> developers once it is merged into the mainline kernel.
> This is very basic, but the current ntfs3 has not provided this support
> for the last four years.
> After ntfsplus was merged, our next step will be to implement full journal
> support. Our ultimate goal is to provide stable NTFS support in Linux,
> utilities support included fsck(ntfsprogs-plus) and journaling.
One important thing here is that all those drivers are implementing
support for same filesystem. So theoretically they should be equivalent
(modulo bugs and missing features).
So basically the userspace ntfs fs utils should work with any of those
drivers and also should be compatible with Windows ntfs.sys driver.
And therefore independent of the used kernel driver.
It would be really nice to have working fsck utility for ntfs. I hope
that we would not have 3 ntfs mkfs/fsck tools from 3 different project
and every one would have different set of bugs or limitations.
> >
> > Is this new driver going to replace existing ntfs3 driver? Or should it
> > live side-by-side together with ntfs3?
> Currently, it is the latter. I think the two drivers should compete.
> A ntfs driver that users can reliably use for ntfs in their
> products is what should be the one that remains.
> Four years ago, ntfs3 promised to soon release the full journal and
> public utilities support that were in their commercial version.
> That promise hasn't been kept yet, Probably, It would not be easy for
> a company that sells a ntfs driver commercially to open some or all sources.
> That's why I think we need at least competition.
I understand it. It is not really easy.
Also same thing can happen with your new ntfsplus. Nobody knows what
would happen in next one or two years.
> >
> > If this new driver is going to replace ntfs3 then it should provide same
> > API/ABI to userspace. For this case at least same/compatible mount
> > options, ioctl interface and/or attribute features (not sure what is
> > already supported).
> Sure, If ntfsplus replace ntfs3, it will support them.
> >
> > You wrote that ntfsplus is based on the old ntfs driver. How big is the
> > diff between old ntfs and new ntfsplus driver? If the code is still
> > same, maybe it would be better to call it ntfs as before and construct
> > commits in a way which will first "revert the old ntfs driver" and then
> > apply your changes on top of it (like write feature, etc..)?
> I thought this patch-set was better because a lot of code clean-up
> was done, resulting in a large diff, and the old ntfs was removed.
> I would like to proceed with the current set of patches rather than
> restructuring the patchset again.
Sure. In the current form it looks to be more readable and easier for
review.
But I think that more developers could be curious how similar is the new
ntfsplus to the old removed ntfs. And in the form of revert + changes it
is easier to see what was changed, what was fixed and what new developed.
I'm just thinking, if the code has really lot of common parts, maybe it
would make sense to have it in git in that "big revert + new changes"
form?
> >
> > For mount options, for example I see that new driver does not use
> > de-facto standard iocharset= mount option like all other fs driver but
> > instead has nls= mount option. This should be fixed.
> Okay, I will fix it on the next version.
> >
> > Pali
> Thank you for your review:)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-21 22:19 ` Pali Rohár
@ 2025-10-22 2:13 ` Namjae Jeon
2025-10-22 18:52 ` Pali Rohár
0 siblings, 1 reply; 18+ messages in thread
From: Namjae Jeon @ 2025-10-22 2:13 UTC (permalink / raw)
To: Pali Rohár
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Wed, Oct 22, 2025 at 7:19 AM Pali Rohár <pali@kernel.org> wrote:
>
> On Tuesday 21 October 2025 10:49:48 Namjae Jeon wrote:
> > On Tue, Oct 21, 2025 at 3:33 AM Pali Rohár <pali@kernel.org> wrote:
> > >
> > > Hello,
> > Hi Pali,
> > >
> > > Do you have a plan, what should be the future of the NTFS support in
> > > Linux? Because basically this is a third NTFS driver in recent years
> > > and I think it is not a good idea to replace NTFS driver every decade by
> > > a new different implementation.
> > Our product is currently using ntfsplus without any issues, but we plan to
> > provide support for the various issues that are reported from users or
> > developers once it is merged into the mainline kernel.
> > This is very basic, but the current ntfs3 has not provided this support
> > for the last four years.
> > After ntfsplus was merged, our next step will be to implement full journal
> > support. Our ultimate goal is to provide stable NTFS support in Linux,
> > utilities support included fsck(ntfsprogs-plus) and journaling.
>
> One important thing here is that all those drivers are implementing
> support for same filesystem. So theoretically they should be equivalent
> (modulo bugs and missing features).
>
> So basically the userspace ntfs fs utils should work with any of those
> drivers and also should be compatible with Windows ntfs.sys driver.
> And therefore independent of the used kernel driver.
>
> It would be really nice to have working fsck utility for ntfs. I hope
> that we would not have 3 ntfs mkfs/fsck tools from 3 different project
> and every one would have different set of bugs or limitations.
>
> > >
> > > Is this new driver going to replace existing ntfs3 driver? Or should it
> > > live side-by-side together with ntfs3?
> > Currently, it is the latter. I think the two drivers should compete.
> > A ntfs driver that users can reliably use for ntfs in their
> > products is what should be the one that remains.
> > Four years ago, ntfs3 promised to soon release the full journal and
> > public utilities support that were in their commercial version.
> > That promise hasn't been kept yet, Probably, It would not be easy for
> > a company that sells a ntfs driver commercially to open some or all sources.
> > That's why I think we need at least competition.
>
> I understand it. It is not really easy.
>
> Also same thing can happen with your new ntfsplus. Nobody knows what
> would happen in next one or two years.
Since I publicly mentioned adding write support to ntfs driver, I have devoted
a great deal of time and effort to fulfilling that while working on other tasks
in parallel. Your comment seems to undermine all the effort I have done
over the years.
>
> > >
> > > If this new driver is going to replace ntfs3 then it should provide same
> > > API/ABI to userspace. For this case at least same/compatible mount
> > > options, ioctl interface and/or attribute features (not sure what is
> > > already supported).
> > Sure, If ntfsplus replace ntfs3, it will support them.
> > >
> > > You wrote that ntfsplus is based on the old ntfs driver. How big is the
> > > diff between old ntfs and new ntfsplus driver? If the code is still
> > > same, maybe it would be better to call it ntfs as before and construct
> > > commits in a way which will first "revert the old ntfs driver" and then
> > > apply your changes on top of it (like write feature, etc..)?
> > I thought this patch-set was better because a lot of code clean-up
> > was done, resulting in a large diff, and the old ntfs was removed.
> > I would like to proceed with the current set of patches rather than
> > restructuring the patchset again.
>
> Sure. In the current form it looks to be more readable and easier for
> review.
>
> But I think that more developers could be curious how similar is the new
> ntfsplus to the old removed ntfs. And in the form of revert + changes it
> is easier to see what was changed, what was fixed and what new developed.
>
> I'm just thinking, if the code has really lot of common parts, maybe it
> would make sense to have it in git in that "big revert + new changes"
> form?
>
> > >
> > > For mount options, for example I see that new driver does not use
> > > de-facto standard iocharset= mount option like all other fs driver but
> > > instead has nls= mount option. This should be fixed.
> > Okay, I will fix it on the next version.
> > >
> > > Pali
> > Thank you for your review:)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
` (6 preceding siblings ...)
2025-10-21 0:17 ` Bagas Sanjaya
@ 2025-10-22 6:30 ` David Sterba
2025-10-22 8:33 ` Namjae Jeon
2025-10-22 18:57 ` Pali Rohár
7 siblings, 2 replies; 18+ messages in thread
From: David Sterba @ 2025-10-22 6:30 UTC (permalink / raw)
To: Namjae Jeon
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> The feature comparison summary
> ==============================
>
> Feature ntfsplus ntfs3
> =================================== ======== ===========
> Write support Yes Yes
> iomap support Yes No
> No buffer head Yes No
> Public utilities(mkfs, fsck, etc.) Yes No
> xfstests passed 287 218
> Idmapped mount Yes No
> Delayed allocation Yes No
> Bonnie++ Pass Fail
> Journaling Planned Inoperative
> =================================== ======== ===========
Having two implementations of the same is problematic but I think what
votes for ntfs+ is that it's using the current internal interfaces like
iomap and no buffer heads. I'm not familiar with recent ntfs3
development but it would be good to know if the API conversions are
planned at all.
There are many filesystems using the old interfaces and I think most of
them will stay like that. The config options BUFFER_HEAD and FS_IOMAP
make the distinction what people care about most. In case of ntfs it's
clearly for interoperability.
As a user I'd be interested in feature parity with ntfs3, eg. I don't
see the label ioctls supported but it's a minor thing. Ideally there's
one full featured implementation but I take it that it may not be
feasible to update ntfs3 so it's equivalent to ntfs+. As this is not a
native linux filesystem swapping the implementation can be fairly
transparent, depending only on the config options. The drawback is
losing the history of fixed bugs that may show up again.
We could do the same as when ntfs3 appeared, but back then it had
arguably better position as it brought full write support. Right now I
understand it more of as maintenance problem.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-22 6:30 ` David Sterba
@ 2025-10-22 8:33 ` Namjae Jeon
2025-10-22 18:57 ` Pali Rohár
1 sibling, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-22 8:33 UTC (permalink / raw)
To: dsterba
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Wed, Oct 22, 2025 at 3:31 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> > The feature comparison summary
> > ==============================
> >
> > Feature ntfsplus ntfs3
> > =================================== ======== ===========
> > Write support Yes Yes
> > iomap support Yes No
> > No buffer head Yes No
> > Public utilities(mkfs, fsck, etc.) Yes No
> > xfstests passed 287 218
> > Idmapped mount Yes No
> > Delayed allocation Yes No
> > Bonnie++ Pass Fail
> > Journaling Planned Inoperative
> > =================================== ======== ===========
>
> Having two implementations of the same is problematic but I think what
> votes for ntfs+ is that it's using the current internal interfaces like
> iomap and no buffer heads. I'm not familiar with recent ntfs3
> development but it would be good to know if the API conversions are
> planned at all.
>
> There are many filesystems using the old interfaces and I think most of
> them will stay like that. The config options BUFFER_HEAD and FS_IOMAP
> make the distinction what people care about most. In case of ntfs it's
> clearly for interoperability.
>
> As a user I'd be interested in feature parity with ntfs3, eg. I don't
> see the label ioctls supported but it's a minor thing.
I can confirm that achieving full feature parity with ntfs3, including
the label ioctl support, in the next version.
Thanks for your feedback!
> Ideally there's
> one full featured implementation but I take it that it may not be
> feasible to update ntfs3 so it's equivalent to ntfs+. As this is not a
> native linux filesystem swapping the implementation can be fairly
> transparent, depending only on the config options. The drawback is
> losing the history of fixed bugs that may show up again.
>
> We could do the same as when ntfs3 appeared, but back then it had
> arguably better position as it brought full write support. Right now I
> understand it more of as maintenance problem.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-22 2:13 ` Namjae Jeon
@ 2025-10-22 18:52 ` Pali Rohár
2025-10-22 22:32 ` Namjae Jeon
0 siblings, 1 reply; 18+ messages in thread
From: Pali Rohár @ 2025-10-22 18:52 UTC (permalink / raw)
To: Namjae Jeon
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Wednesday 22 October 2025 11:13:50 Namjae Jeon wrote:
> On Wed, Oct 22, 2025 at 7:19 AM Pali Rohár <pali@kernel.org> wrote:
> >
> > On Tuesday 21 October 2025 10:49:48 Namjae Jeon wrote:
> > > On Tue, Oct 21, 2025 at 3:33 AM Pali Rohár <pali@kernel.org> wrote:
> > > >
> > > > Hello,
> > > Hi Pali,
> > > >
> > > > Do you have a plan, what should be the future of the NTFS support in
> > > > Linux? Because basically this is a third NTFS driver in recent years
> > > > and I think it is not a good idea to replace NTFS driver every decade by
> > > > a new different implementation.
> > > Our product is currently using ntfsplus without any issues, but we plan to
> > > provide support for the various issues that are reported from users or
> > > developers once it is merged into the mainline kernel.
> > > This is very basic, but the current ntfs3 has not provided this support
> > > for the last four years.
> > > After ntfsplus was merged, our next step will be to implement full journal
> > > support. Our ultimate goal is to provide stable NTFS support in Linux,
> > > utilities support included fsck(ntfsprogs-plus) and journaling.
> >
> > One important thing here is that all those drivers are implementing
> > support for same filesystem. So theoretically they should be equivalent
> > (modulo bugs and missing features).
> >
> > So basically the userspace ntfs fs utils should work with any of those
> > drivers and also should be compatible with Windows ntfs.sys driver.
> > And therefore independent of the used kernel driver.
> >
> > It would be really nice to have working fsck utility for ntfs. I hope
> > that we would not have 3 ntfs mkfs/fsck tools from 3 different project
> > and every one would have different set of bugs or limitations.
> >
> > > >
> > > > Is this new driver going to replace existing ntfs3 driver? Or should it
> > > > live side-by-side together with ntfs3?
> > > Currently, it is the latter. I think the two drivers should compete.
> > > A ntfs driver that users can reliably use for ntfs in their
> > > products is what should be the one that remains.
> > > Four years ago, ntfs3 promised to soon release the full journal and
> > > public utilities support that were in their commercial version.
> > > That promise hasn't been kept yet, Probably, It would not be easy for
> > > a company that sells a ntfs driver commercially to open some or all sources.
> > > That's why I think we need at least competition.
> >
> > I understand it. It is not really easy.
> >
> > Also same thing can happen with your new ntfsplus. Nobody knows what
> > would happen in next one or two years.
> Since I publicly mentioned adding write support to ntfs driver, I have devoted
> a great deal of time and effort to fulfilling that while working on other tasks
> in parallel. Your comment seems to undermine all the effort I have done
> over the years.
I'm really sorry, I did not mean it in that way. I just wanted to point
that year is a very long period and unexpected things could happen.
Nothing against your or any others effort.
> >
> > > >
> > > > If this new driver is going to replace ntfs3 then it should provide same
> > > > API/ABI to userspace. For this case at least same/compatible mount
> > > > options, ioctl interface and/or attribute features (not sure what is
> > > > already supported).
> > > Sure, If ntfsplus replace ntfs3, it will support them.
> > > >
> > > > You wrote that ntfsplus is based on the old ntfs driver. How big is the
> > > > diff between old ntfs and new ntfsplus driver? If the code is still
> > > > same, maybe it would be better to call it ntfs as before and construct
> > > > commits in a way which will first "revert the old ntfs driver" and then
> > > > apply your changes on top of it (like write feature, etc..)?
> > > I thought this patch-set was better because a lot of code clean-up
> > > was done, resulting in a large diff, and the old ntfs was removed.
> > > I would like to proceed with the current set of patches rather than
> > > restructuring the patchset again.
> >
> > Sure. In the current form it looks to be more readable and easier for
> > review.
> >
> > But I think that more developers could be curious how similar is the new
> > ntfsplus to the old removed ntfs. And in the form of revert + changes it
> > is easier to see what was changed, what was fixed and what new developed.
> >
> > I'm just thinking, if the code has really lot of common parts, maybe it
> > would make sense to have it in git in that "big revert + new changes"
> > form?
> >
> > > >
> > > > For mount options, for example I see that new driver does not use
> > > > de-facto standard iocharset= mount option like all other fs driver but
> > > > instead has nls= mount option. This should be fixed.
> > > Okay, I will fix it on the next version.
> > > >
> > > > Pali
> > > Thank you for your review:)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-22 6:30 ` David Sterba
2025-10-22 8:33 ` Namjae Jeon
@ 2025-10-22 18:57 ` Pali Rohár
1 sibling, 0 replies; 18+ messages in thread
From: Pali Rohár @ 2025-10-22 18:57 UTC (permalink / raw)
To: David Sterba
Cc: Namjae Jeon, viro, brauner, hch, hch, tytso, willy, jack, djwong,
josef, sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil,
amir73il, linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee,
jay.sim, gunho.lee
On Wednesday 22 October 2025 08:30:56 David Sterba wrote:
> On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> > The feature comparison summary
> > ==============================
> >
> > Feature ntfsplus ntfs3
> > =================================== ======== ===========
> > Write support Yes Yes
> > iomap support Yes No
> > No buffer head Yes No
> > Public utilities(mkfs, fsck, etc.) Yes No
> > xfstests passed 287 218
> > Idmapped mount Yes No
> > Delayed allocation Yes No
> > Bonnie++ Pass Fail
> > Journaling Planned Inoperative
> > =================================== ======== ===========
>
> Having two implementations of the same is problematic but I think what
> votes for ntfs+ is that it's using the current internal interfaces like
> iomap and no buffer heads. I'm not familiar with recent ntfs3
> development but it would be good to know if the API conversions are
> planned at all.
>
> There are many filesystems using the old interfaces and I think most of
> them will stay like that. The config options BUFFER_HEAD and FS_IOMAP
> make the distinction what people care about most. In case of ntfs it's
> clearly for interoperability.
>
> As a user I'd be interested in feature parity with ntfs3, eg. I don't
> see the label ioctls supported but it's a minor thing. Ideally there's
> one full featured implementation but I take it that it may not be
> feasible to update ntfs3 so it's equivalent to ntfs+. As this is not a
> native linux filesystem swapping the implementation can be fairly
> transparent, depending only on the config options. The drawback is
> losing the history of fixed bugs that may show up again.
This drawback already happened at the time of switch from old ntfs to
ntfs3 driver. So I think that this is not a problem.
> We could do the same as when ntfs3 appeared, but back then it had
> arguably better position as it brought full write support. Right now I
> understand it more of as maintenance problem.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-22 18:52 ` Pali Rohár
@ 2025-10-22 22:32 ` Namjae Jeon
0 siblings, 0 replies; 18+ messages in thread
From: Namjae Jeon @ 2025-10-22 22:32 UTC (permalink / raw)
To: Pali Rohár
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
On Thu, Oct 23, 2025 at 3:52 AM Pali Rohár <pali@kernel.org> wrote:
>
> On Wednesday 22 October 2025 11:13:50 Namjae Jeon wrote:
> > On Wed, Oct 22, 2025 at 7:19 AM Pali Rohár <pali@kernel.org> wrote:
> > >
> > > On Tuesday 21 October 2025 10:49:48 Namjae Jeon wrote:
> > > > On Tue, Oct 21, 2025 at 3:33 AM Pali Rohár <pali@kernel.org> wrote:
> > > > >
> > > > > Hello,
> > > > Hi Pali,
> > > > >
> > > > > Do you have a plan, what should be the future of the NTFS support in
> > > > > Linux? Because basically this is a third NTFS driver in recent years
> > > > > and I think it is not a good idea to replace NTFS driver every decade by
> > > > > a new different implementation.
> > > > Our product is currently using ntfsplus without any issues, but we plan to
> > > > provide support for the various issues that are reported from users or
> > > > developers once it is merged into the mainline kernel.
> > > > This is very basic, but the current ntfs3 has not provided this support
> > > > for the last four years.
> > > > After ntfsplus was merged, our next step will be to implement full journal
> > > > support. Our ultimate goal is to provide stable NTFS support in Linux,
> > > > utilities support included fsck(ntfsprogs-plus) and journaling.
> > >
> > > One important thing here is that all those drivers are implementing
> > > support for same filesystem. So theoretically they should be equivalent
> > > (modulo bugs and missing features).
> > >
> > > So basically the userspace ntfs fs utils should work with any of those
> > > drivers and also should be compatible with Windows ntfs.sys driver.
> > > And therefore independent of the used kernel driver.
> > >
> > > It would be really nice to have working fsck utility for ntfs. I hope
> > > that we would not have 3 ntfs mkfs/fsck tools from 3 different project
> > > and every one would have different set of bugs or limitations.
> > >
> > > > >
> > > > > Is this new driver going to replace existing ntfs3 driver? Or should it
> > > > > live side-by-side together with ntfs3?
> > > > Currently, it is the latter. I think the two drivers should compete.
> > > > A ntfs driver that users can reliably use for ntfs in their
> > > > products is what should be the one that remains.
> > > > Four years ago, ntfs3 promised to soon release the full journal and
> > > > public utilities support that were in their commercial version.
> > > > That promise hasn't been kept yet, Probably, It would not be easy for
> > > > a company that sells a ntfs driver commercially to open some or all sources.
> > > > That's why I think we need at least competition.
> > >
> > > I understand it. It is not really easy.
> > >
> > > Also same thing can happen with your new ntfsplus. Nobody knows what
> > > would happen in next one or two years.
> > Since I publicly mentioned adding write support to ntfs driver, I have devoted
> > a great deal of time and effort to fulfilling that while working on other tasks
> > in parallel. Your comment seems to undermine all the effort I have done
> > over the years.
>
> I'm really sorry, I did not mean it in that way. I just wanted to point
> that year is a very long period and unexpected things could happen.
> Nothing against your or any others effort.
I apologize for the misunderstanding. Thank you for clarifying that for me.
>
> > >
> > > > >
> > > > > If this new driver is going to replace ntfs3 then it should provide same
> > > > > API/ABI to userspace. For this case at least same/compatible mount
> > > > > options, ioctl interface and/or attribute features (not sure what is
> > > > > already supported).
> > > > Sure, If ntfsplus replace ntfs3, it will support them.
> > > > >
> > > > > You wrote that ntfsplus is based on the old ntfs driver. How big is the
> > > > > diff between old ntfs and new ntfsplus driver? If the code is still
> > > > > same, maybe it would be better to call it ntfs as before and construct
> > > > > commits in a way which will first "revert the old ntfs driver" and then
> > > > > apply your changes on top of it (like write feature, etc..)?
> > > > I thought this patch-set was better because a lot of code clean-up
> > > > was done, resulting in a large diff, and the old ntfs was removed.
> > > > I would like to proceed with the current set of patches rather than
> > > > restructuring the patchset again.
> > >
> > > Sure. In the current form it looks to be more readable and easier for
> > > review.
> > >
> > > But I think that more developers could be curious how similar is the new
> > > ntfsplus to the old removed ntfs. And in the form of revert + changes it
> > > is easier to see what was changed, what was fixed and what new developed.
> > >
> > > I'm just thinking, if the code has really lot of common parts, maybe it
> > > would make sense to have it in git in that "big revert + new changes"
> > > form?
> > >
> > > > >
> > > > > For mount options, for example I see that new driver does not use
> > > > > de-facto standard iocharset= mount option like all other fs driver but
> > > > > instead has nls= mount option. This should be fixed.
> > > > Okay, I will fix it on the next version.
> > > > >
> > > > > Pali
> > > > Thank you for your review:)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 00/11] ntfsplus: ntfs filesystem remake
2025-10-21 1:55 ` Namjae Jeon
@ 2025-10-26 5:37 ` Bagas Sanjaya
0 siblings, 0 replies; 18+ messages in thread
From: Bagas Sanjaya @ 2025-10-26 5:37 UTC (permalink / raw)
To: Namjae Jeon
Cc: viro, brauner, hch, hch, tytso, willy, jack, djwong, josef,
sandeen, rgoldwyn, xiang, dsterba, pali, ebiggers, neil, amir73il,
linux-fsdevel, linux-kernel, iamjoonsoo.kim, cheol.lee, jay.sim,
gunho.lee
[-- Attachment #1: Type: text/plain, Size: 1750 bytes --]
On Tue, Oct 21, 2025 at 10:55:15AM +0900, Namjae Jeon wrote:
> On Tue, Oct 21, 2025 at 9:17 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
> >
> > On Mon, Oct 20, 2025 at 11:07:38AM +0900, Namjae Jeon wrote:
> > > Introduction
> > > ============
> >
> > Can you write the documentation at least in
> > Documentation/filesystems/ntfsplus.rst?
> Okay, I will add it on the next version.
> >
> >
> > > - Journaling support:
> > > ntfs3 does not provide full journaling support. It only implement journal
> > > replay[4], which in our testing did not function correctly. My next task
> > > after upstreaming will be to add full journal support to ntfsplus.
> >
> > What's the plan for journaling? Mirroring the Windows implementation AFAIK?
> Yes. It would be best to first obtain the NTFS journal specification,
> and I'll try that.
> >
> > For the timeline: I guess you plan to submit journaling patches right after
> > ntfsplus is merged (at least applied to the filesystem tree or direct PR to
> > Linus), or would it be done for the subsequent release cycle (6.n+1)?
> It will probably take about a year to implement and stabilize it.
I didn't understand. You mean ntfsplus will be non-journaling fs for a while
after upstreaming, right?
>
> >
> > Regarding stability: As it is a new filesystem, shouldn't it be marked
> > experimental (and be stabilized for a few cycles) first?
> I heard from Chrisitan's email that he was considering adding fs/staging trees.
> In my opinion, it would be a good idea to promote ntfsplus after it's
> been tested
> there for a few cycles. And an experimental mark is also possible.
Ack.
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-10-26 5:37 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-20 2:07 [PATCH 00/11] ntfsplus: ntfs filesystem remake Namjae Jeon
2025-10-20 2:07 ` [PATCH 01/11] ntfsplus: in-memory, on-disk structures and headers Namjae Jeon
2025-10-20 2:07 ` [PATCH 02/11] ntfsplus: add super block operations Namjae Jeon
2025-10-20 2:07 ` [PATCH 03/11] ntfsplus: add inode operations Namjae Jeon
2025-10-20 2:07 ` [PATCH 04/11] ntfsplus: add directory operations Namjae Jeon
2025-10-20 2:07 ` [PATCH 05/11] ntfsplus: add file operations Namjae Jeon
2025-10-20 18:33 ` [PATCH 00/11] ntfsplus: ntfs filesystem remake Pali Rohár
2025-10-21 1:49 ` Namjae Jeon
2025-10-21 22:19 ` Pali Rohár
2025-10-22 2:13 ` Namjae Jeon
2025-10-22 18:52 ` Pali Rohár
2025-10-22 22:32 ` Namjae Jeon
2025-10-21 0:17 ` Bagas Sanjaya
2025-10-21 1:55 ` Namjae Jeon
2025-10-26 5:37 ` Bagas Sanjaya
2025-10-22 6:30 ` David Sterba
2025-10-22 8:33 ` Namjae Jeon
2025-10-22 18:57 ` Pali Rohár
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).