From: ezemtsov@google.com
To: linux-fsdevel@vger.kernel.org
Cc: tytso@mit.edu, Eugene Zemtsov <ezemtsov@google.com>
Subject: [PATCH 1/6] incfs: Add first files of incrementalfs
Date: Wed, 1 May 2019 21:03:26 -0700 [thread overview]
Message-ID: <20190502040331.81196-2-ezemtsov@google.com> (raw)
In-Reply-To: <20190502040331.81196-1-ezemtsov@google.com>
From: Eugene Zemtsov <ezemtsov@google.com>
- fs/incfs dir
- Kconfig (CONFIG_INCREMENTAL_FS)
- Makefile
- Module and file system initialization and clean up code
- New MAINTAINERS entry
- Add incrementalfs.h UAPI header
- Register ioctl range in ioctl-numbers.txt
- Documentation
Signed-off-by: Eugene Zemtsov <ezemtsov@google.com>
---
Documentation/filesystems/incrementalfs.rst | 452 ++++++++++++++++++++
Documentation/ioctl/ioctl-number.txt | 1 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/incfs/Kconfig | 10 +
fs/incfs/Makefile | 4 +
fs/incfs/main.c | 85 ++++
fs/incfs/vfs.c | 37 ++
include/uapi/linux/incrementalfs.h | 189 ++++++++
10 files changed, 787 insertions(+)
create mode 100644 Documentation/filesystems/incrementalfs.rst
create mode 100644 fs/incfs/Kconfig
create mode 100644 fs/incfs/Makefile
create mode 100644 fs/incfs/main.c
create mode 100644 fs/incfs/vfs.c
create mode 100644 include/uapi/linux/incrementalfs.h
diff --git a/Documentation/filesystems/incrementalfs.rst b/Documentation/filesystems/incrementalfs.rst
new file mode 100644
index 000000000000..682e3dcb6b5a
--- /dev/null
+++ b/Documentation/filesystems/incrementalfs.rst
@@ -0,0 +1,452 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Incremental File System
+=======================
+
+Overview
+========
+Incremental FS is special-purpose Linux virtual file system that allows
+execution of a program while its binary and resource files are still being
+lazily downloaded over the network, USB etc. It is focused on incremental
+delivery for a small number (under 100) of big files (more than 10 megabytes).
+Incremental FS doesn’t allow direct writes into files and, once loaded, file
+content never changes. Incremental FS doesn’t use a block device, instead it
+saves data into a backing file located on a regular file-system.
+
+But why?
+--------
+To allow running **big** Android apps before their binaries and resources are
+fully downloaded to an Android device. If an app reads something not loaded yet,
+it needs to wait for the data block to be fetched, but in most cases hot blocks
+can be loaded in advance.
+
+Workflow
+--------
+A userspace process, called a data loader, mounts an instance of incremental-fs
+giving it a file descriptor on an underlying file system (like ext4 or f2fs).
+Incremental-fs reads content (if any) of this backing file and interprets it as
+a file system image with files, directories and data blocks. At this point
+the data loader can declare new files to be shown by incremental-fs.
+
+A process is started from a binary located on incremental-fs.
+All reads are served directly from the backing file
+without roundtrips into userspace. If the process accesses a data block that was
+not originally present in the backing file, the read operation waits.
+
+Meanwhile the data loader can feed new data blocks to incremental-fs by calling
+write() on a special .cmd pseudo-file. The data loader can request information
+about pending reads by calling poll() and read() on the .cmd pseudo-file.
+This mechanism allows the data loader to serve most urgently needed data first.
+Once a data block is given to incremental-fs, it saves it to the backing file
+and unblocks all the reads waiting for this block.
+
+Eventually all data for all files is uploaded by the data loader, and saved by
+incremental-fs into the backing file. At that moment the data loader is not
+needed any longer. The backing file will play the role of a complete
+filesystem image for all future runs of the program.
+
+Non-goals
+---------
+* Allowing direct writes by the executing processes into files on incremental-fs
+* Allowing the data loader change file size or content after it was loaded.
+* Having more than a couple hundred files and directories.
+
+
+Features
+========
+
+Read-only, but not unchanging
+-----------------------------
+On the surface a mount directory of incremental-fs would look similar to
+a read-only instance of network file system: files and directories can be
+listed and read, but can’t be directly created or modified via creat() or
+write(). At the same time the data loader can make changes to a directory
+structure via external ioctl-s. i.e. link and unlink files and directories
+(if they empty). Data can't be changed this way, once a file block is loaded
+there is no way to change it.
+
+Filesystem image in a backing file
+----------------------------------
+Instead of using a block device, all data and metadata is stored in a
+backing file provided as a mount parameter. The backing file is located on
+an underlying file system (like ext4 or f2fs). Such approach is very similar
+to what might be achieved by using loopback device with a traditional file
+system, but it avoids extra set-up steps and indirections. It also allows
+incremental-fs image to dynamically grow as new files and data come without
+having to do any extra steps for resizing.
+
+If the backing file contains data at the moment when incremental-fs is mounted,
+content of the backing file is being interpreted as filesystem image.
+New files and data can still be added through the external interface,
+and they will be saved to the backing file.
+
+Data compression
+----------------
+Incremental-fs can store compressed data. In this case each 4KB data block is
+compressed separately. Data blocks can be provided to incremental-fs by
+the data loader in a compressed form. Incremental-fs uncompresses blocks
+each time a executing process reads it (modulo page cache). Compression also
+takes care of blocks composed of all zero bytes removing necessity to handle
+this case separately.
+
+Partially present files
+-----------------------
+Data in the files consists of 4KB blocks, each block can be present or absent.
+Unlike in sparse files, reading an absent block doesn’t return all zeros.
+It waits for the data block to be loaded via the ioctl interface
+(respecting a timeout). Once a data block is loaded it never disappears
+and can’t be changed or erased from a file. This ability to frictionlessly
+wait for temporary missing data is the main feature of incremental-fs.
+
+Hard links. Multiple names for the same file
+--------------------------------------------
+Like all traditional UNIX file systems, incremental-fs supports hard links,
+i.e. different file names in different directories can refer to the same file.
+As mentioned above new hard links can be created and removed via
+the ioctl interface, but actual data files are immutable, modulo partial
+data loading. Each directory can only have at most one name referencing it.
+
+Inspection of incremental-fs internal state
+-------------------------------------------
+poll() and read() on the .cmd pseudo-file allow data loaders to get a list of
+read operations stalled due to lack of a data block (pending reads).
+
+
+Application Programming Interface
+=================================
+
+Regular file system interface
+-----------------------------
+Executing process access files and directories via regular Linux file interface:
+open, read, close etc. All the intricacies of data loading a file representation
+are hidden from them.
+
+External .cmd file interface
+----------------------------
+When incremental-fs is mounted, a mount directory contains a pseudo-file
+called '.cmd'. The data loader will open this file and call read(), write(),
+poll() and ioctl() on it inspect and change state of incremental-fs.
+
+poll() and read() are used by the data loader to wait for pending reads to
+appear and obtain an array of ``struct incfs_pending_read_info``.
+
+write() is used by the data loader to feed new data blocks to incremental-fs.
+A data buffer given to write() is interpreted as an array of
+``struct incfs_new_data_block``. Structs in the array describe locations and
+properties of data blocks loaded with this write() call.
+
+``ioctl(INCFS_IOC_PROCESS_INSTRUCTION)`` is used to change structure of
+incremental-fs. It receives an pointer to ``struct incfs_instruction``
+where type field can have be one of the following values.
+
+**INCFS_INSTRUCTION_NEW_FILE**
+Creates an inode (a file or a directory) without a name.
+It assumes ``incfs_new_file_instruction.file`` is populated with details.
+
+**INCFS_INSTRUCTION_ADD_DIR_ENTRY**
+Creates a name (aka hardlink) for an inode in a directory.
+A directory can't have more than one hardlink pointing to it, but files can be
+linked from different directories.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+**INCFS_INSTRUCTION_REMOVE_DIR_ENTRY**
+Remove a name (aka hardlink) for a file from a directory.
+Only empty directories can be unlinked.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+For more details see in uapi/linux/incrementalfs.h and samples below.
+
+Supported mount options
+-----------------------
+See ``fs/incfs/options.c`` for more details.
+
+ * ``backing_fd=<unsigned int>``
+ Required. A file descriptor of a backing file opened by the process
+ calling mount(2). This descriptor can be closed after mount returns.
+
+ * ``read_timeout_msc=<unsigned int>``
+ Default: 1000. Timeout in milliseconds before a read operation fails
+ if no data found in the backing file or provided by the data loader.
+
+Sysfs files
+-----------
+``/sys/fs/incremental-fs/version`` - a current version of the filesystem.
+One ASCII encoded positive integer number with a new line at the end.
+
+
+Examples
+--------
+See ``sample_data_loader.c`` for a complete implementation of a data loader.
+
+Mount incremental-fs
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ int mount_fs(char *mount_dir, char *backing_file, int timeout_msc)
+ {
+ static const char fs_name[] = INCFS_NAME;
+ char mount_options[512];
+ int backing_fd;
+ int result;
+
+ backing_fd = open(backing_file, O_RDWR);
+ if (backing_fd == -1) {
+ perror("Error in opening backing file");
+ return 1;
+ }
+
+ snprintf(mount_options, ARRAY_SIZE(mount_options),
+ "backing_fd=%u,read_timeout_msc=%u", backing_fd, timeout_msc);
+
+ result = mount(fs_name, mount_dir, fs_name, 0, mount_options);
+ if (result != 0)
+ perror("Error mounting fs.");
+ return result;
+ }
+
+Open .cmd file
+~~~~~~~~~~~~~~
+
+::
+
+ int open_commands_file(char *mount_dir)
+ {
+ char cmd_file[255];
+ int cmd_fd;
+
+ snprintf(cmd_file, ARRAY_SIZE(cmd_file), "%s/.cmd", mount_dir);
+ cmd_fd = open(cmd_file, O_RDWR);
+ if (cmd_fd < 0)
+ perror("Can't open commands file");
+ return cmd_fd;
+ }
+
+Add a file to the file system
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ int create_file(int cmd_fd, char *filename, int *ino_out, size_t size)
+ {
+ int ret = 0;
+ __u16 ino = 0;
+ struct incfs_instruction inst = {
+ .version = INCFS_HEADER_VER,
+ .type = INCFS_INSTRUCTION_NEW_FILE,
+ .file = {
+ .size = size,
+ .mode = S_IFREG | 0555,
+ }
+ };
+
+ ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+ if (ret)
+ return -errno;
+
+ ino = inst.file.ino_out;
+ inst = (struct incfs_instruction){
+ .version = INCFS_HEADER_VER,
+ .type = INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+ .dir_entry = {
+ .dir_ino = INCFS_ROOT_INODE,
+ .child_ino = ino,
+ .name = ptr_to_u64(filename),
+ .name_len = strlen(filename)
+ }
+ };
+ ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+ if (ret)
+ return -errno;
+ *ino_out = ino;
+ return 0;
+ }
+
+Load data into a file
+~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ int cmd_fd = open_commands_file(path_to_mount_dir);
+ char *data = get_some_data();
+ struct incfs_new_data_block block;
+ int err;
+
+ block.file_ino = file_ino;
+ block.block_index = 0;
+ block.compression = COMPRESSION_NONE;
+ block.data = (__u64)data;
+ block.data_len = INCFS_DATA_FILE_BLOCK_SIZE;
+
+ err = write(cmd_fd, &block, sizeof(block));
+
+
+Get an array of pending reads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ int poll_res = 0;
+ struct incfs_pending_read_info reads[10];
+ int cmd_fd = open_commands_file(path_to_mount_dir);
+ struct pollfd pollfd = {
+ .fd = cmd_fd,
+ .events = POLLIN
+ };
+
+ poll_res = poll(&pollfd, 1, timeout);
+ if (poll_res > 0 && (pollfd.revents | POLLIN)) {
+ ssize_t read_res = read(cmd_fd, reads, sizeof(reads));
+ if (read_res > 0)
+ printf("Waiting reads %ld\n", read_res / sizeof(reads[0]));
+ }
+
+
+
+Ondisk format
+=============
+
+General principles
+------------------
+* The backbone of the incremental-fs ondisk format is an append only linked
+ list of metadata blocks. Each metadata block contains an offset of the next
+ one. These blocks describe files and directories on the
+ file system. They also represent actions of adding and removing file names
+ (hard links).
+ Every time incremental-fs instance is mounted, it reads through this list
+ to recreate filesystem's state in memory. An offset of the first record in the
+ metadata list is stored in the superblock at the beginning of the backing
+ file.
+
+* Most of the backing file is taken by data areas and blockmaps.
+ Since data blocks can be compressed and have different sizes,
+ single per-file data area can't be pre-allocated. That's why blockmaps are
+ needed in order to find a location and size of each data block in
+ the backing file. Each time a file is created, a corresponding block map is
+ allocated to store future offsets of data blocks.
+
+ Whenever a data block is given by data loader to incremental-fs:
+ - A data area with the given block is appended to the end of
+ the backing file.
+ - A record in the blockmap for the given block index is updated to reflect
+ its location, size, and compression algorithm.
+
+Important format details
+------------------------
+Ondisk structures are defined in the ``format.h`` file. They are all packed
+and use little-endian order.
+A backing file must start with ``incfs_super_block`` with ``s_magic`` field
+equal to 0x5346434e49 "INCFS".
+
+Metadata records:
+
+* ``incfs_inode`` - metadata record to declare a file or a directory.
+ ``incfs_inode.i_mode`` determents if it is a file
+ or a directory.
+* ``incfs_blockmap_entry`` - metadata record that specifies size and location
+ of a blockmap area for a given file. This area
+ contains an array of ``incfs_blockmap_entry``-s.
+* ``incfs_dir_action`` - metadata record that specifies changes made to a
+ to a directory structure, e.g. add or remove a hardlink.
+* ``incfs_md_header`` - header of a metadata record. It's always a part
+ of other structures and served purpose of metadata
+ bookkeeping.
+
+Other ondisk structures:
+
+* ``incfs_super_block`` - backing file header
+* ``incfs_blockmap_entry`` - a record in a blockmap area that describes size
+ and location of a data block.
+* Data blocks dont have any particular structure, they are written to the backing
+ file in a raw form as they come from a data loader.
+
+
+Backing file layout
+-------------------
+::
+
+ +-------------------------------------------+
+ | incfs_super_block |]---+
+ +-------------------------------------------+ |
+ | metadata |<---+
+ | incfs_inode |]---+
+ +-------------------------------------------+ |
+ ......................... |
+ +-------------------------------------------+ | metadata
+ +------->| blockmap area | | list links
+ | | [incfs_blockmap_entry] | |
+ | | [incfs_blockmap_entry] | |
+ | | [incfs_blockmap_entry] | |
+ | +--[| [incfs_blockmap_entry] | |
+ | | | [incfs_blockmap_entry] | |
+ | | | [incfs_blockmap_entry] | |
+ | | +-------------------------------------------+ |
+ | | ......................... |
+ | | +-------------------------------------------+ |
+ | | | metadata |<---+
+ +----|--[| incfs_blockmap |]---+
+ | +-------------------------------------------+ |
+ | ......................... |
+ | +-------------------------------------------+ |
+ +-->| data block | |
+ +-------------------------------------------+ |
+ ......................... |
+ +-------------------------------------------+ |
+ | metadata |<---+
+ | incfs_dir_action |
+ +-------------------------------------------+
+
+Unreferenced files and absence of garbage collection
+----------------------------------------------------
+Described file format can produce files that don't have any names for them in
+any directories. Incremental-fs takes no steps to prevent such situations or
+reclaim space occupied by such files in the backing file. If garbage collection
+is needed it has to be implemented as a separate userspace tool.
+
+
+Design alternatives
+===================
+
+Why isn't incremental-fs implemented via FUSE?
+----------------------------------------------
+TLDR: FUSE-based filesystems add 20-80% of performance overhead for target
+scenarios, and increase power use on mobile beyond acceptable limit
+for widespread deployment. A custom kernel filesystem is the way to overcome
+these limitations.
+
+From the theoretical side of things, FUSE filesystem adds some overhead to
+each filesystem operation that’s not handled by OS page cache:
+
+ * When an IO request arrives to FUSE driver (D), it puts it into a queue
+ that runs on a separate kernel thread
+ * Then another separate user-mode handler process (H) has to run,
+ potentially after a context switch, to read the request from the queue.
+ Reading the request adds a kernel-user mode transition to the handling.
+ * (H) sends the IO request to kernel to handle it on some underlying storage
+ filesystem. This adds a user-kernel and kernel-user mode transition
+ pair to the handling.
+ * (H) then responds to the FUSE request via a write(2) call.
+ Writing the response is another user-kernel mode transition.
+ * (D) needs to read the response from (H) when its kernel thread runs
+ and forward it to the user
+
+Together, the scenario adds 2 extra user-kernel-user mode transition pairs,
+and potentially has up to 3 additional context switches for the FUSE kernel
+thread and the user-mode handler to start running for each IO request on the
+filesystem.
+This overhead can vary from unnoticeable to unmanageable, depending on the
+target scenario. But it will always burn extra power via CPU staying longer
+in non-idle state, handling context switches and mode transitions.
+One important goal for the new filesystem is to be able to handle each page
+read separately on demand, because we don't want to wait and download more data
+than absolutely necessary. Thus readahead would need to be disabled completely.
+This increases the number of separate IO requests and the FUSE related overhead
+by almost 32x (128KB readahead limit vs 4KB individual block operations)
+
+For more info see a 2017 USENIX research paper:
+To FUSE or Not to FUSE: Performance of User-Space File Systems
+Bharath Kumar Reddy Vangoor, Stony Brook University;
+Vasily Tarasov, IBM Research-Almaden;
+Erez Zadok, Stony Brook University
+https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index c9558146ac58..a5f8e0eaff91 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -227,6 +227,7 @@ Code Seq#(hex) Include File Comments
'f' 00-0F fs/ocfs2/ocfs2_fs.h conflict!
'g' 00-0F linux/usb/gadgetfs.h
'g' 20-2F linux/usb/g_printer.h
+'g' 30-3F include/uapi/linux/incrementalfs.h
'h' 00-7F conflict! Charon filesystem
<mailto:zapman@interlan.net>
'h' 00-1F linux/hpet.h conflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index 5c38f21aee78..c92ad89ee5e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7630,6 +7630,13 @@ F: Documentation/hwmon/ina2xx
F: drivers/hwmon/ina2xx.c
F: include/linux/platform_data/ina2xx.h
+INCREMENTAL FILESYSTEM
+M: Eugene Zemtsov <ezemtsov@google.com>
+S: Supported
+F: fs/incfs/
+F: include/uapi/linux/incrementalfs.h
+F: Documentation/filesystems/incrementalfs.rst
+
INDUSTRY PACK SUBSYSTEM (IPACK)
M: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
M: Jens Taprogge <jens.taprogge@taprogge.org>
diff --git a/fs/Kconfig b/fs/Kconfig
index 3e6d3101f3ff..19f89c936209 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -119,6 +119,7 @@ source "fs/quota/Kconfig"
source "fs/autofs/Kconfig"
source "fs/fuse/Kconfig"
source "fs/overlayfs/Kconfig"
+source "fs/incfs/Kconfig"
menu "Caches"
diff --git a/fs/Makefile b/fs/Makefile
index 427fec226fae..08c6b827df1a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS) += autofs/
obj-$(CONFIG_ADFS_FS) += adfs/
obj-$(CONFIG_FUSE_FS) += fuse/
obj-$(CONFIG_OVERLAY_FS) += overlayfs/
+obj-$(CONFIG_INCREMENTAL_FS) += incfs/
obj-$(CONFIG_ORANGEFS_FS) += orangefs/
obj-$(CONFIG_UDF_FS) += udf/
obj-$(CONFIG_SUN_OPENPROMFS) += openpromfs/
diff --git a/fs/incfs/Kconfig b/fs/incfs/Kconfig
new file mode 100644
index 000000000000..a810131deed0
--- /dev/null
+++ b/fs/incfs/Kconfig
@@ -0,0 +1,10 @@
+config INCREMENTAL_FS
+ tristate "Incremental file system support"
+ depends on BLOCK && CRC32
+ help
+ Incremental FS is a read-only virtual file system that facilitates execution
+ of programs while their binaries are still being lazily downloaded over the
+ network, USB or pigeon post.
+
+ To compile this file system support as a module, choose M here: the
+ module will be called incrementalfs.
\ No newline at end of file
diff --git a/fs/incfs/Makefile b/fs/incfs/Makefile
new file mode 100644
index 000000000000..7892196c634f
--- /dev/null
+++ b/fs/incfs/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_INCREMENTAL_FS) += incrementalfs.o
+
+incrementalfs-y := main.o vfs.o
\ No newline at end of file
diff --git a/fs/incfs/main.c b/fs/incfs/main.c
new file mode 100644
index 000000000000..07e1952ede9e
--- /dev/null
+++ b/fs/incfs/main.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#define INCFS_CORE_VERSION 1
+
+extern struct file_system_type incfs_fs_type;
+
+static struct kobject *sysfs_root;
+
+static ssize_t version_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buff)
+{
+ return snprintf(buff, PAGE_SIZE, "%d\n", INCFS_CORE_VERSION);
+}
+
+static struct kobj_attribute version_attr = __ATTR_RO(version);
+
+static struct attribute *attributes[] = {
+ &version_attr.attr,
+ NULL,
+};
+
+static const struct attribute_group attr_group = {
+ .attrs = attributes,
+};
+
+static int __init init_sysfs(void)
+{
+ int res = 0;
+
+ sysfs_root = kobject_create_and_add(INCFS_NAME, fs_kobj);
+ if (!sysfs_root)
+ return -ENOMEM;
+
+ res = sysfs_create_group(sysfs_root, &attr_group);
+ if (res) {
+ kobject_put(sysfs_root);
+ sysfs_root = NULL;
+ }
+ return res;
+}
+
+static void cleanup_sysfs(void)
+{
+ if (sysfs_root) {
+ sysfs_remove_group(sysfs_root, &attr_group);
+ kobject_put(sysfs_root);
+ sysfs_root = NULL;
+ }
+}
+
+static int __init init_incfs_module(void)
+{
+ int err = 0;
+
+ err = init_sysfs();
+ if (err)
+ return err;
+
+ err = register_filesystem(&incfs_fs_type);
+ if (err)
+ cleanup_sysfs();
+
+ return err;
+}
+
+static void __exit cleanup_incfs_module(void)
+{
+ cleanup_sysfs();
+ unregister_filesystem(&incfs_fs_type);
+}
+
+module_init(init_incfs_module);
+module_exit(cleanup_incfs_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Eugene Zemtsov <ezemtsov@google.com>");
+MODULE_DESCRIPTION("Incremental File System");
diff --git a/fs/incfs/vfs.c b/fs/incfs/vfs.c
new file mode 100644
index 000000000000..2e71f0edf8a1
--- /dev/null
+++ b/fs/incfs/vfs.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+ const char *dev_name, void *data);
+static void kill_sb(struct super_block *sb);
+
+struct file_system_type incfs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = INCFS_NAME,
+ .mount = mount_fs,
+ .kill_sb = kill_sb,
+ .fs_flags = 0
+};
+
+static int fill_super_block(struct super_block *sb, void *data, int silent)
+{
+ return 0;
+}
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+ const char *dev_name, void *data)
+{
+ return mount_nodev(type, flags, data, fill_super_block);
+}
+
+static void kill_sb(struct super_block *sb)
+{
+ generic_shutdown_super(sb);
+}
+
diff --git a/include/uapi/linux/incrementalfs.h b/include/uapi/linux/incrementalfs.h
new file mode 100644
index 000000000000..5bcf66ac852b
--- /dev/null
+++ b/include/uapi/linux/incrementalfs.h
@@ -0,0 +1,189 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interface for Incremental FS.
+ *
+ * Incremental FS is special-purpose Linux virtual file system that allows
+ * execution of a program while its binary and resource files are still being
+ * lazily downloaded over the network, USB etc.
+ *
+ * Copyright 2019 Google LLC
+ */
+#ifndef _UAPI_LINUX_INCREMENTALFS_H
+#define _UAPI_LINUX_INCREMENTALFS_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* ===== constants ===== */
+#define INCFS_NAME "incremental-fs"
+#define INCFS_MAGIC_NUMBER (0x5346434e49ul)
+#define INCFS_DATA_FILE_BLOCK_SIZE 4096
+#define INCFS_HEADER_VER 1
+
+#define INCFS_MAX_FILES 1000
+#define INCFS_COMMAND_INODE 1
+#define INCFS_ROOT_INODE 2
+
+#define INCFS_IOCTL_BASE_CODE 'g'
+
+/* ===== ioctl requests on command file ===== */
+
+/* Make changes to the file system via incfs instructions. */
+#define INCFS_IOC_PROCESS_INSTRUCTION \
+ _IOWR(INCFS_IOCTL_BASE_CODE, 30, struct incfs_instruction)
+
+enum incfs_compression_alg { COMPRESSION_NONE = 0, COMPRESSION_LZ4 = 1 };
+
+/*
+ * Description of a pending read. A pending read - a read call by
+ * a userspace program for which the filesystem currently doesn't have data.
+ *
+ * This structs can be read from .cmd file to obtain a set of reads which
+ * are currently pending.
+ */
+struct incfs_pending_read_info {
+ /* Inode number of a file that is being read from. */
+ __aligned_u64 file_ino;
+
+ /* Index of a file block that is being read. */
+ __u32 block_index;
+
+ /* A serial number of this pending read. */
+ __u32 serial_number;
+};
+
+/*
+ * A struct to be written into a .cmd file to provide a data block for a file.
+ */
+struct incfs_new_data_block {
+ /* Inode number of a file this block belongs to. */
+ __aligned_u64 file_ino;
+
+ /* Index of a data block. */
+ __u32 block_index;
+
+ /* Length of data */
+ __u32 data_len;
+
+ /*
+ * A pointer ot an actual data for the block.
+ *
+ * Equivalent to: __u8 *data;
+ */
+ __aligned_u64 data;
+
+ /*
+ * Compression algorithm used to compress the data block.
+ * Values from enum incfs_compression_alg.
+ */
+ __u32 compression;
+
+ __u32 reserved1;
+
+ __aligned_u64 reserved2;
+};
+
+enum incfs_instruction_type {
+ INCFS_INSTRUCTION_NOOP = 0,
+ INCFS_INSTRUCTION_NEW_FILE = 1,
+ INCFS_INSTRUCTION_ADD_DIR_ENTRY = 3,
+ INCFS_INSTRUCTION_REMOVE_DIR_ENTRY = 4,
+};
+
+/*
+ * Create a new file or directory.
+ * Corresponds to INCFS_INSTRUCTION_NEW_FILE
+ */
+struct incfs_new_file_instruction {
+ /*
+ * [Out param. Populated by the kernel after ioctl.]
+ * Inode number of a newly created file.
+ */
+ __aligned_u64 ino_out;
+
+ /*
+ * Total size of the new file. Ignored if S_ISDIR(mode).
+ */
+ __aligned_u64 size;
+
+ /*
+ * File mode. Permissions and dir flag.
+ */
+ __u16 mode;
+
+ __u16 reserved1;
+
+ __u32 reserved2;
+
+ __aligned_u64 reserved3;
+
+ __aligned_u64 reserved4;
+
+ __aligned_u64 reserved5;
+
+ __aligned_u64 reserved6;
+
+ __aligned_u64 reserved7;
+};
+
+/*
+ * Create or remove a name (aka hardlink) for a file in a directory.
+ * Corresponds to
+ * INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+ * INCFS_INSTRUCTION_REMOVE_DIR_ENTRY
+ */
+struct incfs_dir_entry_instruction {
+ /* Inode number of a directory to add/remove a file to/from. */
+ __aligned_u64 dir_ino;
+
+ /* File to add/remove. */
+ __aligned_u64 child_ino;
+
+ /* Length of name field */
+ __u32 name_len;
+
+ __u32 reserved1;
+
+ /*
+ * A pointer to the name characters of a file to add/remove
+ *
+ * Equivalent to: char *name;
+ */
+ __aligned_u64 name;
+
+ __aligned_u64 reserved2;
+
+ __aligned_u64 reserved3;
+
+ __aligned_u64 reserved4;
+
+ __aligned_u64 reserved5;
+};
+
+/*
+ * An Incremental FS instruction is the way for userspace
+ * to
+ * - create files and directories
+ * - show and hide files in the directory structure
+ */
+struct incfs_instruction {
+ /* Populate with INCFS_HEADER_VER */
+ __u32 version;
+
+ /*
+ * Type - what this instruction actually does.
+ * Values from enum incfs_instruction_type.
+ */
+ __u32 type;
+
+ union {
+ struct incfs_new_file_instruction file;
+ struct incfs_dir_entry_instruction dir_entry;
+
+ /* Hard limit on the instruction body size in the future. */
+ __u8 reserved[64];
+ };
+};
+
+#endif /* _UAPI_LINUX_INCREMENTALFS_H */
--
2.21.0.593.g511ec345e18-goog
next prev parent reply other threads:[~2019-05-02 4:03 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-02 4:03 Initial patches for Incremental FS ezemtsov
2019-05-02 4:03 ` ezemtsov [this message]
2019-05-02 19:06 ` [PATCH 1/6] incfs: Add first files of incrementalfs Miklos Szeredi
2019-05-02 20:41 ` Randy Dunlap
2019-05-07 15:57 ` Jann Horn
2019-05-07 17:13 ` Greg KH
2019-05-07 17:18 ` Greg KH
2019-05-02 4:03 ` [PATCH 2/6] incfs: Backing file format ezemtsov
2019-05-02 4:03 ` [PATCH 3/6] incfs: Management of in-memory FS data structures ezemtsov
2019-05-02 4:03 ` [PATCH 4/6] incfs: Integration with VFS layer ezemtsov
2019-05-02 4:03 ` [PATCH 6/6] incfs: Integration tests for incremental-fs ezemtsov
2019-05-02 11:19 ` Initial patches for Incremental FS Amir Goldstein
2019-05-02 13:10 ` Theodore Ts'o
2019-05-02 13:26 ` Al Viro
2019-05-03 4:23 ` Eugene Zemtsov
2019-05-03 5:19 ` Amir Goldstein
2019-05-08 20:09 ` Eugene Zemtsov
2019-05-09 8:15 ` Amir Goldstein
[not found] ` <CAK8JDrEQnXTcCtAPkb+S4r4hORiKh_yX=0A0A=LYSVKUo_n4OA@mail.gmail.com>
2019-05-21 1:32 ` Yurii Zubrytskyi
2019-05-22 8:32 ` Miklos Szeredi
2019-05-22 17:25 ` Yurii Zubrytskyi
2019-05-23 4:25 ` Miklos Szeredi
2019-05-29 21:06 ` Yurii Zubrytskyi
2019-05-30 9:22 ` Miklos Szeredi
2019-05-30 22:45 ` Yurii Zubrytskyi
2019-05-31 9:02 ` Miklos Szeredi
2019-05-22 10:54 ` Amir Goldstein
2019-05-03 7:23 ` Richard Weinberger
2019-05-03 10:22 ` Miklos Szeredi
2019-05-02 13:46 ` Amir Goldstein
2019-05-02 18:16 ` Richard Weinberger
2019-05-02 18:33 ` Richard Weinberger
2019-05-02 13:47 ` J. R. Okajima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190502040331.81196-2-ezemtsov@google.com \
--to=ezemtsov@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).