* [PATCH/RFC] s390: Hypervisor File System
@ 2006-04-21 11:35 Michael Holzheu
2006-04-21 11:53 ` Pekka Enberg
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 11:35 UTC (permalink / raw)
To: linux-kernel; +Cc: schwidefsky, holzheu
From: holzheu@de.ibm.com
On zSeries machines there exists an interface which allows the operating
system to retrieve LPAR hypervisor accounting data. For example, it is
possible to get usage data for physical and virtual cpus. In order to
provide this information to user space programs, I implemented a new
virtual Linux file system named 'hypfs' using the Linux 2.6 libfs
framework. The name 'hypfs' stands for 'Hypervisor Filesystem'. All the
accounting information is put into different virtual files which can be
accessed from user space. All data is represented as ASCII strings.
When the file system is mounted the accounting information is retrieved
and a file system tree is created with the attribute files containing
the cpu information. The content of the files remains unchanged until a
new update is made. An update can be triggered from user space through
writing 'something' into a special purpose update file.
Regarding the mount point I am not sure what to use. Currently I would
suggest /sys/hypervisor. Any better ideas for the mount point?
Currently this is a zSeries specific solution but if other
virtualization platforms like e.g. Xen are interested in exporting
hypervisor data to user space, we can think of a more generic approach.
We create the following directory structure:
<mount-point>/
update
cpus/
<cpu-id>
type
mgmtime
<cpu-id>
...
hyp/
type
systems/
<lpar-name>
cpus/
<cpu-id>
type
mgmtime
cputime
onlinetime
<cpu-id>
...
<lpar-name>
cpus/
...
- update: Time of last update in seconds since 1970.
- cpus/: Directory for all physical cpus
- cpus/<cpu-id>/: Directory for one physical cpu. <cpu-id> is a logical
(decimal) cpu number
- cpus/<cpu-id>/type: Type name of physical zSeries cpu.
- cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds.
- hyp/: Directory for hypervisor information
- hyp/type: Type of hypervisor (currently only 'LPAR Hypervisor')
- systems/: Directory for all LPARs
- systems/<lpar-name>/: Directory for one LPAR.
- systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus for
one LPAR. <cpu-id> is a logical (decimal) cpu number.
- systems/<lpar-name>/cpus/<cpu-id>/type: Type name of the
logical zSeries cpu.
- systems/<lpar-name>/cpus/<cpu-id>/mgmtime: LPAR-management time.
Accumulated number of microseconds during which a physical
CPU was assigned to the logical cpu and the cpu time was
consumed by the hypervisor and was not provided to
the LPAR (LPAR overhead).
- systems/<lpar-name>/cpus/<cpu-id>/cputime:
Accumulated number of microseconds during which a physical CPU
was assigned to the logical cpu and the cpu time was consumed
by the LPAR.
- systems/<lpar-name>/cpus/<cpu-id>/onlinetime: Accumulated number of
microseconds during which the logical CPU has been online.
The update process is triggered when writing 'something' into the
'update' file at the top level hypfs directory. You can do this e.g.
with 'echo 1 > update'. During the update the whole directory structure
is deleted and built up again. The update file contains the time of the
last update in seconds since 1970.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
---
fs/Kconfig | 11
fs/Makefile | 1
fs/hypfs/Makefile | 7
fs/hypfs/hypfs.h | 43 +++
fs/hypfs/hypfs_diag.c | 628 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/hypfs/hypfs_diag.h | 16 +
fs/hypfs/inode.c | 469 +++++++++++++++++++++++++++++++++++++
7 files changed, 1175 insertions(+)
diff -urpN linux-2.6.16/fs/Kconfig linux-2.6.16-hypfs/fs/Kconfig
--- linux-2.6.16/fs/Kconfig 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/fs/Kconfig 2006-04-21 12:56:58.000000000 +0200
@@ -802,6 +802,17 @@ config PROC_VMCORE
help
Exports the dump image of crashed kernel in ELF format.
+config HYPFS_FS
+ bool "zseries hypervisor file system support"
+ depends on S390
+ default y
+ help
+ This is a virtual file system intended to provide accounting
+ information in a zSeries hypervisor environment.
+
+ If you don't know whether you need it, then you don't need it:
+ answer N.
+
config SYSFS
bool "sysfs file system support" if EMBEDDED
default y
diff -urpN linux-2.6.16/fs/Makefile linux-2.6.16-hypfs/fs/Makefile
--- linux-2.6.16/fs/Makefile 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/fs/Makefile 2006-04-21 12:56:58.000000000 +0200
@@ -54,6 +54,7 @@ obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
obj-$(CONFIG_JBD) += jbd/
obj-$(CONFIG_EXT2_FS) += ext2/
+obj-$(CONFIG_HYPFS_FS) += hypfs/
obj-$(CONFIG_CRAMFS) += cramfs/
obj-$(CONFIG_RAMFS) += ramfs/
obj-$(CONFIG_HUGETLBFS) += hugetlbfs/
diff -urpN linux-2.6.16/fs/hypfs/Makefile linux-2.6.16-hypfs/fs/hypfs/Makefile
--- linux-2.6.16/fs/hypfs/Makefile 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/fs/hypfs/Makefile 2006-04-21 12:56:58.000000000 +0200
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux hypfs filesystem routines.
+#
+
+obj-$(CONFIG_HYPFS_FS) += hypfs.o
+
+hypfs-objs := inode.o hypfs_diag.o
diff -urpN linux-2.6.16/fs/hypfs/hypfs.h linux-2.6.16-hypfs/fs/hypfs/hypfs.h
--- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.000000000 +0200
@@ -0,0 +1,43 @@
+/*
+ * fs/hypfs/hypfs.h
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#ifndef _HYPFS_H_
+#define _HYPFS_H_
+
+#include <linux/fs.h>
+#include <linux/types.h>
+
+#define REG_FILE_MODE 0440
+#define UPDATE_FILE_MODE 0660
+#define DIR_MODE 0550
+
+extern struct dentry *hypfs_mkdir(struct super_block *sb, struct dentry *parent,
+ const char *name);
+
+extern struct dentry *hypfs_create_u64(struct super_block *sb,
+ struct dentry *dir, const char *name,
+ __u64 value);
+
+extern struct dentry *hypfs_create_str(struct super_block *sb,
+ struct dentry *dir, const char *name,
+ char *string);
+
+static void inline remove_trailing_blanks(char *string)
+{
+ char *ptr;
+ for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
+ if (*ptr == ' ')
+ *ptr = 0;
+ else
+ break;
+ }
+}
+
+extern struct super_block *hypfs_sblk;
+
+#endif /* _HYPFS_H_ */
diff -urpN linux-2.6.16/fs/hypfs/hypfs_diag.c linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.c
--- linux-2.6.16/fs/hypfs/hypfs_diag.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.c 2006-04-21 12:56:58.000000000 +0200
@@ -0,0 +1,628 @@
+/*
+ * fs/hypfs/hypfs_diag.c
+ * Hypervisor filesystem for Linux on zSeries. Diag 204 and 224
+ * implementation.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <asm/ebcdic.h>
+#include "hypfs.h"
+
+#define LPAR_NAME_LEN 8 /* lpar name len in diag 204 data */
+#define TMP_SIZE 1024 /* size of temorary buffers */
+
+/* diag 204 subcodes */
+typedef enum {
+ SUBC_STIB4 = 4,
+ SUBC_RSI = 5,
+ SUBC_STIB6 = 6,
+ SUBC_STIB7 = 7
+} diag204_subc_t;
+
+/* The two available diag 204 data formats */
+typedef enum {
+ INFO_SIMPLE = 0,
+ INFO_EXT = 0x00010000
+} diag204_info_t;
+
+/* bit is set in flags, when physical cpu info is included in diag 204 data */
+#define LPAR_PHYS_FLG 0x80
+
+static char *diag224_cpu_names; /* diag 224 name table */
+static diag204_subc_t diag204_store_sc; /* used subcode for store */
+static diag204_info_t diag204_info_type; /* used diag 204 data format */
+
+/*
+ * DIAG 204 data structures and member access functions.
+ *
+ * Since we have two different diag 204 data formats for old and new zSeries
+ * machines, we do not access the structs directly, but use getter functions for
+ * each struct member instead. This should make the code more readable.
+ */
+
+/* Time information block */
+
+struct info_blk_hdr {
+ __u8 npar;
+ __u8 flags;
+ __u16 tslice;
+ __u16 phys_cpus;
+ __u16 this_part;
+ __u64 curtod;
+} __attribute__ ((packed));
+
+struct x_info_blk_hdr {
+ __u8 npar;
+ __u8 flags;
+ __u16 tslice;
+ __u16 phys_cpus;
+ __u16 this_part;
+ __u64 curtod1;
+ __u64 curtod2;
+ char reserved[40];
+} __attribute__ ((packed));
+
+static inline int info_blk_hdr__size(diag204_info_t type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct info_blk_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_info_blk_hdr);
+}
+
+static inline __u8 info_blk_hdr__npar(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->npar;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->npar;
+}
+
+static inline __u8 info_blk_hdr__flags(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->flags;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->flags;
+}
+
+static inline __u16 info_blk_hdr__phys_cpus(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->phys_cpus;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->phys_cpus;
+}
+
+/* Partition header */
+
+struct part_hdr {
+ __u8 pn;
+ __u8 cpus;
+ char reserved[6];
+ char part_name[LPAR_NAME_LEN];
+} __attribute__ ((packed));
+
+struct x_part_hdr {
+ __u8 pn;
+ __u8 cpus;
+ __u8 rcpus;
+ __u8 pflag;
+ __u32 mlu;
+ char part_name[LPAR_NAME_LEN];
+ char lpc_name[8];
+ char os_name[8];
+ __u64 online_cs;
+ __u64 online_es;
+ __u8 upid;
+ char reserved1[3];
+ __u32 group_mlu;
+ char group_name[8];
+ char reserved2[32];
+} __attribute__ ((packed));
+
+static inline int part_hdr__size(diag204_info_t type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct part_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_part_hdr);
+}
+
+static inline __u8 part_hdr__cpus(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct part_hdr *)hdr)->cpus;
+ else /* INFO_EXT */
+ return ((struct x_part_hdr *)hdr)->cpus;
+}
+
+static inline void part_hdr__part_name(diag204_info_t type, void *hdr,
+ char *name)
+{
+ if (type == INFO_SIMPLE)
+ memcpy(name, ((struct part_hdr *)hdr)->part_name,
+ LPAR_NAME_LEN);
+ else /* INFO_EXT */
+ memcpy(name, ((struct x_part_hdr *)hdr)->part_name,
+ LPAR_NAME_LEN);
+ EBCASC(name, LPAR_NAME_LEN);
+ name[LPAR_NAME_LEN] = 0;
+ remove_trailing_blanks(name);
+}
+
+struct cpu_info {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ __u8 cflag;
+ __u16 weight;
+ __u64 acc_time;
+ __u64 lp_time;
+} __attribute__ ((packed));
+
+struct x_cpu_info {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ __u8 cflag;
+ __u16 weight;
+ __u64 acc_time;
+ __u64 lp_time;
+ __u16 min_weight;
+ __u16 cur_weight;
+ __u16 max_weight;
+ char reseved2[2];
+ __u64 online_time;
+ __u64 wait_time;
+ __u32 pma_weight;
+ __u32 polar_weight;
+ char reserved3[40];
+} __attribute__ ((packed));
+
+/* CPU info block */
+
+static inline int cpu_info__size(diag204_info_t type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct cpu_info);
+ else /* INFO_EXT */
+ return sizeof(struct x_cpu_info);
+}
+
+static inline __u8 cpu_info__ctidx(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->ctidx;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->ctidx;
+}
+
+static inline __u16 cpu_info__cpu_addr(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->cpu_addr;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->cpu_addr;
+}
+
+static inline __u64 cpu_info__acc_time(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->acc_time;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->acc_time;
+}
+
+static inline __u64 cpu_info__lp_time(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->lp_time;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->lp_time;
+}
+
+static inline __u64 cpu_info__online_time(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return 0; /* online_time not available in simple info */
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->online_time;
+}
+
+/* Physical header */
+
+struct phys_hdr {
+ char reserved1[1];
+ __u8 cpus;
+ char reserved2[6];
+ char mgm_name[8];
+} __attribute__ ((packed));
+
+struct x_phys_hdr {
+ char reserved1[1];
+ __u8 cpus;
+ char reserved2[6];
+ char mgm_name[8];
+ char reserved3[80];
+} __attribute__ ((packed));
+
+static inline int phys_hdr__size(diag204_info_t type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct phys_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_phys_hdr);
+}
+
+static inline __u8 phys_hdr__cpus(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_hdr *)hdr)->cpus;
+ else /* INFO_EXT */
+ return ((struct x_phys_hdr *)hdr)->cpus;
+}
+
+/* Physical CPU info block */
+
+struct phys_cpu {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ char reserved2[3];
+ __u64 mgm_time;
+ char reserved3[8];
+} __attribute__ ((packed));
+
+struct x_phys_cpu {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ char reserved2[3];
+ __u64 mgm_time;
+ char reserved3[80];
+} __attribute__ ((packed));
+
+static inline int phys_cpu__size(diag204_info_t type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct phys_cpu);
+ else /* INFO_EXT */
+ return sizeof(struct x_phys_cpu);
+}
+
+static inline __u16 phys_cpu__cpu_addr(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->cpu_addr;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->cpu_addr;
+}
+
+static inline __u64 phys_cpu__mgm_time(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->mgm_time;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->mgm_time;
+}
+
+static inline __u64 phys_cpu__ctidx(diag204_info_t type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->ctidx;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->ctidx;
+}
+
+/* Diagnose 204 functions */
+
+static inline int diag204(unsigned long subcode, unsigned long size, void *addr)
+{
+ register unsigned long _subcode asm("0") = subcode;
+ register unsigned long _size asm("1") = size;
+
+ asm volatile (" diag %2,%0,0x204\n"
+ "0: \n" ".section __ex_table,\"a\"\n"
+#ifndef __s390x__
+ " .align 4\n"
+ " .long 0b,0b\n"
+#else
+ " .align 8\n"
+ " .quad 0b,0b\n"
+#endif
+ ".previous":"+d" (_subcode), "+d"(_size)
+ :"d"(addr)
+ :"memory");
+ if (_subcode)
+ return -1;
+ else
+ return _size;
+}
+
+/*
+ * diag204_probe() has to find out, which type of diagnose 204 implementation
+ * we have on our machine. Currently there are three possible scanarios:
+ * - subcode 4 + simple data format (only one page)
+ * - subcode 4-6 + extended data format
+ * - subcode 4-7 + extended data format
+ *
+ * Subcode 5 is used to retrieve the size of the data, provided by subcodes
+ * 6 and 7. Subcode 7 basically has the same function as subcode 6. In addition
+ * to subcode 6 it provides also information about secondary cpus.
+ * In order to get as much information as possible, we first try
+ * subcode 7, then 6 and if both fail, we use subcode 4.
+ */
+
+/* XXX What get we back from subcode 5? Length for subcode 6 or subcode 7 ? */
+/* XXX Use GFP_DMA für diag204 ? */
+
+static int diag204_probe(void)
+{
+ void *buf;
+ int pages, rc;
+
+ pages = diag204(SUBC_RSI | INFO_EXT, 0, 0);
+ if (pages > 0) {
+ if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL))) {
+ rc = -ENOMEM;
+ goto err_out;
+ }
+ if (diag204(SUBC_STIB7 | INFO_EXT, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB7;
+ diag204_info_type = INFO_EXT;
+ goto out;
+ }
+ if (diag204(SUBC_STIB6 | INFO_EXT, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB7;
+ diag204_info_type = INFO_EXT;
+ goto out;
+ }
+ kfree(buf);
+ }
+ if (!(buf = kmalloc(PAGE_SIZE, GFP_KERNEL))) {
+ rc = -ENOMEM;
+ goto err_out;
+ }
+ if (diag204(SUBC_STIB4 | INFO_SIMPLE, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB4;
+ diag204_info_type = INFO_SIMPLE;
+ goto out;
+ } else {
+ rc = -ENOSYS;
+ goto err_out;
+ }
+ out:
+ kfree(buf);
+ return 0;
+ err_out:
+ kfree(buf);
+ return rc;
+}
+
+static void *diag204_store(void)
+{
+ void *buf;
+ int pages;
+
+ if (diag204_store_sc == SUBC_STIB4)
+ pages = 1;
+ else
+ pages = diag204(SUBC_RSI | diag204_info_type, 0, 0);
+
+ if (pages < 0)
+ return ERR_PTR(-ENOSYS);
+
+ if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL)))
+ return ERR_PTR(-ENOMEM);
+
+ if (diag204(diag204_store_sc | diag204_info_type, pages, buf) < 0) {
+ kfree(buf);
+ return ERR_PTR(-ENOSYS);
+ }
+ return buf;
+}
+
+/* Diagnose 224 functions */
+
+static inline void diag224(void *ptr)
+{
+ asm volatile(" diag %0,%1,0x224\n"
+ : :"d" (0), "d"(ptr) : "memory");
+}
+
+static inline int diag224_get_name_table(void)
+{
+ /* memory must be below 2GB */
+ diag224_cpu_names = kmalloc(PAGE_SIZE, GFP_KERNEL | GFP_DMA);
+ if (!diag224_cpu_names)
+ return -ENOMEM;
+ diag224(diag224_cpu_names);
+ EBCASC(diag224_cpu_names + 16, (*diag224_cpu_names + 1) * 16);
+ return 0;
+}
+
+static inline void diag224_delete_name_table(void)
+{
+ kfree(diag224_cpu_names);
+}
+
+static int diag224_idx2name(int index, char *name)
+{
+ memcpy(name, diag224_cpu_names + ((index + 1) * 16), 16);
+ name[16] = 0;
+ remove_trailing_blanks(name);
+ return 0;
+}
+
+int diag_init(void)
+{
+ int rc;
+
+ if (diag204_probe()) {
+ printk(KERN_ERR "hypfs: diag 204 not working.");
+ return -ENODATA;
+ }
+ rc = diag224_get_name_table();
+ if (rc) {
+ diag224_delete_name_table();
+ printk(KERN_ERR "hypfs: could not get name table.\n");
+ }
+ return rc;
+}
+
+void diag_exit(void)
+{
+ diag224_delete_name_table();
+}
+
+/*
+ * Functions to create the directory structure
+ * *******************************************
+ */
+
+static int hypfs_create_cpu_files(struct super_block *sb,
+ struct dentry *cpus_dir, void *cpu_info)
+{
+ struct dentry *cpu_dir;
+ char buffer[TMP_SIZE];
+ void *rc;
+
+ snprintf(buffer, TMP_SIZE, "%d", cpu_info__cpu_addr(diag204_info_type,
+ cpu_info));
+ cpu_dir = hypfs_mkdir(sb, cpus_dir, buffer);
+ rc = hypfs_create_u64(sb, cpu_dir, "mgmtime",
+ cpu_info__acc_time(diag204_info_type, cpu_info) -
+ cpu_info__lp_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ rc = hypfs_create_u64(sb, cpu_dir, "cputime",
+ cpu_info__lp_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ if (diag204_info_type == INFO_EXT) {
+ rc = hypfs_create_u64(sb, cpu_dir, "onlinetime",
+ cpu_info__online_time(diag204_info_type,
+ cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ }
+ diag224_idx2name(cpu_info__ctidx(diag204_info_type, cpu_info), buffer);
+ rc = hypfs_create_str(sb, cpu_dir, "type", buffer);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ return 0;
+}
+
+static void *hypfs_create_lpar_files(struct super_block *sb,
+ struct dentry *systems_dir, void *part_hdr)
+{
+ struct dentry *cpus_dir;
+ struct dentry *lpar_dir;
+ char lpar_name[LPAR_NAME_LEN + 1];
+ void *cpu_info;
+ int i;
+
+ part_hdr__part_name(diag204_info_type, part_hdr, lpar_name);
+ lpar_name[LPAR_NAME_LEN] = 0;
+ lpar_dir = hypfs_mkdir(sb, systems_dir, lpar_name);
+ if (IS_ERR(lpar_dir))
+ return lpar_dir;
+ cpus_dir = hypfs_mkdir(sb, lpar_dir, "cpus");
+ if (IS_ERR(cpus_dir))
+ return cpus_dir;
+ cpu_info = part_hdr + part_hdr__size(diag204_info_type);
+ for (i = 0; i < part_hdr__cpus(diag204_info_type, part_hdr); i++) {
+ int rc;
+ rc = hypfs_create_cpu_files(sb, cpus_dir, cpu_info);
+ if (rc)
+ return ERR_PTR(rc);
+ cpu_info += cpu_info__size(diag204_info_type);
+ }
+ return cpu_info;
+}
+
+static int hypfs_create_phys_cpu_files(struct super_block *sb,
+ struct dentry *cpus_dir, void *cpu_info)
+{
+ struct dentry *cpu_dir;
+ char buffer[TMP_SIZE];
+ void *rc;
+
+ snprintf(buffer, TMP_SIZE, "%i", phys_cpu__cpu_addr(diag204_info_type,
+ cpu_info));
+ cpu_dir = hypfs_mkdir(sb, cpus_dir, buffer);
+ if (IS_ERR(cpu_dir))
+ return PTR_ERR(cpu_dir);
+ rc = hypfs_create_u64(sb, cpu_dir, "mgmtime",
+ phys_cpu__mgm_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ diag224_idx2name(phys_cpu__ctidx(diag204_info_type, cpu_info), buffer);
+ rc = hypfs_create_str(sb, cpu_dir, "type", buffer);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ return 0;
+}
+
+static void *hypfs_create_phys_files(struct super_block *sb,
+ struct dentry *parent_dir, void *phys_hdr)
+{
+ int i;
+ void *cpu_info;
+ struct dentry *cpus_dir;
+
+ cpus_dir = hypfs_mkdir(sb, parent_dir, "cpus");
+ if (IS_ERR(cpus_dir))
+ return cpus_dir;
+ cpu_info = phys_hdr + phys_hdr__size(diag204_info_type);
+ for (i = 0; i < phys_hdr__cpus(diag204_info_type, phys_hdr); i++) {
+ int rc;
+ rc = hypfs_create_phys_cpu_files(sb, cpus_dir, cpu_info);
+ if (rc)
+ return ERR_PTR(rc);
+ cpu_info += phys_cpu__size(diag204_info_type);
+ }
+ return cpu_info;
+}
+
+int diag_create_files(struct super_block *sb, struct dentry *root)
+{
+ struct dentry *systems_dir, *hyp_dir;
+ void *time_hdr, *part_hdr;
+ int i;
+ void *buffer, *rc;
+
+ buffer = diag204_store();
+ if (IS_ERR(buffer))
+ return PTR_ERR(buffer);
+
+ systems_dir = hypfs_mkdir(sb, root, "systems");
+ if (IS_ERR(systems_dir))
+ return PTR_ERR(systems_dir);
+ time_hdr = (struct x_info_blk_hdr *)buffer;
+ part_hdr = time_hdr + info_blk_hdr__size(diag204_info_type);
+ for (i = 0; i < info_blk_hdr__npar(diag204_info_type, time_hdr); i++) {
+ part_hdr = hypfs_create_lpar_files(sb, systems_dir, part_hdr);
+ if (IS_ERR(part_hdr))
+ return PTR_ERR(part_hdr);
+ }
+ if (info_blk_hdr__flags(diag204_info_type, time_hdr) & LPAR_PHYS_FLG) {
+ void *rc;
+ rc = hypfs_create_phys_files(sb, root, part_hdr);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ }
+ hyp_dir = hypfs_mkdir(sb, root, "hyp");
+ if (IS_ERR(hyp_dir))
+ return PTR_ERR(hyp_dir);
+ rc = hypfs_create_str(sb, hyp_dir, "type", "LPAR Hypervisor");
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ kfree(buffer);
+ return 0;
+}
diff -urpN linux-2.6.16/fs/hypfs/hypfs_diag.h linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.h
--- linux-2.6.16/fs/hypfs/hypfs_diag.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.h 2006-04-21 12:56:58.000000000 +0200
@@ -0,0 +1,16 @@
+/*
+ * fs/hypfs/hypfs_diag.h
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#ifndef _HYPFS_DIAG_H_
+#define _HYPFS_DIAG_H_
+
+extern int diag_init(void);
+extern void diag_exit(void);
+extern int diag_create_files(struct super_block *sb, struct dentry *root);
+
+#endif /* _HYPFS_DIAG_H_ */
diff -urpN linux-2.6.16/fs/hypfs/inode.c linux-2.6.16-hypfs/fs/hypfs/inode.c
--- linux-2.6.16/fs/hypfs/inode.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/fs/hypfs/inode.c 2006-04-21 12:56:58.000000000 +0200
@@ -0,0 +1,469 @@
+/*
+ * fs/hypfs/inode.c
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/vfs.h>
+#include <linux/pagemap.h>
+#include <linux/gfp.h>
+#include <linux/time.h>
+#include <linux/parser.h>
+#include <linux/sysfs.h>
+#include <linux/module.h>
+#include <asm/ebcdic.h>
+#include "hypfs.h"
+#include "hypfs_diag.h"
+
+#define HYPFS_MAGIC 0x687970 /* ASCII 'hyp' */
+#define TMP_SIZE 1024 /* size of temorary buffers */
+#define UPDATE_DATA_SIZE 32 /* size of update file inode buffer */
+
+static struct dentry *hypfs_create_update_file(struct super_block *sb,
+ struct dentry *dir);
+
+struct hypfs_sb_info {
+ int uid; /* uid used for files and dirs */
+ int gid; /* gid used for files and dirs */
+};
+
+struct super_block *hypfs_sblk;
+static struct dentry *update_file_dentry;
+static struct file_operations hypfs_file_ops;
+static struct file_system_type hypfs_type;
+static struct super_operations hypfs_s_ops;
+static time_t last_update_time = 0; /* update time in seconds since 1970 */
+static DECLARE_MUTEX(hypfs_lock); // XXX DEFINE_MUTEX in 2.6.16 !
+
+/* start of list of all dentries, which have to be deleted on update */
+static struct dentry *hypfs_last_dentry;
+
+static void hypfs_update_update(void)
+{
+ last_update_time = get_seconds();
+ snprintf((char *)update_file_dentry->d_inode->u.generic_ip,
+ UPDATE_DATA_SIZE, "%ld\n", last_update_time);
+ update_file_dentry->d_inode->i_size =
+ strlen((char *)update_file_dentry->d_inode->u.generic_ip);
+ update_file_dentry->d_inode->i_atime =
+ update_file_dentry->d_inode->i_mtime =
+ update_file_dentry->d_inode->i_ctime = CURRENT_TIME;
+}
+
+/* directory tree removal functions */
+
+static void hypfs_add_dentry(struct dentry *dentry)
+{
+ dentry->d_fsdata = hypfs_last_dentry;
+ hypfs_last_dentry = dentry;
+}
+
+static void hypfs_delete_tree(struct dentry *root)
+{
+ while (hypfs_last_dentry) {
+ struct dentry *parent, *next_dentry;
+
+ parent = hypfs_last_dentry->d_parent;
+ if (S_ISDIR(hypfs_last_dentry->d_inode->i_mode))
+ simple_rmdir(parent->d_inode, hypfs_last_dentry);
+ else
+ simple_unlink(parent->d_inode, hypfs_last_dentry);
+ d_delete(hypfs_last_dentry);
+ next_dentry = hypfs_last_dentry->d_fsdata;
+ dput(hypfs_last_dentry);
+ hypfs_last_dentry = next_dentry;
+ }
+}
+
+static struct inode *hypfs_make_inode(struct super_block *sb, int mode)
+{
+ struct inode *ret = new_inode(sb);
+
+ if (ret) {
+ ret->i_mode = mode;
+ ret->i_uid = ((struct hypfs_sb_info *)sb->s_fs_info)->uid;
+ ret->i_gid = ((struct hypfs_sb_info *)sb->s_fs_info)->gid;
+ ret->i_blksize = PAGE_CACHE_SIZE;
+ ret->i_blocks = 0;
+ ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
+ if (mode & S_IFDIR)
+ ret->i_nlink = 2;
+ else
+ ret->i_nlink = 1;
+ }
+ return ret;
+}
+
+static void hypfs_drop_inode(struct inode *inode)
+{
+ kfree((void *)inode->u.generic_ip);
+ generic_delete_inode(inode);
+}
+
+static int hypfs_open(struct inode *inode, struct file *filp)
+{
+ char *data = (char *)filp->f_dentry->d_inode->u.generic_ip;
+
+ if ((filp->f_mode & FMODE_WRITE) &&
+ (filp->f_dentry != update_file_dentry))
+ return -EPERM;
+ down(&hypfs_lock);
+ filp->private_data = kmalloc(strlen(data) + 1, GFP_KERNEL);
+ if (!filp->private_data) {
+ up(&hypfs_lock);
+ return -ENOMEM;
+ }
+ strcpy(filp->private_data, data);
+ up(&hypfs_lock);
+ return 0;
+}
+
+static ssize_t hypfs_aio_read(struct kiocb *iocb, __user char *buf,
+ size_t count, loff_t offset)
+{
+ char *data;
+ size_t len;
+ struct file *filp = iocb->ki_filp;
+
+ data = (char *)filp->private_data;
+ len = strlen(data);
+ if (offset > len) {
+ count = 0;
+ goto out;
+ }
+ if (count > len - offset)
+ count = len - offset;
+ if (copy_to_user(buf, data + offset, count)) {
+ count = -EFAULT;
+ goto out;
+ }
+ iocb->ki_pos += count;
+ file_accessed(filp);
+ out:
+ return count;
+}
+
+static ssize_t hypfs_aio_write(struct kiocb *iocb, const char __user *buf,
+ size_t count, loff_t pos)
+{
+ int rc;
+
+ down(&hypfs_lock);
+ if (last_update_time == get_seconds()) {
+ rc = -EBUSY;
+ goto out;
+ }
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
+ if (rc) {
+ printk(KERN_ERR "hypfs: Update failed\n");
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ goto out;
+ }
+ hypfs_update_update();
+ rc = count;
+ out:
+ up(&hypfs_lock);
+ return rc;
+}
+
+static int hypfs_release(struct inode *inode, struct file *filp)
+{
+ kfree(filp->private_data);
+ return 0;
+}
+
+enum { opt_uid, opt_gid, opt_err };
+
+static match_table_t hypfs_tokens = {
+ {opt_uid, "uid=%u"},
+ {opt_gid, "gid=%u"},
+ {opt_err, NULL}
+};
+
+static int hypfs_parse_options(char *options)
+{
+ char *str;
+ substring_t args[MAX_OPT_ARGS];
+
+ if (!options)
+ return 0;
+ while ((str = strsep(&options, ",")) != NULL) {
+ int token, option;
+ if (!*str)
+ continue;
+ token = match_token(str, hypfs_tokens, args);
+ switch (token) {
+ case opt_uid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->uid
+ = option;
+ break;
+ case opt_gid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->gid
+ = option;
+ break;
+ case opt_err:
+ default:
+ printk(KERN_ERR "hypfs: Unrecognized mount option "
+ "\"%s\" or missing value\n", str);
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+static int hypfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct inode *root_inode;
+ int rc = 0;
+ struct hypfs_sb_info *sbi;
+
+ sbi = kmalloc(sizeof(struct hypfs_sb_info), GFP_KERNEL);
+ if (!sbi)
+ return -ENOMEM;
+ sbi->uid = current->uid;
+ sbi->gid = current->gid;
+ sb->s_fs_info = sbi;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = HYPFS_MAGIC;
+ sb->s_op = &hypfs_s_ops;
+ if (hypfs_parse_options(data)) {
+ rc = -EINVAL;
+ goto err_alloc;
+ }
+ root_inode = hypfs_make_inode(sb, S_IFDIR | 0755);
+ if (!root_inode) {
+ rc = -ENOMEM;
+ goto err_alloc;
+ }
+ root_inode->i_op = &simple_dir_inode_operations;
+ root_inode->i_fop = &simple_dir_operations;
+ sb->s_root = d_alloc_root(root_inode);
+ if (!sb->s_root) {
+ rc = -ENOMEM;
+ goto err_inode;
+ }
+ hypfs_sblk = sb;
+ rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
+ if (rc)
+ goto err_tree;
+ update_file_dentry = hypfs_create_update_file(hypfs_sblk,
+ hypfs_sblk->s_root);
+ if (IS_ERR(update_file_dentry)) {
+ rc = PTR_ERR(update_file_dentry);
+ goto err_tree;
+ }
+ hypfs_update_update();
+ return 0;
+
+ err_tree:
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ dput(hypfs_sblk->s_root);
+ err_inode:
+ hypfs_drop_inode(root_inode);
+ err_alloc:
+ kfree(sbi);
+ return rc;
+}
+static void hypfs_kill_super(struct super_block *sb)
+{
+ kfree(sb->s_fs_info);
+ kill_litter_super(sb);
+}
+
+static struct super_block *hypfs_get_super(struct file_system_type *fst,
+ int flags, const char *devname,
+ void *data)
+{
+ return get_sb_single(fst, flags, data, hypfs_fill_super);
+}
+
+static struct dentry *hypfs_create_file(struct super_block *sb,
+ struct dentry *parent, const char *name,
+ char *data, mode_t mode)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ struct qstr qname;
+
+ qname.name = name;
+ qname.len = strlen(name);
+ qname.hash = full_name_hash(name, qname.len);
+ dentry = lookup_one_len(name, parent, strlen(name));
+ if (IS_ERR(dentry))
+ return ERR_PTR(-ENOMEM);
+ inode = hypfs_make_inode(sb, mode);
+ if (!inode) {
+ dput(dentry);
+ return ERR_PTR(-ENOMEM);
+ }
+ if (mode & S_IFREG) {
+ inode->i_fop = &hypfs_file_ops;
+ inode->i_size = strlen(data);
+ } else if (mode & S_IFDIR) {
+ inode->i_op = &simple_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ parent->d_inode->i_nlink++;
+ } else
+ BUG();
+ inode->u.generic_ip = data;
+ d_instantiate(dentry, inode);
+ return dentry;
+}
+
+struct dentry *hypfs_mkdir(struct super_block *sb, struct dentry *parent,
+ const char *name)
+{
+ struct dentry *dentry;
+
+ dentry = hypfs_create_file(sb, parent, name, NULL, S_IFDIR | DIR_MODE);
+ if (IS_ERR(dentry))
+ return dentry;
+ hypfs_add_dentry(dentry);
+ parent->d_inode->i_nlink++;
+ return dentry;
+}
+
+static struct dentry *hypfs_create_update_file(struct super_block *sb,
+ struct dentry *dir)
+{
+ char *buffer;
+ struct dentry *dentry;
+
+ buffer = kmalloc(UPDATE_DATA_SIZE, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ snprintf(buffer, UPDATE_DATA_SIZE, "%d\n", 0);
+ dentry = hypfs_create_file(sb, dir, "update", buffer,
+ S_IFREG | UPDATE_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ /*
+ * We do not put the update file on the 'delete' list with
+ * hypfs_add_dentry(), since it should not be removed when the tree
+ * is updated.
+ */
+ return dentry;
+}
+
+struct dentry *hypfs_create_u64(struct super_block *sb, struct dentry *dir,
+ const char *name, __u64 value)
+{
+ char *buffer;
+ char tmp[TMP_SIZE];
+ struct dentry *dentry;
+
+ snprintf(tmp, TMP_SIZE, "%lld\n", (unsigned long long int)value);
+ buffer = kmalloc(strlen(tmp) + 1, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ strcpy(buffer, tmp);
+ dentry =
+ hypfs_create_file(sb, dir, name, buffer, S_IFREG | REG_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ hypfs_add_dentry(dentry);
+ return dentry;
+}
+
+struct dentry *hypfs_create_str(struct super_block *sb, struct dentry *dir,
+ const char *name, char *string)
+{
+ char *buffer;
+ struct dentry *dentry;
+
+ buffer = kmalloc(strlen(string) + 2, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ sprintf(buffer, "%s\n", string);
+ dentry =
+ hypfs_create_file(sb, dir, name, buffer, S_IFREG | REG_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ hypfs_add_dentry(dentry);
+ return dentry;
+}
+
+static struct file_operations hypfs_file_ops = {
+ .open = hypfs_open,
+ .release = hypfs_release,
+ .read = do_sync_read,
+ .write = do_sync_write,
+ .aio_read = hypfs_aio_read,
+ .aio_write = hypfs_aio_write,
+};
+
+static struct file_system_type hypfs_type = {
+ .owner = THIS_MODULE,
+ .name = "hypfs",
+ .get_sb = hypfs_get_super,
+ .kill_sb = hypfs_kill_super,
+};
+
+static struct super_operations hypfs_s_ops = {
+ .statfs = simple_statfs,
+ .drop_inode = hypfs_drop_inode,
+};
+
+/*
+ * init and exit
+ * *************
+ */
+
+static decl_subsys(hypervisor, NULL, NULL);
+
+static int __init hypfs_init(void)
+{
+ int rc;
+
+ if (MACHINE_IS_VM) {
+ return -ENODATA;
+ }
+ if (diag_init()) {
+ printk(KERN_ERR "hypfs: diag init failed.\n");
+ return -ENODATA;
+ }
+ rc = subsystem_register(&hypervisor_subsys);
+ if (rc) {
+ diag_exit();
+ return rc;
+ }
+ rc = register_filesystem(&hypfs_type);
+ if (rc) {
+ subsystem_unregister(&hypervisor_subsys);
+ diag_exit();
+ return rc;
+ }
+ return 0;
+}
+
+static void __exit hypfs_exit(void)
+{
+ diag_exit();
+ unregister_filesystem(&hypfs_type);
+ subsystem_unregister(&hypervisor_subsys);
+}
+
+module_init(hypfs_init)
+module_exit(hypfs_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Michael Holzheu <holzheu@de.ibm.com>");
+MODULE_DESCRIPTION("zSeries Hypervisor Filesystem");
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 11:35 [PATCH/RFC] s390: Hypervisor File System Michael Holzheu
@ 2006-04-21 11:53 ` Pekka Enberg
2006-04-21 13:56 ` Michael Holzheu
2006-04-21 13:32 ` Pekka Enberg
2006-04-21 14:42 ` Pekka Enberg
2 siblings, 1 reply; 32+ messages in thread
From: Pekka Enberg @ 2006-04-21 11:53 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, schwidefsky
Hi Michael,
On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> On zSeries machines there exists an interface which allows the operating
> system to retrieve LPAR hypervisor accounting data.
[snip]
> fs/Kconfig | 11
> fs/Makefile | 1
> fs/hypfs/Makefile | 7
> fs/hypfs/hypfs.h | 43 +++
> fs/hypfs/hypfs_diag.c | 628 ++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/hypfs/hypfs_diag.h | 16 +
> fs/hypfs/inode.c | 469 +++++++++++++++++++++++++++++++++++++
> 7 files changed, 1175 insertions(+)
This filesystem makes no sense for anything but s390 so please put it
under arch/s390/ (following the convention set by cell specific
spufs). Thanks.
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 11:53 ` Pekka Enberg
@ 2006-04-21 13:56 ` Michael Holzheu
2006-04-21 14:55 ` Arnd Bergmann
0 siblings, 1 reply; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 13:56 UTC (permalink / raw)
To: Pekka Enberg; +Cc: linux-kernel, mschwid2, penberg
Hi Pekka,
> This filesystem makes no sense for anything but s390 so please put it
> under arch/s390/ (following the convention set by cell specific
> spufs). Thanks.
Agreed! As long as the filesystem is s390 specifc, we probably should put
it put it under arch/s390/hypfs. But in general one could imagine, that
also other hypervisor platforms want to have such a filesystem in the
future. In that case, we could make the filesystem more generic. E.g. we
could split it into the filesystem part and an architecture specific
backend which provides the hypervisor data. But you are right, until no
other platform supports it, we should keep it simple, leave it s390
specific and move it to arch/s390.
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 13:56 ` Michael Holzheu
@ 2006-04-21 14:55 ` Arnd Bergmann
2006-04-21 15:31 ` Michael Holzheu
0 siblings, 1 reply; 32+ messages in thread
From: Arnd Bergmann @ 2006-04-21 14:55 UTC (permalink / raw)
To: Michael Holzheu; +Cc: Pekka Enberg, linux-kernel, mschwid2, penberg
On Friday 21 April 2006 15:56, Michael Holzheu wrote:
> > This filesystem makes no sense for anything but s390 so please put it
> > under arch/s390/ (following the convention set by cell specific
> > spufs). Thanks.
>
> Agreed! As long as the filesystem is s390 specifc, we probably should put
> it put it under arch/s390/hypfs. But in general one could imagine, that
> also other hypervisor platforms want to have such a filesystem in the
> future. In that case, we could make the filesystem more generic. E.g. we
> could split it into the filesystem part and an architecture specific
> backend which provides the hypervisor data. But you are right, until no
> other platform supports it, we should keep it simple, leave it s390
> specific and move it to arch/s390.
>
There was some discussion about a sysfs hierarchy for hypervisor data
some time ago, see also http://lwn.net/Articles/176365/.
The idea was rather similar, just for other attributes. Maybe this
can be consolidated in some way.
Is there a strong reason why you made your own file system instead of
using subsystem_register to add /sys/hypervisor?
Arnd <><
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:55 ` Arnd Bergmann
@ 2006-04-21 15:31 ` Michael Holzheu
0 siblings, 0 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 15:31 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linux-kernel, mschwid2, penberg, Pekka Enberg
Hi Arnd,
Arnd Bergmann <arnd@arndb.de> wrote on 04/21/2006 04:55:46 PM:
> There was some discussion about a sysfs hierarchy for hypervisor data
> some time ago, see also http://lwn.net/Articles/176365/.
> The idea was rather similar, just for other attributes. Maybe this
> can be consolidated in some way.
Thank you for the hint! Maybe we can find someone from the xen team
to discuss this.
> Is there a strong reason why you made your own file system instead of
> using subsystem_register to add /sys/hypervisor?
The main argument was that in sysfs normally operating system local
data is provided like data for busses or devices. The hypervisor data
does not belong to one machine. It is data for several machines within
a hypervisor environment.
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 11:35 [PATCH/RFC] s390: Hypervisor File System Michael Holzheu
2006-04-21 11:53 ` Pekka Enberg
@ 2006-04-21 13:32 ` Pekka Enberg
2006-04-21 14:08 ` Michael Holzheu
2006-04-21 14:42 ` Pekka Enberg
2 siblings, 1 reply; 32+ messages in thread
From: Pekka Enberg @ 2006-04-21 13:32 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, schwidefsky
On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> When the file system is mounted the accounting information is retrieved
> and a file system tree is created with the attribute files containing
> the cpu information. The content of the files remains unchanged until a
> new update is made. An update can be triggered from user space through
> writing 'something' into a special purpose update file.
Why are you caching the data?
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 13:32 ` Pekka Enberg
@ 2006-04-21 14:08 ` Michael Holzheu
2006-04-21 15:38 ` Pekka Enberg
0 siblings, 1 reply; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 14:08 UTC (permalink / raw)
To: Pekka Enberg; +Cc: linux-kernel, mschwid2, penberg
> > When the file system is mounted the accounting information is retrieved
> > and a file system tree is created with the attribute files containing
> > the cpu information. The content of the files remains unchanged until a
> > new update is made. An update can be triggered from user space through
> > writing 'something' into a special purpose update file.
>
> Why are you caching the data?
We had two problems:
The first one was, that the hardware interface for getting the data is very
expensive. We always get back the data for all LPARs and all cpus.
Therefore
we do not want to get the data every time an attribute file is read.
The second problem was, that we want to provide a consistent snapshot
of the hypervisor data for the user space application. The mechanism to
achieve that is that the application first reads the update file. If
the data is too old, it triggers an update. Then we get the new
hypervisor data and update the whole filesystem tree. Then the
application can read file for file all data it needs. After all data
is read and the application wants to have a consistent snapshot,
it has to read the update attribute again. If the time value has changed,
another process has triggered another update in the meantime.
In this case the application has repeat the whole process.
Currently we allow only one update per second.
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:08 ` Michael Holzheu
@ 2006-04-21 15:38 ` Pekka Enberg
2006-04-21 16:40 ` Martin Schwidefsky
0 siblings, 1 reply; 32+ messages in thread
From: Pekka Enberg @ 2006-04-21 15:38 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, mschwid2, penberg
Hi Michael,
On Fri, 2006-04-21 at 16:08 +0200, Michael Holzheu wrote:
> The first one was, that the hardware interface for getting the data is
> very expensive. We always get back the data for all LPARs and all
> cpus. Therefore we do not want to get the data every time an attribute
> file is read.
You can cache the results in userspace. So I don't see this one as an
argument for making the kernel more complex.
On Fri, 2006-04-21 at 16:08 +0200, Michael Holzheu wrote:
> The second problem was, that we want to provide a consistent snapshot
> of the hypervisor data for the user space application.
How do you ensure consistency now? And how is that different from an
userspace process reading the whole directory hierarchy into cache in
one go?
The update-on-write to special file thing seems bit strange to me. What
if two processes ask for it at the same time?
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 15:38 ` Pekka Enberg
@ 2006-04-21 16:40 ` Martin Schwidefsky
2006-04-25 14:04 ` Pekka Enberg
0 siblings, 1 reply; 32+ messages in thread
From: Martin Schwidefsky @ 2006-04-21 16:40 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Michael Holzheu, linux-kernel, mschwid2, penberg
On Fri, 2006-04-21 at 18:38 +0300, Pekka Enberg wrote:
> Hi Michael,
>
> On Fri, 2006-04-21 at 16:08 +0200, Michael Holzheu wrote:
> > The first one was, that the hardware interface for getting the data is
> > very expensive. We always get back the data for all LPARs and all
> > cpus. Therefore we do not want to get the data every time an attribute
> > file is read.
>
> You can cache the results in userspace. So I don't see this one as an
> argument for making the kernel more complex.
Unfortunately we can't cache them in user-space. To read a file you do
open/read/close an the attributes you are interested in. If we do not
cache the content of the attributes in the kernel we have to issue a
diag204 for each open/read/close sequence. But the data that is deliverd
by diag204 will have changed between the two calls. In addition the
number of active lpars and the lpar names might have changed.
> On Fri, 2006-04-21 at 16:08 +0200, Michael Holzheu wrote:
> > The second problem was, that we want to provide a consistent snapshot
> > of the hypervisor data for the user space application.
>
> How do you ensure consistency now? And how is that different from an
> userspace process reading the whole directory hierarchy into cache in
> one go?
We ensure the consistency by the following sequence:
1) read the update attribute to get the timestamp of the last update
2) decide if the data is new enough, if it is to old write something to
the update attribute and restart with 1)
3) read the information you are interested in
4) read the update attribute again and compare it with the one read in
step 1). If the timestamp has changed restart the whole thing with 1)
> The update-on-write to special file thing seems bit strange to me. What
> if two processes ask for it at the same time?
Yes, we had that discussion as well. Any sensible suggestion how to do
it in a more clever way would be greatly appreciated. We haven't found
something better so far.
--
blue skies,
Martin.
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 16:40 ` Martin Schwidefsky
@ 2006-04-25 14:04 ` Pekka Enberg
0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 14:04 UTC (permalink / raw)
To: schwidefsky; +Cc: Michael Holzheu, linux-kernel, mschwid2, linux-fsdevel
Hi Martin,
On Fri, 2006-04-21 at 18:38 +0300, Pekka Enberg wrote:
> > The update-on-write to special file thing seems bit strange to me. What
> > if two processes ask for it at the same time?
On Fri, 2006-04-21 at 18:40 +0200, Martin Schwidefsky wrote:
> Yes, we had that discussion as well. Any sensible suggestion how to do
> it in a more clever way would be greatly appreciated. We haven't found
> something better so far.
The only way I can think of is via s_ops->remount_fs. That's bit safer
anyway, as sb->s_lock protects us from concurrent updates. I am cc'ing
fsdevel to see if they can come up with a better suggestion.
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 11:35 [PATCH/RFC] s390: Hypervisor File System Michael Holzheu
2006-04-21 11:53 ` Pekka Enberg
2006-04-21 13:32 ` Pekka Enberg
@ 2006-04-21 14:42 ` Pekka Enberg
2006-04-21 14:59 ` Michael Holzheu
` (3 more replies)
2 siblings, 4 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-21 14:42 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, schwidefsky
Hi Michael,
I have included some review comments below.
Pekka
On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> diff -urpN linux-2.6.16/fs/hypfs/hypfs.h linux-2.6.16-hypfs/fs/hypfs/hypfs.h
> --- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.000000000 +0200
> +static void inline remove_trailing_blanks(char *string)
> +{
> + char *ptr;
> + for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
> + if (*ptr == ' ')
> + *ptr = 0;
> + else
> + break;
> + }
> +}
Please consider moving this to lib/string.c and perhaps renaming it to
strstrip().
> diff -urpN linux-2.6.16/fs/hypfs/hypfs_diag.c linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.c
> --- linux-2.6.16/fs/hypfs/hypfs_diag.c 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.c 2006-04-21 12:56:58.000000000 +0200
> +/* diag 204 subcodes */
> +typedef enum {
> + SUBC_STIB4 = 4,
> + SUBC_RSI = 5,
> + SUBC_STIB6 = 6,
> + SUBC_STIB7 = 7
> +} diag204_subc_t;
> +
> +/* The two available diag 204 data formats */
> +typedef enum {
> + INFO_SIMPLE = 0,
> + INFO_EXT = 0x00010000
> +} diag204_info_t;
Please kill the typedefs.
> +/* Time information block */
> +
> +struct info_blk_hdr {
> + __u8 npar;
> + __u8 flags;
> + __u16 tslice;
> + __u16 phys_cpus;
> + __u16 this_part;
> + __u64 curtod;
> +} __attribute__ ((packed));
> +
> +struct x_info_blk_hdr {
> + __u8 npar;
> + __u8 flags;
> + __u16 tslice;
> + __u16 phys_cpus;
> + __u16 this_part;
> + __u64 curtod1;
> + __u64 curtod2;
> + char reserved[40];
> +} __attribute__ ((packed));
Couldn't you use endianess annotated types for these?
> +/* Partition header */
> +
> +struct part_hdr {
> + __u8 pn;
> + __u8 cpus;
> + char reserved[6];
> + char part_name[LPAR_NAME_LEN];
> +} __attribute__ ((packed));
> +
> +struct x_part_hdr {
> + __u8 pn;
> + __u8 cpus;
> + __u8 rcpus;
> + __u8 pflag;
> + __u32 mlu;
> + char part_name[LPAR_NAME_LEN];
> + char lpc_name[8];
> + char os_name[8];
> + __u64 online_cs;
> + __u64 online_es;
> + __u8 upid;
> + char reserved1[3];
> + __u32 group_mlu;
> + char group_name[8];
> + char reserved2[32];
> +} __attribute__ ((packed));
Same here.
> +
> +static inline int part_hdr__size(diag204_info_t type)
Please drop the extra underscore from function name.
> +static inline __u8 part_hdr__cpus(diag204_info_t type, void *hdr)
Ditto. Appears in various other places as well.
> +struct cpu_info {
> + __u16 cpu_addr;
> + char reserved1[2];
> + __u8 ctidx;
> + __u8 cflag;
> + __u16 weight;
> + __u64 acc_time;
> + __u64 lp_time;
> +} __attribute__ ((packed));
> +
> +struct x_cpu_info {
> + __u16 cpu_addr;
> + char reserved1[2];
> + __u8 ctidx;
> + __u8 cflag;
> + __u16 weight;
> + __u64 acc_time;
> + __u64 lp_time;
> + __u16 min_weight;
> + __u16 cur_weight;
> + __u16 max_weight;
> + char reseved2[2];
> + __u64 online_time;
> + __u64 wait_time;
> + __u32 pma_weight;
> + __u32 polar_weight;
> + char reserved3[40];
> +} __attribute__ ((packed));
Endianess annotated types?
> +/* Physical header */
> +
> +struct phys_hdr {
> + char reserved1[1];
> + __u8 cpus;
> + char reserved2[6];
> + char mgm_name[8];
> +} __attribute__ ((packed));
> +
> +struct x_phys_hdr {
> + char reserved1[1];
> + __u8 cpus;
> + char reserved2[6];
> + char mgm_name[8];
> + char reserved3[80];
> +} __attribute__ ((packed));
Here.
> +/* Physical CPU info block */
> +
> +struct phys_cpu {
> + __u16 cpu_addr;
> + char reserved1[2];
> + __u8 ctidx;
> + char reserved2[3];
> + __u64 mgm_time;
> + char reserved3[8];
> +} __attribute__ ((packed));
> +
> +struct x_phys_cpu {
> + __u16 cpu_addr;
> + char reserved1[2];
> + __u8 ctidx;
> + char reserved2[3];
> + __u64 mgm_time;
> + char reserved3[80];
> +} __attribute__ ((packed));
Here.
> +static inline int diag204(unsigned long subcode, unsigned long size, void *addr)
> +{
> + register unsigned long _subcode asm("0") = subcode;
> + register unsigned long _size asm("1") = size;
> +
> + asm volatile (" diag %2,%0,0x204\n"
> + "0: \n" ".section __ex_table,\"a\"\n"
> +#ifndef __s390x__
> + " .align 4\n"
> + " .long 0b,0b\n"
> +#else
> + " .align 8\n"
> + " .quad 0b,0b\n"
> +#endif
> + ".previous":"+d" (_subcode), "+d"(_size)
> + :"d"(addr)
> + :"memory");
Please note that the above is a big clue for this fs to go into arch/s390/.
> +static int diag204_probe(void)
> +{
> + void *buf;
> + int pages, rc;
> +
> + pages = diag204(SUBC_RSI | INFO_EXT, 0, 0);
> + if (pages > 0) {
> + if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL))) {
Please move the assignment out of the if expression for readability.
> + rc = -ENOMEM;
> + goto err_out;
> + }
> + if (diag204(SUBC_STIB7 | INFO_EXT, pages, buf) >= 0) {
> + diag204_store_sc = SUBC_STIB7;
> + diag204_info_type = INFO_EXT;
> + goto out;
> + }
> + if (diag204(SUBC_STIB6 | INFO_EXT, pages, buf) >= 0) {
> + diag204_store_sc = SUBC_STIB7;
> + diag204_info_type = INFO_EXT;
> + goto out;
> + }
> + kfree(buf);
> + }
> + if (!(buf = kmalloc(PAGE_SIZE, GFP_KERNEL))) {
Same here.
> + rc = -ENOMEM;
> + goto err_out;
> + }
> + if (diag204(SUBC_STIB4 | INFO_SIMPLE, pages, buf) >= 0) {
> + diag204_store_sc = SUBC_STIB4;
> + diag204_info_type = INFO_SIMPLE;
> + goto out;
> + } else {
> + rc = -ENOSYS;
> + goto err_out;
> + }
> + out:
> + kfree(buf);
> + return 0;
> + err_out:
> + kfree(buf);
> + return rc;
Please drop the duplicate error path.
> +static void *diag204_store(void)
> +{
> + void *buf;
> + int pages;
> +
> + if (diag204_store_sc == SUBC_STIB4)
> + pages = 1;
> + else
> + pages = diag204(SUBC_RSI | diag204_info_type, 0, 0);
> +
> + if (pages < 0)
> + return ERR_PTR(-ENOSYS);
> +
> + if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL)))
And here.
> diff -urpN linux-2.6.16/fs/hypfs/inode.c linux-2.6.16-hypfs/fs/hypfs/inode.c
> --- linux-2.6.16/fs/hypfs/inode.c 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.16-hypfs/fs/hypfs/inode.c 2006-04-21 12:56:58.000000000 +0200
> +static DECLARE_MUTEX(hypfs_lock); // XXX DEFINE_MUTEX in 2.6.16 !
Yes, please!
> +
> +/* start of list of all dentries, which have to be deleted on update */
> +static struct dentry *hypfs_last_dentry;
> +
> +static void hypfs_update_update(void)
> +{
> + last_update_time = get_seconds();
> + snprintf((char *)update_file_dentry->d_inode->u.generic_ip,
> + UPDATE_DATA_SIZE, "%ld\n", last_update_time);
> + update_file_dentry->d_inode->i_size =
> + strlen((char *)update_file_dentry->d_inode->u.generic_ip);
> + update_file_dentry->d_inode->i_atime =
> + update_file_dentry->d_inode->i_mtime =
> + update_file_dentry->d_inode->i_ctime = CURRENT_TIME;
Please introduce local variable for update_file_dentry->d_inode for
readability. Perhaps for generic_ip as well.
> +}
> +
> +/* directory tree removal functions */
> +
> +static void hypfs_add_dentry(struct dentry *dentry)
> +{
> + dentry->d_fsdata = hypfs_last_dentry;
> + hypfs_last_dentry = dentry;
> +}
> +
> +static void hypfs_delete_tree(struct dentry *root)
> +{
> + while (hypfs_last_dentry) {
> + struct dentry *parent, *next_dentry;
> +
> + parent = hypfs_last_dentry->d_parent;
> + if (S_ISDIR(hypfs_last_dentry->d_inode->i_mode))
> + simple_rmdir(parent->d_inode, hypfs_last_dentry);
> + else
> + simple_unlink(parent->d_inode, hypfs_last_dentry);
> + d_delete(hypfs_last_dentry);
> + next_dentry = hypfs_last_dentry->d_fsdata;
> + dput(hypfs_last_dentry);
> + hypfs_last_dentry = next_dentry;
> + }
> +}
> +
> +static struct inode *hypfs_make_inode(struct super_block *sb, int mode)
> +{
> + struct inode *ret = new_inode(sb);
> +
> + if (ret) {
> + ret->i_mode = mode;
> + ret->i_uid = ((struct hypfs_sb_info *)sb->s_fs_info)->uid;
> + ret->i_gid = ((struct hypfs_sb_info *)sb->s_fs_info)->gid;
Reduce casting by introducing a local variable for hypfs_sb_info.
> + ret->i_blksize = PAGE_CACHE_SIZE;
> + ret->i_blocks = 0;
> + ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
> + if (mode & S_IFDIR)
> + ret->i_nlink = 2;
> + else
> + ret->i_nlink = 1;
> + }
> + return ret;
> +}
> +
> +static void hypfs_drop_inode(struct inode *inode)
> +{
> + kfree((void *)inode->u.generic_ip);
Please drop the redundant cast. Consider moving this to inode_ops->clear_inode.
> + generic_delete_inode(inode);
> +}
> +static int hypfs_parse_options(char *options)
> +{
> + char *str;
> + substring_t args[MAX_OPT_ARGS];
> +
> + if (!options)
> + return 0;
> + while ((str = strsep(&options, ",")) != NULL) {
> + int token, option;
> + if (!*str)
> + continue;
> + token = match_token(str, hypfs_tokens, args);
> + switch (token) {
> + case opt_uid:
> + if (match_int(&args[0], &option))
> + return -EINVAL;
> + ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->uid
> + = option;
> + break;
> + case opt_gid:
> + if (match_int(&args[0], &option))
> + return -EINVAL;
> + ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->gid
> + = option;
Please reduce casting by introducing a local variable for hypfs_sb_info.
> +static int hypfs_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct inode *root_inode;
> + int rc = 0;
> + struct hypfs_sb_info *sbi;
> +
> + sbi = kmalloc(sizeof(struct hypfs_sb_info), GFP_KERNEL);
> + if (!sbi)
> + return -ENOMEM;
> + sbi->uid = current->uid;
> + sbi->gid = current->gid;
> + sb->s_fs_info = sbi;
> + sb->s_blocksize = PAGE_CACHE_SIZE;
> + sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
> + sb->s_magic = HYPFS_MAGIC;
> + sb->s_op = &hypfs_s_ops;
> + if (hypfs_parse_options(data)) {
> + rc = -EINVAL;
> + goto err_alloc;
> + }
> + root_inode = hypfs_make_inode(sb, S_IFDIR | 0755);
> + if (!root_inode) {
> + rc = -ENOMEM;
> + goto err_alloc;
> + }
> + root_inode->i_op = &simple_dir_inode_operations;
> + root_inode->i_fop = &simple_dir_operations;
> + sb->s_root = d_alloc_root(root_inode);
> + if (!sb->s_root) {
> + rc = -ENOMEM;
> + goto err_inode;
> + }
> + hypfs_sblk = sb;
> + rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
> + if (rc)
> + goto err_tree;
> + update_file_dentry = hypfs_create_update_file(hypfs_sblk,
> + hypfs_sblk->s_root);
> + if (IS_ERR(update_file_dentry)) {
> + rc = PTR_ERR(update_file_dentry);
> + goto err_tree;
> + }
> + hypfs_update_update();
> + return 0;
> +
> + err_tree:
> + hypfs_delete_tree(hypfs_sblk->s_root);
> + dput(hypfs_sblk->s_root);
> + err_inode:
> + hypfs_drop_inode(root_inode);
You should use iput() here.
> + err_alloc:
> + kfree(sbi);
> + return rc;
> +}
> +static void hypfs_kill_super(struct super_block *sb)
> +{
> + kfree(sb->s_fs_info);
Please use super_operations->put_super instead for freeing s_fs_info.
> + kill_litter_super(sb);
> +/*
> + * init and exit
> + * *************
> + */
Consider dropping the above useless comment.
> +
> +static decl_subsys(hypervisor, NULL, NULL);
> +
> +static int __init hypfs_init(void)
> +{
> + int rc;
> +
> + if (MACHINE_IS_VM) {
> + return -ENODATA;
> + }
> + if (diag_init()) {
> + printk(KERN_ERR "hypfs: diag init failed.\n");
> + return -ENODATA;
> + }
> + rc = subsystem_register(&hypervisor_subsys);
> + if (rc) {
> + diag_exit();
> + return rc;
> + }
> + rc = register_filesystem(&hypfs_type);
> + if (rc) {
> + subsystem_unregister(&hypervisor_subsys);
> + diag_exit();
> + return rc;
Please use gotos for error handling here to reduce duplication.
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:42 ` Pekka Enberg
@ 2006-04-21 14:59 ` Michael Holzheu
2006-04-21 15:41 ` Pekka Enberg
2006-04-21 15:18 ` Jörn Engel
` (2 subsequent siblings)
3 siblings, 1 reply; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 14:59 UTC (permalink / raw)
To: Pekka Enberg; +Cc: linux-kernel, mschwid2, penberg
Hi Pekka,
> I have included some review comments below.
First of all thank you very much for the code review! You suggestions
sound reasonable! I will post an updated patch on Monday!
[snip]
> > +struct x_info_blk_hdr {
> > + __u8 npar;
> > + __u8 flags;
> > + __u16 tslice;
> > + __u16 phys_cpus;
> > + __u16 this_part;
> > + __u64 curtod1;
> > + __u64 curtod2;
> > + char reserved[40];
> > +} __attribute__ ((packed));
>
> Couldn't you use endianess annotated types for these?
>
What are "endianess annotated types" ?
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:59 ` Michael Holzheu
@ 2006-04-21 15:41 ` Pekka Enberg
0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-21 15:41 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, mschwid2, penberg
On Fri, 2006-04-21 at 16:59 +0200, Michael Holzheu wrote:
> > > +struct x_info_blk_hdr {
> > > + __u8 npar;
> > > + __u8 flags;
> > > + __u16 tslice;
> > > + __u16 phys_cpus;
> > > + __u16 this_part;
> > > + __u64 curtod1;
> > > + __u64 curtod2;
> > > + char reserved[40];
> > > +} __attribute__ ((packed));
> >
> > Couldn't you use endianess annotated types for these?
>
> What are "endianess annotated types" ?
For example, __le16 and __be16 (search for __bitwise in
<linux/types.h>). You can use sparse to check for endianess erros in
your code if you us them.
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:42 ` Pekka Enberg
2006-04-21 14:59 ` Michael Holzheu
@ 2006-04-21 15:18 ` Jörn Engel
2006-04-21 15:36 ` Michael Holzheu
2006-04-21 22:30 ` Ingo Oeser
2006-04-24 17:19 ` Michael Holzheu
3 siblings, 1 reply; 32+ messages in thread
From: Jörn Engel @ 2006-04-21 15:18 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Michael Holzheu, linux-kernel, schwidefsky
On Fri, 21 April 2006 17:42:28 +0300, Pekka Enberg wrote:
> On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> > diff -urpN linux-2.6.16/fs/hypfs/hypfs.h linux-2.6.16-hypfs/fs/hypfs/hypfs.h
> > --- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
> > +++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.000000000 +0200
> > +static void inline remove_trailing_blanks(char *string)
> > +{
> > + char *ptr;
> > + for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
> > + if (*ptr == ' ')
> > + *ptr = 0;
> > + else
> > + break;
> > + }
> > +}
>
> Please consider moving this to lib/string.c and perhaps renaming it to
> strstrip().
If you do that, could you strip all whitespace? I have a special
function to kill a single newline, if present. Looks to me like those
two could be combined.
Jörn
--
The wise man seeks everything in himself; the ignorant man tries to get
everything from somebody else.
-- unknown
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 15:18 ` Jörn Engel
@ 2006-04-21 15:36 ` Michael Holzheu
2006-04-21 15:46 ` Jörn Engel
0 siblings, 1 reply; 32+ messages in thread
From: Michael Holzheu @ 2006-04-21 15:36 UTC (permalink / raw)
To: Jörn Engel; +Cc: linux-kernel, mschwid2, Pekka Enberg
Hi Jörn
Jörn Engel <joern@wohnheim.fh-wedel.de> wrote on 04/21/2006 05:18:00 PM:
> On Fri, 21 April 2006 17:42:28 +0300, Pekka Enberg wrote:
> > On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> > > diff -urpN linux-2.6.16/fs/hypfs/hypfs.h linux-2.6.16-
> hypfs/fs/hypfs/hypfs.h
> > > --- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000
+0100
> > > +++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.
> 000000000 +0200
> > > +static void inline remove_trailing_blanks(char *string)
> > > +{
> > > + char *ptr;
> > > + for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
> > > + if (*ptr == ' ')
> > > + *ptr = 0;
> > > + else
> > > + break;
> > > + }
> > > +}
> >
> > Please consider moving this to lib/string.c and perhaps renaming it to
> > strstrip().
>
> If you do that, could you strip all whitespace? I have a special
> function to kill a single newline, if present. Looks to me like those
> two could be combined.
That would be ok for us, since we do not have any newlines
in our strings. I will include this in the next patch!
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 15:36 ` Michael Holzheu
@ 2006-04-21 15:46 ` Jörn Engel
0 siblings, 0 replies; 32+ messages in thread
From: Jörn Engel @ 2006-04-21 15:46 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, mschwid2, Pekka Enberg
On Fri, 21 April 2006 17:36:53 +0200, Michael Holzheu wrote:
>
> That would be ok for us, since we do not have any newlines
> in our strings. I will include this in the next patch!
Thanks!
It would be nice to seperate this patch out from the rest. It is
useful independently of when and whether hypfs gets merged.
Jörn
--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:42 ` Pekka Enberg
2006-04-21 14:59 ` Michael Holzheu
2006-04-21 15:18 ` Jörn Engel
@ 2006-04-21 22:30 ` Ingo Oeser
2006-04-24 17:17 ` Michael Holzheu
2006-04-24 17:19 ` Michael Holzheu
3 siblings, 1 reply; 32+ messages in thread
From: Ingo Oeser @ 2006-04-21 22:30 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Michael Holzheu, linux-kernel, schwidefsky
Hi Pekka,
On Friday, 21. April 2006 16:42, Pekka Enberg wrote:
> On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> > diff -urpN linux-2.6.16/fs/hypfs/hypfs.h linux-2.6.16-hypfs/fs/hypfs/hypfs.h
> > --- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
> > +++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.000000000 +0200
> > +static void inline remove_trailing_blanks(char *string)
> > +{
> > + char *ptr;
> > + for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
> > + if (*ptr == ' ')
> > + *ptr = 0;
> > + else
> > + break;
> > + }
> > +}
>
> Please consider moving this to lib/string.c and perhaps renaming it to
> strstrip().
Nearly.
- Just return the *ptr and let the caller modify the string.
- Take a string with characters to reject
Reasons:
- string might be read only
- caller wants to copy it anyway
- string might be a substring or sth. we like to parse further
- Symmetry with strchr()
Otherwise it is a very good idea implemented in a patch similiar to this
untested one below against Linus' current tree.
use case would be:
char *s = strltrim(string, " \t");
char *e = strrtrim(s, " \t\n\r");
*e = '\0';
Regards
Ingo Oeser
----
diff --git a/include/linux/string.h b/include/linux/string.h
index c61306d..fc90a70 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -68,6 +68,12 @@ extern __kernel_size_t strnlen(const cha
#ifndef __HAVE_ARCH_STRPBRK
extern char * strpbrk(const char *,const char *);
#endif
+#ifndef __HAVE_ARCH_STRLTRIM
+extern char * strltrim(const char *,const char *);
+#endif
+#ifndef __HAVE_ARCH_STRRTRIM
+extern char * strrtrim(const char *,const char *);
+#endif
#ifndef __HAVE_ARCH_STRSEP
extern char * strsep(char **,const char *);
#endif
diff --git a/lib/string.c b/lib/string.c
index 064f631..4bdcd5e 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -408,6 +408,55 @@ char *strpbrk(const char *cs, const char
EXPORT_SYMBOL(strpbrk);
#endif
+#ifndef __HAVE_ARCH_STRLTRIM
+/**
+ * strltrim - Get pointer to first character of @s which is
+ * not contained in letters in @reject
+ * @s: The string to be searched
+ * @reject: The string of letters to avoid
+ */
+char *strltrim(const char *s, const char *reject)
+{
+ const char *p;
+ const char *r;
+
+ for (p = s; p != '\0'; ++p) {
+ for (r = reject; *r != '\0'; ++r) {
+ if (*p == *r)
+ continue;
+ }
+ return (char *)p;
+ }
+ return (char *)s;
+}
+EXPORT_SYMBOL(strltrim);
+#endif
+
+#ifndef __HAVE_ARCH_STRRTRIM
+/**
+ * strrtrim - Get pointer to last character of @s which is
+ * not contained in letters in @reject
+ * @s: The string to be searched
+ * @reject: The string of letters to avoid
+ */
+char *strrtrim(const char *s, const char *reject)
+{
+ const char *end = s + strlen(s);
+ const char *p;
+ const char *r;
+
+ for (p = end - 1; s <= p; --p) {
+ for (r = reject; *r != '\0'; ++r) {
+ if (*p == *r)
+ continue;
+ }
+ return (char *)p;
+ }
+ return (char *)end;
+}
+EXPORT_SYMBOL(strrtrim);
+#endif
+
#ifndef __HAVE_ARCH_STRSEP
/**
* strsep - Split a string into tokens
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 22:30 ` Ingo Oeser
@ 2006-04-24 17:17 ` Michael Holzheu
2006-04-24 19:57 ` Ingo Oeser
2006-04-25 6:27 ` Pekka Enberg
0 siblings, 2 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-24 17:17 UTC (permalink / raw)
To: Ingo Oeser; +Cc: linux-kernel, mschwid2, Pekka Enberg
Hi Ingo,
Ingo Oeser <ioe-lkml@rameria.de> wrote on 04/22/2006 12:30:28 AM:
>
> Nearly.
>
> - Just return the *ptr and let the caller modify the string.
> - Take a string with characters to reject
>
> Reasons:
> - string might be read only
> - caller wants to copy it anyway
> - string might be a substring or sth. we like to parse further
> - Symmetry with strchr()
>
> Otherwise it is a very good idea implemented in a patch similiar to this
> untested one below against Linus' current tree.
>
> use case would be:
> char *s = strltrim(string, " \t");
> char *e = strrtrim(s, " \t\n\r");
> *e = '\0';
I agree that it is a good idea to specify the characters to
reject, but I would like to use the function without having an
additional local pointer variable. In my opinion this
functionality is enough for most cases.
What about something like that:
/**
* strrtrim - Remove trailing characters specified in @reject
* @s: The string to be searched
* @reject: The string of letters to avoid
*/
static inline void strrtrim(char *s, const char *reject)
{
char *p;
const char *r;
for (p = s + strlen(s) - 1; s <= p; p--) {
for (r = reject; (*r != '\0') && (*p != *r); r++)
/* nothing */;
if (*r == '\0')
break;
}
*(p + 1) = '\0';
}
Regards
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-24 17:17 ` Michael Holzheu
@ 2006-04-24 19:57 ` Ingo Oeser
2006-04-25 6:27 ` Pekka Enberg
1 sibling, 0 replies; 32+ messages in thread
From: Ingo Oeser @ 2006-04-24 19:57 UTC (permalink / raw)
To: Michael Holzheu; +Cc: linux-kernel, mschwid2, Pekka Enberg
Hi Michael,
On Monday, 24. April 2006 19:17, you wrote:
> I agree that it is a good idea to specify the characters to
> reject, but I would like to use the function without having an
> additional local pointer variable. In my opinion this
> functionality is enough for most cases.
AFAIK there is no Standard for this kind of functions.
Yes, it might be ok, if you like to destroy the string.
But I prefer non-destructive string handling :-)
Might be a matter of taste. So go wild, if nobody else objects.
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-24 17:17 ` Michael Holzheu
2006-04-24 19:57 ` Ingo Oeser
@ 2006-04-25 6:27 ` Pekka Enberg
1 sibling, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 6:27 UTC (permalink / raw)
To: Michael Holzheu; +Cc: Ingo Oeser, linux-kernel, mschwid2
On Mon, 2006-04-24 at 19:17 +0200, Michael Holzheu wrote:
> What about something like that:
>
> /**
> * strrtrim - Remove trailing characters specified in @reject
> * @s: The string to be searched
> * @reject: The string of letters to avoid
> */
> static inline void strrtrim(char *s, const char *reject)
Better to make it out-of-line to save kernel text.
> {
> char *p;
> const char *r;
>
> for (p = s + strlen(s) - 1; s <= p; p--) {
> for (r = reject; (*r != '\0') && (*p != *r); r++)
> /* nothing */;
> if (*r == '\0')
> break;
> }
> *(p + 1) = '\0';
> }
>
> Regards
> Michael
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-21 14:42 ` Pekka Enberg
` (2 preceding siblings ...)
2006-04-21 22:30 ` Ingo Oeser
@ 2006-04-24 17:19 ` Michael Holzheu
3 siblings, 0 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-24 17:19 UTC (permalink / raw)
To: penberg, ioe-lkml; +Cc: linux-kernel, mschwid2, joern
Hi Pekka, Hi Ingo,
I included your comments in my new patch
(which comes with the next posting):
penberg@gmail.com wrote on 04/21/2006 04:42:28 PM:
> I have included some review comments below.
>
> On 4/21/06, Michael Holzheu <holzheu@de.ibm.com> wrote:
> > diff -urpN linux-2.6.16/fs/hypfs/hypfs.h
linux-2.6.16-hypfs/fs/hypfs/hypfs.h
> > --- linux-2.6.16/fs/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
> > +++ linux-2.6.16-hypfs/fs/hypfs/hypfs.h 2006-04-21 12:56:58.
> 000000000 +0200
> > +static void inline remove_trailing_blanks(char *string)
> > +{
> > + char *ptr;
> > + for (ptr = string + strlen(string) - 1; ptr > string; ptr--) {
> > + if (*ptr == ' ')
> > + *ptr = 0;
> > + else
> > + break;
> > + }
> > +}
>
> Please consider moving this to lib/string.c and perhaps renaming it to
> strstrip().
I Implemented a strrtrim() function in string.c
> > diff -urpN linux-2.6.16/fs/hypfs/hypfs_diag.c linux-2.6.16-
> hypfs/fs/hypfs/hypfs_diag.c
> > --- linux-2.6.16/fs/hypfs/hypfs_diag.c 1970-01-01 01:00:00.000000000
+0100
> > +++ linux-2.6.16-hypfs/fs/hypfs/hypfs_diag.c 2006-04-21 12:56:
> 58.000000000 +0200
> > +/* diag 204 subcodes */
> > +typedef enum {
> > + SUBC_STIB4 = 4,
> > + SUBC_RSI = 5,
> > + SUBC_STIB6 = 6,
> > + SUBC_STIB7 = 7
> > +} diag204_subc_t;
> > +
> > +/* The two available diag 204 data formats */
> > +typedef enum {
> > + INFO_SIMPLE = 0,
> > + INFO_EXT = 0x00010000
> > +} diag204_info_t;
>
> Please kill the typedefs.
Done.
> > +/* Time information block */
> > +
> > +struct info_blk_hdr {
> > + __u8 npar;
> > + __u8 flags;
> > + __u16 tslice;
> > + __u16 phys_cpus;
> > + __u16 this_part;
> > + __u64 curtod;
> > +} __attribute__ ((packed));
> > +
> > +struct x_info_blk_hdr {
> > + __u8 npar;
> > + __u8 flags;
> > + __u16 tslice;
> > + __u16 phys_cpus;
> > + __u16 this_part;
> > + __u64 curtod1;
> > + __u64 curtod2;
> > + char reserved[40];
> > +} __attribute__ ((packed));
>
> Couldn't you use endianess annotated types for these?
Since we have a s390 only implementation, I think this is not
necessary.
> > +static inline __u8 part_hdr__cpus(diag204_info_t type, void *hdr)
>
> Ditto. Appears in various other places as well.
I used the extra underscore for the "getter" functions to
separate the member part from the rest of the function name.
E.g. "part_hdr" is the structure name and "cpus" is the
member name. I really would like to keep that.
> > +static inline int diag204(unsigned long subcode, unsigned long
> size, void *addr)
> > +{
> > + register unsigned long _subcode asm("0") = subcode;
> > + register unsigned long _size asm("1") = size;
> > +
> > + asm volatile (" diag %2,%0,0x204\n"
> > + "0: \n" ".section __ex_table,\"a\"\n"
> > +#ifndef __s390x__
> > + " .align 4\n"
> > + " .long 0b,0b\n"
> > +#else
> > + " .align 8\n"
> > + " .quad 0b,0b\n"
> > +#endif
> > + ".previous":"+d" (_subcode), "+d"(_size)
> > + :"d"(addr)
> > + :"memory");
>
> Please note that the above is a big clue for this fs to go into
arch/s390/.
Right! I moved the code to arch/s390/hypfs.
>
> > +static int diag204_probe(void)
> > +{
> > + void *buf;
> > + int pages, rc;
> > +
> > + pages = diag204(SUBC_RSI | INFO_EXT, 0, 0);
> > + if (pages > 0) {
> > + if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL))) {
>
> Please move the assignment out of the if expression for readability.
Done.
> > + rc = -ENOMEM;
> > + goto err_out;
> > + }
> > + if (diag204(SUBC_STIB7 | INFO_EXT, pages, buf) >= 0) {
> > + diag204_store_sc = SUBC_STIB7;
> > + diag204_info_type = INFO_EXT;
> > + goto out;
> > + }
> > + if (diag204(SUBC_STIB6 | INFO_EXT, pages, buf) >= 0) {
> > + diag204_store_sc = SUBC_STIB7;
> > + diag204_info_type = INFO_EXT;
> > + goto out;
> > + }
> > + kfree(buf);
> > + }
> > + if (!(buf = kmalloc(PAGE_SIZE, GFP_KERNEL))) {
>
> Same here.
Done.
> > + rc = -ENOMEM;
> > + goto err_out;
> > + }
> > + if (diag204(SUBC_STIB4 | INFO_SIMPLE, pages, buf) >= 0) {
> > + diag204_store_sc = SUBC_STIB4;
> > + diag204_info_type = INFO_SIMPLE;
> > + goto out;
> > + } else {
> > + rc = -ENOSYS;
> > + goto err_out;
> > + }
> > + out:
> > + kfree(buf);
> > + return 0;
> > + err_out:
> > + kfree(buf);
> > + return rc;
>
> Please drop the duplicate error path.
Done. I use now "rc = 0" for the good path.
> > +static void *diag204_store(void)
> > +{
> > + void *buf;
> > + int pages;
> > +
> > + if (diag204_store_sc == SUBC_STIB4)
> > + pages = 1;
> > + else
> > + pages = diag204(SUBC_RSI | diag204_info_type, 0, 0);
> > +
> > + if (pages < 0)
> > + return ERR_PTR(-ENOSYS);
> > +
> > + if (!(buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL)))
>
> And here.
Done.
> > diff -urpN linux-2.6.16/fs/hypfs/inode.c
linux-2.6.16-hypfs/fs/hypfs/inode.c
> > --- linux-2.6.16/fs/hypfs/inode.c 1970-01-01 01:00:00.000000000 +0100
> > +++ linux-2.6.16-hypfs/fs/hypfs/inode.c 2006-04-21 12:56:58.
> 000000000 +0200
>
> > +static DECLARE_MUTEX(hypfs_lock); // XXX DEFINE_MUTEX in 2.6.16 !
>
> Yes, please!
Done.
> > +
> > +/* start of list of all dentries, which have to be deleted on update
*/
> > +static struct dentry *hypfs_last_dentry;
> > +
> > +static void hypfs_update_update(void)
> > +{
> > + last_update_time = get_seconds();
> > + snprintf((char *)update_file_dentry->d_inode->u.generic_ip,
> > + UPDATE_DATA_SIZE, "%ld\n", last_update_time);
> > + update_file_dentry->d_inode->i_size =
> > + strlen((char *)update_file_dentry->d_inode->u.generic_ip);
> > + update_file_dentry->d_inode->i_atime =
> > + update_file_dentry->d_inode->i_mtime =
> > + update_file_dentry->d_inode->i_ctime = CURRENT_TIME;
>
> Please introduce local variable for update_file_dentry->d_inode for
> readability. Perhaps for generic_ip as well.
Done. Much better now!
> > +}
> > +
> > +/* directory tree removal functions */
> > +
> > +static void hypfs_add_dentry(struct dentry *dentry)
> > +{
> > + dentry->d_fsdata = hypfs_last_dentry;
> > + hypfs_last_dentry = dentry;
> > +}
> > +
> > +static void hypfs_delete_tree(struct dentry *root)
> > +{
> > + while (hypfs_last_dentry) {
> > + struct dentry *parent, *next_dentry;
> > +
> > + parent = hypfs_last_dentry->d_parent;
> > + if (S_ISDIR(hypfs_last_dentry->d_inode->i_mode))
> > + simple_rmdir(parent->d_inode, hypfs_last_dentry);
> > + else
> > + simple_unlink(parent->d_inode, hypfs_last_dentry);
> > + d_delete(hypfs_last_dentry);
> > + next_dentry = hypfs_last_dentry->d_fsdata;
> > + dput(hypfs_last_dentry);
> > + hypfs_last_dentry = next_dentry;
> > + }
> > +}
> > +
> > +static struct inode *hypfs_make_inode(struct super_block *sb, int
mode)
> > +{
> > + struct inode *ret = new_inode(sb);
> > +
> > + if (ret) {
> > + ret->i_mode = mode;
> > + ret->i_uid = ((struct hypfs_sb_info *)sb->s_fs_info)->uid;
> > + ret->i_gid = ((struct hypfs_sb_info *)sb->s_fs_info)->gid;
>
> Reduce casting by introducing a local variable for hypfs_sb_info.
Done.
> > + ret->i_blksize = PAGE_CACHE_SIZE;
> > + ret->i_blocks = 0;
> > + ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
> > + if (mode & S_IFDIR)
> > + ret->i_nlink = 2;
> > + else
> > + ret->i_nlink = 1;
> > + }
> > + return ret;
> > +}
> > +
> > +static void hypfs_drop_inode(struct inode *inode)
> > +{
> > + kfree((void *)inode->u.generic_ip);
>
> Please drop the redundant cast. Consider moving this to
> inode_ops->clear_inode.
I removed the cast, but I keep the code in drop_inode() until
someone tells me that this terribly wrong.
> > +static int hypfs_parse_options(char *options)
> > +{
> > + char *str;
> > + substring_t args[MAX_OPT_ARGS];
> > +
> > + if (!options)
> > + return 0;
> > + while ((str = strsep(&options, ",")) != NULL) {
> > + int token, option;
> > + if (!*str)
> > + continue;
> > + token = match_token(str, hypfs_tokens, args);
> > + switch (token) {
> > + case opt_uid:
> > + if (match_int(&args[0], &option))
> > + return -EINVAL;
> > + ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->uid
> > + = option;
> > + break;
> > + case opt_gid:
> > + if (match_int(&args[0], &option))
> > + return -EINVAL;
> > + ((struct hypfs_sb_info *)hypfs_sblk->s_fs_info)->gid
> > + = option;
>
> Please reduce casting by introducing a local variable for hypfs_sb_info.
Done.
> > +static int hypfs_fill_super(struct super_block *sb, void *data, int
silent)
> > +{
> > + struct inode *root_inode;
> > + int rc = 0;
> > + struct hypfs_sb_info *sbi;
> > +
> > + sbi = kmalloc(sizeof(struct hypfs_sb_info), GFP_KERNEL);
> > + if (!sbi)
> > + return -ENOMEM;
> > + sbi->uid = current->uid;
> > + sbi->gid = current->gid;
> > + sb->s_fs_info = sbi;
> > + sb->s_blocksize = PAGE_CACHE_SIZE;
> > + sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
> > + sb->s_magic = HYPFS_MAGIC;
> > + sb->s_op = &hypfs_s_ops;
> > + if (hypfs_parse_options(data)) {
> > + rc = -EINVAL;
> > + goto err_alloc;
> > + }
> > + root_inode = hypfs_make_inode(sb, S_IFDIR | 0755);
> > + if (!root_inode) {
> > + rc = -ENOMEM;
> > + goto err_alloc;
> > + }
> > + root_inode->i_op = &simple_dir_inode_operations;
> > + root_inode->i_fop = &simple_dir_operations;
> > + sb->s_root = d_alloc_root(root_inode);
> > + if (!sb->s_root) {
> > + rc = -ENOMEM;
> > + goto err_inode;
> > + }
> > + hypfs_sblk = sb;
> > + rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
> > + if (rc)
> > + goto err_tree;
> > + update_file_dentry = hypfs_create_update_file(hypfs_sblk,
> > + hypfs_sblk->s_root);
> > + if (IS_ERR(update_file_dentry)) {
> > + rc = PTR_ERR(update_file_dentry);
> > + goto err_tree;
> > + }
> > + hypfs_update_update();
> > + return 0;
> > +
> > + err_tree:
> > + hypfs_delete_tree(hypfs_sblk->s_root);
> > + dput(hypfs_sblk->s_root);
> > + err_inode:
> > + hypfs_drop_inode(root_inode);
>
> You should use iput() here.
Done.
> > + err_alloc:
> > + kfree(sbi);
> > + return rc;
> > +}
> > +static void hypfs_kill_super(struct super_block *sb)
> > +{
> > + kfree(sb->s_fs_info);
>
> Please use super_operations->put_super instead for freeing s_fs_info.
Done.
> > +/*
> > + * init and exit
> > + * *************
> > + */
>
> Consider dropping the above useless comment.
Done.
> > +
> > +static decl_subsys(hypervisor, NULL, NULL);
> > +
> > +static int __init hypfs_init(void)
> > +{
> > + int rc;
> > +
> > + if (MACHINE_IS_VM) {
> > + return -ENODATA;
> > + }
> > + if (diag_init()) {
> > + printk(KERN_ERR "hypfs: diag init failed.\n");
> > + return -ENODATA;
> > + }
> > + rc = subsystem_register(&hypervisor_subsys);
> > + if (rc) {
> > + diag_exit();
> > + return rc;
> > + }
> > + rc = register_filesystem(&hypfs_type);
> > + if (rc) {
> > + subsystem_unregister(&hypervisor_subsys);
> > + diag_exit();
> > + return rc;
>
> Please use gotos for error handling here to reduce duplication.
Done.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
@ 2006-04-24 17:19 Michael Holzheu
2006-04-25 6:58 ` Pekka Enberg
2006-04-25 14:33 ` Pekka Enberg
0 siblings, 2 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-24 17:19 UTC (permalink / raw)
To: penberg, ioe-lkml; +Cc: linux-kernel, mschwid2, joern
Hi all,
penberg@gmail.com wrote on 04/21/2006 04:42:28 PM:
> Hi Michael,
>
> I have included some review comments below.
>
Here the updated version of my patch!
Michael
----
arch/s390/Kconfig | 7
arch/s390/Makefile | 2
arch/s390/hypfs/Makefile | 7
arch/s390/hypfs/hypfs.h | 32 ++
arch/s390/hypfs/hypfs_diag.c | 628 +++++++++++++++++++++++++++++++++++++++++++
arch/s390/hypfs/hypfs_diag.h | 16 +
arch/s390/hypfs/inode.c | 467 +++++++++++++++++++++++++++++++
include/linux/string.h | 3
lib/string.c | 22 +
9 files changed, 1183 insertions(+), 1 deletion(-)
diff -urpN linux-2.6.16/arch/s390/Kconfig linux-2.6.16-hypfs/arch/s390/Kconfig
--- linux-2.6.16/arch/s390/Kconfig 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/Kconfig 2006-04-24 18:30:54.000000000 +0200
@@ -442,6 +442,13 @@ config NO_IDLE_HZ_INIT
The HZ timer is switched off in idle by default. That means the
HZ timer is already disabled at boot time.
+config HYPFS_FS
+ bool "zseries hypervisor file system support"
+ default y
+ help
+ This is a virtual file system intended to provide accounting
+ information in a zSeries hypervisor environment.
+
config KEXEC
bool "kexec system call (EXPERIMENTAL)"
depends on EXPERIMENTAL
diff -urpN linux-2.6.16/arch/s390/Makefile linux-2.6.16-hypfs/arch/s390/Makefile
--- linux-2.6.16/arch/s390/Makefile 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/Makefile 2006-04-24 18:29:50.000000000 +0200
@@ -77,7 +77,7 @@ LDFLAGS_vmlinux := -e start
head-y := arch/$(ARCH)/kernel/head.o arch/$(ARCH)/kernel/init_task.o
core-y += arch/$(ARCH)/mm/ arch/$(ARCH)/kernel/ arch/$(ARCH)/crypto/ \
- arch/$(ARCH)/appldata/
+ arch/$(ARCH)/appldata/ arch/$(ARCH)/hypfs/
libs-y += arch/$(ARCH)/lib/
drivers-y += drivers/s390/
drivers-$(CONFIG_MATHEMU) += arch/$(ARCH)/math-emu/
diff -urpN linux-2.6.16/arch/s390/hypfs/Makefile linux-2.6.16-hypfs/arch/s390/hypfs/Makefile
--- linux-2.6.16/arch/s390/hypfs/Makefile 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/hypfs/Makefile 2006-04-24 18:25:35.000000000 +0200
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux hypfs filesystem routines.
+#
+
+obj-$(CONFIG_HYPFS_FS) += hypfs.o
+
+hypfs-objs := inode.o hypfs_diag.o
diff -urpN linux-2.6.16/arch/s390/hypfs/hypfs.h linux-2.6.16-hypfs/arch/s390/hypfs/hypfs.h
--- linux-2.6.16/arch/s390/hypfs/hypfs.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/hypfs/hypfs.h 2006-04-24 18:25:35.000000000 +0200
@@ -0,0 +1,32 @@
+/*
+ * fs/hypfs/hypfs.h
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#ifndef _HYPFS_H_
+#define _HYPFS_H_
+
+#include <linux/fs.h>
+#include <linux/types.h>
+
+#define REG_FILE_MODE 0440
+#define UPDATE_FILE_MODE 0660
+#define DIR_MODE 0550
+
+extern struct dentry *hypfs_mkdir(struct super_block *sb, struct dentry *parent,
+ const char *name);
+
+extern struct dentry *hypfs_create_u64(struct super_block *sb,
+ struct dentry *dir, const char *name,
+ __u64 value);
+
+extern struct dentry *hypfs_create_str(struct super_block *sb,
+ struct dentry *dir, const char *name,
+ char *string);
+
+extern struct super_block *hypfs_sblk;
+
+#endif /* _HYPFS_H_ */
diff -urpN linux-2.6.16/arch/s390/hypfs/hypfs_diag.c linux-2.6.16-hypfs/arch/s390/hypfs/hypfs_diag.c
--- linux-2.6.16/arch/s390/hypfs/hypfs_diag.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/hypfs/hypfs_diag.c 2006-04-24 18:25:35.000000000 +0200
@@ -0,0 +1,628 @@
+/*
+ * fs/hypfs/hypfs_diag.c
+ * Hypervisor filesystem for Linux on zSeries. Diag 204 and 224
+ * implementation.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <asm/ebcdic.h>
+#include "hypfs.h"
+
+#define LPAR_NAME_LEN 8 /* lpar name len in diag 204 data */
+#define TMP_SIZE 1024 /* size of temorary buffers */
+
+/* diag 204 subcodes */
+enum diag204_sc {
+ SUBC_STIB4 = 4,
+ SUBC_RSI = 5,
+ SUBC_STIB6 = 6,
+ SUBC_STIB7 = 7
+};
+
+/* The two available diag 204 data formats */
+enum diag204_format {
+ INFO_SIMPLE = 0,
+ INFO_EXT = 0x00010000
+};
+
+/* bit is set in flags, when physical cpu info is included in diag 204 data */
+#define LPAR_PHYS_FLG 0x80
+
+static char *diag224_cpu_names; /* diag 224 name table */
+static enum diag204_sc diag204_store_sc; /* used subcode for store */
+static enum diag204_format diag204_info_type; /* used diag 204 data format */
+
+/*
+ * DIAG 204 data structures and member access functions.
+ *
+ * Since we have two different diag 204 data formats for old and new zSeries
+ * machines, we do not access the structs directly, but use getter functions for
+ * each struct member instead. This should make the code more readable.
+ */
+
+/* Time information block */
+
+struct info_blk_hdr {
+ __u8 npar;
+ __u8 flags;
+ __u16 tslice;
+ __u16 phys_cpus;
+ __u16 this_part;
+ __u64 curtod;
+} __attribute__ ((packed));
+
+struct x_info_blk_hdr {
+ __u8 npar;
+ __u8 flags;
+ __u16 tslice;
+ __u16 phys_cpus;
+ __u16 this_part;
+ __u64 curtod1;
+ __u64 curtod2;
+ char reserved[40];
+} __attribute__ ((packed));
+
+static inline int info_blk_hdr__size(enum diag204_format type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct info_blk_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_info_blk_hdr);
+}
+
+static inline __u8 info_blk_hdr__npar(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->npar;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->npar;
+}
+
+static inline __u8 info_blk_hdr__flags(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->flags;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->flags;
+}
+
+static inline __u16 info_blk_hdr__pcpus(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct info_blk_hdr *)hdr)->phys_cpus;
+ else /* INFO_EXT */
+ return ((struct x_info_blk_hdr *)hdr)->phys_cpus;
+}
+
+/* Partition header */
+
+struct part_hdr {
+ __u8 pn;
+ __u8 cpus;
+ char reserved[6];
+ char part_name[LPAR_NAME_LEN];
+} __attribute__ ((packed));
+
+struct x_part_hdr {
+ __u8 pn;
+ __u8 cpus;
+ __u8 rcpus;
+ __u8 pflag;
+ __u32 mlu;
+ char part_name[LPAR_NAME_LEN];
+ char lpc_name[8];
+ char os_name[8];
+ __u64 online_cs;
+ __u64 online_es;
+ __u8 upid;
+ char reserved1[3];
+ __u32 group_mlu;
+ char group_name[8];
+ char reserved2[32];
+} __attribute__ ((packed));
+
+static inline int part_hdr__size(enum diag204_format type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct part_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_part_hdr);
+}
+
+static inline __u8 part_hdr__cpus(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct part_hdr *)hdr)->cpus;
+ else /* INFO_EXT */
+ return ((struct x_part_hdr *)hdr)->cpus;
+}
+
+static inline void part_hdr__part_name(enum diag204_format type, void *hdr,
+ char *name)
+{
+ if (type == INFO_SIMPLE)
+ memcpy(name, ((struct part_hdr *)hdr)->part_name,
+ LPAR_NAME_LEN);
+ else /* INFO_EXT */
+ memcpy(name, ((struct x_part_hdr *)hdr)->part_name,
+ LPAR_NAME_LEN);
+ EBCASC(name, LPAR_NAME_LEN);
+ name[LPAR_NAME_LEN] = 0;
+ strrtrim(name, " ");
+}
+
+struct cpu_info {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ __u8 cflag;
+ __u16 weight;
+ __u64 acc_time;
+ __u64 lp_time;
+} __attribute__ ((packed));
+
+struct x_cpu_info {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ __u8 cflag;
+ __u16 weight;
+ __u64 acc_time;
+ __u64 lp_time;
+ __u16 min_weight;
+ __u16 cur_weight;
+ __u16 max_weight;
+ char reseved2[2];
+ __u64 online_time;
+ __u64 wait_time;
+ __u32 pma_weight;
+ __u32 polar_weight;
+ char reserved3[40];
+} __attribute__ ((packed));
+
+/* CPU info block */
+
+static inline int cpu_info__size(enum diag204_format type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct cpu_info);
+ else /* INFO_EXT */
+ return sizeof(struct x_cpu_info);
+}
+
+static inline __u8 cpu_info__ctidx(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->ctidx;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->ctidx;
+}
+
+static inline __u16 cpu_info__cpu_addr(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->cpu_addr;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->cpu_addr;
+}
+
+static inline __u64 cpu_info__acc_time(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->acc_time;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->acc_time;
+}
+
+static inline __u64 cpu_info__lp_time(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct cpu_info *)hdr)->lp_time;
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->lp_time;
+}
+
+static inline __u64 cpu_info__online_time(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return 0; /* online_time not available in simple info */
+ else /* INFO_EXT */
+ return ((struct x_cpu_info *)hdr)->online_time;
+}
+
+/* Physical header */
+
+struct phys_hdr {
+ char reserved1[1];
+ __u8 cpus;
+ char reserved2[6];
+ char mgm_name[8];
+} __attribute__ ((packed));
+
+struct x_phys_hdr {
+ char reserved1[1];
+ __u8 cpus;
+ char reserved2[6];
+ char mgm_name[8];
+ char reserved3[80];
+} __attribute__ ((packed));
+
+static inline int phys_hdr__size(enum diag204_format type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct phys_hdr);
+ else /* INFO_EXT */
+ return sizeof(struct x_phys_hdr);
+}
+
+static inline __u8 phys_hdr__cpus(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_hdr *)hdr)->cpus;
+ else /* INFO_EXT */
+ return ((struct x_phys_hdr *)hdr)->cpus;
+}
+
+/* Physical CPU info block */
+
+struct phys_cpu {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ char reserved2[3];
+ __u64 mgm_time;
+ char reserved3[8];
+} __attribute__ ((packed));
+
+struct x_phys_cpu {
+ __u16 cpu_addr;
+ char reserved1[2];
+ __u8 ctidx;
+ char reserved2[3];
+ __u64 mgm_time;
+ char reserved3[80];
+} __attribute__ ((packed));
+
+static inline int phys_cpu__size(enum diag204_format type)
+{
+ if (type == INFO_SIMPLE)
+ return sizeof(struct phys_cpu);
+ else /* INFO_EXT */
+ return sizeof(struct x_phys_cpu);
+}
+
+static inline __u16 phys_cpu__cpu_addr(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->cpu_addr;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->cpu_addr;
+}
+
+static inline __u64 phys_cpu__mgm_time(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->mgm_time;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->mgm_time;
+}
+
+static inline __u64 phys_cpu__ctidx(enum diag204_format type, void *hdr)
+{
+ if (type == INFO_SIMPLE)
+ return ((struct phys_cpu *)hdr)->ctidx;
+ else /* INFO_EXT */
+ return ((struct x_phys_cpu *)hdr)->ctidx;
+}
+
+/* Diagnose 204 functions */
+
+static inline int diag204(unsigned long subcode, unsigned long size, void *addr)
+{
+ register unsigned long _subcode asm("0") = subcode;
+ register unsigned long _size asm("1") = size;
+
+ asm volatile (" diag %2,%0,0x204\n"
+ "0: \n" ".section __ex_table,\"a\"\n"
+#ifndef __s390x__
+ " .align 4\n"
+ " .long 0b,0b\n"
+#else
+ " .align 8\n"
+ " .quad 0b,0b\n"
+#endif
+ ".previous":"+d" (_subcode), "+d"(_size)
+ :"d"(addr)
+ :"memory");
+ if (_subcode)
+ return -1;
+ else
+ return _size;
+}
+
+/*
+ * diag204_probe() has to find out, which type of diagnose 204 implementation
+ * we have on our machine. Currently there are three possible scanarios:
+ * - subcode 4 + simple data format (only one page)
+ * - subcode 4-6 + extended data format
+ * - subcode 4-7 + extended data format
+ *
+ * Subcode 5 is used to retrieve the size of the data, provided by subcodes
+ * 6 and 7. Subcode 7 basically has the same function as subcode 6. In addition
+ * to subcode 6 it provides also information about secondary cpus.
+ * In order to get as much information as possible, we first try
+ * subcode 7, then 6 and if both fail, we use subcode 4.
+ */
+
+static int diag204_probe(void)
+{
+ void *buf;
+ int pages, rc;
+
+ pages = diag204(SUBC_RSI | INFO_EXT, 0, 0);
+ if (pages > 0) {
+ buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL);
+ if (!buf) {
+ rc = -ENOMEM;
+ goto err_out;
+ }
+ if (diag204(SUBC_STIB7 | INFO_EXT, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB7;
+ diag204_info_type = INFO_EXT;
+ goto out;
+ }
+ if (diag204(SUBC_STIB6 | INFO_EXT, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB7;
+ diag204_info_type = INFO_EXT;
+ goto out;
+ }
+ kfree(buf);
+ }
+ buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!buf) {
+ rc = -ENOMEM;
+ goto err_out;
+ }
+ if (diag204(SUBC_STIB4 | INFO_SIMPLE, pages, buf) >= 0) {
+ diag204_store_sc = SUBC_STIB4;
+ diag204_info_type = INFO_SIMPLE;
+ goto out;
+ } else {
+ rc = -ENOSYS;
+ goto err_out;
+ }
+ out:
+ rc = 0;
+ err_out:
+ kfree(buf);
+ return rc;
+}
+
+static void *diag204_store(void)
+{
+ void *buf;
+ int pages;
+
+ if (diag204_store_sc == SUBC_STIB4)
+ pages = 1;
+ else
+ pages = diag204(SUBC_RSI | diag204_info_type, 0, 0);
+
+ if (pages < 0)
+ return ERR_PTR(-ENOSYS);
+
+ buf = kmalloc(pages * PAGE_SIZE, GFP_KERNEL);
+ if (!buf)
+ return ERR_PTR(-ENOMEM);
+
+ if (diag204(diag204_store_sc | diag204_info_type, pages, buf) < 0) {
+ kfree(buf);
+ return ERR_PTR(-ENOSYS);
+ }
+ return buf;
+}
+
+/* Diagnose 224 functions */
+
+static inline void diag224(void *ptr)
+{
+ asm volatile(" diag %0,%1,0x224\n"
+ : :"d" (0), "d"(ptr) : "memory");
+}
+
+static inline int diag224_get_name_table(void)
+{
+ /* memory must be below 2GB */
+ diag224_cpu_names = kmalloc(PAGE_SIZE, GFP_KERNEL | GFP_DMA);
+ if (!diag224_cpu_names)
+ return -ENOMEM;
+ diag224(diag224_cpu_names);
+ EBCASC(diag224_cpu_names + 16, (*diag224_cpu_names + 1) * 16);
+ return 0;
+}
+
+static inline void diag224_delete_name_table(void)
+{
+ kfree(diag224_cpu_names);
+}
+
+static int diag224_idx2name(int index, char *name)
+{
+ memcpy(name, diag224_cpu_names + ((index + 1) * 16), 16);
+ name[16] = 0;
+ strrtrim(name, " ");
+ return 0;
+}
+
+int diag_init(void)
+{
+ int rc;
+
+ if (diag204_probe()) {
+ printk(KERN_ERR "hypfs: diag 204 not working.");
+ return -ENODATA;
+ }
+ rc = diag224_get_name_table();
+ if (rc) {
+ diag224_delete_name_table();
+ printk(KERN_ERR "hypfs: could not get name table.\n");
+ }
+ return rc;
+}
+
+void diag_exit(void)
+{
+ diag224_delete_name_table();
+}
+
+/*
+ * Functions to create the directory structure
+ * *******************************************
+ */
+
+static int hypfs_create_cpu_files(struct super_block *sb,
+ struct dentry *cpus_dir, void *cpu_info)
+{
+ struct dentry *cpu_dir;
+ char buffer[TMP_SIZE];
+ void *rc;
+
+ snprintf(buffer, TMP_SIZE, "%d", cpu_info__cpu_addr(diag204_info_type,
+ cpu_info));
+ cpu_dir = hypfs_mkdir(sb, cpus_dir, buffer);
+ rc = hypfs_create_u64(sb, cpu_dir, "mgmtime",
+ cpu_info__acc_time(diag204_info_type, cpu_info) -
+ cpu_info__lp_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ rc = hypfs_create_u64(sb, cpu_dir, "cputime",
+ cpu_info__lp_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ if (diag204_info_type == INFO_EXT) {
+ rc = hypfs_create_u64(sb, cpu_dir, "onlinetime",
+ cpu_info__online_time(diag204_info_type,
+ cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ }
+ diag224_idx2name(cpu_info__ctidx(diag204_info_type, cpu_info), buffer);
+ rc = hypfs_create_str(sb, cpu_dir, "type", buffer);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ return 0;
+}
+
+static void *hypfs_create_lpar_files(struct super_block *sb,
+ struct dentry *systems_dir, void *part_hdr)
+{
+ struct dentry *cpus_dir;
+ struct dentry *lpar_dir;
+ char lpar_name[LPAR_NAME_LEN + 1];
+ void *cpu_info;
+ int i;
+
+ part_hdr__part_name(diag204_info_type, part_hdr, lpar_name);
+ lpar_name[LPAR_NAME_LEN] = 0;
+ lpar_dir = hypfs_mkdir(sb, systems_dir, lpar_name);
+ if (IS_ERR(lpar_dir))
+ return lpar_dir;
+ cpus_dir = hypfs_mkdir(sb, lpar_dir, "cpus");
+ if (IS_ERR(cpus_dir))
+ return cpus_dir;
+ cpu_info = part_hdr + part_hdr__size(diag204_info_type);
+ for (i = 0; i < part_hdr__cpus(diag204_info_type, part_hdr); i++) {
+ int rc;
+ rc = hypfs_create_cpu_files(sb, cpus_dir, cpu_info);
+ if (rc)
+ return ERR_PTR(rc);
+ cpu_info += cpu_info__size(diag204_info_type);
+ }
+ return cpu_info;
+}
+
+static int hypfs_create_phys_cpu_files(struct super_block *sb,
+ struct dentry *cpus_dir, void *cpu_info)
+{
+ struct dentry *cpu_dir;
+ char buffer[TMP_SIZE];
+ void *rc;
+
+ snprintf(buffer, TMP_SIZE, "%i", phys_cpu__cpu_addr(diag204_info_type,
+ cpu_info));
+ cpu_dir = hypfs_mkdir(sb, cpus_dir, buffer);
+ if (IS_ERR(cpu_dir))
+ return PTR_ERR(cpu_dir);
+ rc = hypfs_create_u64(sb, cpu_dir, "mgmtime",
+ phys_cpu__mgm_time(diag204_info_type, cpu_info));
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ diag224_idx2name(phys_cpu__ctidx(diag204_info_type, cpu_info), buffer);
+ rc = hypfs_create_str(sb, cpu_dir, "type", buffer);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ return 0;
+}
+
+static void *hypfs_create_phys_files(struct super_block *sb,
+ struct dentry *parent_dir, void *phys_hdr)
+{
+ int i;
+ void *cpu_info;
+ struct dentry *cpus_dir;
+
+ cpus_dir = hypfs_mkdir(sb, parent_dir, "cpus");
+ if (IS_ERR(cpus_dir))
+ return cpus_dir;
+ cpu_info = phys_hdr + phys_hdr__size(diag204_info_type);
+ for (i = 0; i < phys_hdr__cpus(diag204_info_type, phys_hdr); i++) {
+ int rc;
+ rc = hypfs_create_phys_cpu_files(sb, cpus_dir, cpu_info);
+ if (rc)
+ return ERR_PTR(rc);
+ cpu_info += phys_cpu__size(diag204_info_type);
+ }
+ return cpu_info;
+}
+
+int diag_create_files(struct super_block *sb, struct dentry *root)
+{
+ struct dentry *systems_dir, *hyp_dir;
+ void *time_hdr, *part_hdr;
+ int i;
+ void *buffer, *rc;
+
+ buffer = diag204_store();
+ if (IS_ERR(buffer))
+ return PTR_ERR(buffer);
+
+ systems_dir = hypfs_mkdir(sb, root, "systems");
+ if (IS_ERR(systems_dir))
+ return PTR_ERR(systems_dir);
+ time_hdr = (struct x_info_blk_hdr *)buffer;
+ part_hdr = time_hdr + info_blk_hdr__size(diag204_info_type);
+ for (i = 0; i < info_blk_hdr__npar(diag204_info_type, time_hdr); i++) {
+ part_hdr = hypfs_create_lpar_files(sb, systems_dir, part_hdr);
+ if (IS_ERR(part_hdr))
+ return PTR_ERR(part_hdr);
+ }
+ if (info_blk_hdr__flags(diag204_info_type, time_hdr) & LPAR_PHYS_FLG) {
+ void *rc;
+ rc = hypfs_create_phys_files(sb, root, part_hdr);
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ }
+ hyp_dir = hypfs_mkdir(sb, root, "hyp");
+ if (IS_ERR(hyp_dir))
+ return PTR_ERR(hyp_dir);
+ rc = hypfs_create_str(sb, hyp_dir, "type", "LPAR Hypervisor");
+ if (IS_ERR(rc))
+ return PTR_ERR(rc);
+ kfree(buffer);
+ return 0;
+}
diff -urpN linux-2.6.16/arch/s390/hypfs/hypfs_diag.h linux-2.6.16-hypfs/arch/s390/hypfs/hypfs_diag.h
--- linux-2.6.16/arch/s390/hypfs/hypfs_diag.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/hypfs/hypfs_diag.h 2006-04-24 18:25:35.000000000 +0200
@@ -0,0 +1,16 @@
+/*
+ * fs/hypfs/hypfs_diag.h
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#ifndef _HYPFS_DIAG_H_
+#define _HYPFS_DIAG_H_
+
+extern int diag_init(void);
+extern void diag_exit(void);
+extern int diag_create_files(struct super_block *sb, struct dentry *root);
+
+#endif /* _HYPFS_DIAG_H_ */
diff -urpN linux-2.6.16/arch/s390/hypfs/inode.c linux-2.6.16-hypfs/arch/s390/hypfs/inode.c
--- linux-2.6.16/arch/s390/hypfs/inode.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.16-hypfs/arch/s390/hypfs/inode.c 2006-04-24 18:25:35.000000000 +0200
@@ -0,0 +1,467 @@
+/*
+ * fs/hypfs/inode.c
+ * Hypervisor filesystem for Linux on zSeries.
+ *
+ * Copyright (C) IBM Corp. 2006
+ * Author(s): Michael Holzheu <holzheu@de.ibm.com>
+ */
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/vfs.h>
+#include <linux/pagemap.h>
+#include <linux/gfp.h>
+#include <linux/time.h>
+#include <linux/parser.h>
+#include <linux/sysfs.h>
+#include <linux/module.h>
+#include <asm/ebcdic.h>
+#include "hypfs.h"
+#include "hypfs_diag.h"
+
+#define HYPFS_MAGIC 0x687970 /* ASCII 'hyp' */
+#define TMP_SIZE 1024 /* size of temorary buffers */
+#define UPDATE_DATA_SIZE 32 /* size of update file inode buffer */
+
+static struct dentry *hypfs_create_update_file(struct super_block *sb,
+ struct dentry *dir);
+
+struct hypfs_sb_info {
+ int uid; /* uid used for files and dirs */
+ int gid; /* gid used for files and dirs */
+};
+
+struct super_block *hypfs_sblk;
+static struct dentry *update_file_dentry;
+static struct file_operations hypfs_file_ops;
+static struct file_system_type hypfs_type;
+static struct super_operations hypfs_s_ops;
+static time_t last_update_time = 0; /* update time in seconds since 1970 */
+static DEFINE_MUTEX(hypfs_lock);
+
+/* start of list of all dentries, which have to be deleted on update */
+static struct dentry *hypfs_last_dentry;
+
+static void hypfs_update_update(void)
+{
+ struct inode* inode = update_file_dentry->d_inode;
+
+ last_update_time = get_seconds();
+ snprintf((char *)inode->u.generic_ip,
+ UPDATE_DATA_SIZE, "%ld\n", last_update_time);
+ inode->i_size = strlen((char *)inode->u.generic_ip);
+ inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+}
+
+/* directory tree removal functions */
+
+static void hypfs_add_dentry(struct dentry *dentry)
+{
+ dentry->d_fsdata = hypfs_last_dentry;
+ hypfs_last_dentry = dentry;
+}
+
+static void hypfs_delete_tree(struct dentry *root)
+{
+ while (hypfs_last_dentry) {
+ struct dentry *parent, *next_dentry;
+
+ parent = hypfs_last_dentry->d_parent;
+ if (S_ISDIR(hypfs_last_dentry->d_inode->i_mode))
+ simple_rmdir(parent->d_inode, hypfs_last_dentry);
+ else
+ simple_unlink(parent->d_inode, hypfs_last_dentry);
+ d_delete(hypfs_last_dentry);
+ next_dentry = hypfs_last_dentry->d_fsdata;
+ dput(hypfs_last_dentry);
+ hypfs_last_dentry = next_dentry;
+ }
+}
+
+static struct inode *hypfs_make_inode(struct super_block *sb, int mode)
+{
+ struct inode *ret = new_inode(sb);
+
+ if (ret) {
+ struct hypfs_sb_info *hypfs_info = sb->s_fs_info;
+ ret->i_mode = mode;
+ ret->i_uid = hypfs_info->uid;
+ ret->i_gid = hypfs_info->gid;
+ ret->i_blksize = PAGE_CACHE_SIZE;
+ ret->i_blocks = 0;
+ ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
+ if (mode & S_IFDIR)
+ ret->i_nlink = 2;
+ else
+ ret->i_nlink = 1;
+ }
+ return ret;
+}
+
+static void hypfs_drop_inode(struct inode *inode)
+{
+ kfree(inode->u.generic_ip);
+ generic_delete_inode(inode);
+}
+
+static int hypfs_open(struct inode *inode, struct file *filp)
+{
+ char *data = (char *)filp->f_dentry->d_inode->u.generic_ip;
+
+ if ((filp->f_mode & FMODE_WRITE) &&
+ (filp->f_dentry != update_file_dentry))
+ return -EPERM;
+ mutex_lock(&hypfs_lock);
+ filp->private_data = kmalloc(strlen(data) + 1, GFP_KERNEL);
+ if (!filp->private_data) {
+ mutex_unlock(&hypfs_lock);
+ return -ENOMEM;
+ }
+ strcpy(filp->private_data, data);
+ mutex_unlock(&hypfs_lock);
+ return 0;
+}
+
+static ssize_t hypfs_aio_read(struct kiocb *iocb, __user char *buf,
+ size_t count, loff_t offset)
+{
+ char *data;
+ size_t len;
+ struct file *filp = iocb->ki_filp;
+
+ data = (char *)filp->private_data;
+ len = strlen(data);
+ if (offset > len) {
+ count = 0;
+ goto out;
+ }
+ if (count > len - offset)
+ count = len - offset;
+ if (copy_to_user(buf, data + offset, count)) {
+ count = -EFAULT;
+ goto out;
+ }
+ iocb->ki_pos += count;
+ file_accessed(filp);
+ out:
+ return count;
+}
+
+static ssize_t hypfs_aio_write(struct kiocb *iocb, const char __user *buf,
+ size_t count, loff_t pos)
+{
+ int rc;
+
+ mutex_lock(&hypfs_lock);
+ if (last_update_time == get_seconds()) {
+ rc = -EBUSY;
+ goto out;
+ }
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
+ if (rc) {
+ printk(KERN_ERR "hypfs: Update failed\n");
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ goto out;
+ }
+ hypfs_update_update();
+ rc = count;
+ out:
+ mutex_unlock(&hypfs_lock);
+ return rc;
+}
+
+static int hypfs_release(struct inode *inode, struct file *filp)
+{
+ kfree(filp->private_data);
+ return 0;
+}
+
+enum { opt_uid, opt_gid, opt_err };
+
+static match_table_t hypfs_tokens = {
+ {opt_uid, "uid=%u"},
+ {opt_gid, "gid=%u"},
+ {opt_err, NULL}
+};
+
+static int hypfs_parse_options(char *options)
+{
+ char *str;
+ substring_t args[MAX_OPT_ARGS];
+
+ if (!options)
+ return 0;
+ while ((str = strsep(&options, ",")) != NULL) {
+ int token, option;
+ struct hypfs_sb_info *hypfs_info = hypfs_sblk->s_fs_info;
+
+ if (!*str)
+ continue;
+ token = match_token(str, hypfs_tokens, args);
+ switch (token) {
+ case opt_uid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ hypfs_info->uid = option;
+ break;
+ case opt_gid:
+ if (match_int(&args[0], &option))
+ return -EINVAL;
+ hypfs_info->gid = option;
+ break;
+ case opt_err:
+ default:
+ printk(KERN_ERR "hypfs: Unrecognized mount option "
+ "\"%s\" or missing value\n", str);
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+static int hypfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct inode *root_inode;
+ int rc = 0;
+ struct hypfs_sb_info *sbi;
+
+ sbi = kmalloc(sizeof(struct hypfs_sb_info), GFP_KERNEL);
+ if (!sbi)
+ return -ENOMEM;
+ sbi->uid = current->uid;
+ sbi->gid = current->gid;
+ sb->s_fs_info = sbi;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = HYPFS_MAGIC;
+ sb->s_op = &hypfs_s_ops;
+ if (hypfs_parse_options(data)) {
+ rc = -EINVAL;
+ goto err_alloc;
+ }
+ root_inode = hypfs_make_inode(sb, S_IFDIR | 0755);
+ if (!root_inode) {
+ rc = -ENOMEM;
+ goto err_alloc;
+ }
+ root_inode->i_op = &simple_dir_inode_operations;
+ root_inode->i_fop = &simple_dir_operations;
+ sb->s_root = d_alloc_root(root_inode);
+ if (!sb->s_root) {
+ rc = -ENOMEM;
+ goto err_inode;
+ }
+ hypfs_sblk = sb;
+ rc = diag_create_files(hypfs_sblk, hypfs_sblk->s_root);
+ if (rc)
+ goto err_tree;
+ update_file_dentry = hypfs_create_update_file(hypfs_sblk,
+ hypfs_sblk->s_root);
+ if (IS_ERR(update_file_dentry)) {
+ rc = PTR_ERR(update_file_dentry);
+ goto err_tree;
+ }
+ hypfs_update_update();
+ return 0;
+
+ err_tree:
+ hypfs_delete_tree(hypfs_sblk->s_root);
+ dput(hypfs_sblk->s_root);
+ err_inode:
+ iput(root_inode);
+ err_alloc:
+ kfree(sbi);
+ return rc;
+}
+
+static void hypfs_put_super(struct super_block *sb)
+{
+ kfree(sb->s_fs_info);
+}
+
+static struct super_block *hypfs_get_super(struct file_system_type *fst,
+ int flags, const char *devname,
+ void *data)
+{
+ return get_sb_single(fst, flags, data, hypfs_fill_super);
+}
+
+static struct dentry *hypfs_create_file(struct super_block *sb,
+ struct dentry *parent, const char *name,
+ char *data, mode_t mode)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ struct qstr qname;
+
+ qname.name = name;
+ qname.len = strlen(name);
+ qname.hash = full_name_hash(name, qname.len);
+ dentry = lookup_one_len(name, parent, strlen(name));
+ if (IS_ERR(dentry))
+ return ERR_PTR(-ENOMEM);
+ inode = hypfs_make_inode(sb, mode);
+ if (!inode) {
+ dput(dentry);
+ return ERR_PTR(-ENOMEM);
+ }
+ if (mode & S_IFREG) {
+ inode->i_fop = &hypfs_file_ops;
+ inode->i_size = strlen(data);
+ } else if (mode & S_IFDIR) {
+ inode->i_op = &simple_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ parent->d_inode->i_nlink++;
+ } else
+ BUG();
+ inode->u.generic_ip = data;
+ d_instantiate(dentry, inode);
+ return dentry;
+}
+
+struct dentry *hypfs_mkdir(struct super_block *sb, struct dentry *parent,
+ const char *name)
+{
+ struct dentry *dentry;
+
+ dentry = hypfs_create_file(sb, parent, name, NULL, S_IFDIR | DIR_MODE);
+ if (IS_ERR(dentry))
+ return dentry;
+ hypfs_add_dentry(dentry);
+ parent->d_inode->i_nlink++;
+ return dentry;
+}
+
+static struct dentry *hypfs_create_update_file(struct super_block *sb,
+ struct dentry *dir)
+{
+ char *buffer;
+ struct dentry *dentry;
+
+ buffer = kmalloc(UPDATE_DATA_SIZE, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ snprintf(buffer, UPDATE_DATA_SIZE, "%d\n", 0);
+ dentry = hypfs_create_file(sb, dir, "update", buffer,
+ S_IFREG | UPDATE_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ /*
+ * We do not put the update file on the 'delete' list with
+ * hypfs_add_dentry(), since it should not be removed when the tree
+ * is updated.
+ */
+ return dentry;
+}
+
+struct dentry *hypfs_create_u64(struct super_block *sb, struct dentry *dir,
+ const char *name, __u64 value)
+{
+ char *buffer;
+ char tmp[TMP_SIZE];
+ struct dentry *dentry;
+
+ snprintf(tmp, TMP_SIZE, "%lld\n", (unsigned long long int)value);
+ buffer = kmalloc(strlen(tmp) + 1, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ strcpy(buffer, tmp);
+ dentry =
+ hypfs_create_file(sb, dir, name, buffer, S_IFREG | REG_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ hypfs_add_dentry(dentry);
+ return dentry;
+}
+
+struct dentry *hypfs_create_str(struct super_block *sb, struct dentry *dir,
+ const char *name, char *string)
+{
+ char *buffer;
+ struct dentry *dentry;
+
+ buffer = kmalloc(strlen(string) + 2, GFP_KERNEL);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+ sprintf(buffer, "%s\n", string);
+ dentry =
+ hypfs_create_file(sb, dir, name, buffer, S_IFREG | REG_FILE_MODE);
+ if (!dentry) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+ hypfs_add_dentry(dentry);
+ return dentry;
+}
+
+static struct file_operations hypfs_file_ops = {
+ .open = hypfs_open,
+ .release = hypfs_release,
+ .read = do_sync_read,
+ .write = do_sync_write,
+ .aio_read = hypfs_aio_read,
+ .aio_write = hypfs_aio_write,
+};
+
+static struct file_system_type hypfs_type = {
+ .owner = THIS_MODULE,
+ .name = "hypfs",
+ .get_sb = hypfs_get_super,
+ .kill_sb = kill_litter_super
+};
+
+static struct super_operations hypfs_s_ops = {
+ .statfs = simple_statfs,
+ .drop_inode = hypfs_drop_inode,
+ .put_super = hypfs_put_super
+};
+
+static decl_subsys(hypervisor, NULL, NULL);
+
+static int __init hypfs_init(void)
+{
+ int rc;
+
+ if (MACHINE_IS_VM)
+ return -ENODATA;
+ if (diag_init()) {
+ rc = -ENODATA;
+ goto err_msg;
+ }
+ rc = subsystem_register(&hypervisor_subsys);
+ if (rc)
+ goto err_diag;
+ rc = register_filesystem(&hypfs_type);
+ if (rc)
+ goto err_sysfs;
+ return 0;
+
+ err_sysfs:
+ subsystem_unregister(&hypervisor_subsys);
+ err_diag:
+ diag_exit();
+ err_msg:
+ printk(KERN_ERR "hypfs: init failed (rc = %i).\n", rc);
+ return rc;
+}
+
+static void __exit hypfs_exit(void)
+{
+ diag_exit();
+ unregister_filesystem(&hypfs_type);
+ subsystem_unregister(&hypervisor_subsys);
+}
+
+module_init(hypfs_init)
+module_exit(hypfs_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Michael Holzheu <holzheu@de.ibm.com>");
+MODULE_DESCRIPTION("zSeries Hypervisor Filesystem");
diff -urpN linux-2.6.16/include/linux/string.h linux-2.6.16-hypfs/include/linux/string.h
--- linux-2.6.16/include/linux/string.h 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/include/linux/string.h 2006-04-24 18:27:07.000000000 +0200
@@ -59,6 +59,9 @@ extern char * strnchr(const char *, size
#ifndef __HAVE_ARCH_STRRCHR
extern char * strrchr(const char *,int);
#endif
+#ifndef __HAVE_ARCH_STRRTRIM
+extern void strrtrim(char *, const char *);
+#endif
#ifndef __HAVE_ARCH_STRSTR
extern char * strstr(const char *,const char *);
#endif
diff -urpN linux-2.6.16/lib/string.c linux-2.6.16-hypfs/lib/string.c
--- linux-2.6.16/lib/string.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-hypfs/lib/string.c 2006-04-24 18:26:46.000000000 +0200
@@ -284,6 +284,28 @@ char *strrchr(const char *s, int c)
EXPORT_SYMBOL(strrchr);
#endif
+#ifndef __HAVE_ARCH_STRRTRIM
+/**
+ * strrtrim - Remove trailing characters specified in @reject
+ * @s: The string to be searched
+ * @reject: The string of letters to avoid
+ */
+void strrtrim(char *s, const char *reject)
+{
+ char *p;
+ const char *r;
+
+ for (p = s + strlen(s) - 1; s <= p; p--) {
+ for (r = reject; (*r != '\0') && (*p != *r); r++)
+ /* nothing */;
+ if (*r == '\0')
+ break;
+ }
+ *(p + 1) = '\0';
+}
+EXPORT_SYMBOL(strrtrim);
+#endif
+
#ifndef __HAVE_ARCH_STRNCHR
/**
* strnchr - Find a character in a length limited string
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-24 17:19 Michael Holzheu
@ 2006-04-25 6:58 ` Pekka Enberg
2006-04-25 7:32 ` Pekka Enberg
2006-04-25 14:33 ` Pekka Enberg
1 sibling, 1 reply; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 6:58 UTC (permalink / raw)
To: Michael Holzheu; +Cc: ioe-lkml, linux-kernel, mschwid2, joern
On Mon, 2006-04-24 at 19:19 +0200, Michael Holzheu wrote:
> +#ifndef __HAVE_ARCH_STRRTRIM
> +/**
> + * strrtrim - Remove trailing characters specified in @reject
> + * @s: The string to be searched
> + * @reject: The string of letters to avoid
> + */
> +void strrtrim(char *s, const char *reject)
I think this should return s to be consistent with other string API
functions.
> +{
> + char *p;
> + const char *r;
You need to exit here if strlen(s) is zero.
> +
> + for (p = s + strlen(s) - 1; s <= p; p--) {
> + for (r = reject; (*r != '\0') && (*p != *r); r++)
> + /* nothing */;
Please use strchr for the above.
> + if (*r == '\0')
> + break;
What are you testing here?
> + }
> + *(p + 1) = '\0';
> +}
> +EXPORT_SYMBOL(strrtrim);
> +#endif
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 6:58 ` Pekka Enberg
@ 2006-04-25 7:32 ` Pekka Enberg
2006-04-25 7:47 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 7:32 UTC (permalink / raw)
To: Michael Holzheu; +Cc: ioe-lkml, linux-kernel, mschwid2, joern
On Mon, 2006-04-24 at 19:19 +0200, Michael Holzheu wrote:
> > +#ifndef __HAVE_ARCH_STRRTRIM
> > +/**
> > + * strrtrim - Remove trailing characters specified in @reject
> > + * @s: The string to be searched
> > + * @reject: The string of letters to avoid
> > + */
> > +void strrtrim(char *s, const char *reject)
On Tue, 2006-04-25 at 09:58 +0300, Pekka Enberg wrote:
> I think this should return s to be consistent with other string API
> functions.
Hmm, thinking about this, I think a better API would be to not have that
reject parameter at all. Would something like this be accetable for your
use?
Pekka
diff --git a/include/linux/string.h b/include/linux/string.h
index c61306d..3d66cae 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -56,6 +56,9 @@ #endif
#ifndef __HAVE_ARCH_STRRCHR
extern char * strrchr(const char *,int);
#endif
+#ifndef __HAVE_ARCH_STRSTRIP
+extern char * strstrip(char *);
+#endif
#ifndef __HAVE_ARCH_STRSTR
extern char * strstr(const char *,const char *);
#endif
diff --git a/lib/string.c b/lib/string.c
index 064f631..5b4257d 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -301,6 +301,38 @@ char *strnchr(const char *s, size_t coun
EXPORT_SYMBOL(strnchr);
#endif
+#ifndef __HAVE_ARCH_STRSTRIP
+/**
+ * strstrip - Removes leading and trailing whitespace from @s.
+ * @s: The string to be stripped.
+ *
+ * Note that the first trailing whitespace is replaced with a %NUL-terminator
+ * in the given string @s. Returns a pointer to the first non-whitespace
+ * character in @s.
+ */
+char *strstrip(char *s)
+{
+ size_t size;
+ char *end;
+
+ size = strlen(s);
+
+ if (!size)
+ return s;
+
+ end = s + size - 1;
+ while (end != s && isspace(*end))
+ end--;
+ *(end + 1) = '\0';
+
+ while (*s && isspace(*s))
+ s++;
+
+ return s;
+}
+EXPORT_SYMBOL(strstrip);
+#endif
+
#ifndef __HAVE_ARCH_STRLEN
/**
* strlen - Find the length of a string
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 7:32 ` Pekka Enberg
@ 2006-04-25 7:47 ` Andrew Morton
2006-04-25 8:01 ` Pekka Enberg
2006-04-25 8:52 ` Jörn Engel
2006-04-25 12:22 ` Michael Holzheu
2 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2006-04-25 7:47 UTC (permalink / raw)
To: Pekka Enberg; +Cc: holzheu, ioe-lkml, linux-kernel, mschwid2, joern
Pekka Enberg <penberg@cs.helsinki.fi> wrote:
>
> +#ifndef __HAVE_ARCH_STRSTRIP
> +extern char * strstrip(char *);
> +#endif
Do we really need this gunk? It's not as if strstrip() is so super
performance-sensitive that anyone would go and write a hand-tuned assembly
version?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 7:47 ` Andrew Morton
@ 2006-04-25 8:01 ` Pekka Enberg
0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 8:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: holzheu, ioe-lkml, linux-kernel, mschwid2, joern
Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> > +#ifndef __HAVE_ARCH_STRSTRIP
> > +extern char * strstrip(char *);
> > +#endif
On Tue, 2006-04-25 at 00:47 -0700, Andrew Morton wrote:
> Do we really need this gunk? It's not as if strstrip() is so super
> performance-sensitive that anyone would go and write a hand-tuned assembly
> version?
I guess not. I added it for consistency, but whatever makes you happy
Andrew :)
Pekka
diff --git a/include/linux/string.h b/include/linux/string.h
index c61306d..e4c7558 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -56,6 +56,7 @@ #endif
#ifndef __HAVE_ARCH_STRRCHR
extern char * strrchr(const char *,int);
#endif
+extern char * strstrip(char *);
#ifndef __HAVE_ARCH_STRSTR
extern char * strstr(const char *,const char *);
#endif
diff --git a/lib/string.c b/lib/string.c
index 064f631..6307726 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -301,6 +301,36 @@ char *strnchr(const char *s, size_t coun
EXPORT_SYMBOL(strnchr);
#endif
+/**
+ * strstrip - Removes leading and trailing whitespace from @s.
+ * @s: The string to be stripped.
+ *
+ * Note that the first trailing whitespace is replaced with a %NUL-terminator
+ * in the given string @s. Returns a pointer to the first non-whitespace
+ * character in @s.
+ */
+char *strstrip(char *s)
+{
+ size_t size;
+ char *end;
+
+ size = strlen(s);
+
+ if (!size)
+ return s;
+
+ end = s + size - 1;
+ while (end != s && isspace(*end))
+ end--;
+ *(end + 1) = '\0';
+
+ while (*s && isspace(*s))
+ s++;
+
+ return s;
+}
+EXPORT_SYMBOL(strstrip);
+
#ifndef __HAVE_ARCH_STRLEN
/**
* strlen - Find the length of a string
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 7:32 ` Pekka Enberg
2006-04-25 7:47 ` Andrew Morton
@ 2006-04-25 8:52 ` Jörn Engel
2006-04-25 9:00 ` Pekka Enberg
2006-04-25 12:22 ` Michael Holzheu
2 siblings, 1 reply; 32+ messages in thread
From: Jörn Engel @ 2006-04-25 8:52 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Michael Holzheu, ioe-lkml, linux-kernel, mschwid2
On Tue, 25 April 2006 10:32:16 +0300, Pekka Enberg wrote:
>
> Hmm, thinking about this, I think a better API would be to not have that
> reject parameter at all. Would something like this be accetable for your
> use?
[...]
> + while (*s && isspace(*s))
> + s++;
Just checking. This does remove a newline, right?
Jörn
--
When in doubt, punt. When somebody actually complains, go back and fix it...
The 90% solution is a good thing.
-- Rob Landley
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 8:52 ` Jörn Engel
@ 2006-04-25 9:00 ` Pekka Enberg
0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 9:00 UTC (permalink / raw)
To: Jörn Engel; +Cc: Michael Holzheu, ioe-lkml, linux-kernel, mschwid2
On Tue, 25 April 2006 10:32:16 +0300, Pekka Enberg wrote:
> > Hmm, thinking about this, I think a better API would be to not have that
> > reject parameter at all. Would something like this be accetable for your
> > use?
On Tue, 2006-04-25 at 10:52 +0200, Jörn Engel wrote:
> [...]
>
> > + while (*s && isspace(*s))
> > + s++;
>
> Just checking. This does remove a newline, right?
Yes, it does.
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 7:32 ` Pekka Enberg
2006-04-25 7:47 ` Andrew Morton
2006-04-25 8:52 ` Jörn Engel
@ 2006-04-25 12:22 ` Michael Holzheu
2 siblings, 0 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-25 12:22 UTC (permalink / raw)
To: Pekka Enberg; +Cc: ioe-lkml, joern, linux-kernel, mschwid2, akpm
Pekka Enberg <penberg@cs.helsinki.fi> wrote on 04/25/2006 09:32:16 AM:
> On Mon, 2006-04-24 at 19:19 +0200, Michael Holzheu wrote:
> > > +#ifndef __HAVE_ARCH_STRRTRIM
> > > +/**
> > > + * strrtrim - Remove trailing characters specified in @reject
> > > + * @s: The string to be searched
> > > + * @reject: The string of letters to avoid
> > > + */
> > > +void strrtrim(char *s, const char *reject)
>
> On Tue, 2006-04-25 at 09:58 +0300, Pekka Enberg wrote:
> > I think this should return s to be consistent with other string API
> > functions.
>
> Hmm, thinking about this, I think a better API would be to not have that
> reject parameter at all. Would something like this be accetable for your
> use?
Yes, strtrip() will work for us! If Andrew takes it, I will use it in my
patch!
Thanks
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-24 17:19 Michael Holzheu
2006-04-25 6:58 ` Pekka Enberg
@ 2006-04-25 14:33 ` Pekka Enberg
2006-04-25 14:45 ` Jörn Engel
2006-04-26 9:30 ` Michael Holzheu
1 sibling, 2 replies; 32+ messages in thread
From: Pekka Enberg @ 2006-04-25 14:33 UTC (permalink / raw)
To: Michael Holzheu; +Cc: ioe-lkml, linux-kernel, mschwid2, joern
On Mon, 2006-04-24 at 19:19 +0200, Michael Holzheu wrote:
> +static int hypfs_create_cpu_files(struct super_block *sb,
> + struct dentry *cpus_dir, void *cpu_info)
> +{
> + struct dentry *cpu_dir;
> + char buffer[TMP_SIZE];
Holy cow! That's 1 KB allocated on the stack! Please use kmalloc()
instead.
> +static int hypfs_create_phys_cpu_files(struct super_block *sb,
> + struct dentry *cpus_dir, void *cpu_info)
> +{
> + struct dentry *cpu_dir;
> + char buffer[TMP_SIZE];
Ditto.
> +static ssize_t hypfs_aio_write(struct kiocb *iocb, const char __user *buf,
> + size_t count, loff_t pos)
> +{
> + int rc;
> +
> + mutex_lock(&hypfs_lock);
> + if (last_update_time == get_seconds()) {
> + rc = -EBUSY;
> + goto out;
> + }
> + hypfs_delete_tree(hypfs_sblk->s_root);
To state what I said earlier: the use of a global hypfs_sblk is
problematic because now we can only have the filesystem mounted once. So
I would really like to see some other way of updating. How do you feel
about the s_ops->fs_remount thing?
Pekka
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 14:33 ` Pekka Enberg
@ 2006-04-25 14:45 ` Jörn Engel
2006-04-26 9:30 ` Michael Holzheu
1 sibling, 0 replies; 32+ messages in thread
From: Jörn Engel @ 2006-04-25 14:45 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Michael Holzheu, ioe-lkml, linux-kernel, mschwid2
On Tue, 25 April 2006 17:33:01 +0300, Pekka Enberg wrote:
>
> To state what I said earlier: the use of a global hypfs_sblk is
> problematic because now we can only have the filesystem mounted once. So
> I would really like to see some other way of updating. How do you feel
> about the s_ops->fs_remount thing?
It sounds as if part of hypfs is a global resource. Maybe the code
could be split between a global get-and-store-data part and a
per-filesystem representation part?
Jörn
--
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH/RFC] s390: Hypervisor File System
2006-04-25 14:33 ` Pekka Enberg
2006-04-25 14:45 ` Jörn Engel
@ 2006-04-26 9:30 ` Michael Holzheu
1 sibling, 0 replies; 32+ messages in thread
From: Michael Holzheu @ 2006-04-26 9:30 UTC (permalink / raw)
To: Pekka Enberg; +Cc: ioe-lkml, joern, linux-kernel, mschwid2
Hi Pekka,
Pekka Enberg <penberg@cs.helsinki.fi> wrote on 04/25/2006 04:33:01 PM:
> On Mon, 2006-04-24 at 19:19 +0200, Michael Holzheu wrote:
> > +static int hypfs_create_cpu_files(struct super_block *sb,
> > + struct dentry *cpus_dir, void *cpu_info)
> > +{
> > + struct dentry *cpu_dir;
> > + char buffer[TMP_SIZE];
>
> Holy cow! That's 1 KB allocated on the stack! Please use kmalloc()
> instead.
Thanks! We also have found that already! Shame on me ...
> > +static ssize_t hypfs_aio_write(struct kiocb *iocb, const char __user
*buf,
> > + size_t count, loff_t pos)
> > +{
> > + int rc;
> > +
> > + mutex_lock(&hypfs_lock);
> > + if (last_update_time == get_seconds()) {
> > + rc = -EBUSY;
> > + goto out;
> > + }
> > + hypfs_delete_tree(hypfs_sblk->s_root);
>
> To state what I said earlier: the use of a global hypfs_sblk is
> problematic because now we can only have the filesystem mounted once. So
> I would really like to see some other way of updating. How do you feel
> about the s_ops->fs_remount thing?
>
I will eliminate the global variable anyway, since I can
get the superblock also from the inode:
+static ssize_t hypfs_aio_write(struct kiocb *iocb, const char __user *buf,
+ size_t count, loff_t pos)
+{
+ int rc;
+ struct super_block *sb;
+
+ mutex_lock(&hypfs_lock);
+ sb = iocb->ki_filp->f_dentry->d_inode->i_sb;
+ if (last_update_time == get_seconds()) {
+ rc = -EBUSY;
+ goto out;
+ }
+ hypfs_delete_tree(sb->s_root);
+ rc = diag_create_files(sb, sb->s_root);
+ if (rc) {
+ printk(KERN_ERR "hypfs: Update failed\n");
+ hypfs_delete_tree(sb->s_root);
+ goto out;
+ }
+ hypfs_update_update();
+ rc = count;
+ out:
+ mutex_unlock(&hypfs_lock);
+ return rc;
+}
Hmmm... sure, you could use remount to trigger the update.
One advantage of remount is that we do not need the hypfs_lock,
right. But having the hypfs lock needs only three lines
of code and therefore this is not really a strong argument
in my eyes.
I think one disadvantage of the remount mechanism is that it
is not as intuitive as the update attribute. If you have
an update file, it is clear that writing to that file
triggers an update. That a remount triggers an update
seems to be less intuitive for me.
But the bigger disadvantage is that only root (or any user
if you specify 'user' in /etc/fstab) can trigger
the update with remount. We want to allow also other users
to trigger the update and read the attributes. The current
implementation for that is to specify uid and gid as mount
option to define the owner of the files. For example you
can run a user space program with uid xxx and specify
uid=xxx as mount option to allow the program to update
and read the data.
Michael
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-04-26 9:30 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-21 11:35 [PATCH/RFC] s390: Hypervisor File System Michael Holzheu
2006-04-21 11:53 ` Pekka Enberg
2006-04-21 13:56 ` Michael Holzheu
2006-04-21 14:55 ` Arnd Bergmann
2006-04-21 15:31 ` Michael Holzheu
2006-04-21 13:32 ` Pekka Enberg
2006-04-21 14:08 ` Michael Holzheu
2006-04-21 15:38 ` Pekka Enberg
2006-04-21 16:40 ` Martin Schwidefsky
2006-04-25 14:04 ` Pekka Enberg
2006-04-21 14:42 ` Pekka Enberg
2006-04-21 14:59 ` Michael Holzheu
2006-04-21 15:41 ` Pekka Enberg
2006-04-21 15:18 ` Jörn Engel
2006-04-21 15:36 ` Michael Holzheu
2006-04-21 15:46 ` Jörn Engel
2006-04-21 22:30 ` Ingo Oeser
2006-04-24 17:17 ` Michael Holzheu
2006-04-24 19:57 ` Ingo Oeser
2006-04-25 6:27 ` Pekka Enberg
2006-04-24 17:19 ` Michael Holzheu
-- strict thread matches above, loose matches on Subject: below --
2006-04-24 17:19 Michael Holzheu
2006-04-25 6:58 ` Pekka Enberg
2006-04-25 7:32 ` Pekka Enberg
2006-04-25 7:47 ` Andrew Morton
2006-04-25 8:01 ` Pekka Enberg
2006-04-25 8:52 ` Jörn Engel
2006-04-25 9:00 ` Pekka Enberg
2006-04-25 12:22 ` Michael Holzheu
2006-04-25 14:33 ` Pekka Enberg
2006-04-25 14:45 ` Jörn Engel
2006-04-26 9:30 ` Michael Holzheu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox