* [PATCH 0/5] Allow more than PAGESIZE data read in configfs
@ 2006-10-10 18:20 Chandra Seetharaman
2006-10-10 18:20 ` [PATCH 1/5] Fix a module count leak Chandra Seetharaman
` (5 more replies)
0 siblings, 6 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:20 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: ckrm-tech, Chandra Seetharaman, linux-kernel
Currently, maximum amount of data that can be read from a configfs
attribute file is limited to PAGESIZE bytes. This is a limitation for
some of the usages of configfs.
This patchset removes that limitaion by using seq_file for read operations
instead of using locally allocated buffer.
Tested the interface change with configfs_example.c in the Documentation
directory.
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/5] Fix a module count leak.
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
@ 2006-10-10 18:20 ` Chandra Seetharaman
2006-10-10 22:17 ` Joel Becker
2006-10-10 18:20 ` [PATCH 2/5] Use seq_file for read side of operations Chandra Seetharaman
` (4 subsequent siblings)
5 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:20 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: linux-kernel, Chandra Seetharaman, ckrm-tech
check_perm() does not drop the reference to the module when kmalloc()
failure occurs. This patch fixes the problem.
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
--
fs/configfs/file.c | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)
Index: linux-2.6.18/fs/configfs/file.c
===================================================================
--- linux-2.6.18.orig/fs/configfs/file.c
+++ linux-2.6.18/fs/configfs/file.c
@@ -275,14 +275,15 @@ static int check_perm(struct inode * ino
* it in file->private_data for easy access.
*/
buffer = kmalloc(sizeof(struct configfs_buffer),GFP_KERNEL);
- if (buffer) {
- memset(buffer,0,sizeof(struct configfs_buffer));
- init_MUTEX(&buffer->sem);
- buffer->needs_read_fill = 1;
- buffer->ops = ops;
- file->private_data = buffer;
- } else
+ if (!buffer) {
error = -ENOMEM;
+ goto Enomem;
+ }
+ memset(buffer,0,sizeof(struct configfs_buffer));
+ init_MUTEX(&buffer->sem);
+ buffer->needs_read_fill = 1;
+ buffer->ops = ops;
+ file->private_data = buffer;
goto Done;
Einval:
@@ -290,6 +291,7 @@ static int check_perm(struct inode * ino
goto Done;
Eaccess:
error = -EACCES;
+ Enomem:
module_put(attr->ca_owner);
Done:
if (error && item)
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 2/5] Use seq_file for read side of operations
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
2006-10-10 18:20 ` [PATCH 1/5] Fix a module count leak Chandra Seetharaman
@ 2006-10-10 18:20 ` Chandra Seetharaman
2006-10-11 9:12 ` Joel Becker
2006-10-10 18:21 ` [PATCH 3/5] Change configfs_example.c to use the new interface Chandra Seetharaman
` (3 subsequent siblings)
5 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:20 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: ckrm-tech, linux-kernel, Chandra Seetharaman
configfs currently has a limitation that the kernel can send only
PAGESIZE of data through a attribute.
This patch removes that limitation by using seq_file in the read side of
operations.
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
--
fs/configfs/file.c | 130 ++++++++++-------------------------------------
include/linux/configfs.h | 3 -
2 files changed, 31 insertions(+), 102 deletions(-)
Index: linux-2.6.18/fs/configfs/file.c
===================================================================
--- linux-2.6.18.orig/fs/configfs/file.c
+++ linux-2.6.18/fs/configfs/file.c
@@ -40,112 +40,33 @@ struct configfs_buffer {
char * page;
struct configfs_item_operations * ops;
struct semaphore sem;
- int needs_read_fill;
+ struct dentry * dentry;
};
/**
- * fill_read_buffer - allocate and fill buffer from item.
- * @dentry: dentry pointer.
- * @buffer: data buffer for file.
- *
- * Allocate @buffer->page, if it hasn't been already, then call the
- * config_item's show() method to fill the buffer with this attribute's
- * data.
- * This is called only once, on the file's first read.
- */
-static int fill_read_buffer(struct dentry * dentry, struct configfs_buffer * buffer)
-{
- struct configfs_attribute * attr = to_attr(dentry);
- struct config_item * item = to_item(dentry->d_parent);
- struct configfs_item_operations * ops = buffer->ops;
- int ret = 0;
- ssize_t count;
-
- if (!buffer->page)
- buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
- if (!buffer->page)
- return -ENOMEM;
-
- count = ops->show_attribute(item,attr,buffer->page);
- buffer->needs_read_fill = 0;
- BUG_ON(count > (ssize_t)PAGE_SIZE);
- if (count >= 0)
- buffer->count = count;
- else
- ret = count;
- return ret;
-}
-
-
-/**
- * flush_read_buffer - push buffer to userspace.
- * @buffer: data buffer for file.
- * @userbuf: user-passed buffer.
- * @count: number of bytes requested.
- * @ppos: file position.
- *
- * Copy the buffer we filled in fill_read_buffer() to userspace.
- * This is done at the reader's leisure, copying and advancing
- * the amount they specify each time.
- * This may be called continuously until the buffer is empty.
- */
-static int flush_read_buffer(struct configfs_buffer * buffer, char __user * buf,
- size_t count, loff_t * ppos)
-{
- int error;
-
- if (*ppos > buffer->count)
- return 0;
-
- if (count > (buffer->count - *ppos))
- count = buffer->count - *ppos;
-
- error = copy_to_user(buf,buffer->page + *ppos,count);
- if (!error)
- *ppos += count;
- return error ? -EFAULT : count;
-}
-
-/**
* configfs_read_file - read an attribute.
- * @file: file pointer.
- * @buf: buffer to fill.
- * @count: number of bytes to read.
- * @ppos: starting offset in file.
+ * @s: seq_file pointer.
+ * @v: unused void pointer
*
* Userspace wants to read an attribute file. The attribute descriptor
- * is in the file's ->d_fsdata. The target item is in the directory's
- * ->d_fsdata.
+ * is available through dentry, which is stored in the seq_file.
+ * The target item is in the dentry's parent.
+ *
+ * We just call the show() method directly.
*
- * We call fill_read_buffer() to allocate and fill the buffer from the
- * item's show() method exactly once (if the read is happening from
- * the beginning of the file). That should fill the entire buffer with
- * all the data the item has to offer for that attribute.
- * We then call flush_read_buffer() to copy the buffer to userspace
- * in the increments specified.
*/
-
-static ssize_t
-configfs_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+static int
+configfs_read_file(struct seq_file *s, void *v)
{
- struct configfs_buffer * buffer = file->private_data;
- ssize_t retval = 0;
+ struct configfs_buffer * buffer = s->private;
+ struct configfs_item_operations * ops = buffer->ops;
+ struct configfs_attribute * attr = to_attr(buffer->dentry);
+ struct config_item * item = to_item(buffer->dentry->d_parent);
- down(&buffer->sem);
- if (buffer->needs_read_fill) {
- if ((retval = fill_read_buffer(file->f_dentry,buffer)))
- goto out;
- }
- pr_debug("%s: count = %d, ppos = %lld, buf = %s\n",
- __FUNCTION__,count,*ppos,buffer->page);
- retval = flush_read_buffer(buffer,buf,count,ppos);
-out:
- up(&buffer->sem);
- return retval;
+ return ops->show_attribute(item, attr, s);
}
-
/**
* fill_write_buffer - copy buffer from userspace.
* @buffer: data buffer for file.
@@ -169,7 +90,6 @@ fill_write_buffer(struct configfs_buffer
if (count > PAGE_SIZE)
count = PAGE_SIZE;
error = copy_from_user(buffer->page,buf,count);
- buffer->needs_read_fill = 1;
return error ? -EFAULT : count;
}
@@ -216,7 +136,8 @@ flush_write_buffer(struct dentry * dentr
static ssize_t
configfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
- struct configfs_buffer * buffer = file->private_data;
+ struct configfs_buffer * buffer =
+ ((struct seq_file *)file->private_data)->private;
ssize_t len;
down(&buffer->sem);
@@ -281,9 +202,14 @@ static int check_perm(struct inode * ino
}
memset(buffer,0,sizeof(struct configfs_buffer));
init_MUTEX(&buffer->sem);
- buffer->needs_read_fill = 1;
buffer->ops = ops;
- file->private_data = buffer;
+ buffer->dentry = file->f_dentry;
+
+ error = single_open(file, configfs_read_file, buffer);
+ if (error) {
+ kfree(buffer);
+ goto Enomem;
+ }
goto Done;
Einval:
@@ -309,7 +235,8 @@ static int configfs_release(struct inode
struct config_item * item = to_item(filp->f_dentry->d_parent);
struct configfs_attribute * attr = to_attr(filp->f_dentry);
struct module * owner = attr->ca_owner;
- struct configfs_buffer * buffer = filp->private_data;
+ struct configfs_buffer * buffer =
+ ((struct seq_file *)filp->private_data)->private;
if (item)
config_item_put(item);
@@ -321,13 +248,14 @@ static int configfs_release(struct inode
free_page((unsigned long)buffer->page);
kfree(buffer);
}
- return 0;
+
+ return single_release(inode, filp);
}
const struct file_operations configfs_file_operations = {
- .read = configfs_read_file,
+ .read = seq_read,
.write = configfs_write_file,
- .llseek = generic_file_llseek,
+ .llseek = seq_lseek,
.open = configfs_open_file,
.release = configfs_release,
};
Index: linux-2.6.18/include/linux/configfs.h
===================================================================
--- linux-2.6.18.orig/include/linux/configfs.h
+++ linux-2.6.18/include/linux/configfs.h
@@ -40,6 +40,7 @@
#include <linux/types.h>
#include <linux/list.h>
#include <linux/kref.h>
+#include <linux/seq_file.h>
#include <asm/atomic.h>
#include <asm/semaphore.h>
@@ -147,7 +148,7 @@ struct configfs_attribute {
*/
struct configfs_item_operations {
void (*release)(struct config_item *);
- ssize_t (*show_attribute)(struct config_item *, struct configfs_attribute *,char *);
+ int (*show_attribute)(struct config_item *, struct configfs_attribute *,struct seq_file *);
ssize_t (*store_attribute)(struct config_item *,struct configfs_attribute *,const char *, size_t);
int (*allow_link)(struct config_item *src, struct config_item *target);
int (*drop_link)(struct config_item *src, struct config_item *target);
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 3/5] Change configfs_example.c to use the new interface
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
2006-10-10 18:20 ` [PATCH 1/5] Fix a module count leak Chandra Seetharaman
2006-10-10 18:20 ` [PATCH 2/5] Use seq_file for read side of operations Chandra Seetharaman
@ 2006-10-10 18:21 ` Chandra Seetharaman
2006-10-10 18:21 ` [PATCH 4/5] Change Documentation to reflect " Chandra Seetharaman
` (2 subsequent siblings)
5 siblings, 0 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:21 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: Chandra Seetharaman, linux-kernel, ckrm-tech
Change the example to suit the modified interface to use seq_file.
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
--
Documentation/filesystems/configfs/configfs_example.c | 30 +++++++++---------
1 files changed, 15 insertions(+), 15 deletions(-)
Index: linux-2.6.18/Documentation/filesystems/configfs/configfs_example.c
===================================================================
--- linux-2.6.18.orig/Documentation/filesystems/configfs/configfs_example.c
+++ linux-2.6.18/Documentation/filesystems/configfs/configfs_example.c
@@ -53,7 +53,7 @@ struct childless {
struct childless_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct childless *, char *);
+ ssize_t (*show)(struct childless *, struct seq_file *);
ssize_t (*store)(struct childless *, const char *, size_t);
};
@@ -63,20 +63,20 @@ static inline struct childless *to_child
}
static ssize_t childless_showme_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
ssize_t pos;
- pos = sprintf(page, "%d\n", childless->showme);
+ pos = seq_printf(s, "%d\n", childless->showme);
childless->showme++;
return pos;
}
static ssize_t childless_storeme_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", childless->storeme);
+ return seq_printf(s, "%d\n", childless->storeme);
}
static ssize_t childless_storeme_write(struct childless *childless,
@@ -99,9 +99,9 @@ static ssize_t childless_storeme_write(s
}
static ssize_t childless_description_read(struct childless *childless,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[01-childless]\n"
"\n"
"The childless subsystem is the simplest possible subsystem in\n"
@@ -133,7 +133,7 @@ static struct configfs_attribute *childl
static ssize_t childless_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct childless *childless = to_childless(item);
struct childless_attribute *childless_attr =
@@ -141,7 +141,7 @@ static ssize_t childless_attr_show(struc
ssize_t ret = 0;
if (childless_attr->show)
- ret = childless_attr->show(childless, page);
+ ret = childless_attr->show(childless, s);
return ret;
}
@@ -216,12 +216,12 @@ static struct configfs_attribute *simple
static ssize_t simple_child_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
ssize_t count;
struct simple_child *simple_child = to_simple_child(item);
- count = sprintf(page, "%d\n", simple_child->storeme);
+ count = seq_printf(s, "%d\n", simple_child->storeme);
return count;
}
@@ -304,9 +304,9 @@ static struct configfs_attribute *simple
static ssize_t simple_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[02-simple-children]\n"
"\n"
"This subsystem allows the creation of child config_items. These\n"
@@ -390,9 +390,9 @@ static struct configfs_attribute *group_
static ssize_t group_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page,
+ return seq_printf(s,
"[03-group-children]\n"
"\n"
"This subsystem allows the creation of child config_groups. These\n"
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 4/5] Change Documentation to reflect the new interface
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
` (2 preceding siblings ...)
2006-10-10 18:21 ` [PATCH 3/5] Change configfs_example.c to use the new interface Chandra Seetharaman
@ 2006-10-10 18:21 ` Chandra Seetharaman
2006-10-10 18:21 ` [PATCH 5/5] Change the existing code to use " Chandra Seetharaman
2006-10-10 20:35 ` [PATCH 0/5] Allow more than PAGESIZE data read in configfs Joel Becker
5 siblings, 0 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:21 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: ckrm-tech, Chandra Seetharaman, linux-kernel
Change the interface definition of show_attribute() to use seq_file.
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
--
Documentation/filesystems/configfs/configfs.txt | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.18/Documentation/filesystems/configfs/configfs.txt
===================================================================
--- linux-2.6.18.orig/Documentation/filesystems/configfs/configfs.txt
+++ linux-2.6.18/Documentation/filesystems/configfs/configfs.txt
@@ -162,7 +162,7 @@ among other things. For that, it needs
void (*release)(struct config_item *);
ssize_t (*show_attribute)(struct config_item *,
struct configfs_attribute *,
- char *);
+ struct seq_file *);
ssize_t (*store_attribute)(struct config_item *,
struct configfs_attribute *,
const char *, size_t);
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 5/5] Change the existing code to use the new interface
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
` (3 preceding siblings ...)
2006-10-10 18:21 ` [PATCH 4/5] Change Documentation to reflect " Chandra Seetharaman
@ 2006-10-10 18:21 ` Chandra Seetharaman
2006-10-10 20:35 ` [PATCH 0/5] Allow more than PAGESIZE data read in configfs Joel Becker
5 siblings, 0 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 18:21 UTC (permalink / raw)
To: akpm, Joel.Becker; +Cc: linux-kernel, Chandra Seetharaman, ckrm-tech
Changes to the current user of configfs, OCFS2, to use the new
interface of show_attribute().
Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
--
fs/ocfs2/cluster/heartbeat.c | 32 ++++++++++++++++----------------
fs/ocfs2/cluster/nodemanager.c | 25 ++++++++++++++-----------
2 files changed, 30 insertions(+), 27 deletions(-)
Index: linux-2.6.18/fs/ocfs2/cluster/heartbeat.c
===================================================================
--- linux-2.6.18.orig/fs/ocfs2/cluster/heartbeat.c
+++ linux-2.6.18/fs/ocfs2/cluster/heartbeat.c
@@ -1111,9 +1111,9 @@ static int o2hb_read_block_input(struct
}
static ssize_t o2hb_region_block_bytes_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", reg->hr_block_bytes);
+ return seq_printf(s, "%u\n", reg->hr_block_bytes);
}
static ssize_t o2hb_region_block_bytes_write(struct o2hb_region *reg,
@@ -1139,9 +1139,9 @@ static ssize_t o2hb_region_block_bytes_w
}
static ssize_t o2hb_region_start_block_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%llu\n", reg->hr_start_block);
+ return seq_printf(s, "%llu\n", reg->hr_start_block);
}
static ssize_t o2hb_region_start_block_write(struct o2hb_region *reg,
@@ -1164,9 +1164,9 @@ static ssize_t o2hb_region_start_block_w
}
static ssize_t o2hb_region_blocks_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", reg->hr_blocks);
+ return seq_printf(s, "%d\n", reg->hr_blocks);
}
static ssize_t o2hb_region_blocks_write(struct o2hb_region *reg,
@@ -1192,12 +1192,12 @@ static ssize_t o2hb_region_blocks_write(
}
static ssize_t o2hb_region_dev_read(struct o2hb_region *reg,
- char *page)
+ struct seq_file *s)
{
unsigned int ret = 0;
if (reg->hr_bdev)
- ret = sprintf(page, "%s\n", reg->hr_dev_name);
+ ret = seq_printf(s, "%s\n", reg->hr_dev_name);
return ret;
}
@@ -1443,7 +1443,7 @@ out:
struct o2hb_region_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2hb_region *, char *);
+ ssize_t (*show)(struct o2hb_region *, struct seq_file *);
ssize_t (*store)(struct o2hb_region *, const char *, size_t);
};
@@ -1489,7 +1489,7 @@ static struct configfs_attribute *o2hb_r
static ssize_t o2hb_region_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2hb_region *reg = to_o2hb_region(item);
struct o2hb_region_attribute *o2hb_region_attr =
@@ -1497,7 +1497,7 @@ static ssize_t o2hb_region_show(struct c
ssize_t ret = 0;
if (o2hb_region_attr->show)
- ret = o2hb_region_attr->show(reg, page);
+ ret = o2hb_region_attr->show(reg, s);
return ret;
}
@@ -1581,13 +1581,13 @@ static void o2hb_heartbeat_group_drop_it
struct o2hb_heartbeat_group_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2hb_heartbeat_group *, char *);
+ ssize_t (*show)(struct o2hb_heartbeat_group *, struct seq_file *);
ssize_t (*store)(struct o2hb_heartbeat_group *, const char *, size_t);
};
static ssize_t o2hb_heartbeat_group_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2hb_heartbeat_group *reg = to_o2hb_heartbeat_group(to_config_group(item));
struct o2hb_heartbeat_group_attribute *o2hb_heartbeat_group_attr =
@@ -1595,7 +1595,7 @@ static ssize_t o2hb_heartbeat_group_show
ssize_t ret = 0;
if (o2hb_heartbeat_group_attr->show)
- ret = o2hb_heartbeat_group_attr->show(reg, page);
+ ret = o2hb_heartbeat_group_attr->show(reg, s);
return ret;
}
@@ -1614,9 +1614,9 @@ static ssize_t o2hb_heartbeat_group_stor
}
static ssize_t o2hb_heartbeat_group_threshold_show(struct o2hb_heartbeat_group *group,
- char *page)
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", o2hb_dead_threshold);
+ return seq_printf(s, "%u\n", o2hb_dead_threshold);
}
static ssize_t o2hb_heartbeat_group_threshold_store(struct o2hb_heartbeat_group *group,
Index: linux-2.6.18/fs/ocfs2/cluster/nodemanager.c
===================================================================
--- linux-2.6.18.orig/fs/ocfs2/cluster/nodemanager.c
+++ linux-2.6.18/fs/ocfs2/cluster/nodemanager.c
@@ -238,9 +238,9 @@ static void o2nm_node_release(struct con
kfree(node);
}
-static ssize_t o2nm_node_num_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_num_read(struct o2nm_node *node, struct seq_file *s)
{
- return sprintf(page, "%d\n", node->nd_num);
+ return seq_printf(s, "%d\n", node->nd_num);
}
static struct o2nm_cluster *to_o2nm_cluster_from_node(struct o2nm_node *node)
@@ -293,9 +293,10 @@ static ssize_t o2nm_node_num_write(struc
return count;
}
-static ssize_t o2nm_node_ipv4_port_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_ipv4_port_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%u\n", ntohs(node->nd_ipv4_port));
+ return seq_printf(s, "%u\n", ntohs(node->nd_ipv4_port));
}
static ssize_t o2nm_node_ipv4_port_write(struct o2nm_node *node,
@@ -318,9 +319,10 @@ static ssize_t o2nm_node_ipv4_port_write
return count;
}
-static ssize_t o2nm_node_ipv4_address_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_ipv4_address_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%u.%u.%u.%u\n", NIPQUAD(node->nd_ipv4_address));
+ return seq_printf(s, "%u.%u.%u.%u\n", NIPQUAD(node->nd_ipv4_address));
}
static ssize_t o2nm_node_ipv4_address_write(struct o2nm_node *node,
@@ -361,9 +363,10 @@ static ssize_t o2nm_node_ipv4_address_wr
return count;
}
-static ssize_t o2nm_node_local_read(struct o2nm_node *node, char *page)
+static ssize_t o2nm_node_local_read(struct o2nm_node *node,
+ struct seq_file *s)
{
- return sprintf(page, "%d\n", node->nd_local);
+ return seq_printf(s, "%d\n", node->nd_local);
}
static ssize_t o2nm_node_local_write(struct o2nm_node *node, const char *page,
@@ -417,7 +420,7 @@ static ssize_t o2nm_node_local_write(str
struct o2nm_node_attribute {
struct configfs_attribute attr;
- ssize_t (*show)(struct o2nm_node *, char *);
+ ssize_t (*show)(struct o2nm_node *, struct seq_file *);
ssize_t (*store)(struct o2nm_node *, const char *, size_t);
};
@@ -474,7 +477,7 @@ static int o2nm_attr_index(struct config
static ssize_t o2nm_node_show(struct config_item *item,
struct configfs_attribute *attr,
- char *page)
+ struct seq_file *s)
{
struct o2nm_node *node = to_o2nm_node(item);
struct o2nm_node_attribute *o2nm_node_attr =
@@ -482,7 +485,7 @@ static ssize_t o2nm_node_show(struct con
ssize_t ret = 0;
if (o2nm_node_attr->show)
- ret = o2nm_node_attr->show(node, page);
+ ret = o2nm_node_attr->show(node, s);
return ret;
}
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
` (4 preceding siblings ...)
2006-10-10 18:21 ` [PATCH 5/5] Change the existing code to use " Chandra Seetharaman
@ 2006-10-10 20:35 ` Joel Becker
2006-10-10 21:31 ` [ckrm-tech] " Paul Menage
2006-10-11 20:19 ` Andrew Morton
5 siblings, 2 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-10 20:35 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: akpm, ckrm-tech, linux-kernel
On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> Currently, maximum amount of data that can be read from a configfs
> attribute file is limited to PAGESIZE bytes. This is a limitation for
> some of the usages of configfs.
NAK. This forces a complex and inappropriate interface on the
majority of users, and doesn't honor configfs' simplicity-first design.
I understand Chandra's concerns, but this patch isn't the right
way to do it.
Joel
--
Dort wo man Bücher verbrennt, verbrennt man am Ende auch Mensch.
(Wherever they burn books, they will also end up burning people.)
- Heinrich Heine
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 20:35 ` [PATCH 0/5] Allow more than PAGESIZE data read in configfs Joel Becker
@ 2006-10-10 21:31 ` Paul Menage
2006-10-10 21:58 ` Joel Becker
2006-10-11 20:19 ` Andrew Morton
1 sibling, 1 reply; 56+ messages in thread
From: Paul Menage @ 2006-10-10 21:31 UTC (permalink / raw)
To: Chandra Seetharaman, akpm, ckrm-tech, linux-kernel
On 10/10/06, Joel Becker <Joel.Becker@oracle.com> wrote:
> On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> > Currently, maximum amount of data that can be read from a configfs
> > attribute file is limited to PAGESIZE bytes. This is a limitation for
> > some of the usages of configfs.
>
> NAK. This forces a complex and inappropriate interface on the
> majority of users, and doesn't honor configfs' simplicity-first design.
How is the seq_file interface complex and inappropriate? For the
configfs clients it's basically a drop-in replacement for sprintf(),
as Chandra's patches show.
Paul
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 21:31 ` [ckrm-tech] " Paul Menage
@ 2006-10-10 21:58 ` Joel Becker
2006-10-10 23:13 ` Chandra Seetharaman
2006-10-11 0:49 ` Matt Helsley
0 siblings, 2 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-10 21:58 UTC (permalink / raw)
To: Paul Menage; +Cc: Chandra Seetharaman, akpm, ckrm-tech, linux-kernel
On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > NAK. This forces a complex and inappropriate interface on the
> >majority of users, and doesn't honor configfs' simplicity-first design.
>
> How is the seq_file interface complex and inappropriate? For the
> configfs clients it's basically a drop-in replacement for sprintf(),
> as Chandra's patches show.
Well, they now have to learn seq_file. They now get to assume
that "spewing large amounts of junk" is the default rather than "single
attribute", which is correct. None of it is relevant for the majority
of correct users.
It exposes the "I'm a file" knowledge down to the client module.
The entire point of configfs is that the "filesystem bits" are
independant of the "client bits". To the client, it's an item
hierarchy. To the user, the interface happens to be a filesystem.
Technically, the seq_printf() as a drop-in replacement seems to
be functional. I'm worried about lifetiming, but I think it's OK (what
do I mean? If I open the file, I'd better not be able to remove the
client module until everything is torn down. If I close the file, it
had better get all torn down before module_put() so that when
->release() returns, te module can safely be removed. I *think* this
change satisfies these worries, but it's something that absolutely has
to be done right. Yes, I'm very paranoid about this).
My bigger worry is that we haven't solved the write side. How
does one *set* a large attribute? It had better not be multiple
attributes. I know that your module doesn't set it, but hey, we don't
codify that requirement. Perhaps a patch where we say "if you are a
large display attribute, we'll use seq_file and error on write because
it isn't allowed" but that leaves the old buffer-based approach for
normal-sized read-write attributes.
Joel
--
Life's Little Instruction Book #252
"Take good care of those you love."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/5] Fix a module count leak.
2006-10-10 18:20 ` [PATCH 1/5] Fix a module count leak Chandra Seetharaman
@ 2006-10-10 22:17 ` Joel Becker
0 siblings, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-10 22:17 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: akpm, linux-kernel, ckrm-tech, mark.fasheh
Obviously a bug. I've pulled it into my tree and it should appear in
OCFS2's ALL branch very shortly.
Joel
On Tue, Oct 10, 2006 at 11:20:49AM -0700, Chandra Seetharaman wrote:
> check_perm() does not drop the reference to the module when kmalloc()
> failure occurs. This patch fixes the problem.
>
> Signed-Off-By: Chandra Seetharaman <sekharan@us.ibm.com>
> --
>
> fs/configfs/file.c | 16 +++++++++-------
> 1 files changed, 9 insertions(+), 7 deletions(-)
>
> Index: linux-2.6.18/fs/configfs/file.c
> ===================================================================
> --- linux-2.6.18.orig/fs/configfs/file.c
> +++ linux-2.6.18/fs/configfs/file.c
> @@ -275,14 +275,15 @@ static int check_perm(struct inode * ino
> * it in file->private_data for easy access.
> */
> buffer = kmalloc(sizeof(struct configfs_buffer),GFP_KERNEL);
> - if (buffer) {
> - memset(buffer,0,sizeof(struct configfs_buffer));
> - init_MUTEX(&buffer->sem);
> - buffer->needs_read_fill = 1;
> - buffer->ops = ops;
> - file->private_data = buffer;
> - } else
> + if (!buffer) {
> error = -ENOMEM;
> + goto Enomem;
> + }
> + memset(buffer,0,sizeof(struct configfs_buffer));
> + init_MUTEX(&buffer->sem);
> + buffer->needs_read_fill = 1;
> + buffer->ops = ops;
> + file->private_data = buffer;
> goto Done;
>
> Einval:
> @@ -290,6 +291,7 @@ static int check_perm(struct inode * ino
> goto Done;
> Eaccess:
> error = -EACCES;
> + Enomem:
> module_put(attr->ca_owner);
> Done:
> if (error && item)
>
> --
>
> ----------------------------------------------------------------------
> Chandra Seetharaman | Be careful what you choose....
> - sekharan@us.ibm.com | .......you may get it.
> ----------------------------------------------------------------------
--
"I almost ran over an angel
He had a nice big fat cigar.
'In a sense,' he said, 'You're alone here
So if you jump, you'd best jump far.'"
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 21:58 ` Joel Becker
@ 2006-10-10 23:13 ` Chandra Seetharaman
2006-10-11 0:15 ` Joel Becker
2006-10-11 0:49 ` Matt Helsley
1 sibling, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-10 23:13 UTC (permalink / raw)
To: Joel Becker; +Cc: Paul Menage, akpm, ckrm-tech, linux-kernel
On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > > NAK. This forces a complex and inappropriate interface on the
> > >majority of users, and doesn't honor configfs' simplicity-first design.
> >
> > How is the seq_file interface complex and inappropriate? For the
> > configfs clients it's basically a drop-in replacement for sprintf(),
> > as Chandra's patches show.
>
> Well, they now have to learn seq_file. They now get to assume
If they are simple users, they don't have to "learn" seq_file semantics,
they would just replace their sprintf's with seq_printfs (as my changes
in OCFS2 show).
IMO, seq_file interface is not that complex to learn either.
> that "spewing large amounts of junk" is the default rather than "single
> attribute", which is correct. None of it is relevant for the majority
> of correct users.
"char *" can also be used to spew out large amount of data (ok, maybe up
to PAGESIZE in configfs's case :). My point is that changing char * to
seq_file doesn't necessarily "introduce" the issue (of spewing large
amounts of data).
> It exposes the "I'm a file" knowledge down to the client module.
> The entire point of configfs is that the "filesystem bits" are
> independant of the "client bits". To the client, it's an item
> hierarchy. To the user, the interface happens to be a filesystem.
This issue is moot, unless you have intentions of changing the user
interface of configfs to be anything other than a file system, isn't
it ?
> Technically, the seq_printf() as a drop-in replacement seems to
> be functional. I'm worried about lifetiming, but I think it's OK (what
> do I mean? If I open the file, I'd better not be able to remove the
> client module until everything is torn down. If I close the file, it
> had better get all torn down before module_put() so that when
> ->release() returns, te module can safely be removed. I *think* this
> change satisfies these worries, but it's something that absolutely has
> to be done right. Yes, I'm very paranoid about this).
> My bigger worry is that we haven't solved the write side. How
> does one *set* a large attribute? It had better not be multiple
> attributes. I know that your module doesn't set it, but hey, we don't
> codify that requirement. Perhaps a patch where we say "if you are a
> large display attribute, we'll use seq_file and error on write because
> it isn't allowed" but that leaves the old buffer-based approach for
> normal-sized read-write attributes.
Now we are in need of *large* reads. We can add this feature and let it
evolve to the next level later when somebody needs to *set* a large
attribute.
Also, these changes do not result in any change in the user level
interface. So, we can afford this interface changes to change again
later.
> Joel
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 23:13 ` Chandra Seetharaman
@ 2006-10-11 0:15 ` Joel Becker
0 siblings, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-11 0:15 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: Paul Menage, akpm, ckrm-tech, linux-kernel
On Tue, Oct 10, 2006 at 04:13:52PM -0700, Chandra Seetharaman wrote:
> On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> > Well, they now have to learn seq_file. They now get to assume
>
> If they are simple users, they don't have to "learn" seq_file semantics,
> they would just replace their sprintf's with seq_printfs (as my changes
> in OCFS2 show).
The sed(1) is trivial sure. seq_file isn't terribly complex.
It's less about mere code knowledge and more about intention. Really,
it's how people understand configfs will deal with their attributes.
> "char *" can also be used to spew out large amount of data (ok, maybe up
> to PAGESIZE in configfs's case :). My point is that changing char * to
> seq_file doesn't necessarily "introduce" the issue (of spewing large
> amounts of data).
If I see a seq_file, I assume there are multiple things to
iterate over. Don't you?
> This issue is moot, unless you have intentions of changing the user
> interface of configfs to be anything other than a file system, isn't
> it ?
It could be today, without much trouble. The entire point is to
prevent client modules from implementing a filesystem or any filesystem
semantics. They implement an item hierarchy with attributes. The
attributes are read-write with ->show() and ->set(). The filesystem
should be invisible to the client.
> Now we are in need of *large* reads. We can add this feature and let it
> evolve to the next level later when somebody needs to *set* a large
> attribute.
I don't want some set of ad-hoc rules based on legacy broken
ideas. "Well, you can do this, or this, or this, or even this, and all
sort of work, but it's a mess" is not a good thing.
Joel
--
Life's Little Instruction Book #20
"Be forgiving of yourself and others."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 21:58 ` Joel Becker
2006-10-10 23:13 ` Chandra Seetharaman
@ 2006-10-11 0:49 ` Matt Helsley
2006-10-11 1:28 ` Joel Becker
1 sibling, 1 reply; 56+ messages in thread
From: Matt Helsley @ 2006-10-11 0:49 UTC (permalink / raw)
To: Joel Becker; +Cc: Paul Menage, linux-kernel, Chandra Seetharaman, ckrm-tech
On Tue, 2006-10-10 at 14:58 -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 02:31:43PM -0700, Paul Menage wrote:
> > > NAK. This forces a complex and inappropriate interface on the
> > >majority of users, and doesn't honor configfs' simplicity-first design.
> >
> > How is the seq_file interface complex and inappropriate? For the
> > configfs clients it's basically a drop-in replacement for sprintf(),
> > as Chandra's patches show.
>
> Well, they now have to learn seq_file. They now get to assume
> that "spewing large amounts of junk" is the default rather than "single
> attribute", which is correct. None of it is relevant for the majority
> of correct users.
We want to be able to export a sequence of small (<< 1 page),
homogenous, unstructured (scalar), attributes through configfs using the
same file. While this is rather specific, I'd guess it would be a common
occurrence.
Yes, keeping track of writing to these sequences (add, remove, replace)
is a problem. But that's what the file position is for. configfs could
support exporting sequences of common (scalar) types that need to be
exported from the kernel. Types like u32, u64, pid_t, etc. Then seek can
be used to index into the sequence to indicate append or replace. seek
and truncate would have to be translated into bytes so configfs can
manage the buffers behind the scenes while passing sequence position to
the code that uses configfs.
Does this seem reasonable?
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 0:49 ` Matt Helsley
@ 2006-10-11 1:28 ` Joel Becker
2006-10-11 22:39 ` Greg KH
[not found] ` <20061011220619.GB7911@ca-server1.us.oracle.com>
0 siblings, 2 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-11 1:28 UTC (permalink / raw)
To: Matt Helsley; +Cc: Paul Menage, linux-kernel, Chandra Seetharaman, ckrm-tech
On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> We want to be able to export a sequence of small (<< 1 page),
> homogenous, unstructured (scalar), attributes through configfs using the
> same file. While this is rather specific, I'd guess it would be a common
> occurrence.
Pray tell, why? "One attribute per file" is the mantra here.
You really should think hard before you break it. Simple heuristic:
would you have to parse the buffer? Then it's wrong.
> Yes, keeping track of writing to these sequences (add, remove, replace)
> is a problem. But that's what the file position is for. configfs could
>
> Does this seem reasonable?
No. It adds complexity to an interface that is supposed to be
simple. Now, I'm not sure what you are trying to do here, so I don't
know how it fits in. Is it really "multiple attributes per file", or
"this attribute is a list of entries"?
An example. If you had to set an IP address and a port, here's
your scenarios:
[Right]
echo "10.0.0.1" > /sys/kernel/config/subsys/item/address
echo "8000" > /sys/kernel/config/subsys/item/port
[Wrong]
echo 10.0.0.1\n8000" > /sys/kernel/config/subsys/item/address-and-port
But perhaps you are not setting two distinct attributes. I
don't understand what you are doing, so some detail would be nice.
Joel
--
Life's Little Instruction Book #407
"Every once in a while, take the scenic route."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/5] Use seq_file for read side of operations
2006-10-10 18:20 ` [PATCH 2/5] Use seq_file for read side of operations Chandra Seetharaman
@ 2006-10-11 9:12 ` Joel Becker
0 siblings, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-11 9:12 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: akpm, ckrm-tech, linux-kernel
On Tue, Oct 10, 2006 at 11:20:55AM -0700, Chandra Seetharaman wrote:
> + error = single_open(file, configfs_read_file, buffer);
> + if (error) {
> + kfree(buffer);
> + goto Enomem;
> + }
Btw, with single_open(), how do you ever get more than one call
to ->show()? Shouldn't single_next() prevent that?
Joel
--
Life's Little Instruction Book #451
"Don't be afraid to say, 'I'm sorry.'"
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-10 20:35 ` [PATCH 0/5] Allow more than PAGESIZE data read in configfs Joel Becker
2006-10-10 21:31 ` [ckrm-tech] " Paul Menage
@ 2006-10-11 20:19 ` Andrew Morton
2006-10-11 21:41 ` Joel Becker
2006-10-11 22:18 ` Joel Becker
1 sibling, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2006-10-11 20:19 UTC (permalink / raw)
To: Joel Becker; +Cc: Chandra Seetharaman, ckrm-tech, linux-kernel
On Tue, 10 Oct 2006 13:35:11 -0700
Joel Becker <Joel.Becker@oracle.com> wrote:
> On Tue, Oct 10, 2006 at 11:20:43AM -0700, Chandra Seetharaman wrote:
> > Currently, maximum amount of data that can be read from a configfs
> > attribute file is limited to PAGESIZE bytes. This is a limitation for
> > some of the usages of configfs.
>
> NAK. This forces a complex and inappropriate interface on the
> majority of users, and doesn't honor configfs' simplicity-first design.
The patch deletes a pile of custom code from configfs and replaces it with
calls to standard kernel infrastructure and fixes a shortcoming/bug in the
process. Migration over to the new interface is trivial and almost
scriptable.
Nice patch. What's not to like about it??
> I understand Chandra's concerns, but this patch isn't the right
> way to do it.
To do what? Fix the artificial PAGE_SIZE contraint? The patch would be
justified on cleanup grounds even if nobody was hitting that limit.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 20:19 ` Andrew Morton
@ 2006-10-11 21:41 ` Joel Becker
2006-10-11 22:18 ` Joel Becker
1 sibling, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-11 21:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: Chandra Seetharaman, ckrm-tech, linux-kernel
On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> The patch deletes a pile of custom code from configfs and replaces it with
> calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> process. Migration over to the new interface is trivial and almost
> scriptable.
>
> Nice patch. What's not to like about it??
We're discussing it both online and offline because there are
concerns they have that do need to be addressed.
What's not to like? If you need more than 4K to display an
attribute, odds are you have more than one attribute in the file, which
is Wrong. These guys, it turns out, don't, which is why I'm trying to
help them. But I don't want to encourage others to do the wrong thing.
I'm also unhappy with the lack of symmetry this creates on the
store/show() prototypes. We're discussing whether the store() side
should support larger-than-a-page writes, and how to do it.
Joel
--
"Conservative, n. A statesman who is enamoured of existing evils,
as distinguished from the Liberal, who wishes to replace them
with others."
- Ambrose Bierce, The Devil's Dictionary
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 20:19 ` Andrew Morton
2006-10-11 21:41 ` Joel Becker
@ 2006-10-11 22:18 ` Joel Becker
2006-10-11 22:48 ` Andrew Morton
1 sibling, 1 reply; 56+ messages in thread
From: Joel Becker @ 2006-10-11 22:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Chandra Seetharaman, ckrm-tech, linux-kernel
On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> The patch deletes a pile of custom code from configfs and replaces it with
> calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> process. Migration over to the new interface is trivial and almost
> scriptable.
The configfs stuff is based on the sysfs code too. Should we
migrate sysfs/file.c to the same seq_file code? Serious question, if
the cleanup is considered better.
Joel
--
"I almost ran over an angel
He had a nice big fat cigar.
'In a sense,' he said, 'You're alone here
So if you jump, you'd best jump far.'"
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 1:28 ` Joel Becker
@ 2006-10-11 22:39 ` Greg KH
2006-10-11 23:26 ` Chandra Seetharaman
2006-10-12 2:17 ` Matt Helsley
[not found] ` <20061011220619.GB7911@ca-server1.us.oracle.com>
1 sibling, 2 replies; 56+ messages in thread
From: Greg KH @ 2006-10-11 22:39 UTC (permalink / raw)
To: Matt Helsley, Paul Menage, linux-kernel, Chandra Seetharaman,
ckrm-tech
On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > We want to be able to export a sequence of small (<< 1 page),
> > homogenous, unstructured (scalar), attributes through configfs using the
> > same file. While this is rather specific, I'd guess it would be a common
> > occurrence.
>
> Pray tell, why? "One attribute per file" is the mantra here.
> You really should think hard before you break it. Simple heuristic:
> would you have to parse the buffer? Then it's wrong.
I agree. You are trying to use configfs for something that it is not
entended to be used for. If you want to write/read large numbers of
attrbutes like this, use your own filesystem.
configfs has the same "one value per file" rule that sysfs has. And
because your userspace model doesn't fit that, don't try to change
configfs here.
What happened to your old ckrmfs? I thought you were handling all of
this in that.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 22:18 ` Joel Becker
@ 2006-10-11 22:48 ` Andrew Morton
2006-10-11 23:27 ` Chandra Seetharaman
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2006-10-11 22:48 UTC (permalink / raw)
To: Joel Becker; +Cc: Chandra Seetharaman, ckrm-tech, linux-kernel
On Wed, 11 Oct 2006 15:18:22 -0700
Joel Becker <Joel.Becker@oracle.com> wrote:
> On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > The patch deletes a pile of custom code from configfs and replaces it with
> > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > process. Migration over to the new interface is trivial and almost
> > scriptable.
>
> The configfs stuff is based on the sysfs code too. Should we
> migrate sysfs/file.c to the same seq_file code? Serious question, if
> the cleanup is considered better.
>
I don't see why not. I don't know if anyone has though of/proposed it
before.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 22:39 ` Greg KH
@ 2006-10-11 23:26 ` Chandra Seetharaman
2006-10-12 4:17 ` Paul Jackson
2006-10-12 23:51 ` Greg KH
2006-10-12 2:17 ` Matt Helsley
1 sibling, 2 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-11 23:26 UTC (permalink / raw)
To: Greg KH; +Cc: Matt Helsley, Paul Menage, linux-kernel, ckrm-tech
On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > We want to be able to export a sequence of small (<< 1 page),
> > > homogenous, unstructured (scalar), attributes through configfs using the
> > > same file. While this is rather specific, I'd guess it would be a common
> > > occurrence.
> >
> > Pray tell, why? "One attribute per file" is the mantra here.
> > You really should think hard before you break it. Simple heuristic:
> > would you have to parse the buffer? Then it's wrong.
>
> I agree. You are trying to use configfs for something that it is not
> entended to be used for. If you want to write/read large numbers of
> attrbutes like this, use your own filesystem.
I would say it is a "large attribute" not "large numbers of attributes".
>
> configfs has the same "one value per file" rule that sysfs has. And
> because your userspace model doesn't fit that, don't try to change
> configfs here.
>
> What happened to your old ckrmfs? I thought you were handling all of
> this in that.
We decided to use an existing infrastructure instead of having our own
file system.
configfs is a perfect fit for us, except the size limitation.
BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
processes aggregation from cpuset to have an independent infrastructure
(http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
has its own file system. I was advocating him to use configfs. But, he
also has this issue/limitation.
>
> thanks,
>
> greg k-h
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 22:48 ` Andrew Morton
@ 2006-10-11 23:27 ` Chandra Seetharaman
2006-10-14 8:01 ` Greg KH
0 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-11 23:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: Joel Becker, ckrm-tech, linux-kernel
On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> On Wed, 11 Oct 2006 15:18:22 -0700
> Joel Becker <Joel.Becker@oracle.com> wrote:
>
> > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > The patch deletes a pile of custom code from configfs and replaces it with
> > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > process. Migration over to the new interface is trivial and almost
> > > scriptable.
> >
> > The configfs stuff is based on the sysfs code too. Should we
> > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > the cleanup is considered better.
> >
>
> I don't see why not. I don't know if anyone has though of/proposed it
> before.
I can generate a patch for that too.
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 22:39 ` Greg KH
2006-10-11 23:26 ` Chandra Seetharaman
@ 2006-10-12 2:17 ` Matt Helsley
2006-10-12 23:54 ` Greg KH
1 sibling, 1 reply; 56+ messages in thread
From: Matt Helsley @ 2006-10-12 2:17 UTC (permalink / raw)
To: Greg KH
Cc: Joel Becker, Paul Menage, linux-kernel, Chandra Seetharaman,
ckrm-tech
On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > We want to be able to export a sequence of small (<< 1 page),
> > > homogenous, unstructured (scalar), attributes through configfs using the
> > > same file. While this is rather specific, I'd guess it would be a common
> > > occurrence.
> >
> > Pray tell, why? "One attribute per file" is the mantra here.
> > You really should think hard before you break it. Simple heuristic:
> > would you have to parse the buffer? Then it's wrong.
>
> I agree. You are trying to use configfs for something that it is not
> entended to be used for. If you want to write/read large numbers of
I disagree with your assertion that we're abusing configfs. "one value
per file" is not the purpose of configfs.
The purpose of configfs is to allow userspace to create and manipulate
kernel objects whose lifetime is under the control of userspace. That
perfectly matches the idea of being able to create, manipulate, and
destroy a resource group from userspace.
"one value per file" is a phrase describing what configfs and sysfs
files should normally look like. However it's not a rule since there is
precedent for sysfs files that require parsing:
/sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
/sys/block/hda/stat
/sys/block/hda/dev
These are counterexamples to your assertion below that "one value per
file" is a rule.
> attrbutes like this, use your own filesystem.
This conflicts with the idea of reusing kernel code that was made to be
reused. Except for this 1 page limit configfs is nearly a perfect match.
I doubt we'd get a favorable reaction if we said:
"OK, let's copy configfs then remove the page size limit."
> configfs has the same "one value per file" rule that sysfs has. And
> because your userspace model doesn't fit that, don't try to change
> configfs here.
Please see my counterexample above.
> What happened to your old ckrmfs? I thought you were handling all of
> this in that.
We dropped RCFS more than 1 year ago after feedback suggested we should
try to share as much kernel code as possible. Other than the 1 page
limit to the list of pids, configfs is a perfect match for what we need.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 23:26 ` Chandra Seetharaman
@ 2006-10-12 4:17 ` Paul Jackson
2006-10-12 23:51 ` Greg KH
1 sibling, 0 replies; 56+ messages in thread
From: Paul Jackson @ 2006-10-12 4:17 UTC (permalink / raw)
To: sekharan; +Cc: greg, menage, ckrm-tech, matthltc, linux-kernel
> I agree. You are trying to use configfs for something that it is not
> entended to be used for.
Yup - but perhaps the best answer is that the design should be
extended, to handle a simple vector, such as a list of task process
id's or a list of CPU numbers.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
[not found] ` <1160619516.18766.209.camel@localhost.localdomain>
@ 2006-10-12 7:08 ` Joel Becker
2006-10-12 21:44 ` Paul Jackson
2006-10-13 23:37 ` Matt Helsley
0 siblings, 2 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-12 7:08 UTC (permalink / raw)
To: Matt Helsley; +Cc: Linux Kernel, Greg KH
On Wed, Oct 11, 2006 at 07:18:36PM -0700, Matt Helsley wrote:
> On Wed, 2006-10-11 at 15:06 -0700, Joel Becker wrote:
> > I suspect I was too harsh here. Can you tell me what you are
Btw, Mark says 'hi' and that we met at his Scotch tasting party.
I didn't realize this was you ;-)
> A little too harsh IMHO -- this isn't something I'm suggesting out of
> laziness, lack of thought, nor am I some corporate stooge unfamiliar
> with Linux. I am well aware of this mantra. I am also well aware that
> there are counterexamples in sysfs (which has the same mantra):
>
> /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
> /sys/block/hda/stat
> /sys/block/hda/dev
Sure, but mere existence doesn't make it right ;-)
> Each of these has more than one number in them. And they don't all use
> the same delimeter. So yes, parsing is required in existing sysfs files.
> And the parsing I'm proposing is simpler than that needed in the pci
> resource file.
As sysfs grew, it became very clear that if you require parsing,
someone in userspace will get it wrong. It's happened with pci
resources, etc. What about people who say "I'll just add a field, no
one will notice" and everything breaks? That's just not acceptible.
> In fact we've discussed the problem with you in the past (on ckrm-tech)
> and described what we're looking for. You suggested adding file(s) to
> sysfs that replace this attribute. It was an interesting suggestion but
> does not work because we have one of these lists for each item in
> configfs. I think all of this discussion is in the ckrm-tech list
> archives too (in MARC, possibly gmane, etc.).
Sure it works. You have one per resource group. In
resource_group_make_object(), you sysfs_mkdir() the sysfs file. There
is no technical limitation at all. Now, it is nice that the pid list
lives in the configfs tree from a location standpoint. Sure. But is it
worth breaking the simplicity of configfs. Greg says NO! You say YES!
I say "I'm not sure". And here we are.
> A patch would best explain what I'm proposing. But since I don't have a
> working patch yet I've tried to answer your questions below.
Ok, some meat, good.
> Well that's sort of what we want. Not an array of u32s but a list of
> pids. Among other things Resource Groups uses configfs to let userspace
> create named groups of tasks. The list of pids describing the group of
> tasks can easily exceed one page in size when output via the members
> configfs attribute.
And here we go. In userspace, someone can't just do:
ATTR=$(cat /sys/kernel/config/ckrm/myresource/attr)
because ATTR needs to be split and parsed. And what if you decide to
change it from "<pid>\n" to "<pid> <tgid>\n" per line? Oops, can't do
it, or you break an ABI.
> Since there are so many of them and they don't have the same lifetime
> characteristics as a configfs object we don't want to incur the space
> and complexity cost of creating and managing one configfs attribute per
> pid. At the same time, the groups themselves do have the lifetime of a
> configfs object and there are few of them.
First, I agree that pids as an object each is very expensive.
If the groups are few, of course, creating a sysfs object per group
isn't that hard :-)
> Where's the large-attribute discussion going on?
lkml and also offline, though we're bringing it back to lkml.
I'm going to cc this too, I hope you don't mind.
> WARNING: It's possible I've confused my configfs terminology a bit in
> my description below.
I'll try and make do. :-)
> So for example we would have items with the following attributes:
>
> /sys/kernel/config/res_group/members:
> 1
> 2
> 3
> 4567
Is this toplevel members file the "not part of any other group"
list?
> /sys/kernel/config/res_group/mhelsleys_tasks/members:
> 4568
>
> /sys/kernel/config/res_group/jbeckers_tasks/members:
> 4569
>
> This is how Resource Groups (formerly known as CKRM) currently
> implements the list -- as a single attribute with one pid printed per
> line. This doesn't fit the accepted OVPA rule and doesn't fit in one
> page on large systems with crazy numbers of tasks.
>
> We've considered creating one attribute per pid but frankly it makes
> the rest of the code much nastier in every way (code complexity, memory
> use, locking, etc). We also considered your earlier suggestion of having
> a custom sysfs file for this. However it just doesn't fit: each list is
> different so we'd need one file per resource group. This naturally
> suggests mimicking the configfs tree in /sys but that's uniformly worse
> than one pid per attribute.
How is this more complex? One object per attribute is large,
but if the number of groups is "few" as you say above, isn't that
simpler?
> The OVPA rule exists for some very good reasons. We want to avoid the
> nightmares encountered with /proc. Specifically the nightmare of
> organizing and parsing all of those structure values. /proc files often
> incorporate too much structure into the body of the file and that in
> turn requires complex parsing; which tends to break compatibility as the
> structures they reflect change. (I've worked on some userspace C code
> that parses some /proc/sys/net files and, yes, it's nasty..)
>
> Hence I think we could allow one value per line in addition to one
> value per attribute -- but only under very specific restrictions. The
> restrictions could be enforced by an additional configfs object which
> represents simple lists and lacks direct access to the char buffer.
> Also, unlike Chandra's seq_file patches, a different interface would
> avoid impacting existing uses of configfs.
You're effectively suggesting that a specific attribute type of
"repeated value of type X". No mixed types, no exploded structures,
just a "list of this attr" sort of thing. This does fit my personal
requirement of avoiding a generic, abusable system.
Greg, what do you think?
Joel
--
"Depend on the rabbit's foot if you will, but remember, it didn't
help the rabbit."
- R. E. Shay
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 7:08 ` Joel Becker
@ 2006-10-12 21:44 ` Paul Jackson
2006-10-12 22:51 ` Joel Becker
2006-10-13 23:37 ` Matt Helsley
1 sibling, 1 reply; 56+ messages in thread
From: Paul Jackson @ 2006-10-12 21:44 UTC (permalink / raw)
To: Joel Becker; +Cc: matthltc, linux-kernel, gregkh
> And what if you decide to
> change it from "<pid>\n" to "<pid> <tgid>\n" per line?
I think that's a good argument for never changing the format of one
of these files, rather than a good argument for against a vector of
scalars of identical type and purpose.
And I'd agree that we should not use multiple values per file to
represent a structure either - so I'd agree that we should not allow
"<pid> <tgid>\n" in the first place.
In the cpuset file system:
1) There is one value, or one vector of equivalent scalar values, per file.
2) Once released into the wild, a file never changes what it does or how
it looks.
3) It's ok to add new files.
4) But, at least in the case of cpusets, not ok to add directories, as
the file system represents one directory per one cpuset. No other
directories not representing cpusets are allowed.
This configfs flap feels to me like someone slightly overgeneralized
the lesson to be learned from previous problems displaying entire,
evolving, structures in a single file, and then is being a bit over
zealous enforcing the resulting rule.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 21:44 ` Paul Jackson
@ 2006-10-12 22:51 ` Joel Becker
2006-10-13 0:01 ` Paul Jackson
0 siblings, 1 reply; 56+ messages in thread
From: Joel Becker @ 2006-10-12 22:51 UTC (permalink / raw)
To: Paul Jackson; +Cc: matthltc, linux-kernel, gregkh
On Thu, Oct 12, 2006 at 02:44:20PM -0700, Paul Jackson wrote:
> > And what if you decide to
> > change it from "<pid>\n" to "<pid> <tgid>\n" per line?
>
> I think that's a good argument for never changing the format of one
> of these files, rather than a good argument for against a vector of
> scalars of identical type and purpose.
Sure, no dispute here. My argument point isn't that "vector of
scalars is inherently evil" (I'll leave that aside), it's that a
facility allowing or encouraging the format change or blowup is
unhealthy.
> And I'd agree that we should not use multiple values per file to
> represent a structure either - so I'd agree that we should not allow
> "<pid> <tgid>\n" in the first place.
Cool, we're together on that as well. I know that we can't
fully prevent stupidity, but encouraging better behavior is something I
always try to do.
> This configfs flap feels to me like someone slightly overgeneralized
> the lesson to be learned from previous problems displaying entire,
> evolving, structures in a single file, and then is being a bit over
> zealous enforcing the resulting rule.
Someone (hi, it's me) rather tried to overspecialize configfs.
It's intentionally not all things to all people. Kitchen sinks cause
clogs, as it were.
And I'm intentionally overzealous enforcing it. Better I need
to be beat over the head before I accept a good idea than I accept bad
ones with an "eh?"
Joel
--
One look at the From:
understanding has blossomed
.procmailrc grows
- Alexander Viro
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 23:26 ` Chandra Seetharaman
2006-10-12 4:17 ` Paul Jackson
@ 2006-10-12 23:51 ` Greg KH
2006-10-13 0:16 ` Paul Jackson
` (2 more replies)
1 sibling, 3 replies; 56+ messages in thread
From: Greg KH @ 2006-10-12 23:51 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: Matt Helsley, Paul Menage, linux-kernel, ckrm-tech
On Wed, Oct 11, 2006 at 04:26:00PM -0700, Chandra Seetharaman wrote:
> On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > We want to be able to export a sequence of small (<< 1 page),
> > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > same file. While this is rather specific, I'd guess it would be a common
> > > > occurrence.
> > >
> > > Pray tell, why? "One attribute per file" is the mantra here.
> > > You really should think hard before you break it. Simple heuristic:
> > > would you have to parse the buffer? Then it's wrong.
> >
> > I agree. You are trying to use configfs for something that it is not
> > entended to be used for. If you want to write/read large numbers of
> > attrbutes like this, use your own filesystem.
>
> I would say it is a "large attribute" not "large numbers of attributes".
That attribute seems to violate the rule of "only one thing" and needs
to be parsed. That does not seem like a good fit for configfs or sysfs.
> > configfs has the same "one value per file" rule that sysfs has. And
> > because your userspace model doesn't fit that, don't try to change
> > configfs here.
> >
> > What happened to your old ckrmfs? I thought you were handling all of
> > this in that.
>
> We decided to use an existing infrastructure instead of having our own
> file system.
>
> configfs is a perfect fit for us, except the size limitation.
Then it's not a perfect fit, sorry, as you are trying to get it to do
things it is not supposed to, or designed to, do.
> BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> processes aggregation from cpuset to have an independent infrastructure
> (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> has its own file system. I was advocating him to use configfs. But, he
> also has this issue/limitation.
That's one reason it is so easy to just write your own filesystem then.
What is it these days, less than 200 lines of code? I bet you can even
condence more things to make it 100 lines if you really try. That seems
much more sane than trying to bend configfs into something different.
Why are people so opposed to creating their own filesystems?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 2:17 ` Matt Helsley
@ 2006-10-12 23:54 ` Greg KH
2006-10-13 3:22 ` Matt Helsley
0 siblings, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-10-12 23:54 UTC (permalink / raw)
To: Matt Helsley
Cc: Joel Becker, Paul Menage, linux-kernel, Chandra Seetharaman,
ckrm-tech
On Wed, Oct 11, 2006 at 07:17:44PM -0700, Matt Helsley wrote:
> On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > We want to be able to export a sequence of small (<< 1 page),
> > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > same file. While this is rather specific, I'd guess it would be a common
> > > > occurrence.
> > >
> > > Pray tell, why? "One attribute per file" is the mantra here.
> > > You really should think hard before you break it. Simple heuristic:
> > > would you have to parse the buffer? Then it's wrong.
> >
> > I agree. You are trying to use configfs for something that it is not
> > entended to be used for. If you want to write/read large numbers of
>
> I disagree with your assertion that we're abusing configfs. "one value
> per file" is not the purpose of configfs.
>
> The purpose of configfs is to allow userspace to create and manipulate
> kernel objects whose lifetime is under the control of userspace. That
> perfectly matches the idea of being able to create, manipulate, and
> destroy a resource group from userspace.
>
> "one value per file" is a phrase describing what configfs and sysfs
> files should normally look like. However it's not a rule since there is
> precedent for sysfs files that require parsing:
> /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
Hold over from old procfs use of pci resources. And now a requirement
that X uses this, after it was pointed out to me.
> /sys/block/hda/stat
I never have liked that, it belongs somewhere else. Please move it.
> /sys/block/hda/dev
That is merely a dev_t, a single value.
Come on, there are way worse violators[1] of the "one value per file"
rule in sysfs that you could have found if you wanted to try to declare
how much it is not being followed in places. But if you look, the ratio
of good vs. bad usage is still very high, and keeping it high is my
goal.
> > What happened to your old ckrmfs? I thought you were handling all of
> > this in that.
>
> We dropped RCFS more than 1 year ago after feedback suggested we should
> try to share as much kernel code as possible. Other than the 1 page
> limit to the list of pids, configfs is a perfect match for what we need.
Feedback from whom? I know I sure liked the fact that you did your own
filesystem, despite the bugs that were found in it at times :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 22:51 ` Joel Becker
@ 2006-10-13 0:01 ` Paul Jackson
2006-10-14 4:40 ` Greg KH
0 siblings, 1 reply; 56+ messages in thread
From: Paul Jackson @ 2006-10-13 0:01 UTC (permalink / raw)
To: Joel Becker; +Cc: matthltc, linux-kernel, gregkh
Joel wrote:
> Sure, no dispute here. My argument point isn't that "vector of
> scalars is inherently evil" (I'll leave that aside), it's that a
> facility allowing or encouraging the format change or blowup is
> unhealthy.
Ok ... yes simply extending the size is changing the protocol
at too low a level ... an invitation to future abuse. I'll agree
somewhat with your concern there.
Is there some way to accept an array of scalars? Some change
in the API that configfs presents to the kernel code using it,
that would let that code say "here's an array of u32, of length
so-and-so?"
Hmmm ... having spouted off for three messages now ... decided to
actually look at this ;).
Let me consider a different approach ...
At the top of Documentation/filesystems/configfs/configfs.txt,
I see:
configfs is a ram-based filesystem that provides the converse of
sysfs's functionality. Where sysfs is a filesystem-based view of
kernel objects, configfs is a filesystem-based manager of kernel
objects, or config_items.
I guess this means that sysfs presents the gauges, and configfs
the knobs.
This seems like a bit of an artificial split to me - view (mostly
read) here, and manage (mostly write) here.
Not entirely surprisingly, I'm fond of the cpuset approach -- all
things cpuset-related in one place, both viewing and managing.
Similarly, /proc is a repository for all things task related, and
/dev for all devices (well, with various exceptions, contradictions,
confusions, and historical raisins ...)
There should indeed be a place for various kernel guages and knobs that
don't merit having their own entire file system. And since we already
have a separate sysfs and configfs, better to leave that aspect be, as
two separate name spaces ... despite my rant about this being an
artificial split a few lines above.
And if configfs wants to (continue to) impose a rule that there is a
single, simple scalar per file, that may well fit its needs and guide
its future extensions in the best way.
But if something like Resource Groups comes along, I'd think it deserves
its own file system name space (though sometimes I am sympathetic to
suggestions to merge this one into the cpuset name space -- not sure.)
Underneath each of these filesystems, sysfs, configfs, cpuset, resource
groups, ... it would seem ideal if we had a single kernel file system
infrastructure. Actually, we're only a half a layer from having that,
with vfs. It just takes a fair bit of glue to construct any of these
file systems out of vfs primitives.
I wonder if there might be someway to share that glue? I am envisioning
a new glue layer, mostly in the kernel internal headers and lib
directory, that sits on top of the current vfs, and makes it easy for
virtual file system presenters such as sysfs, configfs, cpusets and
resource groups to construct the particular virtual file system they
require.
I would expect this glue layer to support vectors of scalars, even if
some of its users, such as configfs, chose not to expose that
possibility. Arguments between configfs and resource group developers
over the value of presenting vectors suggest we've gone astray -
enforcing commonality where it is not beneficial to do so.
I would not expect such a change to using common vfs glue to change
much, if at all, existing client kernel code that uses configfs or
sysfs (or cpusets.) The configfs example code in:
Documentation/filesystems/configfs/configfs_example.c
should continue to work, as is.
I would expect to reduce a little some cut and paste code duplication,
hence reduce the kernel text size and reduce kernel maintenance and
improve a little the ease with which future features (Resource Groups,
say) can add their own virtual file system hierarchies.
This means creating a new kernel internal API, by which the various
clients (sysfs, configfs, cpusets, ...) of this common glue layer can
represent the particular instance of such a virtual file system that
they require, and hook in their operational methods. Then it means
porting the existing such virtual file systems, such as configfs, sysfs
and cpusets, to this common glue infrastructure.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 23:51 ` Greg KH
@ 2006-10-13 0:16 ` Paul Jackson
2006-10-13 23:38 ` Matt Helsley
2006-10-13 23:40 ` Matt Helsley
2006-10-16 19:10 ` Chandra Seetharaman
2 siblings, 1 reply; 56+ messages in thread
From: Paul Jackson @ 2006-10-13 0:16 UTC (permalink / raw)
To: Greg KH; +Cc: sekharan, matthltc, menage, linux-kernel, ckrm-tech
> Why are people so opposed to creating their own filesystems?
Well ... I'm not ;). Though I will be eternally grateful to
Simon Derr for doing the heavy lifting needed to create the
cpuset filesystem.
For those of us whose brains don't hold so many details at once,
creating a new file system can seem a bit daunting. And for those
of us not skilled in the art, it is more likely to end up being
300 lines of code, presenting several good provocations for a Hellwig
or a Viro to curse in the general direction of their monitors.
Instead of trying to hijack configfs to purposes ill suited for it,
I wonder if there isn't someway to lower the hurdles that us mere
mortals must leap to creating additional filesystems.
It shouldn't take that much lowering.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 23:54 ` Greg KH
@ 2006-10-13 3:22 ` Matt Helsley
0 siblings, 0 replies; 56+ messages in thread
From: Matt Helsley @ 2006-10-13 3:22 UTC (permalink / raw)
To: Greg KH
Cc: Joel Becker, Paul Menage, linux-kernel, Chandra Seetharaman,
ckrm-tech, Shailabh Nagar
On Thu, 2006-10-12 at 16:54 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 07:17:44PM -0700, Matt Helsley wrote:
> > On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > > We want to be able to export a sequence of small (<< 1 page),
> > > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > > same file. While this is rather specific, I'd guess it would be a common
> > > > > occurrence.
> > > >
> > > > Pray tell, why? "One attribute per file" is the mantra here.
> > > > You really should think hard before you break it. Simple heuristic:
> > > > would you have to parse the buffer? Then it's wrong.
> > >
> > > I agree. You are trying to use configfs for something that it is not
> > > entended to be used for. If you want to write/read large numbers of
> >
> > I disagree with your assertion that we're abusing configfs. "one value
> > per file" is not the purpose of configfs.
> >
> > The purpose of configfs is to allow userspace to create and manipulate
> > kernel objects whose lifetime is under the control of userspace. That
> > perfectly matches the idea of being able to create, manipulate, and
> > destroy a resource group from userspace.
> >
> > "one value per file" is a phrase describing what configfs and sysfs
> > files should normally look like. However it's not a rule since there is
> > precedent for sysfs files that require parsing:
> > /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
>
> Hold over from old procfs use of pci resources. And now a requirement
> that X uses this, after it was pointed out to me.
>
> > /sys/block/hda/stat
>
> I never have liked that, it belongs somewhere else. Please move it.
>
> > /sys/block/hda/dev
>
> That is merely a dev_t, a single value.
It's a single value internally but it's presented as multiple values --
and it needs to be parsed. Following the guidelines it should have been
a directory with a single value per file:
/sys/block/hda/dev/major
/sys/block/hda/dev/minor
So the parsing test cited earlier fails. It's not a matter of whether
parsing is required but of how complex the parsing is.
> Come on, there are way worse violators[1] of the "one value per file"
You mean like /sys/class/input/input0/modalias ?
It was not my intention to catalog all of the violators -- only to point
out that there are counterexamples that prove it's not a rule as you've
asserted.
> rule in sysfs that you could have found if you wanted to try to declare
> how much it is not being followed in places. But if you look, the ratio
> of good vs. bad usage is still very high, and keeping it high is my
> goal.
>
> > > What happened to your old ckrmfs? I thought you were handling all of
> > > this in that.
> >
> > We dropped RCFS more than 1 year ago after feedback suggested we should
> > try to share as much kernel code as possible. Other than the 1 page
> > limit to the list of pids, configfs is a perfect match for what we need.
>
> Feedback from whom? I know I sure liked the fact that you did your own
> filesystem, despite the bugs that were found in it at times :)
This article on LWN suggested the idea of using configfs:
http://lwn.net/Articles/148180/
Though I think the idea may have come up earlier I haven't been able to
find it with Google. Shailabh may be able to pinpoint the exact source
of the idea.
We also got feedback along the lines of "it's too
big" ( http://lwn.net/Articles/145135/ ) which motivated us to find ways
to shrink it. Using appropriate kernel systems that already exist is an
excellent way to reduce the size of the patches and cut down on the bugs
-- and we can contribute by fixing some configfs bugs. Of course using
configfs saves quite a bit of code:
http://lwn.net/Articles/149275/
I've also gotten the impression that new filesystem code tends to
distract (or, worse, confuse) folks from the meat of the patches. It
takes up review time that's better spent on the resource groups core and
controllers.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 7:08 ` Joel Becker
2006-10-12 21:44 ` Paul Jackson
@ 2006-10-13 23:37 ` Matt Helsley
2006-10-14 0:09 ` Joel Becker
1 sibling, 1 reply; 56+ messages in thread
From: Matt Helsley @ 2006-10-13 23:37 UTC (permalink / raw)
To: Joel Becker
Cc: Linux Kernel, Greg KH, Chandra S. Seetharaman, Andrew Morton,
CKRM-Tech
On Thu, 2006-10-12 at 00:08 -0700, Joel Becker wrote:
> On Wed, Oct 11, 2006 at 07:18:36PM -0700, Matt Helsley wrote:
> > On Wed, 2006-10-11 at 15:06 -0700, Joel Becker wrote:
<snip>
> > /sys/devices/pciXXXX:XX/XXXX:XX:XX.X/resource
> > /sys/block/hda/stat
> > /sys/block/hda/dev
>
> Sure, but mere existence doesn't make it right ;-)
>
> > Each of these has more than one number in them. And they don't all use
> > the same delimeter. So yes, parsing is required in existing sysfs files.
> > And the parsing I'm proposing is simpler than that needed in the pci
> > resource file.
>
> As sysfs grew, it became very clear that if you require parsing,
> someone in userspace will get it wrong. It's happened with pci
> resources, etc. What about people who say "I'll just add a field, no
> one will notice" and everything breaks? That's just not acceptible.
I agree. And that means introducing new ABIs should have a high barrier
to acceptance. But in my opinion it also shouldn't prohibit carefully
considered deviation from the guidelines.
> > In fact we've discussed the problem with you in the past (on ckrm-tech)
> > and described what we're looking for. You suggested adding file(s) to
> > sysfs that replace this attribute. It was an interesting suggestion but
> > does not work because we have one of these lists for each item in
> > configfs. I think all of this discussion is in the ckrm-tech list
> > archives too (in MARC, possibly gmane, etc.).
>
> Sure it works. You have one per resource group. In
> resource_group_make_object(), you sysfs_mkdir() the sysfs file. There
That's the easy part. Next we need to make the pid attribute whenever a
new task is created. And delete it when the task dies. And move it
around whenever it changes groups. Is there rename() support in /sys? If
not, would changes to allow rename() be acceptable (I'm worried it would
impact alot of assumptions made in the existing code)?
> is no technical limitation at all. Now, it is nice that the pid list
> lives in the configfs tree from a location standpoint. Sure. But is it
> worth breaking the simplicity of configfs. Greg says NO! You say YES!
> I say "I'm not sure". And here we are.
Consider that having two very similar (but not symlinked!) trees in
both /sys/ ... /res_group and /sys/kernel/config/res_group could be
rather confusing to userspace programmers and users alike.
It would be strange because when you rmdir a group
in /sys/kernel/config/res_group... a directory in /sys would also
disappear. Yet you can't mkdir or rmdir the /sys dirs. And to edit the
group resources you'd be editting stuff in configfs but to add or remove
tasks to the group you'd be mv'ing stuff in /sys..
> > A patch would best explain what I'm proposing. But since I don't have a
> > working patch yet I've tried to answer your questions below.
>
> Ok, some meat, good.
>
> > Well that's sort of what we want. Not an array of u32s but a list of
> > pids. Among other things Resource Groups uses configfs to let userspace
> > create named groups of tasks. The list of pids describing the group of
> > tasks can easily exceed one page in size when output via the members
> > configfs attribute.
>
> And here we go. In userspace, someone can't just do:
>
> ATTR=$(cat /sys/kernel/config/ckrm/myresource/attr)
Hmm, that suggests a good point. While some one *can* do that or:
ATTR=( $(cat /sys/kernel/config/ckrm/myresource/attr) )
the space available for environment variables is limited. So attempting
to store a large (What's "large"?) attribute in an environment variable
is a potentially buggy practice. This is a significant problem affecting
large attributes in general.
> because ATTR needs to be split and parsed. And what if you decide to
> change it from "<pid>\n" to "<pid> <tgid>\n" per line? Oops, can't do
> it, or you break an ABI.
Once we have the pid the rest is easily reachable. I think Paul
Jackson's response -- we can add but not change existing files -- is the
best answer here.
> > Since there are so many of them and they don't have the same lifetime
> > characteristics as a configfs object we don't want to incur the space
> > and complexity cost of creating and managing one configfs attribute per
> > pid. At the same time, the groups themselves do have the lifetime of a
> > configfs object and there are few of them.
>
> First, I agree that pids as an object each is very expensive.
> If the groups are few, of course, creating a sysfs object per group
> isn't that hard :-)
One per group isn't hard or very expensive -- but that's not the issue
I was bringing up. One per pid is expensive because we'd need to alloc
and associate that object with the task. And then, if we're tearing down
all resource groups (the root group) we need to iterate over all the
tasks and free the object. We'd also need to free the object when the
task is freed. Those can race so we'd need some locking...
This is much more complex than being allowed to print out one pid per
line.
<snip>
> > WARNING: It's possible I've confused my configfs terminology a bit in
> > my description below.
>
> I'll try and make do. :-)
>
> > So for example we would have items with the following attributes:
> >
> > /sys/kernel/config/res_group/members:
> > 1
> > 2
> > 3
> > 4567
>
> Is this toplevel members file the "not part of any other group"
> list?
Yes. They all are -- a pid can only be in one members file. Pids 1, 2,
3, 4567 won't show up in any other members file.
> > /sys/kernel/config/res_group/mhelsleys_tasks/members:
> > 4568
pid 4568 won't show up in any other members file.
> > /sys/kernel/config/res_group/jbeckers_tasks/members:
> > 4569
etc.
> > We've considered creating one attribute per pid but frankly it makes
> > the rest of the code much nastier in every way (code complexity, memory
> > use, locking, etc). We also considered your earlier suggestion of having
> > a custom sysfs file for this. However it just doesn't fit: each list is
> > different so we'd need one file per resource group. This naturally
> > suggests mimicking the configfs tree in /sys but that's uniformly worse
> > than one pid per attribute.
>
> How is this more complex? One object per attribute is large,
> but if the number of groups is "few" as you say above, isn't that
> simpler?
There are two parts to the complexity: code complexity and the number
of userspace pieces to deal with. I think that in both of these
categories the OVPA approach is more complex. Here's how I see it:
|Complexity of|
| OVPA | OVPL |
----------------------+------+------+
Code | more | less |
| | |
Number of filesystems | | |
reflecting resource | 2 | 1 |
group hierarchy | | |
| | |
Attribute | less | more |
complexity | | |
----------------------+------+------+
Total | more | less |
I think the "more" in the OVPL column is much much smaller than the
"more" in the OVPA column. Lastly the OVPL code can be shared while the
OVPA code most likely cannot be shared.
> > The OVPA rule exists for some very good reasons. We want to avoid the
> > nightmares encountered with /proc. Specifically the nightmare of
> > organizing and parsing all of those structure values. /proc files often
> > incorporate too much structure into the body of the file and that in
> > turn requires complex parsing; which tends to break compatibility as the
> > structures they reflect change. (I've worked on some userspace C code
> > that parses some /proc/sys/net files and, yes, it's nasty..)
> >
> > Hence I think we could allow one value per line in addition to one
> > value per attribute -- but only under very specific restrictions. The
> > restrictions could be enforced by an additional configfs object which
> > represents simple lists and lacks direct access to the char buffer.
> > Also, unlike Chandra's seq_file patches, a different interface would
> > avoid impacting existing uses of configfs.
>
> You're effectively suggesting that a specific attribute type of
> "repeated value of type X". No mixed types, no exploded structures,
> just a "list of this attr" sort of thing. This does fit my personal
> requirement of avoiding a generic, abusable system.
Exactly.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-13 0:16 ` Paul Jackson
@ 2006-10-13 23:38 ` Matt Helsley
0 siblings, 0 replies; 56+ messages in thread
From: Matt Helsley @ 2006-10-13 23:38 UTC (permalink / raw)
To: Paul Jackson; +Cc: Greg KH, sekharan, menage, linux-kernel, ckrm-tech
On Thu, 2006-10-12 at 17:16 -0700, Paul Jackson wrote:
<snip>
> For those of us whose brains don't hold so many details at once,
> creating a new file system can seem a bit daunting. And for those
> of us not skilled in the art, it is more likely to end up being
> 300 lines of code, presenting several good provocations for a Hellwig
> or a Viro to curse in the general direction of their monitors.
Definitely.
> Instead of trying to hijack configfs to purposes ill suited for it,
> I wonder if there isn't someway to lower the hurdles that us mere
> mortals must leap to creating additional filesystems.
Again, I don't think using configfs to define groups of tasks is
ill-suited to the purpose of configfs. I think we're confusing the
limitations of the implementation with the purpose for having configfs
in addition to sysfs.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 23:51 ` Greg KH
2006-10-13 0:16 ` Paul Jackson
@ 2006-10-13 23:40 ` Matt Helsley
2006-10-13 23:47 ` Paul Menage
2006-10-14 6:17 ` Greg KH
2006-10-16 19:10 ` Chandra Seetharaman
2 siblings, 2 replies; 56+ messages in thread
From: Matt Helsley @ 2006-10-13 23:40 UTC (permalink / raw)
To: Greg KH; +Cc: Chandra Seetharaman, Paul Menage, linux-kernel, ckrm-tech
On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
<snip>
> > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > processes aggregation from cpuset to have an independent infrastructure
> > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > has its own file system. I was advocating him to use configfs. But, he
> > also has this issue/limitation.
>
> That's one reason it is so easy to just write your own filesystem then.
> What is it these days, less than 200 lines of code? I bet you can even
For my_school_project_fs perhaps 200 lines is sufficient.
Paul Menage's patch which Chandra was referring to:
http://lkml.org/lkml/2006/9/28/104
is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
patch -- before we moved to configfs and reduced that to about 300-400
lines. This suggests we'd need around 1500 lines of filesystem code --
7.5 times your estimate.
> condence more things to make it 100 lines if you really try. That seems
> much more sane than trying to bend configfs into something different.
I don't agree. I think it's insane not to use configfs just because we
need a list of pids for each group of tasks.
> Why are people so opposed to creating their own filesystems?
There are lots of reasons not to create your own filesystem. When
you're not already a kernel maintainer it's no small task to create and
get a non-trivial filesystem accepted into the kernel. Getting people to
review whole new filesystems has its own problems:
http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-13 23:40 ` Matt Helsley
@ 2006-10-13 23:47 ` Paul Menage
2006-10-14 6:17 ` Greg KH
1 sibling, 0 replies; 56+ messages in thread
From: Paul Menage @ 2006-10-13 23:47 UTC (permalink / raw)
To: Matt Helsley; +Cc: Greg KH, ckrm-tech, Chandra Seetharaman, linux-kernel
On 10/13/06, Matt Helsley <matthltc@us.ibm.com> wrote:
> On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
>
> <snip>
>
> > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > processes aggregation from cpuset to have an independent infrastructure
> > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > has its own file system. I was advocating him to use configfs. But, he
> > > also has this issue/limitation.
> >
> > That's one reason it is so easy to just write your own filesystem then.
> > What is it these days, less than 200 lines of code? I bet you can even
>
> For my_school_project_fs perhaps 200 lines is sufficient.
>
> Paul Menage's patch which Chandra was referring to:
>
> http://lkml.org/lkml/2006/9/28/104
>
> is 1700 insertions.
To be fair, only about 350 lines of that is filesystem boilerplate.
There's also maybe 100-200 lines of interfacing with the filesystem,
but they'd probably be there as configfs-interfacing code if it was
over configfs.
Paul
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-13 23:37 ` Matt Helsley
@ 2006-10-14 0:09 ` Joel Becker
2006-10-15 1:06 ` Matt Helsley
` (2 more replies)
0 siblings, 3 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-14 0:09 UTC (permalink / raw)
To: Matt Helsley
Cc: Linux Kernel, Greg KH, Chandra S. Seetharaman, Andrew Morton,
CKRM-Tech
On Fri, Oct 13, 2006 at 04:37:38PM -0700, Matt Helsley wrote:
> > Sure it works. You have one per resource group. In
> > resource_group_make_object(), you sysfs_mkdir() the sysfs file. There
>
> That's the easy part. Next we need to make the pid attribute whenever a
> new task is created. And delete it when the task dies. And move it
> around whenever it changes groups. Is there rename() support in /sys? If
> not, would changes to allow rename() be acceptable (I'm worried it would
> impact alot of assumptions made in the existing code)?
No, you don't create a pid attribute per task. The sysfs file
is literally your large attribute. So, instead of echoing a new pid to
"/sys/kernel/config/ckrm/group1/pids", you echo to
"/sys/ckrm/group1/pids". To display them all, you just cat
"/sys/ckrm/group1/pids". It's exactly like the file you want in
configfs, just located in a place where it is allowed.
> Consider that having two very similar (but not symlinked!) trees in
> both /sys/ ... /res_group and /sys/kernel/config/res_group could be
> rather confusing to userspace programmers and users alike.
Not really. It's not identical (tons of attributes live in the
configfs part but not the sysfs part), and it has a clear deliniation of
what each does.
> It would be strange because when you rmdir a group
> in /sys/kernel/config/res_group... a directory in /sys would also
> disappear. Yet you can't mkdir or rmdir the /sys dirs. And to edit the
This is no different than tons of sysfs and procfs functionality
today.
> Hmm, that suggests a good point. While some one *can* do that or:
>
> ATTR=( $(cat /sys/kernel/config/ckrm/myresource/attr) )
>
> the space available for environment variables is limited. So attempting
> to store a large (What's "large"?) attribute in an environment variable
> is a potentially buggy practice. This is a significant problem affecting
> large attributes in general.
If you can't do it in sh, it's pretty much out of scope. This
is a taste rule I use, because like to shell program. Sure, not the end
of the world (not a hard rule, I guess), but worth thinking about.
> There are two parts to the complexity: code complexity and the number
> of userspace pieces to deal with. I think that in both of these
> categories the OVPA approach is more complex. Here's how I see it:
By your definition, sysfs, configfs, and other fs-style control
mechanisms are too complex. We should all just be using ioctl() so that
coders and users have only one namespace :-)
> > You're effectively suggesting that a specific attribute type of
> > "repeated value of type X". No mixed types, no exploded structures,
> > just a "list of this attr" sort of thing. This does fit my personal
> > requirement of avoiding a generic, abusable system.
>
> Exactly.
How do you implement it? Full on seq_file with restrictions
(ops->start,stop,next,show)? Some sort of array (how do I placehold
where the last read(2) was)? Some sort of linked list (again with the
placeholding and locking)? Anything short of seq_file+restrictions
would be perhaps binding that traversal, no?
Joel
--
"When I am working on a problem I never think about beauty. I
only think about how to solve the problem. But when I have finished, if
the solution is not beautiful, I know it is wrong."
- Buckminster Fuller
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-13 0:01 ` Paul Jackson
@ 2006-10-14 4:40 ` Greg KH
0 siblings, 0 replies; 56+ messages in thread
From: Greg KH @ 2006-10-14 4:40 UTC (permalink / raw)
To: Paul Jackson; +Cc: Joel Becker, matthltc, linux-kernel
On Thu, Oct 12, 2006 at 05:01:25PM -0700, Paul Jackson wrote:
> Underneath each of these filesystems, sysfs, configfs, cpuset, resource
> groups, ... it would seem ideal if we had a single kernel file system
> infrastructure. Actually, we're only a half a layer from having that,
> with vfs. It just takes a fair bit of glue to construct any of these
> file systems out of vfs primitives.
No, it's pretty small to create a ram-based filesystem these days. Look
at securityfs and debugfs for two simple examples that you should copy
if you want to create your own. If you can come up with ways of making
those files smaller, then that would be great. But as it is, it is
_really_ simple to do this right now, especially with two working
examples to copy from :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-13 23:40 ` Matt Helsley
2006-10-13 23:47 ` Paul Menage
@ 2006-10-14 6:17 ` Greg KH
2006-10-14 23:14 ` Matt Helsley
1 sibling, 1 reply; 56+ messages in thread
From: Greg KH @ 2006-10-14 6:17 UTC (permalink / raw)
To: Matt Helsley; +Cc: Chandra Seetharaman, Paul Menage, linux-kernel, ckrm-tech
On Fri, Oct 13, 2006 at 04:40:08PM -0700, Matt Helsley wrote:
> On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
>
> <snip>
>
> > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > processes aggregation from cpuset to have an independent infrastructure
> > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > has its own file system. I was advocating him to use configfs. But, he
> > > also has this issue/limitation.
> >
> > That's one reason it is so easy to just write your own filesystem then.
> > What is it these days, less than 200 lines of code? I bet you can even
>
> For my_school_project_fs perhaps 200 lines is sufficient.
>
> Paul Menage's patch which Chandra was referring to:
>
> http://lkml.org/lkml/2006/9/28/104
>
> is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
> patch -- before we moved to configfs and reduced that to about 300-400
> lines. This suggests we'd need around 1500 lines of filesystem code --
> 7.5 times your estimate.
Then I suggest that you are doing something extremely wrong here. The
base filesystem code for both debugfs and securityfs, is around 200
lines of code, including comments. And they are both not
"my_school_project_fs".
If you want to get a bit fancier, and parse some mount options and
provide a persistant mount state, like usbfs, it grows to about 350
lines of code.
Again, not a "fake" filesystem by any means.
And, for another example, ndevfs was posted to lkml over a year ago as a
bad joke and can be found at:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/bad/ndevfs.patch
and weighs in at a whopping 249 lines for all of inode.c (comments and
whitespace included.) It was a full-blown filesystem that handled
almost exactly what you will need to handle.
So please, before you critise me for not knowing exactly how much work
it takes to implement a ram-based filesystem for something like this,
take a look at a few of the already published and shipping
implementations, and note who wrote them...
Otherwise you look quite foolish.
> > condence more things to make it 100 lines if you really try. That seems
> > much more sane than trying to bend configfs into something different.
>
> I don't agree. I think it's insane not to use configfs just because we
> need a list of pids for each group of tasks.
Perhaps because your usage is not what it is intended for? Yeah, I
know, if all you have is a hammer...
> > Why are people so opposed to creating their own filesystems?
>
> There are lots of reasons not to create your own filesystem. When
> you're not already a kernel maintainer it's no small task to create and
> get a non-trivial filesystem accepted into the kernel. Getting people to
> review whole new filesystems has its own problems:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html
There's a world of difference between something as complex as unionfs
and something that merely makes a few calls to the libfs code already in
the kernel.
And if you don't realize this, then yes, I would not recommend that you
should write your own fs. But I would then recommend that the task then
be passed on to someone else in your group who does have the capability
to do so...
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-11 23:27 ` Chandra Seetharaman
@ 2006-10-14 8:01 ` Greg KH
2006-10-14 19:43 ` Andrew Morton
2006-10-16 19:16 ` Chandra Seetharaman
0 siblings, 2 replies; 56+ messages in thread
From: Greg KH @ 2006-10-14 8:01 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: Andrew Morton, Joel Becker, ckrm-tech, linux-kernel
On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > On Wed, 11 Oct 2006 15:18:22 -0700
> > Joel Becker <Joel.Becker@oracle.com> wrote:
> >
> > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > process. Migration over to the new interface is trivial and almost
> > > > scriptable.
> > >
> > > The configfs stuff is based on the sysfs code too. Should we
> > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > the cleanup is considered better.
> > >
> >
> > I don't see why not. I don't know if anyone has though of/proposed it
> > before.
>
> I can generate a patch for that too.
Argh!!!!
Are you going to honestly tell me you have a single attribute in sysfs
that is larger than PAGE_SIZE? If you are getting anywhere close to
this, then something is really wrong and we need to talk.
greg k-h
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 8:01 ` Greg KH
@ 2006-10-14 19:43 ` Andrew Morton
2006-10-14 20:10 ` Joel Becker
2006-10-16 19:16 ` Chandra Seetharaman
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2006-10-14 19:43 UTC (permalink / raw)
To: Greg KH; +Cc: Chandra Seetharaman, Joel Becker, ckrm-tech, linux-kernel
On Sat, 14 Oct 2006 01:01:07 -0700
Greg KH <greg@kroah.com> wrote:
> On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > > On Wed, 11 Oct 2006 15:18:22 -0700
> > > Joel Becker <Joel.Becker@oracle.com> wrote:
> > >
> > > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > > process. Migration over to the new interface is trivial and almost
> > > > > scriptable.
> > > >
> > > > The configfs stuff is based on the sysfs code too. Should we
> > > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > > the cleanup is considered better.
> > > >
> > >
> > > I don't see why not. I don't know if anyone has though of/proposed it
> > > before.
> >
> > I can generate a patch for that too.
>
> Argh!!!!
>
> Are you going to honestly tell me you have a single attribute in sysfs
> that is larger than PAGE_SIZE?
He does not. It's a matter of reusing existing facilities rather than
impementing similar things in multiple places. The equivalent patch in
configfs removed a decent amount of code:
fs/configfs/file.c | 130 ++++++++++-------------------------------------
include/linux/configfs.h | 3 -
2 files changed, 31 insertions(+), 102 deletions(-)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 19:43 ` Andrew Morton
@ 2006-10-14 20:10 ` Joel Becker
2006-10-16 19:24 ` Chandra Seetharaman
0 siblings, 1 reply; 56+ messages in thread
From: Joel Becker @ 2006-10-14 20:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Greg KH, Chandra Seetharaman, ckrm-tech, linux-kernel
On Sat, Oct 14, 2006 at 12:43:51PM -0700, Andrew Morton wrote:
> On Sat, 14 Oct 2006 01:01:07 -0700
> Greg KH <greg@kroah.com> wrote:
> > Argh!!!!
> >
> > Are you going to honestly tell me you have a single attribute in sysfs
> > that is larger than PAGE_SIZE?
>
> He does not. It's a matter of reusing existing facilities rather than
> impementing similar things in multiple places. The equivalent patch in
> configfs removed a decent amount of code:
Issues of PAGE_SIZE and attribute size aside, the patch posted
was incorrect. While they used seq_file, they implemented it in a
completely inefficent fashion, filling the entire buffer in one ->show()
call rather than letting multiple read(2) calls iterate their list.
Joel
--
"Anything that is too stupid to be spoken is sung."
- Voltaire
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 6:17 ` Greg KH
@ 2006-10-14 23:14 ` Matt Helsley
0 siblings, 0 replies; 56+ messages in thread
From: Matt Helsley @ 2006-10-14 23:14 UTC (permalink / raw)
To: Greg KH; +Cc: Chandra Seetharaman, Paul Menage, linux-kernel, ckrm-tech
On Fri, 2006-10-13 at 23:17 -0700, Greg KH wrote:
> On Fri, Oct 13, 2006 at 04:40:08PM -0700, Matt Helsley wrote:
> > On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
> >
> > <snip>
> >
> > > > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > > > processes aggregation from cpuset to have an independent infrastructure
> > > > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > > > has its own file system. I was advocating him to use configfs. But, he
> > > > also has this issue/limitation.
> > >
> > > That's one reason it is so easy to just write your own filesystem then.
> > > What is it these days, less than 200 lines of code? I bet you can even
> >
> > For my_school_project_fs perhaps 200 lines is sufficient.
> >
> > Paul Menage's patch which Chandra was referring to:
> >
> > http://lkml.org/lkml/2006/9/28/104
> >
> > is 1700 insertions. RCFS was around 1500 lines -- similar to Paul's
> > patch -- before we moved to configfs and reduced that to about 300-400
> > lines. This suggests we'd need around 1500 lines of filesystem code --
> > 7.5 times your estimate.
>
> Then I suggest that you are doing something extremely wrong here. The
> base filesystem code for both debugfs and securityfs, is around 200
> lines of code, including comments. And they are both not
> "my_school_project_fs".
>
> If you want to get a bit fancier, and parse some mount options and
> provide a persistant mount state, like usbfs, it grows to about 350
> lines of code.
>
> Again, not a "fake" filesystem by any means.
>
> And, for another example, ndevfs was posted to lkml over a year ago as a
> bad joke and can be found at:
> http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/bad/ndevfs.patch
> and weighs in at a whopping 249 lines for all of inode.c (comments and
> whitespace included.) It was a full-blown filesystem that handled
> almost exactly what you will need to handle.
>
> So please, before you critise me for not knowing exactly how much work
> it takes to implement a ram-based filesystem for something like this,
> take a look at a few of the already published and shipping
> implementations, and note who wrote them...
>
> Otherwise you look quite foolish.
>
> > > condence more things to make it 100 lines if you really try. That seems
> > > much more sane than trying to bend configfs into something different.
> >
> > I don't agree. I think it's insane not to use configfs just because we
> > need a list of pids for each group of tasks.
>
> Perhaps because your usage is not what it is intended for? Yeah, I
> know, if all you have is a hammer...
>
> > > Why are people so opposed to creating their own filesystems?
> >
> > There are lots of reasons not to create your own filesystem. When
> > you're not already a kernel maintainer it's no small task to create and
> > get a non-trivial filesystem accepted into the kernel. Getting people to
> > review whole new filesystems has its own problems:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/1928.html
>
> There's a world of difference between something as complex as unionfs
> and something that merely makes a few calls to the libfs code already in
> the kernel.
>
> And if you don't realize this, then yes, I would not recommend that you
> should write your own fs. But I would then recommend that the task then
> be passed on to someone else in your group who does have the capability
> to do so...
>
> greg k-h
You're right. I'm sorry I suggested that 200 lines might be too small
an estimate. Clearly that was a moronic suggestion on my part and I'm
sorry you took it so personally.
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 0:09 ` Joel Becker
@ 2006-10-15 1:06 ` Matt Helsley
2006-10-15 19:07 ` Paul Jackson
2006-10-16 19:33 ` Chandra Seetharaman
2 siblings, 0 replies; 56+ messages in thread
From: Matt Helsley @ 2006-10-15 1:06 UTC (permalink / raw)
To: Joel Becker
Cc: Linux Kernel, Greg KH, Chandra S. Seetharaman, Andrew Morton,
CKRM-Tech
On Fri, 2006-10-13 at 17:09 -0700, Joel Becker wrote:
> On Fri, Oct 13, 2006 at 04:37:38PM -0700, Matt Helsley wrote:
> > > Sure it works. You have one per resource group. In
> > > resource_group_make_object(), you sysfs_mkdir() the sysfs file. There
> >
> > That's the easy part. Next we need to make the pid attribute whenever a
> > new task is created. And delete it when the task dies. And move it
> > around whenever it changes groups. Is there rename() support in /sys? If
> > not, would changes to allow rename() be acceptable (I'm worried it would
> > impact alot of assumptions made in the existing code)?
>
> No, you don't create a pid attribute per task. The sysfs file
> is literally your large attribute. So, instead of echoing a new pid to
> "/sys/kernel/config/ckrm/group1/pids", you echo to
> "/sys/ckrm/group1/pids". To display them all, you just cat
> "/sys/ckrm/group1/pids". It's exactly like the file you want in
> configfs, just located in a place where it is allowed.
Oh, sorry. I was still operating on the one-value-per-attribute
assumption. This indeed looks like it would work.
> > Consider that having two very similar (but not symlinked!) trees in
> > both /sys/ ... /res_group and /sys/kernel/config/res_group could be
> > rather confusing to userspace programmers and users alike.
>
> Not really. It's not identical (tons of attributes live in the
> configfs part but not the sysfs part), and it has a clear deliniation of
> what each does.
Clear delineation to who? I'm not convinced this is any less confusing
to a userspace programmer than parsing a single newline between multiple
values in a configfs attribute.
> > It would be strange because when you rmdir a group
> > in /sys/kernel/config/res_group... a directory in /sys would also
> > disappear. Yet you can't mkdir or rmdir the /sys dirs. And to edit the
>
> This is no different than tons of sysfs and procfs functionality
> today.
Yup.
> > There are two parts to the complexity: code complexity and the number
> > of userspace pieces to deal with. I think that in both of these
> > categories the OVPA approach is more complex. Here's how I see it:
>
> By your definition, sysfs, configfs, and other fs-style control
> mechanisms are too complex. We should all just be using ioctl() so that
> coders and users have only one namespace :-)
That's an absurd conclusion to draw from my argument that one
filesystem-based approach is less complex than another.
> > > You're effectively suggesting that a specific attribute type of
> > > "repeated value of type X". No mixed types, no exploded structures,
> > > just a "list of this attr" sort of thing. This does fit my personal
> > > requirement of avoiding a generic, abusable system.
> >
> > Exactly.
>
> How do you implement it? Full on seq_file with restrictions
> (ops->start,stop,next,show)?
That was the plan.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 0:09 ` Joel Becker
2006-10-15 1:06 ` Matt Helsley
@ 2006-10-15 19:07 ` Paul Jackson
2006-10-16 19:33 ` Chandra Seetharaman
2 siblings, 0 replies; 56+ messages in thread
From: Paul Jackson @ 2006-10-15 19:07 UTC (permalink / raw)
To: Joel Becker; +Cc: matthltc, linux-kernel, gregkh, sekharan, akpm, ckrm-tech
Joel, responding to Matt, wrote:
> > Hmm, that suggests a good point. While some one *can* do that or:
> >
> > ATTR=( $(cat /sys/kernel/config/ckrm/myresource/attr) )
> >
> > the space available for environment variables is limited. So attempting
> > to store a large (What's "large"?) attribute in an environment variable
> > is a potentially buggy practice. This is a significant problem affecting
> > large attributes in general.
>
> If you can't do it in sh, it's pretty much out of scope. This
> is a taste rule I use, because like to shell program. Sure, not the end
> of the world (not a hard rule, I guess), but worth thinking about.
I too like to shell program. Such potentially long vectors can
be worked in the shell, just not placed in a single command line
argument or environment variable.
Constructs that do work (using the list of process id's in the
file /dev/cpuset/tasks for these examples) include:
1. while read pid
do
... do something with each $pid ...
done < /dev/cpuset/tasks
2. < /dev/cpuset/tasks xargs ps -flp
3. sed -un p < /dev/cpuset/foo/tasks > /dev/cpuset/tasks
Such shell constructs for dealing with vectors that might be longer
than argument or environment length limits have been needed and known
for decades.
In an earlier message, Joel wrote:
> You're effectively suggesting that a specific attribute type of
> "repeated value of type X". No mixed types, no exploded structures,
> just a "list of this attr" sort of thing. This does fit my personal
> requirement of avoiding a generic, abusable system.
Yes - what I call above a "vector."
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-12 23:51 ` Greg KH
2006-10-13 0:16 ` Paul Jackson
2006-10-13 23:40 ` Matt Helsley
@ 2006-10-16 19:10 ` Chandra Seetharaman
2006-10-16 20:32 ` Paul Jackson
2 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-16 19:10 UTC (permalink / raw)
To: Greg KH; +Cc: Paul Menage, ckrm-tech, Matt Helsley, linux-kernel
On Thu, 2006-10-12 at 16:51 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 04:26:00PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:39 -0700, Greg KH wrote:
> > > On Tue, Oct 10, 2006 at 06:28:51PM -0700, Joel Becker wrote:
> > > > On Tue, Oct 10, 2006 at 05:49:59PM -0700, Matt Helsley wrote:
> > > > > We want to be able to export a sequence of small (<< 1 page),
> > > > > homogenous, unstructured (scalar), attributes through configfs using the
> > > > > same file. While this is rather specific, I'd guess it would be a common
> > > > > occurrence.
> > > >
> > > > Pray tell, why? "One attribute per file" is the mantra here.
> > > > You really should think hard before you break it. Simple heuristic:
> > > > would you have to parse the buffer? Then it's wrong.
> > >
> > > I agree. You are trying to use configfs for something that it is not
> > > entended to be used for. If you want to write/read large numbers of
> > > attrbutes like this, use your own filesystem.
> >
> > I would say it is a "large attribute" not "large numbers of attributes".
>
> That attribute seems to violate the rule of "only one thing" and needs
> to be parsed. That does not seem like a good fit for configfs or sysfs.
As pointed by Paul Jackson in a reply, what we are trying to get is a
list of pids, which can be termed as vectors and doesn't need serious
parsing, but simply separating the items in the list by a '\n'.
What I was trying to do in the patches is to remove some code (which
maintains its own buffer) by an existing mechanism, seq_file.
As Andrew pointed in one of the email, my patch set reduces the number
of lines of code in configfs also.
You do not think that is a good idea ?
>
>
> > > configfs has the same "one value per file" rule that sysfs has. And
> > > because your userspace model doesn't fit that, don't try to change
> > > configfs here.
> > >
> > > What happened to your old ckrmfs? I thought you were handling all of
> > > this in that.
> >
> > We decided to use an existing infrastructure instead of having our own
> > file system.
> >
> > configfs is a perfect fit for us, except the size limitation.
>
> Then it's not a perfect fit, sorry, as you are trying to get it to do
> things it is not supposed to, or designed to, do.
>
> > BTW, it it not just CKRM/RG, Paul Menage as recently extracted the
> > processes aggregation from cpuset to have an independent infrastructure
> > (http://marc.theaimsgroup.com/?l=ckrm-tech&m=116006307018720&w=2), which
> > has its own file system. I was advocating him to use configfs. But, he
> > also has this issue/limitation.
>
> That's one reason it is so easy to just write your own filesystem then.
> What is it these days, less than 200 lines of code? I bet you can even
> condence more things to make it 100 lines if you really try. That seems
> much more sane than trying to bend configfs into something different.
I thought the mantra is "Use existing infrastructure, instead of writing
your own".
>
> Why are people so opposed to creating their own filesystems?
I do not think we were/are. As you pointed we did have our own file
system. And as I mentioned, we moved to configfs mainly to reduce code
size and to start using the existing infrastructure.
Same is true even for the (cpuset extracted) patch set submitted by Paul
Menage. It does have its own file system. I was advocating him to use
configfs for the same reason (to use existing interface).
> thanks,
>
> greg k-h
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 8:01 ` Greg KH
2006-10-14 19:43 ` Andrew Morton
@ 2006-10-16 19:16 ` Chandra Seetharaman
1 sibling, 0 replies; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-16 19:16 UTC (permalink / raw)
To: Greg KH; +Cc: Andrew Morton, Joel Becker, ckrm-tech, linux-kernel
On Sat, 2006-10-14 at 01:01 -0700, Greg KH wrote:
> On Wed, Oct 11, 2006 at 04:27:13PM -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-10-11 at 15:48 -0700, Andrew Morton wrote:
> > > On Wed, 11 Oct 2006 15:18:22 -0700
> > > Joel Becker <Joel.Becker@oracle.com> wrote:
> > >
> > > > On Wed, Oct 11, 2006 at 01:19:35PM -0700, Andrew Morton wrote:
> > > > > The patch deletes a pile of custom code from configfs and replaces it with
> > > > > calls to standard kernel infrastructure and fixes a shortcoming/bug in the
> > > > > process. Migration over to the new interface is trivial and almost
> > > > > scriptable.
> > > >
> > > > The configfs stuff is based on the sysfs code too. Should we
> > > > migrate sysfs/file.c to the same seq_file code? Serious question, if
> > > > the cleanup is considered better.
> > > >
> > >
> > > I don't see why not. I don't know if anyone has though of/proposed it
> > > before.
> >
> > I can generate a patch for that too.
>
> Argh!!!!
>
> Are you going to honestly tell me you have a single attribute in sysfs
> that is larger than PAGE_SIZE? If you are getting anywhere close to
> this, then something is really wrong and we need to talk.
I was replying to Andrew's reply in the thread. Thats all.
I do _not_ have any personal interests here (sysfs).
>
> greg k-h
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 20:10 ` Joel Becker
@ 2006-10-16 19:24 ` Chandra Seetharaman
2006-10-16 23:09 ` Joel Becker
0 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-16 19:24 UTC (permalink / raw)
To: Joel Becker; +Cc: Andrew Morton, Greg KH, ckrm-tech, linux-kernel
On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> On Sat, Oct 14, 2006 at 12:43:51PM -0700, Andrew Morton wrote:
> > On Sat, 14 Oct 2006 01:01:07 -0700
> > Greg KH <greg@kroah.com> wrote:
> > > Argh!!!!
> > >
> > > Are you going to honestly tell me you have a single attribute in sysfs
> > > that is larger than PAGE_SIZE?
> >
> > He does not. It's a matter of reusing existing facilities rather than
> > impementing similar things in multiple places. The equivalent patch in
> > configfs removed a decent amount of code:
>
> Issues of PAGE_SIZE and attribute size aside, the patch posted
> was incorrect. While they used seq_file, they implemented it in a
I do not think it is incorrect. It provides a simplest interface to
configfs's clients.
> completely inefficent fashion, filling the entire buffer in one ->show()
> call rather than letting multiple read(2) calls iterate their list.
So, are you suggesting that we should provide a generic seq_operations
interface to configfs's clients ?
>
> Joel
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-14 0:09 ` Joel Becker
2006-10-15 1:06 ` Matt Helsley
2006-10-15 19:07 ` Paul Jackson
@ 2006-10-16 19:33 ` Chandra Seetharaman
2006-10-16 23:07 ` Joel Becker
2 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-16 19:33 UTC (permalink / raw)
To: Joel Becker; +Cc: Matt Helsley, Linux Kernel, Greg KH, Andrew Morton, CKRM-Tech
On Fri, 2006-10-13 at 17:09 -0700, Joel Becker wrote:
> On Fri, Oct 13, 2006 at 04:37:38PM -0700, Matt Helsley wrote:
> > > Sure it works. You have one per resource group. In
> > > resource_group_make_object(), you sysfs_mkdir() the sysfs file. There
> >
> > That's the easy part. Next we need to make the pid attribute whenever a
> > new task is created. And delete it when the task dies. And move it
> > around whenever it changes groups. Is there rename() support in /sys? If
> > not, would changes to allow rename() be acceptable (I'm worried it would
> > impact alot of assumptions made in the existing code)?
>
> No, you don't create a pid attribute per task. The sysfs file
> is literally your large attribute. So, instead of echoing a new pid to
> "/sys/kernel/config/ckrm/group1/pids", you echo to
> "/sys/ckrm/group1/pids". To display them all, you just cat
> "/sys/ckrm/group1/pids". It's exactly like the file you want in
> configfs, just located in a place where it is allowed.
>From what I see, sysfs also has the PAGESIZE limitation. If that _is_
the case, then moving to sysfs does not help us any. Correct me if I am
wrong.
Won't we have the same arguments that we have now ?
<snip>
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 19:10 ` Chandra Seetharaman
@ 2006-10-16 20:32 ` Paul Jackson
2006-10-16 22:29 ` Chandra Seetharaman
0 siblings, 1 reply; 56+ messages in thread
From: Paul Jackson @ 2006-10-16 20:32 UTC (permalink / raw)
To: sekharan; +Cc: greg, linux-kernel, menage, matthltc, ckrm-tech
Chandra wrote:
> As Andrew pointed in one of the email, my patch set reduces the number
> of lines of code in configfs also.
I think Andrew has mentioned this a couple of times ;)
Actually, I've never been particularly happy with the implicit
PAGESIZE limit on these writes. It's dangerous coding practice to
pass someone a buffer pointer in which they are to write, without
passing a corresponding length.
If it is appropriate for configfs to have some such fixed limit on
file lengths, then that should be a formal part of the API or imposed
in a safer manner than hoping the callback routine doesn't overwrite
a buffer.
Whether or not it should use homebrew code as it does now, or Chandra's
patch to make use of existing infrastructure should be a separate
question. I think Andrew's observations apply here.
And whether or not we add support for a vector of scalars of the same
type and meaning should be yet another separate question. No doubt
reasonable minds will differ on this question, though so far that
discussion has been clouded by these other issues.
Greg seems to be suggesting that if we use Chandra's patch, we cannot
impose any length limit, and that this opens the cover to Pandora's
box of all manner of changing and complex output per file.
Well, I could code some pretty ugly output in a single page, if I
set my mind to it. But it would be rejected, because we've learned
that hurts.
>From that I conclude that it is not the PAGESIZE limit that is saving
us from more unparseable file formats, but the discipline we have
gained from learning the hard way.
Chandra - I haven't looked at seq file lately - could a user of it
such as configfs impose a length limit of its choosing, building on
your patch, without pushing the number of lines of code back above
where it started?
Perhaps, say, we would let the callback routines could push stuff into
a seq file without small limits, but then the configfs code could
truncate that output to a limit of its choosing. This would impose a
length limit, safely.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 20:32 ` Paul Jackson
@ 2006-10-16 22:29 ` Chandra Seetharaman
2006-10-17 2:59 ` Paul Jackson
0 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-16 22:29 UTC (permalink / raw)
To: Paul Jackson; +Cc: greg, linux-kernel, menage, matthltc, ckrm-tech
On Mon, 2006-10-16 at 13:32 -0700, Paul Jackson wrote:
<snip>
> Chandra - I haven't looked at seq file lately - could a user of it
> such as configfs impose a length limit of its choosing, building on
Quick look at the seq_file interfaces shows there is no such capability.
(disclaimer: I am no expert of seq_file :)
> your patch, without pushing the number of lines of code back above
> where it started?
>
> Perhaps, say, we would let the callback routines could push stuff into
> a seq file without small limits, but then the configfs code could
> truncate that output to a limit of its choosing. This would impose a
> length limit, safely.
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 19:33 ` Chandra Seetharaman
@ 2006-10-16 23:07 ` Joel Becker
0 siblings, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-16 23:07 UTC (permalink / raw)
To: Chandra Seetharaman
Cc: Matt Helsley, Linux Kernel, Greg KH, Andrew Morton, CKRM-Tech
On Mon, Oct 16, 2006 at 12:33:41PM -0700, Chandra Seetharaman wrote:
> >From what I see, sysfs also has the PAGESIZE limitation. If that _is_
> the case, then moving to sysfs does not help us any. Correct me if I am
> wrong.
>
> Won't we have the same arguments that we have now ?
If you use a 'struct attribute' in sysfs, sure, you have the
same limit. The difference is that sysfs allows you to install your own
files. So you could create a seq_file of your own and insert it into
your sysfs directory.
Joel
--
"I think it would be a good idea."
- Mahatma Ghandi, when asked what he thought of Western
civilization
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 19:24 ` Chandra Seetharaman
@ 2006-10-16 23:09 ` Joel Becker
2006-10-18 0:55 ` Chandra Seetharaman
0 siblings, 1 reply; 56+ messages in thread
From: Joel Becker @ 2006-10-16 23:09 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: Andrew Morton, Greg KH, ckrm-tech, linux-kernel
On Mon, Oct 16, 2006 at 12:24:35PM -0700, Chandra Seetharaman wrote:
> On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> > Issues of PAGE_SIZE and attribute size aside, the patch posted
> > was incorrect. While they used seq_file, they implemented it in a
>
> I do not think it is incorrect. It provides a simplest interface to
> configfs's clients.
I understand that it provides a simple sed(1)-based change for
existing clients. However, it abuses seq_file in exactly the wrong way.
> > completely inefficent fashion, filling the entire buffer in one ->show()
> > call rather than letting multiple read(2) calls iterate their list.
>
> So, are you suggesting that we should provide a generic seq_operations
> interface to configfs's clients ?
Yes, if we provide a vector type in any way, it should be via
seq_operations or something like it. If we're going to use seq_file, we
should use it correctly.
Joel
--
"What no boss of a programmer can ever understand is that a programmer
is working when he's staring out of the window"
- With apologies to Burton Rascoe
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [ckrm-tech] [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 22:29 ` Chandra Seetharaman
@ 2006-10-17 2:59 ` Paul Jackson
0 siblings, 0 replies; 56+ messages in thread
From: Paul Jackson @ 2006-10-17 2:59 UTC (permalink / raw)
To: sekharan; +Cc: greg, menage, ckrm-tech, linux-kernel, matthltc
> Quick look at the seq_file interfaces shows there is no such capability.
Perhaps a seq file size limit could be added. Not sure if that's
a good idea or not ...
Ah - cap the count + ppos the user passed in to configfs_read_file,
before passing these values to flush_read_buffer().
If the user asks for more than is allowed, give them only what they are
allowed, by passing a smaller count to flush_read_buffer(). If they
start at a position past what's allowed, force a huge ppos and let them
see the resulting EOF. Disclaimer - I too am no seq_file expert ;).
This should be just a few more lines in configfs_read_file() on
top of your current patches adapting it to seq_file.
Granted - it is not going in directly in one step to the objective you
seek, that being a configfs suitable for displaying a long vector of
process id's, so that Resource Groups can make use of it (and maybe
someday cpusets, too.)
But it gains the code reduction and reuse benefits of your patch
set, and gets this conversation unwedged, so we can go on to discuss
whether or not it would be a good idea go add a suitable vector
interface to seq_file, without threatening the excellent improvements
that sysfs/configfs have made over the old /proc style mess.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-16 23:09 ` Joel Becker
@ 2006-10-18 0:55 ` Chandra Seetharaman
2006-10-19 18:42 ` Joel Becker
0 siblings, 1 reply; 56+ messages in thread
From: Chandra Seetharaman @ 2006-10-18 0:55 UTC (permalink / raw)
To: Joel Becker; +Cc: Andrew Morton, Greg KH, ckrm-tech, linux-kernel
On Mon, 2006-10-16 at 16:09 -0700, Joel Becker wrote:
> On Mon, Oct 16, 2006 at 12:24:35PM -0700, Chandra Seetharaman wrote:
> > On Sat, 2006-10-14 at 13:10 -0700, Joel Becker wrote:
> > > Issues of PAGE_SIZE and attribute size aside, the patch posted
> > > was incorrect. While they used seq_file, they implemented it in a
> >
> > I do not think it is incorrect. It provides a simplest interface to
> > configfs's clients.
>
> I understand that it provides a simple sed(1)-based change for
> existing clients. However, it abuses seq_file in exactly the wrong way.
>
> > > completely inefficent fashion, filling the entire buffer in one ->show()
> > > call rather than letting multiple read(2) calls iterate their list.
> >
> > So, are you suggesting that we should provide a generic seq_operations
> > interface to configfs's clients ?
>
> Yes, if we provide a vector type in any way, it should be via
> seq_operations or something like it. If we're going to use seq_file, we
> should use it correctly.
>
So, you want me to make a patch that uses seq_op instead of seq_file ?
I am willing to do it.
> Joel
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 0/5] Allow more than PAGESIZE data read in configfs
2006-10-18 0:55 ` Chandra Seetharaman
@ 2006-10-19 18:42 ` Joel Becker
0 siblings, 0 replies; 56+ messages in thread
From: Joel Becker @ 2006-10-19 18:42 UTC (permalink / raw)
To: Chandra Seetharaman; +Cc: Andrew Morton, Greg KH, ckrm-tech, linux-kernel
On Tue, Oct 17, 2006 at 05:55:28PM -0700, Chandra Seetharaman wrote:
> On Mon, 2006-10-16 at 16:09 -0700, Joel Becker wrote:
> > Yes, if we provide a vector type in any way, it should be via
> > seq_operations or something like it. If we're going to use seq_file, we
> > should use it correctly.
> >
>
> So, you want me to make a patch that uses seq_op instead of seq_file ?
> I am willing to do it.
I'd like to see a patch that enforces vectorization and nothing
else, if we were to do anything.
Joel
--
"When arrows don't penetrate, see...
Cupid grabs the pistol."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2006-10-19 18:42 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-10 18:20 [PATCH 0/5] Allow more than PAGESIZE data read in configfs Chandra Seetharaman
2006-10-10 18:20 ` [PATCH 1/5] Fix a module count leak Chandra Seetharaman
2006-10-10 22:17 ` Joel Becker
2006-10-10 18:20 ` [PATCH 2/5] Use seq_file for read side of operations Chandra Seetharaman
2006-10-11 9:12 ` Joel Becker
2006-10-10 18:21 ` [PATCH 3/5] Change configfs_example.c to use the new interface Chandra Seetharaman
2006-10-10 18:21 ` [PATCH 4/5] Change Documentation to reflect " Chandra Seetharaman
2006-10-10 18:21 ` [PATCH 5/5] Change the existing code to use " Chandra Seetharaman
2006-10-10 20:35 ` [PATCH 0/5] Allow more than PAGESIZE data read in configfs Joel Becker
2006-10-10 21:31 ` [ckrm-tech] " Paul Menage
2006-10-10 21:58 ` Joel Becker
2006-10-10 23:13 ` Chandra Seetharaman
2006-10-11 0:15 ` Joel Becker
2006-10-11 0:49 ` Matt Helsley
2006-10-11 1:28 ` Joel Becker
2006-10-11 22:39 ` Greg KH
2006-10-11 23:26 ` Chandra Seetharaman
2006-10-12 4:17 ` Paul Jackson
2006-10-12 23:51 ` Greg KH
2006-10-13 0:16 ` Paul Jackson
2006-10-13 23:38 ` Matt Helsley
2006-10-13 23:40 ` Matt Helsley
2006-10-13 23:47 ` Paul Menage
2006-10-14 6:17 ` Greg KH
2006-10-14 23:14 ` Matt Helsley
2006-10-16 19:10 ` Chandra Seetharaman
2006-10-16 20:32 ` Paul Jackson
2006-10-16 22:29 ` Chandra Seetharaman
2006-10-17 2:59 ` Paul Jackson
2006-10-12 2:17 ` Matt Helsley
2006-10-12 23:54 ` Greg KH
2006-10-13 3:22 ` Matt Helsley
[not found] ` <20061011220619.GB7911@ca-server1.us.oracle.com>
[not found] ` <1160619516.18766.209.camel@localhost.localdomain>
2006-10-12 7:08 ` Joel Becker
2006-10-12 21:44 ` Paul Jackson
2006-10-12 22:51 ` Joel Becker
2006-10-13 0:01 ` Paul Jackson
2006-10-14 4:40 ` Greg KH
2006-10-13 23:37 ` Matt Helsley
2006-10-14 0:09 ` Joel Becker
2006-10-15 1:06 ` Matt Helsley
2006-10-15 19:07 ` Paul Jackson
2006-10-16 19:33 ` Chandra Seetharaman
2006-10-16 23:07 ` Joel Becker
2006-10-11 20:19 ` Andrew Morton
2006-10-11 21:41 ` Joel Becker
2006-10-11 22:18 ` Joel Becker
2006-10-11 22:48 ` Andrew Morton
2006-10-11 23:27 ` Chandra Seetharaman
2006-10-14 8:01 ` Greg KH
2006-10-14 19:43 ` Andrew Morton
2006-10-14 20:10 ` Joel Becker
2006-10-16 19:24 ` Chandra Seetharaman
2006-10-16 23:09 ` Joel Becker
2006-10-18 0:55 ` Chandra Seetharaman
2006-10-19 18:42 ` Joel Becker
2006-10-16 19:16 ` Chandra Seetharaman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox