* [BUG] sysfs on 2.5.48 unable to remove files while in use
@ 2002-11-21 15:39 Rusty Lynch
2002-11-21 17:04 ` Patrick Mochel
2002-11-21 20:45 ` Patrick Mochel
0 siblings, 2 replies; 15+ messages in thread
From: Rusty Lynch @ 2002-11-21 15:39 UTC (permalink / raw)
To: mochel; +Cc: Linux Kernel Mailing List
I have noticed some strange behavior on my 2.5.48 build
where sysfs is not able to handle file/directory removal
correctly when the file/directory is in use.
I can see this with a little kprobes example code that I have
been playing with that will create entries like:
/sysfsroot/noisy/0xc0107ae0/sys_fork
when someone uses that driver to insert a kernel probe
at 0xc0107ae0 that will printk "sys_fork".
What I have noticed, is that if I create a new probe
(which will create the sysfs entry), open a new shell and
cd to /sysfsroot/noisy/0xc0107ae0, and then use my
driver to remove the probe (which will remove the
sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't
go away after I cd out of the shell.
>From then on any attempts to create new sysfs entries
do not show up in /sysfsroot/ until I unload/load my driver
again.
It seems like this could be an issue with some real code
(not just this stupid play code of mine), like maybe hotswap
code that updates sysfs entries.
My patch to 2.5.48 (with kprobes already applied) can be
grabbed from:
http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff
- or here is an inline version that $%^&^ outlook will surely screw up:
diff -urN linux-2.5.48-kprobes/arch/i386/kernel/i386_ksyms.c
linux-2.5.48-kprobes-patched/arch/i386/kernel/i386_ksyms.c
--- linux-2.5.48-kprobes/arch/i386/kernel/i386_ksyms.c 2002-11-17
20:29:57.000000000 -0800
+++ linux-2.5.48-kprobes-patched/arch/i386/kernel/i386_ksyms.c 2002-11-18
15:14:42.000000000 -0800
@@ -59,6 +59,8 @@
extern unsigned long cpu_khz;
extern unsigned long get_cmos_time(void);
+extern int valid_kernel_address(unsigned long addr);
+
/* platform dependent support */
EXPORT_SYMBOL(boot_cpu_data);
#ifdef CONFIG_EISA
@@ -91,6 +93,7 @@
EXPORT_SYMBOL(get_cmos_time);
EXPORT_SYMBOL(cpu_khz);
EXPORT_SYMBOL(apm_info);
+EXPORT_SYMBOL(valid_kernel_address);
#ifdef CONFIG_DEBUG_IOVIRT
EXPORT_SYMBOL(__io_virt_debug);
diff -urN linux-2.5.48-kprobes/arch/i386/kernel/traps.c
linux-2.5.48-kprobes-patched/arch/i386/kernel/traps.c
--- linux-2.5.48-kprobes/arch/i386/kernel/traps.c 2002-11-18
15:14:25.000000000 -0800
+++ linux-2.5.48-kprobes-patched/arch/i386/kernel/traps.c 2002-11-18
15:14:42.000000000 -0800
@@ -131,6 +131,11 @@
#endif
+int valid_kernel_address(unsigned long addr)
+{
+ return kernel_text_address(addr);
+}
+
void show_trace(unsigned long * stack)
{
int i;
diff -urN linux-2.5.48-kprobes/drivers/char/Kconfig
linux-2.5.48-kprobes-patched/drivers/char/Kconfig
--- linux-2.5.48-kprobes/drivers/char/Kconfig 2002-11-17
20:29:21.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/Kconfig 2002-11-18
15:14:59.000000000 -0800
@@ -1270,5 +1270,20 @@
Once bound, I/O against /dev/raw/rawN uses efficient zero-copy I/O.
See the raw(8) manpage for more details.
+config NOISY
+ tristate "Noisy Interface Support"
+ ---help---
+ If you say Y here and create a character special file /dev/noisy with
+ major number 10 and minor number 221 using mknod ("man mknod"), you
+ will get access to an interface for inserting arbitrary printk's
+ into executing kernel code.
+
+ This driver is also available as a module ( = code which can be
+ inserted in and removed from the running kernel whenever you want).
+ The module is called noisy.o. If you want to compile it as a module,
+ say M here and read <file:Documentation/modules.txt>.
+
+ If unsure, say N.
+
endmenu
diff -urN linux-2.5.48-kprobes/drivers/char/Makefile
linux-2.5.48-kprobes-patched/drivers/char/Makefile
--- linux-2.5.48-kprobes/drivers/char/Makefile 2002-11-17
20:29:56.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/Makefile 2002-11-18
15:14:59.000000000 -0800
@@ -102,6 +102,7 @@
obj-$(CONFIG_AGP) += agp/
obj-$(CONFIG_DRM) += drm/
obj-$(CONFIG_PCMCIA) += pcmcia/
+obj-$(CONFIG_NOISY) += noisy.o
# Files generated that shall be removed upon make clean
clean-files := consolemap_deftbl.c defkeymap.c qtronixmap.c
diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c
linux-2.5.48-kprobes-patched/drivers/char/noisy.c
--- linux-2.5.48-kprobes/drivers/char/noisy.c 1969-12-31
16:00:00.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/noisy.c 2002-11-20
18:28:41.000000000 -0800
@@ -0,0 +1,265 @@
+/*
+ * Noisy Interface for Linux
+ *
+ * This driver allows arbitrary printk's to be inserted into
+ * executing kernel code by using the new kprobes interface.
+ *
+ * Copyright (C) 2002 Rusty Lynch <rusty@linux.intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <asm/uaccess.h>
+
+/* exported by arch/YOURARCH/kernel/traps.c */
+extern int valid_kernel_address(unsigned long);
+
+#define MAX_MSG_SIZE 128
+
+/*
+ * Data structures for managing list of probe points
+ */
+static struct list_head probe_list;
+struct nprobe {
+ struct list_head list;
+ struct kprobe probe;
+ char message[MAX_MSG_SIZE + 1];
+ struct attribute attr;
+ struct kobject kobj;
+};
+
+/*
+ * sysfs stuff
+ */
+
+struct subsystem noisy_subsys = {
+ .kobj = { .name = "noisy" },
+};
+
+/*
+ * Probe handler called before the execution of all probe points
+ */
+static void noisy_pre_handler(struct kprobe *p, struct pt_regs *r)
+{
+ struct nprobe *c = container_of(p, struct nprobe, probe);
+ printk(KERN_CRIT "%s: %s\n", __FUNCTION__, c->message);
+}
+
+/*
+ * Probe handler called just after the probed address is single stepped
+ */
+static void noisy_post_handler(struct kprobe *p, struct pt_regs *r,
+ unsigned long flags)
+{
+}
+
+/*
+ * Fault handler that covers the pre_handler, single stepping, and
+ * post_handler executiion.
+ */
+static int noisy_fault_handler(struct kprobe *p, struct pt_regs *r, int
trapnr)
+{
+ /* Let the kernel handle this fault */
+ return 0;
+}
+
+/*
+ * Supported file operations
+ */
+static ssize_t noisy_read(struct file *, char *, size_t, loff_t *);
+static ssize_t noisy_write(struct file *, const char *, size_t, loff_t *);
+static int noisy_open(struct inode *, struct file *);
+static int noisy_release(struct inode *, struct file *);
+
+static struct file_operations noisy_fops = {
+ .owner = THIS_MODULE,
+ .read = noisy_read,
+ .write = noisy_write,
+ .open = noisy_open,
+ .release = noisy_release,
+};
+
+/*
+ * To conserve major numbers, this device uses
+ * the miscdevice subsystem.
+ */
+static struct miscdevice noisy_dev =
+{
+ .minor = NOISY_MINOR,
+ .name = "noisy",
+ .fops = &noisy_fops
+};
+
+static ssize_t noisy_read(struct file *file, char *buf,
+ size_t count, loff_t *ppos)
+{
+ struct nprobe *p;
+ list_for_each_entry(p, &probe_list, list) {
+ printk(KERN_CRIT "%p: %s\n", (p->probe).addr, p->message);
+ }
+
+ return 0;
+}
+
+/*
+ * A kprobe is created for each write operation where write is expecting
+ * a string of the form:
+ * 0xADDRESS MESSAGE
+ *
+ * The kprobe pre_handler will just do a printk using the MESSAGE passed
in.
+ *
+ * All kprobes are unregistered when the node is closed, so I use this
module
+ * like:
+ * $ mknod /dev/noisy c 10 221
+ * $ cat > /dev/noisy
+ * 0xc0107d50 sys_fork
+ *
+ * ... and then go do something that will trigger a sys_fork,
+ * and then control-c to stop the cat process (which
+ * will close the node and therefore stop syslog from further
+ * DOS attacks from this driver)
+ */
+static ssize_t noisy_write(struct file *file, const char *buf, size_t
count,
+ loff_t *ppos)
+{
+ struct nprobe *n = 0;
+ size_t ret = -ENOMEM;
+ char *tmp = 0;
+
+ if (count > MAX_MSG_SIZE) {
+ printk(KERN_CRIT
+ "noisy: Input buffer (%i bytes) is too big!\n",
+ count);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ tmp = kmalloc(count + 1, GFP_KERNEL);
+ if (!tmp) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ n = kmalloc(sizeof(struct nprobe), GFP_KERNEL);
+ if (!n) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ memset(n, '\0', sizeof(struct nprobe));
+
+ if (copy_from_user((void *)tmp, (void *)buf, count)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ tmp[count] = '\0';
+
+ if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ (n->probe).pre_handler = noisy_pre_handler;
+ (n->probe).post_handler = noisy_post_handler;
+ (n->probe).fault_handler = noisy_fault_handler;
+
+
+ if (!valid_kernel_address((unsigned long)(n->probe).addr)) {
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ kobject_init(&(n->kobj));
+ (n->kobj).subsys = &noisy_subsys;
+ snprintf((n->kobj).name, KOBJ_NAME_LEN, "0x%02x",
+ (unsigned int)(n->probe).addr);
+
+ (n->attr).name = n->message;
+ (n->attr).mode = S_IRUGO;
+
+ if (register_kprobe(&(n->probe))) {
+ printk(KERN_CRIT "Unable to register probe at %p\n",
+ (n->probe).addr);
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (kobject_register(&(n->kobj))) {
+ printk(KERN_CRIT "Unable to add probe kobject!\n");
+ unregister_kprobe(&(n->probe));
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (sysfs_create_file(&(n->kobj), &(n->attr))) {
+ printk(KERN_CRIT "Unable to add probe attr file!\n");
+ unregister_kprobe(&(n->probe));
+ kobject_unregister(&(n->kobj));
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ list_add(&(n->list), &probe_list);
+ ret = count;
+
+ out:
+ kfree(tmp);
+ return ret;
+}
+
+static int noisy_open(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static int noisy_release(struct inode *inode, struct file *file)
+{
+ while(!list_empty(&probe_list)) {
+ struct list_head *n = probe_list.next;
+ struct nprobe *p = list_entry(n, struct nprobe, list);
+
+ printk("Releasing probe %p: %s\n",
+ (p->probe).addr, p->message);
+ sysfs_remove_file(&(p->kobj), &(p->attr));
+ kobject_unregister(&(p->kobj));
+ unregister_kprobe(&(p->probe));
+
+ list_del(n);
+ kfree(p);
+ }
+ return 0;
+}
+
+static int __init noisy_init(void)
+{
+ if (misc_register(&noisy_dev))
+ {
+ return -ENODEV;
+ }
+ INIT_LIST_HEAD(&probe_list);
+ subsystem_register(&noisy_subsys);
+ return 0;
+}
+
+static void __exit noisy_exit (void)
+{
+ misc_deregister(&noisy_dev);
+ subsystem_unregister(&noisy_subsys);
+}
+
+module_init(noisy_init);
+module_exit(noisy_exit);
+
+MODULE_AUTHOR("Rusty Lynch");
+MODULE_LICENSE("GPL");
diff -urN linux-2.5.48-kprobes/include/linux/miscdevice.h
linux-2.5.48-kprobes-patched/include/linux/miscdevice.h
--- linux-2.5.48-kprobes/include/linux/miscdevice.h 2002-11-17
20:29:47.000000000 -0800
+++ linux-2.5.48-kprobes-patched/include/linux/miscdevice.h 2002-11-18
15:14:59.000000000 -0800
@@ -23,6 +23,7 @@
#define MICROCODE_MINOR 184
#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
#define MPT_MINOR 220
+#define NOISY_MINOR 221
#define MISC_DYNAMIC_MINOR 255
#define SGI_GRAPHICS_MINOR 146
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 15:39 [BUG] sysfs on 2.5.48 unable to remove files while in use Rusty Lynch @ 2002-11-21 17:04 ` Patrick Mochel 2002-11-21 17:20 ` Rusty Lynch 2002-11-21 20:45 ` Patrick Mochel 1 sibling, 1 reply; 15+ messages in thread From: Patrick Mochel @ 2002-11-21 17:04 UTC (permalink / raw) To: Rusty Lynch; +Cc: Linux Kernel Mailing List Hi there. > My patch to 2.5.48 (with kprobes already applied) can be > grabbed from: > http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff Where is the kprobes patch? Thanks, -pat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 17:04 ` Patrick Mochel @ 2002-11-21 17:20 ` Rusty Lynch 0 siblings, 0 replies; 15+ messages in thread From: Rusty Lynch @ 2002-11-21 17:20 UTC (permalink / raw) To: Patrick Mochel, Rusty Lynch; +Cc: Linux Kernel Mailing List I used the one submitted by Vamsi in http://marc.theaimsgroup.com/?l=linux-kernel&m=103761676628192&w=2 I think IBM has a kprobes webpage with a link to the patch, but I also have copy of the one I used at: http://www.stinkycat.com/patches/patch-kprobes-2.5.48.diff -rustyl ----- Original Message ----- From: "Patrick Mochel" <mochel@osdl.org> To: "Rusty Lynch" <rusty@linux.co.intel.com> Cc: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org> Sent: Thursday, November 21, 2002 9:04 AM Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use > > Hi there. > > > My patch to 2.5.48 (with kprobes already applied) can be > > grabbed from: > > http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff > > Where is the kprobes patch? > > Thanks, > > -pat > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 15:39 [BUG] sysfs on 2.5.48 unable to remove files while in use Rusty Lynch 2002-11-21 17:04 ` Patrick Mochel @ 2002-11-21 20:45 ` Patrick Mochel 2002-11-21 22:07 ` Rusty Lynch 1 sibling, 1 reply; 15+ messages in thread From: Patrick Mochel @ 2002-11-21 20:45 UTC (permalink / raw) To: Rusty Lynch; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 10187 bytes --] Ok, I've had a chance to play with it a bit.. The kprobes patch didn't apply to current -bk. Attached is an updated patch. I also have some comments on your 'noisy' patch, but first.. > I can see this with a little kprobes example code that I have > been playing with that will create entries like: > > /sysfsroot/noisy/0xc0107ae0/sys_fork > > when someone uses that driver to insert a kernel probe > at 0xc0107ae0 that will printk "sys_fork". It seems that having a pure sysfs implementation would be superior, instead of having to use a character device to write to. After looking into this, I realize that a couple of pieces of infrastructure are needed, so I'm working on that, and will post a modified version of your module once I'm done.. > What I have noticed, is that if I create a new probe > (which will create the sysfs entry), open a new shell and > cd to /sysfsroot/noisy/0xc0107ae0, and then use my > driver to remove the probe (which will remove the > sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't > go away after I cd out of the shell. > > >From then on any attempts to create new sysfs entries > do not show up in /sysfsroot/ until I unload/load my driver > again. > > It seems like this could be an issue with some real code > (not just this stupid play code of mine), like maybe hotswap > code that updates sysfs entries. Yes, it was a real bug. I've mucked with the method to do file and directory deletion, and it turns out that the way I was doing it was wrong. I've gone back to mimmicking vfs_unlink() and vfs_rmdir(), which I shouldn't have diverged from in the first place. (doing d_delete() after low-level unlinking, instead of d_invalidate() before it). Appended to this email is my current working patch to sysfs, including fixes discussed and posted yesterday. I've pushed these to Linus, though he appears to be away. I'll push again, with this fix in the next day or so. Please try this patch and let me know if you still experience the problem you're seeing. Concerning your patch: > +config NOISY 'Noisy' seems like such a generic name, and the description doesn't seem to imply its dependence on kprobes. Maybe KPROBES_NOISY? And, you should put it under KPROBES, under DEBUG_KERNEL. > diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c > linux-2.5.48-kprobes-patched/drivers/char/noisy.c > +static struct list_head probe_list; > +struct nprobe { > + struct list_head list; > + struct kprobe probe; > + char message[MAX_MSG_SIZE + 1]; > + struct attribute attr; > + struct kobject kobj; > +}; struct subsystem has a list, and kobject and entry, so you don't have to do it yourself.. > +static ssize_t noisy_write(struct file *file, const char *buf, size_t > count, > + loff_t *ppos) > +{ > + struct nprobe *n = 0; > + size_t ret = -ENOMEM; > + char *tmp = 0; > + > + if (count > MAX_MSG_SIZE) { > + printk(KERN_CRIT > + "noisy: Input buffer (%i bytes) is too big!\n", > + count); > + ret = -EINVAL; > + goto out; > + } > + > + tmp = kmalloc(count + 1, GFP_KERNEL); > + if (!tmp) { > + ret = -ENOMEM; > + goto out; > + } This should be memset to 0. > + n = kmalloc(sizeof(struct nprobe), GFP_KERNEL); > + if (!n) { > + ret = -ENOMEM; > + goto out; > + } > + memset(n, '\0', sizeof(struct nprobe)); > + > + if (copy_from_user((void *)tmp, (void *)buf, count)) { > + ret = -ENOMEM; > + goto out; > + } > + tmp[count] = '\0'; > + > + if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) { > + ret = -EINVAL; > + goto out; > + } You don't free n if you get an error from copy_from_user() or sscanf(). > + (n->attr).name = n->message; > + (n->attr).mode = S_IRUGO; Instead of doing this, you should just declare a static attribute called 'message' and have the contents be the message. This will save you from initializing it each time, and save you 4 bytes in struct nprobe. Something like this should work: struct noisy_attribute { struct attribute attr; ssize_t (*read)(struct nprobe *,char *,size_t,loff_t); }; static ssize_t noisy_attr_show(struct kobject * kobj, struct attribute * attr, char * page, size_t count, loff_t off) { struct nprobe * n = container_of(kobj,struct nprobe,kobj); struct noisy_attribute * noisy_attr = container_of(attr,struct noisy_attribute,attr); ssize_t ret = 0; return noisy_attr->show ? noisy_attr->show(n,page,count,off); } static struct sysfs_ops noisy_sysfs_ops = { .show = noisy_attr_show, }; /* Default Attribute */ static ssize_t noisy_message_read(struct nprobe * n, char * page, size_t count, loff_t off) { return off ? snprintf(page,MAX_MSG_SIZE,"%s",n->message); } static struct noisy_attribute attr_message = { .attr = { .name = "message", .mode = S_IRUGO }, }; static struct attribute * default_attrs[] = { &attr_message.attr, NULL, }; /* * sysfs stuff */ struct subsystem noisy_subsys = { .kobj = { .name = "noisy" }, .default_attrs = default_attrs, .sysfs_ops = noisy_sysfs_ops, }; Note that this will also save you from manually having to create and teardown the file when the kobject is registered and unregistered. -pat --- linux-2.5-virgin/fs/sysfs/inode.c Wed Nov 20 12:13:06 2002 +++ linux-2.5-core/fs/sysfs/inode.c Thu Nov 21 13:51:32 2002 @@ -23,6 +23,8 @@ * Please see Documentation/filesystems/sysfs.txt for more information. */ +#undef DEBUG + #include <linux/list.h> #include <linux/init.h> #include <linux/pagemap.h> @@ -94,9 +96,10 @@ if (!dentry->d_inode) { inode = sysfs_get_inode(dir->i_sb, mode, dev); - if (inode) + if (inode) { d_instantiate(dentry, inode); - else + dget(dentry); + } else error = -ENOSPC; } else error = -EEXIST; @@ -142,17 +145,6 @@ return error; } -static int sysfs_unlink(struct inode *dir, struct dentry *dentry) -{ - struct inode *inode = dentry->d_inode; - down(&inode->i_sem); - dentry->d_inode->i_nlink--; - up(&inode->i_sem); - d_invalidate(dentry); - dput(dentry); - return 0; -} - /** * sysfs_read_file - read an attribute. * @file: file pointer. @@ -173,17 +165,11 @@ sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos) { struct attribute * attr = file->f_dentry->d_fsdata; - struct sysfs_ops * ops = NULL; - struct kobject * kobj; + struct sysfs_ops * ops = file->private_data; + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; unsigned char *page; ssize_t retval = 0; - kobj = file->f_dentry->d_parent->d_fsdata; - if (kobj && kobj->subsys) - ops = kobj->subsys->sysfs_ops; - if (!ops || !ops->show) - return 0; - if (count > PAGE_SIZE) count = PAGE_SIZE; @@ -234,16 +220,11 @@ sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos) { struct attribute * attr = file->f_dentry->d_fsdata; - struct sysfs_ops * ops = NULL; - struct kobject * kobj; + struct sysfs_ops * ops = file->private_data; + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; ssize_t retval = 0; char * page; - kobj = file->f_dentry->d_parent->d_fsdata; - if (kobj && kobj->subsys) - ops = kobj->subsys->sysfs_ops; - if (!ops || !ops->store) - return -EINVAL; page = (char *)__get_free_page(GFP_KERNEL); if (!page) @@ -275,21 +256,72 @@ return retval; } -static int sysfs_open_file(struct inode * inode, struct file * filp) +static int check_perm(struct inode * inode, struct file * file) { - struct kobject * kobj; + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata); + struct attribute * attr = file->f_dentry->d_fsdata; + struct sysfs_ops * ops = NULL; int error = 0; - kobj = filp->f_dentry->d_parent->d_fsdata; - if ((kobj = kobject_get(kobj))) { - struct attribute * attr = filp->f_dentry->d_fsdata; - if (!attr) - error = -EINVAL; - } else - error = -EINVAL; + if (!kobj || !attr) + goto Einval; + + if (kobj->subsys) + ops = kobj->subsys->sysfs_ops; + + /* No sysfs operations, either from having no subsystem, + * or the subsystem have no operations. + */ + if (!ops) + goto Eaccess; + + /* File needs write support. + * The inode's perms must say it's ok, + * and we must have a store method. + */ + if (file->f_mode & FMODE_WRITE) { + + if (!(inode->i_mode & S_IWUGO)) + goto Eperm; + if (!ops->store) + goto Eaccess; + + } + + /* File needs read support. + * The inode's perms must say it's ok, and we there + * must be a show method for it. + */ + if (file->f_mode & FMODE_READ) { + if (!(inode->i_mode & S_IRUGO)) + goto Eperm; + if (!ops->show) + goto Eaccess; + } + + /* No error? Great, store the ops in file->private_data + * for easy access in the read/write functions. + */ + file->private_data = ops; + goto Done; + + Einval: + error = -EINVAL; + goto Done; + Eaccess: + error = -EACCES; + goto Done; + Eperm: + error = -EPERM; + Done: return error; } +static int sysfs_open_file(struct inode * inode, struct file * filp) +{ + return check_perm(inode,filp); +} + static int sysfs_release(struct inode * inode, struct file * filp) { struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata; @@ -541,7 +573,8 @@ /* make sure dentry is really there */ if (victim->d_inode && (victim->d_parent->d_inode == dir->d_inode)) { - sysfs_unlink(dir->d_inode,victim); + simple_unlink(dir->d_inode,victim); + d_delete(victim); } } up(&dir->d_inode->i_sem); @@ -599,19 +632,16 @@ list_for_each_safe(node,next,&dentry->d_subdirs) { struct dentry * d = list_entry(node,struct dentry,d_child); /* make sure dentry is still there */ - if (d->d_inode) - sysfs_unlink(dentry->d_inode,d); + if (d->d_inode) { + simple_unlink(dentry->d_inode,d); + d_delete(dentry); + } } - d_invalidate(dentry); - if (simple_empty(dentry)) { - dentry->d_inode->i_nlink -= 2; - dentry->d_inode->i_flags |= S_DEAD; - parent->d_inode->i_nlink--; - } up(&dentry->d_inode->i_sem); - dput(dentry); - + d_invalidate(dentry); + simple_rmdir(parent->d_inode,dentry); + d_delete(dentry); up(&parent->d_inode->i_sem); dput(parent); } @@ -622,4 +652,3 @@ EXPORT_SYMBOL(sysfs_remove_link); EXPORT_SYMBOL(sysfs_create_dir); EXPORT_SYMBOL(sysfs_remove_dir); -MODULE_LICENSE("GPL"); [-- Attachment #2: Type: TEXT/PLAIN, Size: 6494 bytes --] ===== arch/i386/Kconfig 1.10 vs edited ===== --- 1.10/arch/i386/Kconfig Mon Nov 18 11:52:31 2002 +++ edited/arch/i386/Kconfig Thu Nov 21 11:20:31 2002 @@ -1551,6 +1551,15 @@ Say Y here if you are developing drivers or trying to debug and identify kernel problems. +config KPROBES + bool "Kprobes" + depends on DEBUG_KERNEL + help + Kprobes allows you to trap at almost any kernel address, using + register_kprobe(), and providing a callback function. This is useful + for kernel debugging, non-intrusive instrumentation and testing. If + in doubt, say "N". + config DEBUG_STACKOVERFLOW bool "Check for stack overflows" depends on DEBUG_KERNEL ===== arch/i386/kernel/Makefile 1.29 vs edited ===== --- 1.29/arch/i386/kernel/Makefile Thu Nov 7 11:26:45 2002 +++ edited/arch/i386/kernel/Makefile Thu Nov 21 11:20:31 2002 @@ -29,6 +29,7 @@ obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_EDD) += edd.o obj-$(CONFIG_MODULES) += module.o +obj-$(CONFIG_KPROBES) += kprobes.o EXTRA_AFLAGS := -traditional ===== arch/i386/kernel/entry.S 1.46 vs edited ===== --- 1.46/arch/i386/kernel/entry.S Wed Nov 20 12:56:06 2002 +++ edited/arch/i386/kernel/entry.S Thu Nov 21 11:20:31 2002 @@ -405,9 +405,16 @@ jmp ret_from_exception ENTRY(debug) + pushl $-1 # mark this as an int + SAVE_ALL + movl %esp,%edx pushl $0 - pushl $do_debug - jmp error_code + pushl %edx + call do_debug + addl $8,%esp + testl %eax,%eax + jnz restore_all + jmp ret_from_exception ENTRY(nmi) pushl %eax @@ -420,9 +427,16 @@ RESTORE_ALL ENTRY(int3) + pushl $-1 # mark this as an int + SAVE_ALL + movl %esp,%edx pushl $0 - pushl $do_int3 - jmp error_code + pushl %edx + call do_int3 + addl $8,%esp + testl %eax,%eax + jnz restore_all + jmp ret_from_exception ENTRY(overflow) pushl $0 ===== arch/i386/kernel/traps.c 1.36 vs edited ===== --- 1.36/arch/i386/kernel/traps.c Mon Nov 18 12:10:45 2002 +++ edited/arch/i386/kernel/traps.c Thu Nov 21 11:20:48 2002 @@ -24,6 +24,7 @@ #include <linux/interrupt.h> #include <linux/highmem.h> #include <linux/kallsyms.h> +#include <linux/kprobes.h> #ifdef CONFIG_EISA #include <linux/ioport.h> @@ -404,7 +405,6 @@ } DO_VM86_ERROR_INFO( 0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->eip) -DO_VM86_ERROR( 3, SIGTRAP, "int3", int3) DO_VM86_ERROR( 4, SIGSEGV, "overflow", overflow) DO_VM86_ERROR( 5, SIGSEGV, "bounds", bounds) DO_ERROR_INFO( 6, SIGILL, "invalid operand", invalid_op, ILL_ILLOPN, regs->eip) @@ -420,6 +420,9 @@ { if (regs->eflags & VM_MASK) goto gp_in_vm86; + + if (kprobe_running() && kprobe_fault_handler(regs, 13)) + return; if (!(regs->xcs & 3)) goto gp_in_kernel; @@ -551,6 +554,17 @@ nmi_callback = dummy_nmi_callback; } +asmlinkage int do_int3(struct pt_regs *regs, long error_code) +{ + if (kprobe_handler(regs)) + return 1; + /* This is an interrupt gate, because kprobes wants interrupts + disabled. Normal trap handlers don't. */ + restore_interrupts(regs); + do_trap(3, SIGTRAP, "int3", 1, regs, error_code, NULL); + return 0; +} + /* * Our handling of the processor debug registers is non-trivial. * We do not clear them on entry and exit from the kernel. Therefore @@ -573,7 +587,7 @@ * find every occurrence of the TF bit that could be saved away even * by user code) */ -asmlinkage void do_debug(struct pt_regs * regs, long error_code) +asmlinkage int do_debug(struct pt_regs * regs, long error_code) { unsigned int condition; struct task_struct *tsk = current; @@ -581,6 +595,12 @@ __asm__ __volatile__("movl %%db6,%0" : "=r" (condition)); + if (post_kprobe_handler(regs)) + return 1; + + /* Interrupts not disabled for normal trap handling. */ + restore_interrupts(regs); + /* Mask out spurious debug traps due to lazy DR7 setting */ if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) { if (!tsk->thread.debugreg[7]) @@ -631,15 +651,15 @@ __asm__("movl %0,%%db7" : /* no output */ : "r" (0)); - return; + return 0; debug_vm86: handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1); - return; + return 0; clear_TF: regs->eflags &= ~TF_MASK; - return; + return 0; } /* @@ -803,6 +823,8 @@ struct task_struct *tsk = current; clts(); /* Allow maths ops (or we recurse) */ + if (kprobe_running() && kprobe_fault_handler(®s, 7)) + return; if (!tsk->used_math) init_fpu(tsk); restore_fpu(tsk); @@ -896,9 +918,9 @@ #endif set_trap_gate(0,÷_error); - set_trap_gate(1,&debug); + _set_gate(idt_table+1,14,3,&debug); /* debug trap for kprobes */ set_intr_gate(2,&nmi); - set_system_gate(3,&int3); /* int3-5 can be called from all */ + _set_gate(idt_table+3,14,3,&int3); /* int3-5 can be called from all */ set_system_gate(4,&overflow); set_system_gate(5,&bounds); set_trap_gate(6,&invalid_op); ===== arch/i386/mm/fault.c 1.20 vs edited ===== --- 1.20/arch/i386/mm/fault.c Sat Oct 12 06:11:03 2002 +++ edited/arch/i386/mm/fault.c Thu Nov 21 11:20:31 2002 @@ -19,6 +19,7 @@ #include <linux/init.h> #include <linux/tty.h> #include <linux/vt_kern.h> /* For unblank_screen() */ +#include <linux/kprobes.h> #include <asm/system.h> #include <asm/uaccess.h> @@ -162,6 +163,9 @@ /* get the address */ __asm__("movl %%cr2,%0":"=r" (address)); + + if (kprobe_running() && kprobe_fault_handler(regs, 14)) + return; /* It's safe to allow irq's after cr2 has been saved */ if (regs->eflags & X86_EFLAGS_IF) ===== kernel/Makefile 1.21 vs edited ===== --- 1.21/kernel/Makefile Mon Nov 18 22:51:14 2002 +++ edited/kernel/Makefile Thu Nov 21 11:20:31 2002 @@ -4,7 +4,7 @@ export-objs = signal.o sys.o kmod.o workqueue.o ksyms.o pm.o exec_domain.o \ printk.o platform.o suspend.o dma.o module.o cpufreq.o \ - profile.o rcupdate.o intermodule.o + profile.o rcupdate.o intermodule.o kprobes.o obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \ exit.o itimer.o time.o softirq.o resource.o \ @@ -21,6 +21,7 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq.o obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o +obj-$(CONFIG_KPROBES) += kprobes.o ifneq ($(CONFIG_IA64),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 20:45 ` Patrick Mochel @ 2002-11-21 22:07 ` Rusty Lynch 2002-11-21 23:03 ` Patrick Mochel 0 siblings, 1 reply; 15+ messages in thread From: Rusty Lynch @ 2002-11-21 22:07 UTC (permalink / raw) To: Patrick Mochel, Rusty Lynch; +Cc: Linux Kernel Mailing List > > Ok, I've had a chance to play with it a bit.. > > The kprobes patch didn't apply to current -bk. Attached is an updated > patch. I also have some comments on your 'noisy' patch, but first.. > Very cool. Bitkeeper is one of those things I never bothered with yet (mainly because I feel some comfortable with CVS.) It looks like it might be worth going through ramp-up time on bk to keep up with changes to the kernel. > > I can see this with a little kprobes example code that I have > > been playing with that will create entries like: > > > > /sysfsroot/noisy/0xc0107ae0/sys_fork > > > > when someone uses that driver to insert a kernel probe > > at 0xc0107ae0 that will printk "sys_fork". > > It seems that having a pure sysfs implementation would be superior, > instead of having to use a character device to write to. After looking > into this, I realize that a couple of pieces of infrastructure are needed, > so I'm working on that, and will post a modified version of your module > once I'm done.. I look forward to seeing it. > > > What I have noticed, is that if I create a new probe > > (which will create the sysfs entry), open a new shell and > > cd to /sysfsroot/noisy/0xc0107ae0, and then use my > > driver to remove the probe (which will remove the > > sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't > > go away after I cd out of the shell. > > > > >From then on any attempts to create new sysfs entries > > do not show up in /sysfsroot/ until I unload/load my driver > > again. > > > > It seems like this could be an issue with some real code > > (not just this stupid play code of mine), like maybe hotswap > > code that updates sysfs entries. > > Yes, it was a real bug. I've mucked with the method to do file and > directory deletion, and it turns out that the way I was doing it was > wrong. I've gone back to mimmicking vfs_unlink() and vfs_rmdir(), which I > shouldn't have diverged from in the first place. (doing d_delete() after > low-level unlinking, instead of d_invalidate() before it). > > Appended to this email is my current working patch to sysfs, including > fixes discussed and posted yesterday. I've pushed these to Linus, though > he appears to be away. I'll push again, with this fix in the next day or > so. > > Please try this patch and let me know if you still experience the problem > you're seeing. > It looks like the patch is against the bk tree, and does not apply cleanly to strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but I can see where the last hunk is looking too far up, so I'll try it anyway. In the meantime I'll be grabbing bk so I can see what everyone is looking at. > Concerning your patch: Thanks for taking the time to look at my module. The reason I created it was to get familiar with the 2.5 kernel, so this feedback is very helpful. I'll take a stab at the changes. > > > +config NOISY > > 'Noisy' seems like such a generic name, and the description doesn't seem > to imply its dependence on kprobes. Maybe KPROBES_NOISY? And, you should > put it under KPROBES, under DEBUG_KERNEL. > yea, I don't put much thought into a name, or where would show up in the config. I'll change it. > > diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c > > linux-2.5.48-kprobes-patched/drivers/char/noisy.c > > > +static struct list_head probe_list; > > +struct nprobe { > > + struct list_head list; > > + struct kprobe probe; > > + char message[MAX_MSG_SIZE + 1]; > > + struct attribute attr; > > + struct kobject kobj; > > +}; > > struct subsystem has a list, and kobject and entry, so you don't have to > do it yourself.. > > > +static ssize_t noisy_write(struct file *file, const char *buf, size_t > > count, > > + loff_t *ppos) > > +{ > > + struct nprobe *n = 0; > > + size_t ret = -ENOMEM; > > + char *tmp = 0; > > + > > + if (count > MAX_MSG_SIZE) { > > + printk(KERN_CRIT > > + "noisy: Input buffer (%i bytes) is too big!\n", > > + count); > > + ret = -EINVAL; > > + goto out; > > + } > > + > > + tmp = kmalloc(count + 1, GFP_KERNEL); > > + if (!tmp) { > > + ret = -ENOMEM; > > + goto out; > > + } > > This should be memset to 0. > > > + n = kmalloc(sizeof(struct nprobe), GFP_KERNEL); > > + if (!n) { > > + ret = -ENOMEM; > > + goto out; > > + } > > + memset(n, '\0', sizeof(struct nprobe)); > > + > > + if (copy_from_user((void *)tmp, (void *)buf, count)) { > > + ret = -ENOMEM; > > + goto out; > > + } > > + tmp[count] = '\0'; > > + > > + if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) { > > + ret = -EINVAL; > > + goto out; > > + } > > You don't free n if you get an error from copy_from_user() or sscanf(). > > > + (n->attr).name = n->message; > > + (n->attr).mode = S_IRUGO; > > Instead of doing this, you should just declare a static attribute called > 'message' and have the contents be the message. This will save you from > initializing it each time, and save you 4 bytes in struct nprobe. > Something like this should work: > > > struct noisy_attribute { > struct attribute attr; > ssize_t (*read)(struct nprobe *,char *,size_t,loff_t); > }; > > static ssize_t noisy_attr_show(struct kobject * kobj, struct attribute * > attr, > char * page, size_t count, loff_t off) > { > struct nprobe * n = container_of(kobj,struct nprobe,kobj); > struct noisy_attribute * noisy_attr = container_of(attr,struct > noisy_attribute,attr); > ssize_t ret = 0; > return noisy_attr->show ? noisy_attr->show(n,page,count,off); > } > > static struct sysfs_ops noisy_sysfs_ops = { > .show = noisy_attr_show, > }; > > > /* Default Attribute */ > static ssize_t noisy_message_read(struct nprobe * n, char * page, size_t > count, loff_t off) > { > return off ? snprintf(page,MAX_MSG_SIZE,"%s",n->message); > } > > static struct noisy_attribute attr_message = { > .attr = { .name = "message", .mode = S_IRUGO }, > }; > > static struct attribute * default_attrs[] = { > &attr_message.attr, > NULL, > }; > > /* > * sysfs stuff > */ > > struct subsystem noisy_subsys = { > .kobj = { .name = "noisy" }, > .default_attrs = default_attrs, > .sysfs_ops = noisy_sysfs_ops, > }; > > > Note that this will also save you from manually having to create and > teardown the file when the kobject is registered and unregistered. > > > -pat > > --- linux-2.5-virgin/fs/sysfs/inode.c Wed Nov 20 12:13:06 2002 > +++ linux-2.5-core/fs/sysfs/inode.c Thu Nov 21 13:51:32 2002 > @@ -23,6 +23,8 @@ > * Please see Documentation/filesystems/sysfs.txt for more information. > */ > > +#undef DEBUG > + > #include <linux/list.h> > #include <linux/init.h> > #include <linux/pagemap.h> > @@ -94,9 +96,10 @@ > > if (!dentry->d_inode) { > inode = sysfs_get_inode(dir->i_sb, mode, dev); > - if (inode) > + if (inode) { > d_instantiate(dentry, inode); > - else > + dget(dentry); > + } else > error = -ENOSPC; > } else > error = -EEXIST; > @@ -142,17 +145,6 @@ > return error; > } > > -static int sysfs_unlink(struct inode *dir, struct dentry *dentry) > -{ > - struct inode *inode = dentry->d_inode; > - down(&inode->i_sem); > - dentry->d_inode->i_nlink--; > - up(&inode->i_sem); > - d_invalidate(dentry); > - dput(dentry); > - return 0; > -} > - > /** > * sysfs_read_file - read an attribute. > * @file: file pointer. > @@ -173,17 +165,11 @@ > sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos) > { > struct attribute * attr = file->f_dentry->d_fsdata; > - struct sysfs_ops * ops = NULL; > - struct kobject * kobj; > + struct sysfs_ops * ops = file->private_data; > + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; > unsigned char *page; > ssize_t retval = 0; > > - kobj = file->f_dentry->d_parent->d_fsdata; > - if (kobj && kobj->subsys) > - ops = kobj->subsys->sysfs_ops; > - if (!ops || !ops->show) > - return 0; > - > if (count > PAGE_SIZE) > count = PAGE_SIZE; > > @@ -234,16 +220,11 @@ > sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos) > { > struct attribute * attr = file->f_dentry->d_fsdata; > - struct sysfs_ops * ops = NULL; > - struct kobject * kobj; > + struct sysfs_ops * ops = file->private_data; > + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; > ssize_t retval = 0; > char * page; > > - kobj = file->f_dentry->d_parent->d_fsdata; > - if (kobj && kobj->subsys) > - ops = kobj->subsys->sysfs_ops; > - if (!ops || !ops->store) > - return -EINVAL; > > page = (char *)__get_free_page(GFP_KERNEL); > if (!page) > @@ -275,21 +256,72 @@ > return retval; > } > > -static int sysfs_open_file(struct inode * inode, struct file * filp) > +static int check_perm(struct inode * inode, struct file * file) > { > - struct kobject * kobj; > + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata); > + struct attribute * attr = file->f_dentry->d_fsdata; > + struct sysfs_ops * ops = NULL; > int error = 0; > > - kobj = filp->f_dentry->d_parent->d_fsdata; > - if ((kobj = kobject_get(kobj))) { > - struct attribute * attr = filp->f_dentry->d_fsdata; > - if (!attr) > - error = -EINVAL; > - } else > - error = -EINVAL; > + if (!kobj || !attr) > + goto Einval; > + > + if (kobj->subsys) > + ops = kobj->subsys->sysfs_ops; > + > + /* No sysfs operations, either from having no subsystem, > + * or the subsystem have no operations. > + */ > + if (!ops) > + goto Eaccess; > + > + /* File needs write support. > + * The inode's perms must say it's ok, > + * and we must have a store method. > + */ > + if (file->f_mode & FMODE_WRITE) { > + > + if (!(inode->i_mode & S_IWUGO)) > + goto Eperm; > + if (!ops->store) > + goto Eaccess; > + > + } > + > + /* File needs read support. > + * The inode's perms must say it's ok, and we there > + * must be a show method for it. > + */ > + if (file->f_mode & FMODE_READ) { > + if (!(inode->i_mode & S_IRUGO)) > + goto Eperm; > + if (!ops->show) > + goto Eaccess; > + } > + > + /* No error? Great, store the ops in file->private_data > + * for easy access in the read/write functions. > + */ > + file->private_data = ops; > + goto Done; > + > + Einval: > + error = -EINVAL; > + goto Done; > + Eaccess: > + error = -EACCES; > + goto Done; > + Eperm: > + error = -EPERM; > + Done: > return error; > } > > +static int sysfs_open_file(struct inode * inode, struct file * filp) > +{ > + return check_perm(inode,filp); > +} > + > static int sysfs_release(struct inode * inode, struct file * filp) > { > struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata; > @@ -541,7 +573,8 @@ > /* make sure dentry is really there */ > if (victim->d_inode && > (victim->d_parent->d_inode == dir->d_inode)) { > - sysfs_unlink(dir->d_inode,victim); > + simple_unlink(dir->d_inode,victim); > + d_delete(victim); > } > } > up(&dir->d_inode->i_sem); > @@ -599,19 +632,16 @@ > list_for_each_safe(node,next,&dentry->d_subdirs) { > struct dentry * d = list_entry(node,struct dentry,d_child); > /* make sure dentry is still there */ > - if (d->d_inode) > - sysfs_unlink(dentry->d_inode,d); > + if (d->d_inode) { > + simple_unlink(dentry->d_inode,d); > + d_delete(dentry); > + } > } > > - d_invalidate(dentry); > - if (simple_empty(dentry)) { > - dentry->d_inode->i_nlink -= 2; > - dentry->d_inode->i_flags |= S_DEAD; > - parent->d_inode->i_nlink--; > - } > up(&dentry->d_inode->i_sem); > - dput(dentry); > - > + d_invalidate(dentry); > + simple_rmdir(parent->d_inode,dentry); > + d_delete(dentry); > up(&parent->d_inode->i_sem); > dput(parent); > } > @@ -622,4 +652,3 @@ > EXPORT_SYMBOL(sysfs_remove_link); > EXPORT_SYMBOL(sysfs_create_dir); > EXPORT_SYMBOL(sysfs_remove_dir); > -MODULE_LICENSE("GPL"); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 22:07 ` Rusty Lynch @ 2002-11-21 23:03 ` Patrick Mochel 2002-11-23 0:32 ` Rusty Lynch 2002-11-24 13:04 ` Werner Almesberger 0 siblings, 2 replies; 15+ messages in thread From: Patrick Mochel @ 2002-11-21 23:03 UTC (permalink / raw) To: Rusty Lynch; +Cc: Linux Kernel Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 9360 bytes --] > Very cool. Bitkeeper is one of those things I never bothered > with yet (mainly because I feel some comfortable with CVS.) > It looks like it might be worth going through ramp-up time > on bk to keep up with changes to the kernel. There are also nightly snapshots, though I don't recall the URL off the top of my head.. > > It seems that having a pure sysfs implementation would be superior, > > instead of having to use a character device to write to. After looking > > into this, I realize that a couple of pieces of infrastructure are needed, > > so I'm working on that, and will post a modified version of your module > > once I'm done.. > > I look forward to seeing it. Ok, there are basically two parts to it: the required modifications to sysfs and your updated module. The appended patch adds the support to sysfs be defining struct subsys_attribute for declaring attributes of subsystems themselves, as well as the needed helpers for creation/teardown and read/write. I've attached a replacement for noisy.c that creates a sysfs file named 'ctl' that handles addition and deletion of nprobes, similar to the char device you had created. From the top of the file: /* * Noisy Interface for Linux * * This driver allows arbitrary printk's to be inserted into * executing kernel code by using the new kprobes interface. * A message is attached to an address, and when that address * is reached, the message is printed. * * This uses a sysfs control file to manage a list of probes. * The sysfs directory is at * * /sys/noisy/ * * and the control is named 'ctl'. * * A Noisy Probe can be added by echoing into the file, like: * * $ echo "add <address> <message>" > /sys/noisy/ctl * * where <address> is the address to break on, and <message> * is the message to print when the address is reached. * * Probes can be removed by doing: * * $ echo "del <address>" > /sys/noisy/ctl * * where <address> is the address of the probe. * * The probes themselves get a directory under /sys/noisy/, and * the name of the directory is the address of the probe. Each * probe directory contains one file ('message') that contains * the message to be printed. (More may be added later). */ I've tried to comment the changes and the necessary steps in making this work. [ Note: While I'm generally happy with the way things work, I realize that it still requires a decent amount of overhead in using the sysfs interface (see the file). I'll be looking into shrinking this...] Everything seems to work.. > It looks like the patch is against the bk tree, and does not apply cleanly > to > strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but > I can see where the last hunk is looking too far up, so I'll try it anyway. Ah yes, I forgot there was a patch applied to sysfs since 2.5.48. I've included everything since 2.5.48 in the one I've appended. -pat ===== fs/sysfs/inode.c 1.60 vs 1.67 ===== --- 1.60/fs/sysfs/inode.c Sat Nov 16 15:01:34 2002 +++ 1.67/fs/sysfs/inode.c Thu Nov 21 17:01:53 2002 @@ -23,6 +23,8 @@ * Please see Documentation/filesystems/sysfs.txt for more information. */ +#undef DEBUG + #include <linux/list.h> #include <linux/init.h> #include <linux/pagemap.h> @@ -87,16 +89,17 @@ return inode; } -static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int dev) +static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { struct inode *inode; int error = 0; if (!dentry->d_inode) { inode = sysfs_get_inode(dir->i_sb, mode, dev); - if (inode) + if (inode) { d_instantiate(dentry, inode); - else + dget(dentry); + } else error = -ENOSPC; } else error = -EEXIST; @@ -142,17 +145,43 @@ return error; } -static int sysfs_unlink(struct inode *dir, struct dentry *dentry) +#define to_subsys(k) container_of(k,struct subsystem,kobj) +#define to_sattr(a) container_of(a,struct subsys_attribute,attr) + +/** + * Subsystem file operations. + * These operations allow subsystems to have files that can be + * read/written. + */ +ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr, + char * page, size_t count, loff_t off) { - struct inode *inode = dentry->d_inode; - down(&inode->i_sem); - dentry->d_inode->i_nlink--; - up(&inode->i_sem); - d_invalidate(dentry); - dput(dentry); - return 0; + struct subsystem * s = to_subsys(kobj); + struct subsys_attribute * sattr = to_sattr(attr); + ssize_t ret = 0; + + if (sattr->show) + ret = sattr->show(s,page,count,off); + return ret; } +ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr, + const char * page, size_t count, loff_t off) +{ + struct subsystem * s = to_subsys(kobj); + struct subsys_attribute * sattr = to_sattr(attr); + ssize_t ret = 0; + + if (sattr->store) + ret = sattr->store(s,page,count,off); + return ret; +} + +static struct sysfs_ops subsys_sysfs_ops = { + .show = subsys_attr_show, + .store = subsys_attr_store, +}; + /** * sysfs_read_file - read an attribute. * @file: file pointer. @@ -173,17 +202,11 @@ sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos) { struct attribute * attr = file->f_dentry->d_fsdata; - struct sysfs_ops * ops = NULL; - struct kobject * kobj; + struct sysfs_ops * ops = file->private_data; + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; unsigned char *page; ssize_t retval = 0; - kobj = file->f_dentry->d_parent->d_fsdata; - if (kobj && kobj->subsys) - ops = kobj->subsys->sysfs_ops; - if (!ops || !ops->show) - return 0; - if (count > PAGE_SIZE) count = PAGE_SIZE; @@ -234,16 +257,11 @@ sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos) { struct attribute * attr = file->f_dentry->d_fsdata; - struct sysfs_ops * ops = NULL; - struct kobject * kobj; + struct sysfs_ops * ops = file->private_data; + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; ssize_t retval = 0; char * page; - kobj = file->f_dentry->d_parent->d_fsdata; - if (kobj && kobj->subsys) - ops = kobj->subsys->sysfs_ops; - if (!ops || !ops->store) - return 0; page = (char *)__get_free_page(GFP_KERNEL); if (!page) @@ -275,21 +293,77 @@ return retval; } -static int sysfs_open_file(struct inode * inode, struct file * filp) +static int check_perm(struct inode * inode, struct file * file) { - struct kobject * kobj; + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata); + struct attribute * attr = file->f_dentry->d_fsdata; + struct sysfs_ops * ops = NULL; int error = 0; - kobj = filp->f_dentry->d_parent->d_fsdata; - if ((kobj = kobject_get(kobj))) { - struct attribute * attr = filp->f_dentry->d_fsdata; - if (!attr) - error = -EINVAL; - } else - error = -EINVAL; + if (!kobj || !attr) + goto Einval; + + /* if the kobject has no subsystem, then it is a subsystem itself, + * so give it the subsys_sysfs_ops. + */ + if (kobj->subsys) + ops = kobj->subsys->sysfs_ops; + else + ops = &subsys_sysfs_ops; + + /* No sysfs operations, either from having no subsystem, + * or the subsystem have no operations. + */ + if (!ops) + goto Eaccess; + + /* File needs write support. + * The inode's perms must say it's ok, + * and we must have a store method. + */ + if (file->f_mode & FMODE_WRITE) { + + if (!(inode->i_mode & S_IWUGO)) + goto Eperm; + if (!ops->store) + goto Eaccess; + + } + + /* File needs read support. + * The inode's perms must say it's ok, and we there + * must be a show method for it. + */ + if (file->f_mode & FMODE_READ) { + if (!(inode->i_mode & S_IRUGO)) + goto Eperm; + if (!ops->show) + goto Eaccess; + } + + /* No error? Great, store the ops in file->private_data + * for easy access in the read/write functions. + */ + file->private_data = ops; + goto Done; + + Einval: + error = -EINVAL; + goto Done; + Eaccess: + error = -EACCES; + goto Done; + Eperm: + error = -EPERM; + Done: return error; } +static int sysfs_open_file(struct inode * inode, struct file * filp) +{ + return check_perm(inode,filp); +} + static int sysfs_release(struct inode * inode, struct file * filp) { struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata; @@ -541,7 +615,8 @@ /* make sure dentry is really there */ if (victim->d_inode && (victim->d_parent->d_inode == dir->d_inode)) { - sysfs_unlink(dir->d_inode,victim); + simple_unlink(dir->d_inode,victim); + d_delete(victim); } } up(&dir->d_inode->i_sem); @@ -599,19 +674,16 @@ list_for_each_safe(node,next,&dentry->d_subdirs) { struct dentry * d = list_entry(node,struct dentry,d_child); /* make sure dentry is still there */ - if (d->d_inode) - sysfs_unlink(dentry->d_inode,d); + if (d->d_inode) { + simple_unlink(dentry->d_inode,d); + d_delete(dentry); + } } - d_invalidate(dentry); - if (simple_empty(dentry)) { - dentry->d_inode->i_nlink -= 2; - dentry->d_inode->i_flags |= S_DEAD; - parent->d_inode->i_nlink--; - } up(&dentry->d_inode->i_sem); - dput(dentry); - + d_invalidate(dentry); + simple_rmdir(parent->d_inode,dentry); + d_delete(dentry); up(&parent->d_inode->i_sem); dput(parent); } @@ -622,4 +694,3 @@ EXPORT_SYMBOL(sysfs_remove_link); EXPORT_SYMBOL(sysfs_create_dir); EXPORT_SYMBOL(sysfs_remove_dir); -MODULE_LICENSE("GPL"); [-- Attachment #2: Type: TEXT/PLAIN, Size: 7403 bytes --] /* * Noisy Interface for Linux * * This driver allows arbitrary printk's to be inserted into * executing kernel code by using the new kprobes interface. * A message is attached to an address, and when that address * is reached, the message is printed. * * This uses a sysfs control file to manage a list of probes. * The sysfs directory is at * * /sys/noisy/ * * and the control is named 'ctl'. * * A Noisy Probe can be added by echoing into the file, like: * * $ echo "add <address> <message>" > /sys/noisy/ctl * * where <address> is the address to break on, and <message> * is the message to print when the address is reached. * * Probes can be removed by doing: * * $ echo "del <address>" > /sys/noisy/ctl * * where <address> is the address of the probe. * * The probes themselves get a directory under /sys/noisy/, and * the name of the directory is the address of the probe. Each * probe directory contains one file ('message') that contains * the message to be printed. (More may be added later). * * Copyright (C) 2002 Rusty Lynch <rusty@linux.intel.com> */ #include <linux/module.h> #include <linux/init.h> #include <linux/kprobes.h> #include <linux/slab.h> #include <linux/kobject.h> /* exported by arch/YOURARCH/kernel/traps.c */ extern int valid_kernel_address(unsigned long); #define MAX_MSG_SIZE 128 #define to_nprobe(entry) container_of(entry,struct nprobe,kobj.entry); static DECLARE_MUTEX(noisy_sem); static struct subsystem noisy_subsys; /* * struct nrpobe: data structure for managing list of probe points */ struct nprobe { struct kprobe probe; char message[MAX_MSG_SIZE + 1]; struct kobject kobj; }; /* * Probe handlers. * Only one is used (pre) to print the message out. */ static void noisy_pre_handler(struct kprobe *p, struct pt_regs *r) { struct nprobe *c = container_of(p, struct nprobe, probe); printk(KERN_CRIT "%s: %s\n", __FUNCTION__, c->message); } static void noisy_post_handler(struct kprobe *p, struct pt_regs *r, unsigned long flags) { } static int noisy_fault_handler(struct kprobe *p, struct pt_regs *r, int trapnr) { /* Let the kernel handle this fault */ return 0; } /* * struct noisy_attribute - used for defining probe attributes, with a * typesafe show method. */ struct noisy_attribute { struct attribute attr; ssize_t (*show)(struct nprobe *,char *,size_t,loff_t); }; /** * noisy_attr_show - forward sysfs read call to proper handler. * @kobj: kobject of probe being acessed. * @attr: generic attribute portion of struct noisy_attribute. * @page: buffer to write into. * @count: number of bytes requested. * @off: offset into buffer. * * This is called from sysfs and is necessary to convert the generic * kobject into the right type, and to convert the attribute into the * right attribute type. */ static ssize_t noisy_attr_show(struct kobject * kobj, struct attribute * attr, char * page, size_t count, loff_t off) { struct nprobe * n = container_of(kobj,struct nprobe,kobj); struct noisy_attribute * noisy_attr = container_of(attr,struct noisy_attribute,attr); return noisy_attr->show ? noisy_attr->show(n,page,count,off) : 0; } /* * noisy_sysfs_ops - sysfs operations for struct nprobes. */ static struct sysfs_ops noisy_sysfs_ops = { .show = noisy_attr_show, }; /* Default Attribute - the message to print out. */ static ssize_t noisy_message_read(struct nprobe * n, char * page, size_t count, loff_t off) { return off ? 0: snprintf(page,MAX_MSG_SIZE,"%s\n",n->message); } static struct noisy_attribute attr_message = { .attr = { .name = "message", .mode = S_IRUGO }, .show = noisy_message_read, }; /* Declare array of default attributes to be added when an nprobe is added */ static struct attribute * default_attrs[] = { &attr_message.attr, NULL, }; /* Declare noisy_subsys for addition to sysfs */ static struct subsystem noisy_subsys = { .kobj = { .name = "noisy" }, .default_attrs = default_attrs, .sysfs_ops = &noisy_sysfs_ops, }; /* * noisy ctl attribute. * This is declared as an attribute of the subsystem, and added in * noisy_init(). * * Reading this attribute dumps all the registered noisy probes. * Writing to it either adds or deletes a noisy probe, as described at * the beginning of the file. */ static ssize_t ctl_show(struct subsystem * s, char * page, size_t count, loff_t off) { char * str = page; int ret = 0; down(&noisy_sem); if (!off) { struct list_head * entry, * next; list_for_each_safe(entry,next,&noisy_subsys.list) { struct nprobe * n = to_nprobe(entry); if ((ret + MAX_MSG_SIZE) > PAGE_SIZE) break; str += snprintf(str,PAGE_SIZE - ret,"%p: %s\n", n->probe.addr,n->message); ret = str - page; } } up(&noisy_sem); return ret; } static int add(unsigned long addr, char * message) { struct nprobe * n; int error = 0; if (!valid_kernel_address(addr)) return -EFAULT; n = kmalloc(sizeof(struct nprobe),GFP_KERNEL); if (!n) return -ENOMEM; memset(n,0,sizeof(struct nprobe)); n->probe.addr = (kprobe_opcode_t *)addr; strncpy(n->message,message,MAX_MSG_SIZE); n->probe.pre_handler = noisy_pre_handler; n->probe.post_handler = noisy_post_handler; n->probe.fault_handler = noisy_fault_handler; /* Doing this manually will be unnecessary soon. */ kobject_init(&n->kobj); n->kobj.subsys = &noisy_subsys; snprintf(n->kobj.name, KOBJ_NAME_LEN, "%p", n->probe.addr); if ((error = register_kprobe(&n->probe))) { printk(KERN_ERR "Unable to register probe at %p\n", (n->probe).addr); goto Error; } if ((error = kobject_register(&n->kobj))) { unregister_kprobe(&n->probe); goto Error; } goto Done; Error: kfree(n); Done: return error; } static int del(unsigned long addr) { struct list_head * entry; struct nprobe * n; list_for_each(entry,&noisy_subsys.list) { n = to_nprobe(entry); if ((unsigned long)(n->probe.addr) == addr) { kobject_unregister(&n->kobj); unregister_kprobe(&n->probe); return 0; } } return -EFAULT; } static ssize_t ctl_store(struct subsystem * s, const char * page, size_t count, loff_t off) { char message[MAX_MSG_SIZE]; char ctl[16]; unsigned long addr; int num; int error; int ret = 0; down(&noisy_sem); if (off) goto Done; num = sscanf(page,"%15s 0x%lx %128s",ctl,&addr,message); if (!num) { error = -EINVAL; goto Done; } if (!strcmp(ctl,"add") && num == 3) error = add(addr,message); else if (!strcmp(ctl,"del") && num == 2) error = del(addr); else error = -EINVAL; ret = error ? error : count; Done: up(&noisy_sem); return ret; } static struct subsys_attribute subsys_attr_ctl = { .attr = { .name = "ctl", .mode = 0644 }, .show = ctl_show, .store = ctl_store, }; static int __init noisy_init(void) { subsystem_register(&noisy_subsys); subsys_create_file(&noisy_subsys,&subsys_attr_ctl); return 0; } static void __exit noisy_exit (void) { subsys_remove_file(&noisy_subsys,&subsys_attr_ctl); subsystem_unregister(&noisy_subsys); } module_init(noisy_init); module_exit(noisy_exit); MODULE_AUTHOR("Rusty Lynch"); MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 23:03 ` Patrick Mochel @ 2002-11-23 0:32 ` Rusty Lynch 2002-11-25 17:29 ` Patrick Mochel 2002-11-24 13:04 ` Werner Almesberger 1 sibling, 1 reply; 15+ messages in thread From: Rusty Lynch @ 2002-11-23 0:32 UTC (permalink / raw) To: Patrick Mochel; +Cc: Linux Kernel Mailing List Patrick, Your changes added various subsys_* calls that do not seem to exist in the latest bk tree, or in the patch you sent earlier in this thread. Do you have a local implementation of: subsys_create_file() subsys_remove_file() and defines struct subsys_attribute? -rustyl ----- Original Message ----- From: "Patrick Mochel" <mochel@osdl.org> To: "Rusty Lynch" <rusty@linux.co.intel.com> Cc: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org> Sent: Thursday, November 21, 2002 3:03 PM Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use > > > Very cool. Bitkeeper is one of those things I never bothered > > with yet (mainly because I feel some comfortable with CVS.) > > It looks like it might be worth going through ramp-up time > > on bk to keep up with changes to the kernel. > > There are also nightly snapshots, though I don't recall the URL off the > top of my head.. > > > > It seems that having a pure sysfs implementation would be superior, > > > instead of having to use a character device to write to. After looking > > > into this, I realize that a couple of pieces of infrastructure are needed, > > > so I'm working on that, and will post a modified version of your module > > > once I'm done.. > > > > I look forward to seeing it. > > Ok, there are basically two parts to it: the required modifications to > sysfs and your updated module. > > The appended patch adds the support to sysfs be defining struct > subsys_attribute for declaring attributes of subsystems themselves, as > well as the needed helpers for creation/teardown and read/write. > > I've attached a replacement for noisy.c that creates a sysfs file named > 'ctl' that handles addition and deletion of nprobes, similar to the char > device you had created. From the top of the file: > > /* > * Noisy Interface for Linux > * > * This driver allows arbitrary printk's to be inserted into > * executing kernel code by using the new kprobes interface. > * A message is attached to an address, and when that address > * is reached, the message is printed. > * > * This uses a sysfs control file to manage a list of probes. > * The sysfs directory is at > * > * /sys/noisy/ > * > * and the control is named 'ctl'. > * > * A Noisy Probe can be added by echoing into the file, like: > * > * $ echo "add <address> <message>" > /sys/noisy/ctl > * > * where <address> is the address to break on, and <message> > * is the message to print when the address is reached. > * > * Probes can be removed by doing: > * > * $ echo "del <address>" > /sys/noisy/ctl > * > * where <address> is the address of the probe. > * > * The probes themselves get a directory under /sys/noisy/, and > * the name of the directory is the address of the probe. Each > * probe directory contains one file ('message') that contains > * the message to be printed. (More may be added later). > */ > > I've tried to comment the changes and the necessary steps in making this > work. > > [ Note: While I'm generally happy with the way things work, I realize that > it still requires a decent amount of overhead in using the sysfs interface > (see the file). I'll be looking into shrinking this...] > > Everything seems to work.. > > > It looks like the patch is against the bk tree, and does not apply cleanly > > to > > strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but > > I can see where the last hunk is looking too far up, so I'll try it anyway. > > Ah yes, I forgot there was a patch applied to sysfs since 2.5.48. I've > included everything since 2.5.48 in the one I've appended. > > > -pat > > ===== fs/sysfs/inode.c 1.60 vs 1.67 ===== > --- 1.60/fs/sysfs/inode.c Sat Nov 16 15:01:34 2002 > +++ 1.67/fs/sysfs/inode.c Thu Nov 21 17:01:53 2002 > @@ -23,6 +23,8 @@ > * Please see Documentation/filesystems/sysfs.txt for more information. > */ > > +#undef DEBUG > + > #include <linux/list.h> > #include <linux/init.h> > #include <linux/pagemap.h> > @@ -87,16 +89,17 @@ > return inode; > } > > -static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int dev) > +static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) > { > struct inode *inode; > int error = 0; > > if (!dentry->d_inode) { > inode = sysfs_get_inode(dir->i_sb, mode, dev); > - if (inode) > + if (inode) { > d_instantiate(dentry, inode); > - else > + dget(dentry); > + } else > error = -ENOSPC; > } else > error = -EEXIST; > @@ -142,17 +145,43 @@ > return error; > } > > -static int sysfs_unlink(struct inode *dir, struct dentry *dentry) > +#define to_subsys(k) container_of(k,struct subsystem,kobj) > +#define to_sattr(a) container_of(a,struct subsys_attribute,attr) > + > +/** > + * Subsystem file operations. > + * These operations allow subsystems to have files that can be > + * read/written. > + */ > +ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr, > + char * page, size_t count, loff_t off) > { > - struct inode *inode = dentry->d_inode; > - down(&inode->i_sem); > - dentry->d_inode->i_nlink--; > - up(&inode->i_sem); > - d_invalidate(dentry); > - dput(dentry); > - return 0; > + struct subsystem * s = to_subsys(kobj); > + struct subsys_attribute * sattr = to_sattr(attr); > + ssize_t ret = 0; > + > + if (sattr->show) > + ret = sattr->show(s,page,count,off); > + return ret; > } > > +ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr, > + const char * page, size_t count, loff_t off) > +{ > + struct subsystem * s = to_subsys(kobj); > + struct subsys_attribute * sattr = to_sattr(attr); > + ssize_t ret = 0; > + > + if (sattr->store) > + ret = sattr->store(s,page,count,off); > + return ret; > +} > + > +static struct sysfs_ops subsys_sysfs_ops = { > + .show = subsys_attr_show, > + .store = subsys_attr_store, > +}; > + > /** > * sysfs_read_file - read an attribute. > * @file: file pointer. > @@ -173,17 +202,11 @@ > sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos) > { > struct attribute * attr = file->f_dentry->d_fsdata; > - struct sysfs_ops * ops = NULL; > - struct kobject * kobj; > + struct sysfs_ops * ops = file->private_data; > + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; > unsigned char *page; > ssize_t retval = 0; > > - kobj = file->f_dentry->d_parent->d_fsdata; > - if (kobj && kobj->subsys) > - ops = kobj->subsys->sysfs_ops; > - if (!ops || !ops->show) > - return 0; > - > if (count > PAGE_SIZE) > count = PAGE_SIZE; > > @@ -234,16 +257,11 @@ > sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos) > { > struct attribute * attr = file->f_dentry->d_fsdata; > - struct sysfs_ops * ops = NULL; > - struct kobject * kobj; > + struct sysfs_ops * ops = file->private_data; > + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata; > ssize_t retval = 0; > char * page; > > - kobj = file->f_dentry->d_parent->d_fsdata; > - if (kobj && kobj->subsys) > - ops = kobj->subsys->sysfs_ops; > - if (!ops || !ops->store) > - return 0; > > page = (char *)__get_free_page(GFP_KERNEL); > if (!page) > @@ -275,21 +293,77 @@ > return retval; > } > > -static int sysfs_open_file(struct inode * inode, struct file * filp) > +static int check_perm(struct inode * inode, struct file * file) > { > - struct kobject * kobj; > + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata); > + struct attribute * attr = file->f_dentry->d_fsdata; > + struct sysfs_ops * ops = NULL; > int error = 0; > > - kobj = filp->f_dentry->d_parent->d_fsdata; > - if ((kobj = kobject_get(kobj))) { > - struct attribute * attr = filp->f_dentry->d_fsdata; > - if (!attr) > - error = -EINVAL; > - } else > - error = -EINVAL; > + if (!kobj || !attr) > + goto Einval; > + > + /* if the kobject has no subsystem, then it is a subsystem itself, > + * so give it the subsys_sysfs_ops. > + */ > + if (kobj->subsys) > + ops = kobj->subsys->sysfs_ops; > + else > + ops = &subsys_sysfs_ops; > + > + /* No sysfs operations, either from having no subsystem, > + * or the subsystem have no operations. > + */ > + if (!ops) > + goto Eaccess; > + > + /* File needs write support. > + * The inode's perms must say it's ok, > + * and we must have a store method. > + */ > + if (file->f_mode & FMODE_WRITE) { > + > + if (!(inode->i_mode & S_IWUGO)) > + goto Eperm; > + if (!ops->store) > + goto Eaccess; > + > + } > + > + /* File needs read support. > + * The inode's perms must say it's ok, and we there > + * must be a show method for it. > + */ > + if (file->f_mode & FMODE_READ) { > + if (!(inode->i_mode & S_IRUGO)) > + goto Eperm; > + if (!ops->show) > + goto Eaccess; > + } > + > + /* No error? Great, store the ops in file->private_data > + * for easy access in the read/write functions. > + */ > + file->private_data = ops; > + goto Done; > + > + Einval: > + error = -EINVAL; > + goto Done; > + Eaccess: > + error = -EACCES; > + goto Done; > + Eperm: > + error = -EPERM; > + Done: > return error; > } > > +static int sysfs_open_file(struct inode * inode, struct file * filp) > +{ > + return check_perm(inode,filp); > +} > + > static int sysfs_release(struct inode * inode, struct file * filp) > { > struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata; > @@ -541,7 +615,8 @@ > /* make sure dentry is really there */ > if (victim->d_inode && > (victim->d_parent->d_inode == dir->d_inode)) { > - sysfs_unlink(dir->d_inode,victim); > + simple_unlink(dir->d_inode,victim); > + d_delete(victim); > } > } > up(&dir->d_inode->i_sem); > @@ -599,19 +674,16 @@ > list_for_each_safe(node,next,&dentry->d_subdirs) { > struct dentry * d = list_entry(node,struct dentry,d_child); > /* make sure dentry is still there */ > - if (d->d_inode) > - sysfs_unlink(dentry->d_inode,d); > + if (d->d_inode) { > + simple_unlink(dentry->d_inode,d); > + d_delete(dentry); > + } > } > > - d_invalidate(dentry); > - if (simple_empty(dentry)) { > - dentry->d_inode->i_nlink -= 2; > - dentry->d_inode->i_flags |= S_DEAD; > - parent->d_inode->i_nlink--; > - } > up(&dentry->d_inode->i_sem); > - dput(dentry); > - > + d_invalidate(dentry); > + simple_rmdir(parent->d_inode,dentry); > + d_delete(dentry); > up(&parent->d_inode->i_sem); > dput(parent); > } > @@ -622,4 +694,3 @@ > EXPORT_SYMBOL(sysfs_remove_link); > EXPORT_SYMBOL(sysfs_create_dir); > EXPORT_SYMBOL(sysfs_remove_dir); > -MODULE_LICENSE("GPL"); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-23 0:32 ` Rusty Lynch @ 2002-11-25 17:29 ` Patrick Mochel 0 siblings, 0 replies; 15+ messages in thread From: Patrick Mochel @ 2002-11-25 17:29 UTC (permalink / raw) To: Rusty Lynch; +Cc: Linux Kernel Mailing List On Fri, 22 Nov 2002, Rusty Lynch wrote: > Patrick, > > Your changes added various subsys_* calls that do not seem to > exist in the latest bk tree, or in the patch you sent earlier in this > thread. > > Do you have a local implementation of: > > subsys_create_file() > subsys_remove_file() > > and defines struct subsys_attribute? Woops, I forgot to include that part of the patch. Attached is what I have against current BK, which should also be 2.5.49. This should be everything you need to make it work. Sorry about that. -pat diff -Nru a/fs/sysfs/inode.c b/fs/sysfs/inode.c --- a/fs/sysfs/inode.c Mon Nov 25 11:28:31 2002 +++ b/fs/sysfs/inode.c Mon Nov 25 11:28:31 2002 @@ -145,6 +145,43 @@ return error; } +#define to_subsys(k) container_of(k,struct subsystem,kobj) +#define to_sattr(a) container_of(a,struct subsys_attribute,attr) + +/** + * Subsystem file operations. + * These operations allow subsystems to have files that can be + * read/written. + */ +ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr, + char * page, size_t count, loff_t off) +{ + struct subsystem * s = to_subsys(kobj); + struct subsys_attribute * sattr = to_sattr(attr); + ssize_t ret = 0; + + if (sattr->show) + ret = sattr->show(s,page,count,off); + return ret; +} + +ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr, + const char * page, size_t count, loff_t off) +{ + struct subsystem * s = to_subsys(kobj); + struct subsys_attribute * sattr = to_sattr(attr); + ssize_t ret = 0; + + if (sattr->store) + ret = sattr->store(s,page,count,off); + return ret; +} + +static struct sysfs_ops subsys_sysfs_ops = { + .show = subsys_attr_show, + .store = subsys_attr_store, +}; + /** * sysfs_read_file - read an attribute. * @file: file pointer. @@ -265,9 +302,14 @@ if (!kobj || !attr) goto Einval; - + + /* if the kobject has no subsystem, then it is a subsystem itself, + * so give it the subsys_sysfs_ops. + */ if (kobj->subsys) ops = kobj->subsys->sysfs_ops; + else + ops = &subsys_sysfs_ops; /* No sysfs operations, either from having no subsystem, * or the subsystem have no operations. diff -Nru a/include/linux/kobject.h b/include/linux/kobject.h --- a/include/linux/kobject.h Mon Nov 25 11:28:31 2002 +++ b/include/linux/kobject.h Mon Nov 25 11:28:31 2002 @@ -60,4 +60,13 @@ kobject_put(&s->kobj); } +struct subsys_attribute { + struct attribute attr; + ssize_t (*show)(struct subsystem *, char *, size_t, loff_t); + ssize_t (*store)(struct subsystem *, const char *, size_t, loff_t); +}; + +extern int subsys_create_file(struct subsystem * , struct subsys_attribute *); +extern void subsys_remove_file(struct subsystem * , struct subsys_attribute *); + #endif /* _KOBJECT_H_ */ diff -Nru a/lib/kobject.c b/lib/kobject.c --- a/lib/kobject.c Mon Nov 25 11:28:31 2002 +++ b/lib/kobject.c Mon Nov 25 11:28:31 2002 @@ -229,6 +229,38 @@ } +/** + * subsystem_create_file - export sysfs attribute file. + * @s: subsystem. + * @a: subsystem attribute descriptor. + */ + +int subsys_create_file(struct subsystem * s, struct subsys_attribute * a) +{ + int error = 0; + if (subsys_get(s)) { + error = sysfs_create_file(&s->kobj,&a->attr); + subsys_put(s); + } + return error; +} + + +/** + * subsystem_remove_file - remove sysfs attribute file. + * @s: subsystem. + * @a: attribute desciptor. + */ + +void subsys_remove_file(struct subsystem * s, struct subsys_attribute * a) +{ + if (subsys_get(s)) { + sysfs_remove_file(&s->kobj,&a->attr); + subsys_put(s); + } +} + + EXPORT_SYMBOL(kobject_init); EXPORT_SYMBOL(kobject_register); EXPORT_SYMBOL(kobject_unregister); @@ -238,3 +270,5 @@ EXPORT_SYMBOL(subsystem_init); EXPORT_SYMBOL(subsystem_register); EXPORT_SYMBOL(subsystem_unregister); +EXPORT_SYMBOL(subsys_create_file); +EXPORT_SYMBOL(subsys_remove_file); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-21 23:03 ` Patrick Mochel 2002-11-23 0:32 ` Rusty Lynch @ 2002-11-24 13:04 ` Werner Almesberger 2002-11-24 13:38 ` Alexander Viro 1 sibling, 1 reply; 15+ messages in thread From: Werner Almesberger @ 2002-11-24 13:04 UTC (permalink / raw) To: Patrick Mochel; +Cc: Rusty Lynch, Linux Kernel Mailing List Patrick Mochel wrote: > * This uses a sysfs control file to manage a list of probes. > * The sysfs directory is at > * > * /sys/noisy/ I really like the idea of controlling kprobes through sysfs. However, ... > * A Noisy Probe can be added by echoing into the file, like: > * > * $ echo "add <address> <message>" > /sys/noisy/ctl do you really need a "magic" file for this ? I don't know how well sysfs supports mkdir/rmdir (if at all), but they would seem to provide a much more natural interface. (VFS allows rmdir to remove non-empty directories, so you wouldn't have to rm -r.) I don't think you need probe installation and message setting to be atomic, so you could just assign a unique default message, e.g. the probe address. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-24 13:04 ` Werner Almesberger @ 2002-11-24 13:38 ` Alexander Viro 2002-11-24 14:32 ` Werner Almesberger 2002-11-24 14:51 ` Roman Zippel 0 siblings, 2 replies; 15+ messages in thread From: Alexander Viro @ 2002-11-24 13:38 UTC (permalink / raw) To: Werner Almesberger; +Cc: Patrick Mochel, Rusty Lynch, Linux Kernel Mailing List On Sun, 24 Nov 2002, Werner Almesberger wrote: > do you really need a "magic" file for this ? I don't know how > well sysfs supports mkdir/rmdir (if at all), but they would > seem to provide a much more natural interface. (VFS allows > rmdir to remove non-empty directories, so you wouldn't have > to rm -r.) a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload of races - witness the crap in devfs. b) rmdir of non-empty directory pretty much guarantees another buttload of races. c) mkdir creating non-empty directory or rmdir removing non-empty directory is *ugly*. BTW, Roman's "filesystem" for modules in its current form is vetoed, as far as I'm concerned - this sort of magic is just plain wrong. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-24 13:38 ` Alexander Viro @ 2002-11-24 14:32 ` Werner Almesberger 2002-11-25 18:34 ` Patrick Mochel 2002-11-24 14:51 ` Roman Zippel 1 sibling, 1 reply; 15+ messages in thread From: Werner Almesberger @ 2002-11-24 14:32 UTC (permalink / raw) To: Alexander Viro; +Cc: Patrick Mochel, Rusty Lynch, Linux Kernel Mailing List Alexander Viro wrote: > a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload > of races - witness the crap in devfs. But isn't one of the problems there that kernel and user space can both initiate changes ? What I'm proposing is to let this be driven by user space. You'd of course still have different policies in different parts of the sysfs hierarchy, but would that really be a problem ? > b) rmdir of non-empty directory pretty much guarantees another buttload of > races. Hmm, if the sysfs user could veto file creation while the removal is in progress, behaviour like rm -r would certainly be sufficient. Even if it can't veto it, this may be good enough, since user space is in charge of file creation (after mkdir) anyway. The main reason why I think rmdir with rm -r functionality would be useful in this case, is that, in order to implement a "remove probe" functionality, the application wouldn't have to know what exactly lives in that directory, and it also wouldn't have to implement a real rm -r (or rm xxx/* && rmdir xxx), which I'd also consider a more powerful operation than a simple rmdir. I can see a potential issue with a proliferation of callbacks, though. > c) mkdir creating non-empty directory or rmdir removing non-empty directory > is *ugly*. Uglier than a "magic" file that then goes and creates/removes directories and files in them ? Why don't we echo mkdir foo >. then ? ;-) Besides, there's some precedent: . and .. are also handled implicitly. (And do we really have no file systems that store some meta-data in "magic" files that are created/removed implicitly ?) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-24 14:32 ` Werner Almesberger @ 2002-11-25 18:34 ` Patrick Mochel 2002-11-25 19:13 ` Werner Almesberger 2002-11-25 19:40 ` Roman Zippel 0 siblings, 2 replies; 15+ messages in thread From: Patrick Mochel @ 2002-11-25 18:34 UTC (permalink / raw) To: Werner Almesberger; +Cc: Linux Kernel Mailing List On Sun, 24 Nov 2002, Werner Almesberger wrote: > Alexander Viro wrote: > > a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload > > of races - witness the crap in devfs. > > But isn't one of the problems there that kernel and user space can > both initiate changes ? What I'm proposing is to let this be driven > by user space. You'd of course still have different policies in > different parts of the sysfs hierarchy, but would that really be a > problem ? Yes. The fs hasn't allowed userspace to create and remove files and directories because it's too complex to be worth it. There are a number of races to be dealt with, plus you have to make the filesystem deal with the policy of interpreting the user's request properly. > > c) mkdir creating non-empty directory or rmdir removing non-empty directory > > is *ugly*. > > Uglier than a "magic" file that then goes and creates/removes > directories and files in them ? Why don't we echo mkdir foo >. > then ? ;-) On the surface, that's what's happening, and it's the same way procfs has worked for ages. It's not great, but it works well for specific purposes. The difference is that sysfs directories are tied to kobjects. By writing to the file with the specific syntax, you are telling the module to create an object with the parameters you give. Once the object is registered, a directory is created for it, and it's only removed when the object is unregistered. We don't just randomly create directories. The object type is context dependent, as well as the parameters and the syntax to write to the file. That's the flexibility we can afford. [ From a purist standpoint, it's still a little weird. Al has been telling me for ages that the only proper way to do it is to have each object get a mountpoint instead of a directory. According to him, which I generally take as gospel, it's the only way to do in-kernel filesystems in a race-free way. It's hard, and IIRC, there are several infrastructural changes that must take place in order for it to happen. (I think I still have the IRC log somewhere..) But, it might happen someday.. ] -pat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-25 18:34 ` Patrick Mochel @ 2002-11-25 19:13 ` Werner Almesberger 2002-11-25 19:40 ` Roman Zippel 1 sibling, 0 replies; 15+ messages in thread From: Werner Almesberger @ 2002-11-25 19:13 UTC (permalink / raw) To: Patrick Mochel; +Cc: Linux Kernel Mailing List Patrick Mochel wrote: > The difference is that sysfs directories are tied to kobjects. By writing > to the file with the specific syntax, you are telling the module to create > an object with the parameters you give. Once the object is registered, a > directory is created for it, and it's only removed when the object is > unregistered. We don't just randomly create directories. Yes, and I think that's perfect. All I'm suggesting is to use mkdir/rmdir to make the creation/removal request, and then use whatever is convenient for ensuring that things stay unique (i.e. resolve it either at the VFS, kprobes, or "glue" level). I fully agree that creating interrelated objects at two distinct places in the kernel and then trying to "synchronize" them leads to madness. (*) (*) Actually, someone in academia who works with protocol validations, and has a bit too much time on his or her hands, should once try to find an elegant way of linking this type of in-kernel dependencies to a validation tool like Spin. Right now, the process for validating such a mess is still to abstract the design into a model in a language like Promela (very slick, but decidedly "alien"), SyncC++ (C-ish, but still too far from real kernel code), etc., and to validate the model. I've done this on some occasions (and discovered interesting bugs in the process), but the pain threshold required is a bit too high to suggest this as a general procedure. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-25 18:34 ` Patrick Mochel 2002-11-25 19:13 ` Werner Almesberger @ 2002-11-25 19:40 ` Roman Zippel 1 sibling, 0 replies; 15+ messages in thread From: Roman Zippel @ 2002-11-25 19:40 UTC (permalink / raw) To: Patrick Mochel Cc: Werner Almesberger, Alexander Viro, Linux Kernel Mailing List Hi, On Mon, 25 Nov 2002, Patrick Mochel wrote: > [ From a purist standpoint, it's still a little weird. Al has been telling > me for ages that the only proper way to do it is to have each object get a > mountpoint instead of a directory. According to him, which I generally > take as gospel, it's the only way to do in-kernel filesystems in a > race-free way. It's hard, and IIRC, there are several infrastructural > changes that must take place in order for it to happen. (I think I still > have the IRC log somewhere..) But, it might happen someday.. ] What advantages do mountpoints have compared to normal directories? bye, Roman ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] sysfs on 2.5.48 unable to remove files while in use 2002-11-24 13:38 ` Alexander Viro 2002-11-24 14:32 ` Werner Almesberger @ 2002-11-24 14:51 ` Roman Zippel 1 sibling, 0 replies; 15+ messages in thread From: Roman Zippel @ 2002-11-24 14:51 UTC (permalink / raw) To: Alexander Viro Cc: Werner Almesberger, Patrick Mochel, Rusty Lynch, Linux Kernel Mailing List Hi, On Sun, 24 Nov 2002, Alexander Viro wrote: > c) mkdir creating non-empty directory or rmdir removing non-empty directory > is *ugly*. BTW, Roman's "filesystem" for modules in its current form is > vetoed, as far as I'm concerned - this sort of magic is just plain wrong. I'm open to alternative ideas. If the contents of the dir again is controlled by the fs itself, I don't really see a problem. You need some way to tell the kernel "please create this object and give me an interface to control it". bye, Roman ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2002-11-25 19:33 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-11-21 15:39 [BUG] sysfs on 2.5.48 unable to remove files while in use Rusty Lynch 2002-11-21 17:04 ` Patrick Mochel 2002-11-21 17:20 ` Rusty Lynch 2002-11-21 20:45 ` Patrick Mochel 2002-11-21 22:07 ` Rusty Lynch 2002-11-21 23:03 ` Patrick Mochel 2002-11-23 0:32 ` Rusty Lynch 2002-11-25 17:29 ` Patrick Mochel 2002-11-24 13:04 ` Werner Almesberger 2002-11-24 13:38 ` Alexander Viro 2002-11-24 14:32 ` Werner Almesberger 2002-11-25 18:34 ` Patrick Mochel 2002-11-25 19:13 ` Werner Almesberger 2002-11-25 19:40 ` Roman Zippel 2002-11-24 14:51 ` Roman Zippel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.