* [GIT PATCH] sysfs fixes for 2.6.22-rc4
@ 2007-06-12 23:21 Greg KH
2007-06-12 23:22 ` [PATCH 1/3] sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses Greg Kroah-Hartman
2007-06-13 3:24 ` [GIT PATCH] sysfs fixes for 2.6.22-rc4 Eric Sandeen
0 siblings, 2 replies; 6+ messages in thread
From: Greg KH @ 2007-06-12 23:21 UTC (permalink / raw)
To: Linus Torvalds, Andrew Morton; +Cc: linux-kernel, htejun, sandeen
Here are some sysfs fixes for 2.6.22-rc4
They are based on a set of patches from Tejun that have been in the -mm
tree for a while and fix a nasty sysfs problem that people have been
hitting in the real world (Google has hit this a lot and spent a lot of
time and effort in tracking this down, I'd really like to say thanks for
the help here.)
Tejun and Eric have backported a small set of patches that fix this for
your current tree, with the larger, more intrusive patches queued up for
after 2.6.22 is out.
Tejun and I have beat on these patches a lot and have not found any
problems. I know it's late in the series for them, but under the
circumstances, it seems reasonable.
Please pull from:
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6.git/
Patches will be sent as a follow-on to this message to lkml for people
to see.
thanks,
greg k-h
fs/sysfs/dir.c | 38 +++++++++++++++++++++++++++++++-------
fs/sysfs/inode.c | 21 +++++++++++++++++++--
fs/sysfs/mount.c | 1 +
fs/sysfs/sysfs.h | 2 ++
4 files changed, 53 insertions(+), 9 deletions(-)
---------------
Eric Sandeen (1):
sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses
Tejun Heo (2):
sysfs: fix condition check in sysfs_drop_dentry()
sysfs: fix race condition around sd->s_dentry, take#2
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/3] sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses
2007-06-12 23:21 [GIT PATCH] sysfs fixes for 2.6.22-rc4 Greg KH
@ 2007-06-12 23:22 ` Greg Kroah-Hartman
2007-06-12 23:22 ` [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry() Greg Kroah-Hartman
2007-06-13 3:24 ` [GIT PATCH] sysfs fixes for 2.6.22-rc4 Eric Sandeen
1 sibling, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2007-06-12 23:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Eric Sandeen, Eric Sandeen, Tejun Heo, Greg Kroah-Hartman
From: Eric Sandeen <sandeen@sandeen.net>
Backport of
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch
For regular files in sysfs, sysfs_readdir wants to traverse
sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number.
But, the dentry can be reclaimed under memory pressure, and there is
no synchronization with readdir. This patch follows Tejun's scheme of
allocating and storing an inode number in the new s_ino member of a
sysfs_dirent, when dirents are created, and retrieving it from there
for readdir, so that the pointer chain doesn't have to be traversed.
Tejun's upstream patch uses a new-ish "ida" allocator which brings
along some extra complexity; this -stable patch has a brain-dead
incrementing counter which does not guarantee uniqueness, but because
sysfs doesn't hash inodes as iunique expects, uniqueness wasn't
guaranteed today anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/sysfs/dir.c | 16 +++++++++++-----
fs/sysfs/inode.c | 1 +
fs/sysfs/mount.c | 1 +
fs/sysfs/sysfs.h | 1 +
4 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 85a6686..17a8191 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -30,6 +30,14 @@ static struct dentry_operations sysfs_dentry_ops = {
.d_iput = sysfs_d_iput,
};
+static unsigned int sysfs_inode_counter;
+ino_t sysfs_get_inum(void)
+{
+ if (unlikely(sysfs_inode_counter < 3))
+ sysfs_inode_counter = 3;
+ return sysfs_inode_counter++;
+}
+
/*
* Allocates a new sysfs_dirent and links it to the parent sysfs_dirent
*/
@@ -41,6 +49,7 @@ static struct sysfs_dirent * __sysfs_new_dirent(void * element)
if (!sd)
return NULL;
+ sd->s_ino = sysfs_get_inum();
atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_event, 1);
INIT_LIST_HEAD(&sd->s_children);
@@ -509,7 +518,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
switch (i) {
case 0:
- ino = dentry->d_inode->i_ino;
+ ino = parent_sd->s_ino;
if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0)
break;
filp->f_pos++;
@@ -538,10 +547,7 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
name = sysfs_get_name(next);
len = strlen(name);
- if (next->s_dentry)
- ino = next->s_dentry->d_inode->i_ino;
- else
- ino = iunique(sysfs_sb, 2);
+ ino = next->s_ino;
if (filldir(dirent, name, len, filp->f_pos, ino,
dt_type(next)) < 0)
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index bdd30e7..082e2d4 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -141,6 +141,7 @@ struct inode * sysfs_new_inode(mode_t mode, struct sysfs_dirent * sd)
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
+ inode->i_ino = sd->s_ino;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);
if (sd->s_iattr) {
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 23a48a3..00ab912 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -33,6 +33,7 @@ static struct sysfs_dirent sysfs_root = {
.s_element = NULL,
.s_type = SYSFS_ROOT,
.s_iattr = NULL,
+ .s_ino = 1,
};
static void sysfs_clear_inode(struct inode *inode)
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a77c57e..1966e1a 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -5,6 +5,7 @@ struct sysfs_dirent {
void * s_element;
int s_type;
umode_t s_mode;
+ ino_t s_ino;
struct dentry * s_dentry;
struct iattr * s_iattr;
atomic_t s_event;
--
1.5.2.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry()
2007-06-12 23:22 ` [PATCH 1/3] sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses Greg Kroah-Hartman
@ 2007-06-12 23:22 ` Greg Kroah-Hartman
2007-06-12 23:22 ` [PATCH 3/3] sysfs: fix race condition around sd->s_dentry, take#2 Greg Kroah-Hartman
0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2007-06-12 23:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Tejun Heo, Greg Kroah-Hartman
From: Tejun Heo <htejun@gmail.com>
The condition check doesn't make much sense as it basically always
succeeds. This causes NULL dereferencing on certain cases. It seems
that parentheses are put in the wrong place. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/sysfs/inode.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 082e2d4..38bbe07 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -252,7 +252,7 @@ void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
- if (!(d_unhashed(dentry) && dentry->d_inode)) {
+ if (!d_unhashed(dentry) && dentry->d_inode) {
inode = dentry->d_inode;
spin_lock(&inode->i_lock);
__iget(inode);
--
1.5.2.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 3/3] sysfs: fix race condition around sd->s_dentry, take#2
2007-06-12 23:22 ` [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry() Greg Kroah-Hartman
@ 2007-06-12 23:22 ` Greg Kroah-Hartman
0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2007-06-12 23:22 UTC (permalink / raw)
To: linux-kernel; +Cc: Tejun Heo, Greg Kroah-Hartman
From: Tejun Heo <htejun@gmail.com>
Allowing attribute and symlink dentries to be reclaimed means
sd->s_dentry can change dynamically. However, updates to the field
are unsynchronized leading to race conditions. This patch adds
sysfs_lock and use it to synchronize updates to sd->s_dentry.
Due to the locking around ->d_iput, the check in sysfs_drop_dentry()
is complex. sysfs_lock only protect sd->s_dentry pointer itself. The
validity of the dentry is protected by dcache_lock, so whether dentry
is alive or not can only be tested while holding both locks.
This is minimal backport of sysfs_drop_dentry() rewrite in devel
branch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/sysfs/dir.c | 22 ++++++++++++++++++++--
fs/sysfs/inode.c | 18 +++++++++++++++++-
fs/sysfs/sysfs.h | 1 +
3 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 17a8191..c4342a0 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -13,14 +13,26 @@
#include "sysfs.h"
DECLARE_RWSEM(sysfs_rename_sem);
+spinlock_t sysfs_lock = SPIN_LOCK_UNLOCKED;
static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;
if (sd) {
- BUG_ON(sd->s_dentry != dentry);
- sd->s_dentry = NULL;
+ /* sd->s_dentry is protected with sysfs_lock. This
+ * allows sysfs_drop_dentry() to dereference it.
+ */
+ spin_lock(&sysfs_lock);
+
+ /* The dentry might have been deleted or another
+ * lookup could have happened updating sd->s_dentry to
+ * point the new dentry. Ignore if it isn't pointing
+ * to this dentry.
+ */
+ if (sd->s_dentry == dentry)
+ sd->s_dentry = NULL;
+ spin_unlock(&sysfs_lock);
sysfs_put(sd);
}
iput(inode);
@@ -247,7 +259,10 @@ static int sysfs_attach_attr(struct sysfs_dirent * sd, struct dentry * dentry)
}
dentry->d_fsdata = sysfs_get(sd);
+ /* protect sd->s_dentry against sysfs_d_iput */
+ spin_lock(&sysfs_lock);
sd->s_dentry = dentry;
+ spin_unlock(&sysfs_lock);
error = sysfs_create(dentry, (attr->mode & S_IALLUGO) | S_IFREG, init);
if (error) {
sysfs_put(sd);
@@ -269,7 +284,10 @@ static int sysfs_attach_link(struct sysfs_dirent * sd, struct dentry * dentry)
int err = 0;
dentry->d_fsdata = sysfs_get(sd);
+ /* protect sd->s_dentry against sysfs_d_iput */
+ spin_lock(&sysfs_lock);
sd->s_dentry = dentry;
+ spin_unlock(&sysfs_lock);
err = sysfs_create(dentry, S_IFLNK|S_IRWXUGO, init_symlink);
if (!err) {
dentry->d_op = &sysfs_dentry_ops;
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 38bbe07..5266eec 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -246,9 +246,23 @@ static inline void orphan_all_buffers(struct inode *node)
*/
void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
{
- struct dentry * dentry = sd->s_dentry;
+ struct dentry *dentry = NULL;
struct inode *inode;
+ /* We're not holding a reference to ->s_dentry dentry but the
+ * field will stay valid as long as sysfs_lock is held.
+ */
+ spin_lock(&sysfs_lock);
+ spin_lock(&dcache_lock);
+
+ /* dget dentry if it's still alive */
+ if (sd->s_dentry && sd->s_dentry->d_inode)
+ dentry = dget_locked(sd->s_dentry);
+
+ spin_unlock(&dcache_lock);
+ spin_unlock(&sysfs_lock);
+
+ /* drop dentry */
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
@@ -268,6 +282,8 @@ void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
}
+
+ dput(dentry);
}
}
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 1966e1a..502c949 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -33,6 +33,7 @@ extern const unsigned char * sysfs_get_name(struct sysfs_dirent *sd);
extern void sysfs_drop_dentry(struct sysfs_dirent *sd, struct dentry *parent);
extern int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
+extern spinlock_t sysfs_lock;
extern struct rw_semaphore sysfs_rename_sem;
extern struct super_block * sysfs_sb;
extern const struct file_operations sysfs_dir_operations;
--
1.5.2.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [GIT PATCH] sysfs fixes for 2.6.22-rc4
2007-06-12 23:21 [GIT PATCH] sysfs fixes for 2.6.22-rc4 Greg KH
2007-06-12 23:22 ` [PATCH 1/3] sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses Greg Kroah-Hartman
@ 2007-06-13 3:24 ` Eric Sandeen
1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2007-06-13 3:24 UTC (permalink / raw)
To: Greg KH; +Cc: Linus Torvalds, Andrew Morton, linux-kernel, htejun
Greg KH wrote:
> Eric Sandeen (1):
> sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses
To be fair, Tejun wrote that one too, I just backported & tried to
simplify it a bit for -stable. :)
> Tejun Heo (2):
> sysfs: fix condition check in sysfs_drop_dentry()
> sysfs: fix race condition around sd->s_dentry, take#2
>
Glad to see these are making their way out, we've had people hit these
bugs too. And they beat on this basic patchset quite a lot, with good
results.
Thanks!
-Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCHSET 2.6.22-rc4] sysfs: fix race conditions
@ 2007-06-11 5:01 Tejun Heo
2007-06-11 5:03 ` [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry() Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2007-06-11 5:01 UTC (permalink / raw)
To: linux-kernel, greg, akpm, cebbert, sandeen, maneesh, cs
[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]
Hello, all.
Currently, there are several race conditions around dentry/inode
reclamation.
a. sysfs_dirent->s_dentry dereferencing in sysfs_readdir()
b. sysfs_dirent->s_dentry dereferencing in sysfs_drop_dentry()
c. sysfs_dirent->s_dentry clearing in sysfs_d_iput()
All aboves are done without synchronization and can cause oops if the
timing is right (or wrong).
These race conditions are difficult to trigger but with the attached
patch (sysfs-races.patch) and the following commands running
parallelly, all three are reliably reproducible (you may have to
change timings or disable others to trigger specific one).
1. while true; do insmod drivers/ata/libata.ko; insmod drivers/ata/ata_piix.ko; sleep 1; rmmod ata_piix; rmmod libata; sleep 1; echo -n . ; done
2. while true; do find /sys -type f | xargs cat > /dev/null; echo -n .; sleep 1; done
3. while true; do find /sys/class/scsi_disk -type f | sort | xargs cat > /dev/null; echo -n .; sleep 1; done
4. while true; do umount /sys; sleep 1; mount /sys; sleep 1; echo -n .; done
#1 assumes there are several devices attached to ata_piix controller.
Change #1 to any module or command which creates and removes sysfs
nodes repeatedly and adjust #3 to cat those sysfs nodes.
All known race conditions are fixed in the current -mm. #a is
replaced by adding sd->s_ino and allocating unique ino with ida. #b
and #c are fixed during sysfs_drop_dentry() rewrite. However, those
changes were too big to apply to 2.6.22-rcX or any stable branches.
This patchset contains three minimal backports of fixes in -mm. With
all patches in the patchset and sysfs-races.patch applied, kernel
survived ~20 hours of stress test without any problem.
Thanks.
--
tejun
[-- Attachment #2: sysfs-races.patch --]
[-- Type: text/x-diff, Size: 1187 bytes --]
---
fs/sysfs/dir.c | 7 +++++--
fs/sysfs/inode.c | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)
Index: work1/fs/sysfs/dir.c
===================================================================
--- work1.orig/fs/sysfs/dir.c
+++ work1/fs/sysfs/dir.c
@@ -18,6 +18,8 @@ static void sysfs_d_iput(struct dentry *
{
struct sysfs_dirent * sd = dentry->d_fsdata;
+ udelay(10);
+
if (sd) {
BUG_ON(sd->s_dentry != dentry);
sd->s_dentry = NULL;
@@ -538,9 +540,10 @@ static int sysfs_readdir(struct file * f
name = sysfs_get_name(next);
len = strlen(name);
- if (next->s_dentry)
+ if (next->s_dentry) {
+ msleep(1);
ino = next->s_dentry->d_inode->i_ino;
- else
+ } else
ino = iunique(sysfs_sb, 2);
if (filldir(dirent, name, len, filp->f_pos, ino,
Index: work1/fs/sysfs/inode.c
===================================================================
--- work1.orig/fs/sysfs/inode.c
+++ work1/fs/sysfs/inode.c
@@ -248,6 +248,8 @@ void sysfs_drop_dentry(struct sysfs_dire
struct dentry * dentry = sd->s_dentry;
struct inode *inode;
+ msleep(1);
+
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry()
2007-06-11 5:01 [PATCHSET 2.6.22-rc4] sysfs: fix race conditions Tejun Heo
@ 2007-06-11 5:03 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2007-06-11 5:03 UTC (permalink / raw)
To: linux-kernel, greg, akpm, cebbert, sandeen, maneesh, cs
The condition check doesn't make much sense as it basically always
succeeds. This causes NULL dereferencing on certain cases. It seems
that parentheses are put in the wrong place. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
---
fs/sysfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: work1/fs/sysfs/inode.c
===================================================================
--- work1.orig/fs/sysfs/inode.c
+++ work1/fs/sysfs/inode.c
@@ -252,7 +252,7 @@ void sysfs_drop_dentry(struct sysfs_dire
if (dentry) {
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
- if (!(d_unhashed(dentry) && dentry->d_inode)) {
+ if (!d_unhashed(dentry) && dentry->d_inode) {
inode = dentry->d_inode;
spin_lock(&inode->i_lock);
__iget(inode);
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-06-13 3:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-12 23:21 [GIT PATCH] sysfs fixes for 2.6.22-rc4 Greg KH
2007-06-12 23:22 ` [PATCH 1/3] sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses Greg Kroah-Hartman
2007-06-12 23:22 ` [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry() Greg Kroah-Hartman
2007-06-12 23:22 ` [PATCH 3/3] sysfs: fix race condition around sd->s_dentry, take#2 Greg Kroah-Hartman
2007-06-13 3:24 ` [GIT PATCH] sysfs fixes for 2.6.22-rc4 Eric Sandeen
-- strict thread matches above, loose matches on Subject: below --
2007-06-11 5:01 [PATCHSET 2.6.22-rc4] sysfs: fix race conditions Tejun Heo
2007-06-11 5:03 ` [PATCH 2/3] sysfs: fix condition check in sysfs_drop_dentry() Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.