* [patch 0/2] Sorted seq_file
@ 2007-08-12 15:08 Mathieu Desnoyers
2007-08-12 15:08 ` [patch 1/2] Seq_file add support for sorted list Mathieu Desnoyers
2007-08-12 15:08 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
0 siblings, 2 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-12 15:08 UTC (permalink / raw)
To: akpm, linux-kernel
Hi Andrew,
I ran in a race regarding seq_file listing of modules. Here is a fix proposed to
get a coherent output even if modules are being unloaded at the same time.
It is needed by the linux kernel markers.
It applies on top of 2.6.23-rc2-mm2 in this order:
seq_file_sorted.patch
module.c-sort-module-list.patch
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 1/2] Seq_file add support for sorted list
2007-08-12 15:08 [patch 0/2] Sorted seq_file Mathieu Desnoyers
@ 2007-08-12 15:08 ` Mathieu Desnoyers
2007-08-12 15:08 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-12 15:08 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: seq_file_sorted.patch --]
[-- Type: text/plain, Size: 3172 bytes --]
Add support for sorted list in seq_file. It aims at changing the way
/proc/modules and kallsyms iterates on the module list to remove a race between
module unload and module/symbol listing.
The list is sorted by ascending list_head pointer address.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
fs/seq_file.c | 40 ++++++++++++++++++++++++++++++++++++++++
include/linux/seq_file.h | 20 ++++++++++++++++++++
2 files changed, 60 insertions(+)
Index: linux-2.6-lttng/fs/seq_file.c
===================================================================
--- linux-2.6-lttng.orig/fs/seq_file.c 2007-07-25 01:40:13.000000000 -0400
+++ linux-2.6-lttng/fs/seq_file.c 2007-07-25 02:47:55.000000000 -0400
@@ -485,3 +485,43 @@ struct list_head *seq_list_next(void *v,
}
EXPORT_SYMBOL(seq_list_next);
+
+struct list_head *seq_sorted_list_start(struct list_head *head, void *pos)
+{
+ struct list_head *lh;
+
+ list_for_each(lh, head)
+ if ((void*)lh >= pos)
+ return lh;
+ return NULL;
+}
+
+EXPORT_SYMBOL(seq_sorted_list_start);
+
+struct list_head *seq_sorted_list_start_head(struct list_head *head, void *pos)
+{
+ struct list_head *lh;
+
+ if (!pos)
+ return head;
+ list_for_each(lh, head)
+ if ((void*)lh >= pos)
+ return lh->prev;
+ return NULL;
+}
+
+EXPORT_SYMBOL(seq_sorted_list_start_head);
+
+struct list_head *seq_sorted_list_next(void *p, struct list_head *head,
+ void **ppos)
+{
+ struct list_head *lh;
+ void *next;
+
+ lh = ((struct list_head *)p)->next;
+ next = (lh == head) ? NULL : lh;
+ *ppos = next ? next : (void*)-1UL;
+ return next;
+}
+
+EXPORT_SYMBOL(seq_sorted_list_next);
Index: linux-2.6-lttng/include/linux/seq_file.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/seq_file.h 2007-07-25 01:40:13.000000000 -0400
+++ linux-2.6-lttng/include/linux/seq_file.h 2007-07-25 02:41:22.000000000 -0400
@@ -61,5 +61,25 @@ extern struct list_head *seq_list_start_
extern struct list_head *seq_list_next(void *v, struct list_head *head,
loff_t *ppos);
+/*
+ * Helpers for iteration over a list sorted by ascending head pointer address.
+ * To be used in contexts where preemption cannot be disabled to insure to
+ * continue iteration on a modified list starting at the same location where it
+ * stopped, or at a following location. It insures that the lost information
+ * will only be in elements added/removed from the list between iterations.
+ * void *pos is only used to get the next list element and may not be a valid
+ * list_head anymore when given to seq_sorted_list_start() or
+ * seq_sorted_list_start_head().
+ */
+extern struct list_head *seq_sorted_list_start(struct list_head *head,
+ void *pos);
+extern struct list_head *seq_sorted_list_start_head(struct list_head *head,
+ void *pos);
+/*
+ * next must be called with an existing p node
+ */
+extern struct list_head *seq_sorted_list_next(void *p, struct list_head *head,
+ void **ppos);
+
#endif
#endif
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-12 15:08 [patch 0/2] Sorted seq_file Mathieu Desnoyers
2007-08-12 15:08 ` [patch 1/2] Seq_file add support for sorted list Mathieu Desnoyers
@ 2007-08-12 15:08 ` Mathieu Desnoyers
[not found] ` <20070815033945.GA13134@mail.ustc.edu.cn>
1 sibling, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-12 15:08 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 3510 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-08-12 10:05:39.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-08-12 10:22:05.000000000 -0400
@@ -63,8 +63,10 @@ extern int module_sysfs_initialized;
/* If this is set, the section belongs in the init part of the module */
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
-/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+/*
+ * List of modules, protected by module_mutex or preempt_disable
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
DEFINE_MUTEX(module_mutex);
LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2134,10 +2136,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2402,12 +2418,14 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ if (!*pos)
+ m->private = NULL;
+ return seq_sorted_list_start(&modules, m->private);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, &m->private);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815033945.GA13134@mail.ustc.edu.cn>
@ 2007-08-15 3:39 ` Fengguang Wu
2007-08-15 4:18 ` Al Viro
1 sibling, 0 replies; 23+ messages in thread
From: Fengguang Wu @ 2007-08-15 3:39 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, Randy Dunlap, Martin Bligh
On Sun, Aug 12, 2007 at 11:08:46AM -0400, Mathieu Desnoyers wrote:
>
> static void *m_next(struct seq_file *m, void *p, loff_t *pos)
> {
> - return seq_list_next(p, &modules, pos);
> + return seq_sorted_list_next(p, &modules, &m->private);
> }
In theory it is not safe to use something other than the passed in
*pos as an position indicator. Because seq_file do not always call
->next() to advance to the next item. Look at seq_file.c, it sometimes
increase the pos/index directly! Which also prevents pos to skip
forward, which is preferred in your case.
The attached patch tries to fix it.
The seq_file.c is so twisted!
Fengguang
===
seqfile: remove seq_file's assumption about iterators
The seq_file implementation has some hardcoded index++/pos++ lines,
which assumes iterators to be *continuous* integers.
This patch replaces the index++ lines with calls to m->next(), so that
seq_file users can freely use discrete forms of iterators, such as
ascending addresses.
Cc: Randy Dunlap <rddunlap@osdl.org>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn>
---
fs/seq_file.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
--- linux-2.6.22-git15.orig/fs/seq_file.c
+++ linux-2.6.22-git15/fs/seq_file.c
@@ -13,6 +13,8 @@
#include <asm/uaccess.h>
#include <asm/page.h>
+#define SEQFILE_SHOW_NEXT LONG_MAX
+
/**
* seq_open - initialize sequential file
* @file: file we initialize
@@ -93,6 +95,7 @@ ssize_t seq_read(struct file *file, char
/* if not empty - flush it first */
if (m->count) {
n = min(m->count, size);
+ BUG_ON(m->from == SEQFILE_SHOW_NEXT);
err = copy_to_user(buf, m->buf + m->from, n);
if (err)
goto Efault;
@@ -102,7 +105,7 @@ ssize_t seq_read(struct file *file, char
buf += n;
copied += n;
if (!m->count)
- m->index++;
+ m->from = SEQFILE_SHOW_NEXT;
if (!size)
goto Done;
}
@@ -113,9 +116,11 @@ ssize_t seq_read(struct file *file, char
err = PTR_ERR(p);
if (!p || IS_ERR(p))
break;
- err = m->op->show(m, p);
- if (err)
- break;
+ if (m->from != SEQFILE_SHOW_NEXT) {
+ err = m->op->show(m, p);
+ if (err)
+ break;
+ }
if (m->count < m->size)
goto Fill;
m->op->stop(m, p);
@@ -156,7 +161,7 @@ Fill:
if (m->count)
m->from = n;
else
- pos++;
+ m->from = SEQFILE_SHOW_NEXT;
m->index = pos;
Done:
if (!copied)
@@ -211,12 +216,9 @@ static int traverse(struct seq_file *m,
}
pos += m->count;
m->count = 0;
- if (pos == offset) {
- index++;
- m->index = index;
- break;
- }
p = m->op->next(m, p, &index);
+ if (pos == offset)
+ break;
}
m->op->stop(m, p);
return error;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815033945.GA13134@mail.ustc.edu.cn>
2007-08-15 3:39 ` Fengguang Wu
@ 2007-08-15 4:18 ` Al Viro
[not found] ` <20070815063741.GB5175@mail.ustc.edu.cn>
1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2007-08-15 4:18 UTC (permalink / raw)
To: Mathieu Desnoyers, akpm, linux-kernel, Randy Dunlap, Martin Bligh
On Wed, Aug 15, 2007 at 11:39:45AM +0800, Fengguang Wu wrote:
> On Sun, Aug 12, 2007 at 11:08:46AM -0400, Mathieu Desnoyers wrote:
> >
> > static void *m_next(struct seq_file *m, void *p, loff_t *pos)
> > {
> > - return seq_list_next(p, &modules, pos);
> > + return seq_sorted_list_next(p, &modules, &m->private);
> > }
>
> In theory it is not safe to use something other than the passed in
> *pos as an position indicator. Because seq_file do not always call
> ->next() to advance to the next item. Look at seq_file.c, it sometimes
> increase the pos/index directly! Which also prevents pos to skip
> forward, which is preferred in your case.
>
> The attached patch tries to fix it.
>
> The seq_file.c is so twisted!
>
> Fengguang
> ===
>
> seqfile: remove seq_file's assumption about iterators
>
> The seq_file implementation has some hardcoded index++/pos++ lines,
> which assumes iterators to be *continuous* integers.
What the fuck? It assumes no such thing and a lot of iterators are
nothing like integers. What are you talking about?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815063741.GB5175@mail.ustc.edu.cn>
@ 2007-08-15 6:37 ` Fengguang Wu
2007-08-15 6:53 ` Al Viro
1 sibling, 0 replies; 23+ messages in thread
From: Fengguang Wu @ 2007-08-15 6:37 UTC (permalink / raw)
To: Al Viro; +Cc: Mathieu Desnoyers, akpm, linux-kernel, Randy Dunlap, Martin Bligh
On Wed, Aug 15, 2007 at 05:18:45AM +0100, Al Viro wrote:
>> On Wed, Aug 15, 2007 at 11:39:45AM +0800, Fengguang Wu wrote:
>> seqfile: remove seq_file's assumption about iterators
>>
>> The seq_file implementation has some hardcoded index++/pos++ lines,
>> which assumes iterators to be *continuous* integers.
>
>What the fuck? It assumes no such thing and a lot of iterators are
>nothing like integers. What are you talking about?
Oh I used the wrong term...
Take for example this function from lwn.net:
static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
loff_t *spos = (loff_t *) v;
*pos = ++(*spos);
return spos;
}
I mean 'pos' is sometimes increased in ct_seq_next(), and sometimes from
seq_file.c/seq_read(), too. Thus we cannot reliably do this:
*pos = (*spos) + some_variable_offset;
You are referring to spos as the iterator, are you?
Maybe I'm wrong. I'll dip more into it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815063741.GB5175@mail.ustc.edu.cn>
2007-08-15 6:37 ` Fengguang Wu
@ 2007-08-15 6:53 ` Al Viro
[not found] ` <20070815083645.GA6544@mail.ustc.edu.cn>
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Al Viro @ 2007-08-15 6:53 UTC (permalink / raw)
To: Mathieu Desnoyers, akpm, linux-kernel, Randy Dunlap, Martin Bligh
On Wed, Aug 15, 2007 at 02:37:41PM +0800, Fengguang Wu wrote:
> static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
> {
> loff_t *spos = (loff_t *) v;
> *pos = ++(*spos);
> return spos;
> }
>
> I mean 'pos' is sometimes increased in ct_seq_next(), and sometimes from
> seq_file.c/seq_read(), too. Thus we cannot reliably do this:
>
> *pos = (*spos) + some_variable_offset;
Of course we can. These guys can be sparse - note that ->start()
takes a pointer, and for a good reason. ->start(m, p, pos) should
get the first entry with offset >= *pos (or NULL if we are done) and
set *pos accordingly.
That m->index++ is "we are done with the partial, step just past it, so
that ->start() will pick the first real entry after it the next time it's
called".
For dense case we don't need to update *pos in ->start() - either
we already have one with offset == *pos (and no update is needed),
or we are finished and should return NULL.
However, we have every right to live with sparse offsets; prototype of
->start() had been done the way it's done exactly to allow that kind
of use.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815083645.GA6544@mail.ustc.edu.cn>
@ 2007-08-15 8:36 ` Fengguang Wu
0 siblings, 0 replies; 23+ messages in thread
From: Fengguang Wu @ 2007-08-15 8:36 UTC (permalink / raw)
To: Al Viro; +Cc: Mathieu Desnoyers, akpm, linux-kernel, Randy Dunlap, Martin Bligh
On Wed, Aug 15, 2007 at 07:53:01AM +0100, Al Viro wrote:
> On Wed, Aug 15, 2007 at 02:37:41PM +0800, Fengguang Wu wrote:
> > static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
> > {
> > loff_t *spos = (loff_t *) v;
> > *pos = ++(*spos);
> > return spos;
> > }
> >
> > I mean 'pos' is sometimes increased in ct_seq_next(), and sometimes from
> > seq_file.c/seq_read(), too. Thus we cannot reliably do this:
> >
> > *pos = (*spos) + some_variable_offset;
>
> Of course we can. These guys can be sparse - note that ->start()
> takes a pointer, and for a good reason. ->start(m, p, pos) should
> get the first entry with offset >= *pos (or NULL if we are done) and
> set *pos accordingly.
>
> That m->index++ is "we are done with the partial, step just past it, so
> that ->start() will pick the first real entry after it the next time it's
> called".
>
> For dense case we don't need to update *pos in ->start() - either
> we already have one with offset == *pos (and no update is needed),
> or we are finished and should return NULL.
>
> However, we have every right to live with sparse offsets; prototype of
> ->start() had been done the way it's done exactly to allow that kind
> of use.
So sparse offsets are supported, with some special cares on ->start.
My case is to scan the address space in ranges. The "object" is the
start offset of a range:
__________________#######______________________#############__________
^start ^start
Now the solution can be:
- ->show shows the current range
- ->next seeks to next range
- ->start must *also* do the seek
The last requirement is made clear by you, a fact I refused to accept :)
My old concept was that a ->next should be called to move pages
forward after a new start.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815084625.GA18892@mail.ustc.edu.cn>
@ 2007-08-15 8:46 ` Fengguang Wu
2007-08-18 15:56 ` Mathieu Desnoyers
2007-08-18 16:09 ` [PATCH] Fix f_version type: should be u64 instead of unsigned long Mathieu Desnoyers
2 siblings, 0 replies; 23+ messages in thread
From: Fengguang Wu @ 2007-08-15 8:46 UTC (permalink / raw)
To: Al Viro; +Cc: Mathieu Desnoyers, akpm, linux-kernel, Randy Dunlap, Martin Bligh
Al Viro,
Does this sounds like a good fix?
===
seq_file version fixes
- f_version is 'unsigned long', it's pointless to do more than that.
- m->version should not be reset when we are bumping up the buf size.
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/seq_file.c | 1 -
include/linux/seq_file.h | 2 +-
2 files changed, 1 insertion(+), 2 deletions(-)
--- linux-2.6.23-rc3.orig/include/linux/seq_file.h
+++ linux-2.6.23-rc3/include/linux/seq_file.h
@@ -18,7 +18,7 @@ struct seq_file {
size_t from;
size_t count;
loff_t index;
- loff_t version;
+ unsigned long version;
struct mutex lock;
const struct seq_operations *op;
void *private;
--- linux-2.6.23-rc3.orig/fs/seq_file.c
+++ linux-2.6.23-rc3/fs/seq_file.c
@@ -134,7 +134,6 @@ ssize_t seq_read(struct file *file, char
if (!m->buf)
goto Enomem;
m->count = 0;
- m->version = 0;
}
m->op->stop(m, p);
m->count = 0;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
[not found] ` <20070815084625.GA18892@mail.ustc.edu.cn>
2007-08-15 8:46 ` Fengguang Wu
@ 2007-08-18 15:56 ` Mathieu Desnoyers
2007-08-18 16:09 ` [PATCH] Fix f_version type: should be u64 instead of unsigned long Mathieu Desnoyers
2 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-18 15:56 UTC (permalink / raw)
To: Fengguang Wu; +Cc: Al Viro, akpm, linux-kernel, Randy Dunlap, Martin Bligh
* Fengguang Wu (wfg@mail.ustc.edu.cn) wrote:
> Al Viro,
>
> Does this sounds like a good fix?
> ===
>
> seq_file version fixes
>
> - f_version is 'unsigned long', it's pointless to do more than that.
Hrm, this is weird...
fs.h:
struct inode
u64 i_version;
and
struct file
unsigned long f_version;
Users do:
fs/ext3/dir.c:
if (filp->f_version != inode->i_version) {
So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.
I think the problem is the f_version field type, not in seq_file at all.
I'll prepare a patch for this.
> - m->version should not be reset when we are bumping up the buf size.
>
Hrmmmm, what is this twisted use of versions anyway ?!?
If I look at other version users elsewhere in the kernel, they mostly
do:
repeat:
f_version = i_version
do something
if (f_version != i_version)
repeat;
So they can see if the underlying inode has changed during the
operation. seq_file does it completely the other way around:
m->version = f_version;
do something
and, well, versions are never really used at all.
If we want to use versioning there, we should keep a version counter
associated with the ressource pointed used by seq_files that would be
incremented each time the data structures are modified.
Then, in the read side, we could sanely do:
seq open():
f_version = current version
seq read():
repeat:
m->version = f_version;
do something
if (m->version != current version)
repeat;
This would only make sure that the given read operation has consistent
data. It would not certify data consistency across reads.
I have looked at fs/proc.c/task_mmu.c use of m->version, and I think it
is just really weird. I think the proper way to do it would be to put
the last_addr in a field of a structure to which m->private would point
to.
Mathieu
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> ---
> fs/seq_file.c | 1 -
> include/linux/seq_file.h | 2 +-
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> --- linux-2.6.23-rc3.orig/include/linux/seq_file.h
> +++ linux-2.6.23-rc3/include/linux/seq_file.h
> @@ -18,7 +18,7 @@ struct seq_file {
> size_t from;
> size_t count;
> loff_t index;
> - loff_t version;
> + unsigned long version;
> struct mutex lock;
> const struct seq_operations *op;
> void *private;
> --- linux-2.6.23-rc3.orig/fs/seq_file.c
> +++ linux-2.6.23-rc3/fs/seq_file.c
> @@ -134,7 +134,6 @@ ssize_t seq_read(struct file *file, char
> if (!m->buf)
> goto Enomem;
> m->count = 0;
> - m->version = 0;
> }
> m->op->stop(m, p);
> m->count = 0;
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Fix f_version type: should be u64 instead of unsigned long
[not found] ` <20070815084625.GA18892@mail.ustc.edu.cn>
2007-08-15 8:46 ` Fengguang Wu
2007-08-18 15:56 ` Mathieu Desnoyers
@ 2007-08-18 16:09 ` Mathieu Desnoyers
2 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-18 16:09 UTC (permalink / raw)
To: akpm, Al Viro; +Cc: linux-kernel, Randy Dunlap, Martin Bligh, Fengguang Wu
Fix f_version type: should be u64 instead of long
There is a type inconsistency between struct inode i_version and struct file
f_version.
fs.h:
struct inode
u64 i_version;
and
struct file
unsigned long f_version;
Users do:
fs/ext3/dir.c:
if (filp->f_version != inode->i_version) {
So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.
This patch changes the f_version type to u64, and updates the users accordingly.
It applies to 2.6.23-rc2-mm2.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Martin Bligh <mbligh@google.com>
CC: Randy Dunlap <rddunlap@osdl.org>
CC: Al Viro <viro@ftp.linux.org.uk>
---
fs/ext3/dir.c | 2 +-
fs/ext4/dir.c | 2 +-
fs/ocfs2/dir.c | 2 +-
fs/proc/base.c | 4 ++--
include/linux/fs.h | 2 +-
include/linux/seq_file.h | 2 +-
6 files changed, 7 insertions(+), 7 deletions(-)
Index: linux-2.6-lttng/include/linux/fs.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/fs.h 2007-08-18 11:05:10.000000000 -0400
+++ linux-2.6-lttng/include/linux/fs.h 2007-08-18 11:05:56.000000000 -0400
@@ -799,7 +799,7 @@ struct file {
unsigned int f_uid, f_gid;
struct file_ra_state f_ra;
- unsigned long f_version;
+ u64 f_version;
#ifdef CONFIG_SECURITY
void *f_security;
#endif
Index: linux-2.6-lttng/fs/ext3/dir.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext3/dir.c 2007-08-18 11:08:25.000000000 -0400
+++ linux-2.6-lttng/fs/ext3/dir.c 2007-08-18 11:08:32.000000000 -0400
@@ -210,7 +210,7 @@ revalidate:
* not the directory has been modified
* during the copy operation.
*/
- unsigned long version = filp->f_version;
+ u64 version = filp->f_version;
error = filldir(dirent, de->name,
de->name_len,
Index: linux-2.6-lttng/fs/ext4/dir.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/dir.c 2007-08-18 11:08:47.000000000 -0400
+++ linux-2.6-lttng/fs/ext4/dir.c 2007-08-18 11:08:53.000000000 -0400
@@ -210,7 +210,7 @@ revalidate:
* not the directory has been modified
* during the copy operation.
*/
- unsigned long version = filp->f_version;
+ u64 version = filp->f_version;
error = filldir(dirent, de->name,
de->name_len,
Index: linux-2.6-lttng/fs/ocfs2/dir.c
===================================================================
--- linux-2.6-lttng.orig/fs/ocfs2/dir.c 2007-08-18 11:09:23.000000000 -0400
+++ linux-2.6-lttng/fs/ocfs2/dir.c 2007-08-18 11:09:30.000000000 -0400
@@ -183,7 +183,7 @@ revalidate:
* not the directory has been modified
* during the copy operation.
*/
- unsigned long version = filp->f_version;
+ u64 version = filp->f_version;
unsigned char d_type = DT_UNKNOWN;
if (de->file_type < OCFS2_FT_MAX)
Index: linux-2.6-lttng/fs/proc/base.c
===================================================================
--- linux-2.6-lttng.orig/fs/proc/base.c 2007-08-18 11:11:21.000000000 -0400
+++ linux-2.6-lttng/fs/proc/base.c 2007-08-18 11:11:39.000000000 -0400
@@ -2570,7 +2570,7 @@ static int proc_task_readdir(struct file
/* f_version caches the tgid value that the last readdir call couldn't
* return. lseek aka telldir automagically resets f_version to 0.
*/
- tid = filp->f_version;
+ tid = (int)filp->f_version;
filp->f_version = 0;
for (task = first_tid(leader, tid, pos - 2);
task;
@@ -2579,7 +2579,7 @@ static int proc_task_readdir(struct file
if (proc_task_fill_cache(filp, dirent, filldir, task, tid) < 0) {
/* returning this tgid failed, save it as the first
* pid for the next readir call */
- filp->f_version = tid;
+ filp->f_version = (u64)tid;
put_task_struct(task);
break;
}
Index: linux-2.6-lttng/include/linux/seq_file.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/seq_file.h 2007-08-18 11:12:39.000000000 -0400
+++ linux-2.6-lttng/include/linux/seq_file.h 2007-08-18 11:12:50.000000000 -0400
@@ -18,7 +18,7 @@ struct seq_file {
size_t from;
size_t count;
loff_t index;
- loff_t version;
+ u64 version;
struct mutex lock;
const struct seq_operations *op;
void *private;
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-20 20:26 [patch 0/2] Sort module list Mathieu Desnoyers
@ 2007-08-20 20:26 ` Mathieu Desnoyers
2007-08-21 0:08 ` Rusty Russell
0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-20 20:26 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 3510 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-08-12 10:05:39.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-08-12 10:22:05.000000000 -0400
@@ -63,8 +63,10 @@ extern int module_sysfs_initialized;
/* If this is set, the section belongs in the init part of the module */
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
-/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+/*
+ * List of modules, protected by module_mutex or preempt_disable
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
DEFINE_MUTEX(module_mutex);
LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2134,10 +2136,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2402,12 +2418,14 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ if (!*pos)
+ m->private = NULL;
+ return seq_sorted_list_start(&modules, m->private);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, &m->private);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-20 20:26 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
@ 2007-08-21 0:08 ` Rusty Russell
2007-08-24 15:45 ` Mathieu Desnoyers
0 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2007-08-21 0:08 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Mon, 2007-08-20 at 16:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (module.c-sort-module-list.patch)
> A race that appears both in /proc/modules and in kallsyms: if, between the
> seq file reads, the process is put to sleep and at this moment a module is
> or removed from the module list, the listing will skip an amount of
> modules/symbols corresponding to the amount of elements present in the unloaded
> module, but at the current position in the list if the iteration is located
> after the removed module.
>
> The cleanest way I found to deal with this problem is to sort the module list.
> We can then keep the old struct module * as the old iterator, knowing the it may
> be removed between the seq file reads, but we only use it as "get next". If it
> is not present in the module list, the next pointer will be used.
>
> By doing this, removing a given module will now only fuzz the output related to
> this specific module, not any random module anymore. Since modprobe uses
> /proc/modules, it might be important to make sure multiple concurrent running
> modprobes won't interfere with each other.
You've reduced, but not eliminated, the problem. A new module inserted
is quite likely to reuse the same address.
I don't have a real problem with this patch, but I'm wondering if the
problem is theoretical or demonstrated.
Rusty.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Sort module list - use ppos instead of m->private
2007-08-15 6:53 ` Al Viro
[not found] ` <20070815083645.GA6544@mail.ustc.edu.cn>
[not found] ` <20070815084625.GA18892@mail.ustc.edu.cn>
@ 2007-08-24 15:39 ` Mathieu Desnoyers
2007-08-24 23:34 ` Andrew Morton
2 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-24 15:39 UTC (permalink / raw)
To: Al Viro; +Cc: akpm, linux-kernel, Randy Dunlap, Martin Bligh
Sort modules list - use ppos instead of m->private
When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.
Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
fs/seq_file.c | 17 +++++++++--------
include/linux/seq_file.h | 4 ++--
kernel/module.c | 6 ++----
3 files changed, 13 insertions(+), 14 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-08-24 11:34:36.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-08-24 11:35:40.000000000 -0400
@@ -2418,14 +2418,12 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- if (!*pos)
- m->private = NULL;
- return seq_sorted_list_start(&modules, m->private);
+ return seq_sorted_list_start(&modules, (void*)(long)pos);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_sorted_list_next(p, &modules, &m->private);
+ return seq_sorted_list_next(p, &modules, (void**)pos);
}
static void m_stop(struct seq_file *m, void *p)
Index: linux-2.6-lttng/include/linux/seq_file.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/seq_file.h 2007-08-24 11:34:01.000000000 -0400
+++ linux-2.6-lttng/include/linux/seq_file.h 2007-08-24 11:34:59.000000000 -0400
@@ -73,9 +73,9 @@ extern struct list_head *seq_list_next(v
* seq_sorted_list_start_head().
*/
extern struct list_head *seq_sorted_list_start(struct list_head *head,
- void *pos);
+ void **ppos);
extern struct list_head *seq_sorted_list_start_head(struct list_head *head,
- void *pos);
+ void **ppos);
/*
* next must be called with an existing p node
*/
Index: linux-2.6-lttng/fs/seq_file.c
===================================================================
--- linux-2.6-lttng.orig/fs/seq_file.c 2007-08-24 11:34:01.000000000 -0400
+++ linux-2.6-lttng/fs/seq_file.c 2007-08-24 11:34:59.000000000 -0400
@@ -501,27 +501,28 @@ struct list_head *seq_list_next(void *v,
EXPORT_SYMBOL(seq_list_next);
-struct list_head *seq_sorted_list_start(struct list_head *head, void *pos)
+struct list_head *seq_sorted_list_start(struct list_head *head, void **ppos)
{
struct list_head *lh;
list_for_each(lh, head)
- if ((void*)lh >= pos)
- return lh;
+ if ((void*)lh >= *ppos)
+ return *ppos = lh;
return NULL;
}
EXPORT_SYMBOL(seq_sorted_list_start);
-struct list_head *seq_sorted_list_start_head(struct list_head *head, void *pos)
+struct list_head *seq_sorted_list_start_head(struct list_head *head,
+ void **ppos)
{
struct list_head *lh;
- if (!pos)
- return head;
+ if (!ppos)
+ return *ppos = head;
list_for_each(lh, head)
- if ((void*)lh >= pos)
- return lh->prev;
+ if ((void*)lh >= *ppos)
+ return *ppos = lh->prev;
return NULL;
}
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-21 0:08 ` Rusty Russell
@ 2007-08-24 15:45 ` Mathieu Desnoyers
2007-08-25 21:44 ` Rusty Russell
0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-24 15:45 UTC (permalink / raw)
To: Rusty Russell; +Cc: akpm, linux-kernel
* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Mon, 2007-08-20 at 16:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (module.c-sort-module-list.patch)
> > A race that appears both in /proc/modules and in kallsyms: if, between the
> > seq file reads, the process is put to sleep and at this moment a module is
> > or removed from the module list, the listing will skip an amount of
> > modules/symbols corresponding to the amount of elements present in the unloaded
> > module, but at the current position in the list if the iteration is located
> > after the removed module.
> >
> > The cleanest way I found to deal with this problem is to sort the module list.
> > We can then keep the old struct module * as the old iterator, knowing the it may
> > be removed between the seq file reads, but we only use it as "get next". If it
> > is not present in the module list, the next pointer will be used.
> >
> > By doing this, removing a given module will now only fuzz the output related to
> > this specific module, not any random module anymore. Since modprobe uses
> > /proc/modules, it might be important to make sure multiple concurrent running
> > modprobes won't interfere with each other.
>
> You've reduced, but not eliminated, the problem. A new module inserted
> is quite likely to reuse the same address.
>
Hi Rusty,
Please tell me if I'm wrong, but I think it would not be a problem:
- seq_read() makes sure that a buffer large enough is available so that
m_show() can fully extract and print the information relative to 1
module.
- m_start() and m_stop() takes the module_mutex, therefore within one
seq_read(), once m_start has returned, the struct module * that we
have is valid and will be consistent during the whole seq_read
operation.
- If a module is removed, and then a different one is inserted at the
same address, while we are between two seq_reads for this given struct
module address, the seq_reads will copy to user-space the information
that is still in the buffer for the _first_ struct module encountered,
not the new one.
- After that, iteration will continue to the new struct module address,
effectively skipping the newly inserted module.
> I don't have a real problem with this patch, but I'm wondering if the
> problem is theoretical or demonstrated.
>
Small test module for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
(compiled as ./module)
Actual test (kernel 2.6.23-rc3):
dijkstra:~# lsmod
Module Size Used by
pl2303 18564 0
usbserial 29032 1 pl2303
ppdev 7844 0
sky2 37476 0
skge 36368 0
rtc 10104 0
snd_hda_intel 265628 0
(here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000
We see the the 2nd line is garbage.
Now, with my patch applied:
(here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
We see that since the rtc line was already in the buffer, it has been
printed completely.
(here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000
We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.
Note that this test is done with the
"Sort modules list - use ppos instead of m->private"
patch applied.
Thanks for the review,
Mathieu
> Rusty.
>
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sort module list - use ppos instead of m->private
2007-08-24 15:39 ` [PATCH] Sort module list - use ppos instead of m->private Mathieu Desnoyers
@ 2007-08-24 23:34 ` Andrew Morton
2007-08-25 0:05 ` Mathieu Desnoyers
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2007-08-24 23:34 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Al Viro, linux-kernel, Martin Bligh, Randy.Dunlap
On Fri, 24 Aug 2007 11:39:33 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> Sort modules list - use ppos instead of m->private
>
> When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
> incremented by seq_read() directly and no "next" callback is called when going
> to the next module.
>
> Therefore, use ppos instead of m->private to deal with the fact that this index
> is incremented directly to pass to the next module in seq_read() after the
> buffer has been emptied.
Confused. What problem is this patch fixing? I'm guessing that something
is going wrong when /proc/modules is read one-byte-at-a-time?
<tests that>
<nope>
Better changelogs, please.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sort module list - use ppos instead of m->private
2007-08-24 23:34 ` Andrew Morton
@ 2007-08-25 0:05 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-25 0:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: Al Viro, linux-kernel, Martin Bligh, Randy.Dunlap
* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Fri, 24 Aug 2007 11:39:33 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>
> > Sort modules list - use ppos instead of m->private
> >
> > When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
> > incremented by seq_read() directly and no "next" callback is called when going
> > to the next module.
> >
> > Therefore, use ppos instead of m->private to deal with the fact that this index
> > is incremented directly to pass to the next module in seq_read() after the
> > buffer has been emptied.
>
> Confused. What problem is this patch fixing? I'm guessing that something
> is going wrong when /proc/modules is read one-byte-at-a-time?
>
> <tests that>
>
> <nope>
>
> Better changelogs, please.
>
Ok, will append this in the changelog (I sent this to Rusty earlier
today):
Small test program for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
Before fix, it prints the first module indefinitely. The patch fixes
this.
I will also append more detail to "Sort module list by pointer address
to get coherent sleepable seq_file iterators" changelog before the
2.6.23-rc3-mm1 repost.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-24 15:45 ` Mathieu Desnoyers
@ 2007-08-25 21:44 ` Rusty Russell
2007-08-25 21:53 ` Mathieu Desnoyers
0 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2007-08-25 21:44 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: akpm, linux-kernel
On Fri, 2007-08-24 at 11:45 -0400, Mathieu Desnoyers wrote:
> * Rusty Russell (rusty@rustcorp.com.au) wrote:
> > On Mon, 2007-08-20 at 16:26 -0400, Mathieu Desnoyers wrote:
> > > plain text document attachment (module.c-sort-module-list.patch)
> > > A race that appears both in /proc/modules and in kallsyms: if, between the
> > > seq file reads, the process is put to sleep and at this moment a module is
> > > or removed from the module list, the listing will skip an amount of
> > > modules/symbols corresponding to the amount of elements present in the unloaded
> > > module, but at the current position in the list if the iteration is located
> > > after the removed module.
> > >
> > > The cleanest way I found to deal with this problem is to sort the module list.
> > > We can then keep the old struct module * as the old iterator, knowing the it may
> > > be removed between the seq file reads, but we only use it as "get next". If it
> > > is not present in the module list, the next pointer will be used.
> > >
> > > By doing this, removing a given module will now only fuzz the output related to
> > > this specific module, not any random module anymore. Since modprobe uses
> > > /proc/modules, it might be important to make sure multiple concurrent running
> > > modprobes won't interfere with each other.
> >
> > You've reduced, but not eliminated, the problem. A new module inserted
> > is quite likely to reuse the same address.
> >
>
> Hi Rusty,
>
> Please tell me if I'm wrong, but I think it would not be a problem:
>
> - seq_read() makes sure that a buffer large enough is available so that
> m_show() can fully extract and print the information relative to 1
> module.
> - m_start() and m_stop() takes the module_mutex, therefore within one
> seq_read(), once m_start has returned, the struct module * that we
> have is valid and will be consistent during the whole seq_read
> operation.
> - If a module is removed, and then a different one is inserted at the
> same address, while we are between two seq_reads for this given struct
> module address, the seq_reads will copy to user-space the information
> that is still in the buffer for the _first_ struct module encountered,
> not the new one.
> - After that, iteration will continue to the new struct module address,
> effectively skipping the newly inserted module.
Indeed, I thought that this was a general problem: the seq_list code was
never intended to work on modifiable lists unless you get them in one
big read.
If we accept this problem, what do we do about all the other users?
Rusty.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-25 21:44 ` Rusty Russell
@ 2007-08-25 21:53 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-25 21:53 UTC (permalink / raw)
To: Rusty Russell; +Cc: akpm, linux-kernel
* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Fri, 2007-08-24 at 11:45 -0400, Mathieu Desnoyers wrote:
> > * Rusty Russell (rusty@rustcorp.com.au) wrote:
> > > On Mon, 2007-08-20 at 16:26 -0400, Mathieu Desnoyers wrote:
> > > > plain text document attachment (module.c-sort-module-list.patch)
> > > > A race that appears both in /proc/modules and in kallsyms: if, between the
> > > > seq file reads, the process is put to sleep and at this moment a module is
> > > > or removed from the module list, the listing will skip an amount of
> > > > modules/symbols corresponding to the amount of elements present in the unloaded
> > > > module, but at the current position in the list if the iteration is located
> > > > after the removed module.
> > > >
> > > > The cleanest way I found to deal with this problem is to sort the module list.
> > > > We can then keep the old struct module * as the old iterator, knowing the it may
> > > > be removed between the seq file reads, but we only use it as "get next". If it
> > > > is not present in the module list, the next pointer will be used.
> > > >
> > > > By doing this, removing a given module will now only fuzz the output related to
> > > > this specific module, not any random module anymore. Since modprobe uses
> > > > /proc/modules, it might be important to make sure multiple concurrent running
> > > > modprobes won't interfere with each other.
> > >
> > > You've reduced, but not eliminated, the problem. A new module inserted
> > > is quite likely to reuse the same address.
> > >
> >
> > Hi Rusty,
> >
> > Please tell me if I'm wrong, but I think it would not be a problem:
> >
> > - seq_read() makes sure that a buffer large enough is available so that
> > m_show() can fully extract and print the information relative to 1
> > module.
> > - m_start() and m_stop() takes the module_mutex, therefore within one
> > seq_read(), once m_start has returned, the struct module * that we
> > have is valid and will be consistent during the whole seq_read
> > operation.
> > - If a module is removed, and then a different one is inserted at the
> > same address, while we are between two seq_reads for this given struct
> > module address, the seq_reads will copy to user-space the information
> > that is still in the buffer for the _first_ struct module encountered,
> > not the new one.
> > - After that, iteration will continue to the new struct module address,
> > effectively skipping the newly inserted module.
>
> Indeed, I thought that this was a general problem: the seq_list code was
> never intended to work on modifiable lists unless you get them in one
> big read.
>
> If we accept this problem, what do we do about all the other users?
>
Hum, I guess it would be best for them to switch to the proposed seq
sorted list too. I think that having one example (module.c) that shows
well how this works will be an incentive for other developers to port
their seq_file code to the sorted list (I am thinking, among others,
about kallsyms).
Mathieu
> Rusty.
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-08-27 16:02 [patch 0/2] Sort module list for /proc/modules seq file reads Mathieu Desnoyers
@ 2007-08-27 16:02 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-08-27 16:02 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 5638 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Small test program for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
Actual test (kernel 2.6.23-rc3):
dijkstra:~# lsmod
Module Size Used by
pl2303 18564 0
usbserial 29032 1 pl2303
ppdev 7844 0
sky2 37476 0
skge 36368 0
rtc 10104 0
snd_hda_intel 265628 0
(here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000
We see the the 2nd line is garbage.
Now, with my patch applied:
(here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
We see that since the rtc line was already in the buffer, it has been
printed completely.
(here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000
We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.
Changelog:
When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.
Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.
Before fix, it prints the first module indefinitely. The patch fixes
this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-08-27 11:11:37.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-08-27 11:13:13.000000000 -0400
@@ -63,8 +63,10 @@ extern int module_sysfs_initialized;
/* If this is set, the section belongs in the init part of the module */
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
-/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+/*
+ * List of modules, protected by module_mutex or preempt_disable
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
DEFINE_MUTEX(module_mutex);
LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2130,10 +2132,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2398,12 +2414,12 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ return seq_sorted_list_start(&modules, pos);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, pos);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-09-06 20:05 [patch 0/2] Sort module list for /proc/modules Mathieu Desnoyers
@ 2007-09-06 20:05 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-09-06 20:05 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 5638 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Small test program for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
Actual test (kernel 2.6.23-rc3):
dijkstra:~# lsmod
Module Size Used by
pl2303 18564 0
usbserial 29032 1 pl2303
ppdev 7844 0
sky2 37476 0
skge 36368 0
rtc 10104 0
snd_hda_intel 265628 0
(here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000
We see the the 2nd line is garbage.
Now, with my patch applied:
(here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
We see that since the rtc line was already in the buffer, it has been
printed completely.
(here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000
We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.
Changelog:
When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.
Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.
Before fix, it prints the first module indefinitely. The patch fixes
this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-09-06 15:07:16.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-09-06 15:07:19.000000000 -0400
@@ -63,8 +63,10 @@ extern int module_sysfs_initialized;
/* If this is set, the section belongs in the init part of the module */
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
-/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+/*
+ * List of modules, protected by module_mutex or preempt_disable
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
DEFINE_MUTEX(module_mutex);
LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2126,10 +2128,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2394,12 +2410,12 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ return seq_sorted_list_start(&modules, pos);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, pos);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-09-17 18:43 [patch 0/2] Sort module list Mathieu Desnoyers
@ 2007-09-17 18:43 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-09-17 18:43 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 5601 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Small test program for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
Actual test (kernel 2.6.23-rc3):
dijkstra:~# lsmod
Module Size Used by
pl2303 18564 0
usbserial 29032 1 pl2303
ppdev 7844 0
sky2 37476 0
skge 36368 0
rtc 10104 0
snd_hda_intel 265628 0
(here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000
We see the the 2nd line is garbage.
Now, with my patch applied:
(here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
We see that since the rtc line was already in the buffer, it has been
printed completely.
(here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000
We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.
Changelog:
When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.
Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.
Before fix, it prints the first module indefinitely. The patch fixes
this.
Changelog:
- Remove module_mutex usage: depend on functions implemented in module.c for
that.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-09-13 17:24:46.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-09-13 17:25:26.000000000 -0400
@@ -64,7 +64,8 @@ extern int module_sysfs_initialized;
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
static DEFINE_MUTEX(module_mutex);
static LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2130,10 +2131,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2398,12 +2413,12 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ return seq_sorted_list_start(&modules, pos);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, pos);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators
2007-09-18 21:09 [patch 0/2] Sorted Module List for 2.6.23-rc6-mm1 Mathieu Desnoyers
@ 2007-09-18 21:09 ` Mathieu Desnoyers
0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2007-09-18 21:09 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: Mathieu Desnoyers
[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 5601 bytes --]
A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.
The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.
By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.
Small test program for this:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define BUFSIZE 1024
int main()
{
int fd = open("/proc/modules", O_RDONLY);
char buf[BUFSIZE];
ssize_t size;
do {
size = read(fd, buf, 1);
printf("%c", buf[0]);
usleep(100000);
} while(size > 0);
close(fd);
return 0;
}
Actual test (kernel 2.6.23-rc3):
dijkstra:~# lsmod
Module Size Used by
pl2303 18564 0
usbserial 29032 1 pl2303
ppdev 7844 0
sky2 37476 0
skge 36368 0
rtc 10104 0
snd_hda_intel 265628 0
(here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000
We see the the 2nd line is garbage.
Now, with my patch applied:
(here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
We see that since the rtc line was already in the buffer, it has been
printed completely.
(here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000
We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.
Changelog:
When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.
Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.
Before fix, it prints the first module indefinitely. The patch fixes
this.
Changelog:
- Remove module_mutex usage: depend on functions implemented in module.c for
that.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
kernel/module.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2007-09-13 17:24:46.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2007-09-13 17:25:26.000000000 -0400
@@ -64,7 +64,8 @@ extern int module_sysfs_initialized;
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
/* List of modules, protected by module_mutex or preempt_disable
- * (add/delete uses stop_machine). */
+ * (add/delete uses stop_machine). Sorted by ascending list node address.
+ */
static DEFINE_MUTEX(module_mutex);
static LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
@@ -2130,10 +2131,24 @@ nomodsectinfo:
/*
* link the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
+ * We sort the modules by struct module pointer address to permit correct
+ * iteration over modules of, at least, kallsyms for preemptible operations,
+ * such as read(). Sorting by struct module pointer address is equivalent to
+ * sort by list node address.
*/
static int __link_module(void *_mod)
{
- struct module *mod = _mod;
+ struct module *mod = _mod, *iter;
+
+ list_for_each_entry_reverse(iter, &modules, list) {
+ BUG_ON(iter == mod); /* Should never be in the list twice */
+ if (iter < mod) {
+ /* We belong to the location right after iter. */
+ list_add(&mod->list, &iter->list);
+ return 0;
+ }
+ }
+ /* We should be added at the head of the list */
list_add(&mod->list, &modules);
return 0;
}
@@ -2398,12 +2413,12 @@ unsigned long module_kallsyms_lookup_nam
static void *m_start(struct seq_file *m, loff_t *pos)
{
mutex_lock(&module_mutex);
- return seq_list_start(&modules, *pos);
+ return seq_sorted_list_start(&modules, pos);
}
static void *m_next(struct seq_file *m, void *p, loff_t *pos)
{
- return seq_list_next(p, &modules, pos);
+ return seq_sorted_list_next(p, &modules, pos);
}
static void m_stop(struct seq_file *m, void *p)
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-09-18 21:12 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-12 15:08 [patch 0/2] Sorted seq_file Mathieu Desnoyers
2007-08-12 15:08 ` [patch 1/2] Seq_file add support for sorted list Mathieu Desnoyers
2007-08-12 15:08 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
[not found] ` <20070815033945.GA13134@mail.ustc.edu.cn>
2007-08-15 3:39 ` Fengguang Wu
2007-08-15 4:18 ` Al Viro
[not found] ` <20070815063741.GB5175@mail.ustc.edu.cn>
2007-08-15 6:37 ` Fengguang Wu
2007-08-15 6:53 ` Al Viro
[not found] ` <20070815083645.GA6544@mail.ustc.edu.cn>
2007-08-15 8:36 ` Fengguang Wu
[not found] ` <20070815084625.GA18892@mail.ustc.edu.cn>
2007-08-15 8:46 ` Fengguang Wu
2007-08-18 15:56 ` Mathieu Desnoyers
2007-08-18 16:09 ` [PATCH] Fix f_version type: should be u64 instead of unsigned long Mathieu Desnoyers
2007-08-24 15:39 ` [PATCH] Sort module list - use ppos instead of m->private Mathieu Desnoyers
2007-08-24 23:34 ` Andrew Morton
2007-08-25 0:05 ` Mathieu Desnoyers
-- strict thread matches above, loose matches on Subject: below --
2007-08-20 20:26 [patch 0/2] Sort module list Mathieu Desnoyers
2007-08-20 20:26 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
2007-08-21 0:08 ` Rusty Russell
2007-08-24 15:45 ` Mathieu Desnoyers
2007-08-25 21:44 ` Rusty Russell
2007-08-25 21:53 ` Mathieu Desnoyers
2007-08-27 16:02 [patch 0/2] Sort module list for /proc/modules seq file reads Mathieu Desnoyers
2007-08-27 16:02 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
2007-09-06 20:05 [patch 0/2] Sort module list for /proc/modules Mathieu Desnoyers
2007-09-06 20:05 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
2007-09-17 18:43 [patch 0/2] Sort module list Mathieu Desnoyers
2007-09-17 18:43 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
2007-09-18 21:09 [patch 0/2] Sorted Module List for 2.6.23-rc6-mm1 Mathieu Desnoyers
2007-09-18 21:09 ` [patch 2/2] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox