* [PATCH 0/4] devmem and readahead fixes for 2.6.33
@ 2010-01-22 4:59 Wu Fengguang
2010-01-22 4:59 ` [PATCH 1/4] devmem: check vmalloc address on kmem read/write Wu Fengguang
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-22 4:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, stable, Andi Kleen, KAMEZAWA Hiroyuki,
Wu Fengguang, LKML, Linux Memory Management List, linux-fsdevel
Andrew,
Here are some good fixes for 2.6.33, they have been floating around
with other patches for some time. I should really seperate them out
earlier..
Greg,
The first two patches are on devmem. 2.6.32 also needs fixing, however
the patches can only apply cleanly to 2.6.33. I can do backporting if
necessary.
[PATCH 1/4] devmem: check vmalloc address on kmem read/write
[PATCH 2/4] devmem: fix kmem write bug on memory holes
The next two patches are on readahead. All previous kernel needs fixing,
and the patches can apply cleanly to 2.6.32, too.
[PATCH 3/4] vfs: take f_lock on modifying f_mode after open time
[PATCH 4/4] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] devmem: check vmalloc address on kmem read/write
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
@ 2010-01-22 4:59 ` Wu Fengguang
2010-01-22 4:59 ` [PATCH 2/4] devmem: fix kmem write bug on memory holes Wu Fengguang
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-22 4:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, Hugh Dickins, stable, KAMEZAWA Hiroyuki,
Wu Fengguang, Andi Kleen, LKML, Linux Memory Management List,
linux-fsdevel
[-- Attachment #1: vmalloc-addr-fix.patch --]
[-- Type: text/plain, Size: 2935 bytes --]
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Otherwise vmalloc_to_page() will BUG().
This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here
we return -ENXIO (inspired by Hugh) if no bytes have been transfered
to/from user space, otherwise return partial read/write results.
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Hugh Dickins <hugh.dickins@tiscali.co.uk>
CC: <stable@kernel.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
drivers/char/mem.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
--- linux-mm.orig/drivers/char/mem.c 2010-01-11 10:22:35.000000000 +0800
+++ linux-mm/drivers/char/mem.c 2010-01-11 10:32:32.000000000 +0800
@@ -395,6 +395,7 @@ static ssize_t read_kmem(struct file *fi
unsigned long p = *ppos;
ssize_t low_count, read, sz;
char * kbuf; /* k-addr because vread() takes vmlist_lock rwlock */
+ int err = 0;
read = 0;
if (p < (unsigned long) high_memory) {
@@ -441,12 +442,16 @@ static ssize_t read_kmem(struct file *fi
return -ENOMEM;
while (count > 0) {
sz = size_inside_page(p, count);
+ if (!is_vmalloc_or_module_addr((void *)p)) {
+ err = -ENXIO;
+ break;
+ }
sz = vread(kbuf, (char *)p, sz);
if (!sz)
break;
if (copy_to_user(buf, kbuf, sz)) {
- free_page((unsigned long)kbuf);
- return -EFAULT;
+ err = -EFAULT;
+ break;
}
count -= sz;
buf += sz;
@@ -455,8 +460,8 @@ static ssize_t read_kmem(struct file *fi
}
free_page((unsigned long)kbuf);
}
- *ppos = p;
- return read;
+ *ppos = p;
+ return read ? read : err;
}
@@ -520,6 +525,7 @@ static ssize_t write_kmem(struct file *
ssize_t wrote = 0;
ssize_t virtr = 0;
char * kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
+ int err = 0;
if (p < (unsigned long) high_memory) {
unsigned long to_write = min_t(unsigned long, count,
@@ -540,12 +546,14 @@ static ssize_t write_kmem(struct file *
unsigned long sz = size_inside_page(p, count);
unsigned long n;
+ if (!is_vmalloc_or_module_addr((void *)p)) {
+ err = -ENXIO;
+ break;
+ }
n = copy_from_user(kbuf, buf, sz);
if (n) {
- if (wrote + virtr)
- break;
- free_page((unsigned long)kbuf);
- return -EFAULT;
+ err = -EFAULT;
+ break;
}
sz = vwrite(kbuf, (char *)p, sz);
count -= sz;
@@ -556,8 +564,8 @@ static ssize_t write_kmem(struct file *
free_page((unsigned long)kbuf);
}
- *ppos = p;
- return virtr + wrote;
+ *ppos = p;
+ return virtr + wrote ? : err;
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4] devmem: fix kmem write bug on memory holes
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
2010-01-22 4:59 ` [PATCH 1/4] devmem: check vmalloc address on kmem read/write Wu Fengguang
@ 2010-01-22 4:59 ` Wu Fengguang
2010-01-22 4:59 ` [PATCH 3/4] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-22 4:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, Andi Kleen, Benjamin Herrenschmidt,
Christoph Lameter, Ingo Molnar, Tejun Heo, Nick Piggin,
KAMEZAWA Hiroyuki, stable, Wu Fengguang, LKML,
Linux Memory Management List, linux-fsdevel
[-- Attachment #1: vwrite-fix.patch --]
[-- Type: text/plain, Size: 1298 bytes --]
write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This creates
a bug that "buf" is not advanced accordingly.
Fix it to simply ignore the return value, hence the memory hole.
CC: Andi Kleen <andi@firstfloor.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Tejun Heo <tj@kernel.org>
CC: Nick Piggin <npiggin@suse.de>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: <stable@kernel.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
drivers/char/mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- linux-mm.orig/drivers/char/mem.c 2010-01-11 10:32:32.000000000 +0800
+++ linux-mm/drivers/char/mem.c 2010-01-11 10:32:34.000000000 +0800
@@ -555,7 +555,7 @@ static ssize_t write_kmem(struct file *
err = -EFAULT;
break;
}
- sz = vwrite(kbuf, (char *)p, sz);
+ vwrite(kbuf, (char *)p, sz);
count -= sz;
buf += sz;
virtr += sz;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4] vfs: take f_lock on modifying f_mode after open time
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
2010-01-22 4:59 ` [PATCH 1/4] devmem: check vmalloc address on kmem read/write Wu Fengguang
2010-01-22 4:59 ` [PATCH 2/4] devmem: fix kmem write bug on memory holes Wu Fengguang
@ 2010-01-22 4:59 ` Wu Fengguang
2010-01-22 4:59 ` [PATCH 4/4] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-22 4:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, Al Viro, Christoph Hellwig, Wu Fengguang,
stable, Andi Kleen, LKML, Linux Memory Management List,
linux-fsdevel
[-- Attachment #1: fmode-lock.patch --]
[-- Type: text/plain, Size: 1165 bytes --]
We'll introduce FMODE_RANDOM which will be runtime modified.
So protect all runtime modification to f_mode with f_lock to
avoid races.
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/file_table.c | 2 ++
fs/nfsd/nfs4state.c | 2 ++
2 files changed, 4 insertions(+)
--- linux.orig/fs/file_table.c 2010-01-15 09:11:07.000000000 +0800
+++ linux/fs/file_table.c 2010-01-15 09:11:15.000000000 +0800
@@ -392,7 +392,9 @@ retry:
continue;
if (!(f->f_mode & FMODE_WRITE))
continue;
+ spin_lock(&f->f_lock);
f->f_mode &= ~FMODE_WRITE;
+ spin_unlock(&f->f_lock);
if (file_check_writeable(f) != 0)
continue;
file_release_write(f);
--- linux.orig/fs/nfsd/nfs4state.c 2010-01-15 09:08:22.000000000 +0800
+++ linux/fs/nfsd/nfs4state.c 2010-01-15 09:11:15.000000000 +0800
@@ -1998,7 +1998,9 @@ nfs4_file_downgrade(struct file *filp, u
{
if (share_access & NFS4_SHARE_ACCESS_WRITE) {
drop_file_write_access(filp);
+ spin_lock(&filp->f_lock);
filp->f_mode = (filp->f_mode | FMODE_READ) & ~FMODE_WRITE;
+ spin_unlock(&filp->f_lock);
}
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
` (2 preceding siblings ...)
2010-01-22 4:59 ` [PATCH 3/4] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
@ 2010-01-22 4:59 ` Wu Fengguang
2010-01-22 5:31 ` [PATCH 0/4] devmem and readahead fixes for 2.6.33 Greg KH
2010-02-03 23:47 ` [stable] " Greg KH
5 siblings, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-22 4:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, Nick Piggin, Andi Kleen, Steven Whitehouse,
David Howells, Al Viro, Jonathan Corbet, Christoph Hellwig,
Wu Fengguang, stable, LKML, Linux Memory Management List,
linux-fsdevel
[-- Attachment #1: fadvise-random.patch --]
[-- Type: text/plain, Size: 3781 bytes --]
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.
POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor
performance: a 16K read will be carried out in 4 _sync_ 1-page reads.
In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.
POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.
So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.
Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).
In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!
v6: use FMODE_RANDOM (proposed by Christoph Hellwig)
v5: use bit 0200000000; explicitly nuke the O_RANDOM bit in __dentry_open()
(Stephen Rothwell)
v4: resolve bit conflicts with sparc and parisc;
use bit 040000000(=FMODE_NONOTIFY), which will be masked out by
__dentry_open(), so that open(O_RANDOM) is disabled
(Stephen Rothwell and Christoph Hellwig)
v3: use O_RANDOM to indicate both read/write access pattern as in
posix_fadvise(), although it only takes effect for read() now
(proposed by Quentin)
v2: use O_RANDOM_READ to avoid race conditions (pointed out by Andi)
CC: Nick Piggin <npiggin@suse.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: David Howells <dhowells@redhat.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Jonathan Corbet <corbet@lwn.net>
CC: Christoph Hellwig <hch@infradead.org>
Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/fs.h | 3 +++
mm/fadvise.c | 10 +++++++++-
mm/readahead.c | 6 ++++++
3 files changed, 18 insertions(+), 1 deletion(-)
--- linux-2.6.orig/mm/fadvise.c 2009-08-23 14:44:23.000000000 +0800
+++ linux-2.6/mm/fadvise.c 2010-01-22 12:57:07.000000000 +0800
@@ -77,12 +77,20 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
switch (advice) {
case POSIX_FADV_NORMAL:
file->f_ra.ra_pages = bdi->ra_pages;
+ spin_lock(&file->f_lock);
+ file->f_flags &= ~FMODE_RANDOM;
+ spin_unlock(&file->f_lock);
break;
case POSIX_FADV_RANDOM:
- file->f_ra.ra_pages = 0;
+ spin_lock(&file->f_lock);
+ file->f_flags |= FMODE_RANDOM;
+ spin_unlock(&file->f_lock);
break;
case POSIX_FADV_SEQUENTIAL:
file->f_ra.ra_pages = bdi->ra_pages * 2;
+ spin_lock(&file->f_lock);
+ file->f_flags &= ~FMODE_RANDOM;
+ spin_unlock(&file->f_lock);
break;
case POSIX_FADV_WILLNEED:
if (!mapping->a_ops->readpage) {
--- linux-2.6.orig/mm/readahead.c 2010-01-22 12:55:48.000000000 +0800
+++ linux-2.6/mm/readahead.c 2010-01-22 12:57:07.000000000 +0800
@@ -501,6 +501,12 @@ void page_cache_sync_readahead(struct ad
if (!ra->ra_pages)
return;
+ /* be dumb */
+ if (filp->f_mode & FMODE_RANDOM) {
+ force_page_cache_readahead(mapping, filp, offset, req_size);
+ return;
+ }
+
/* do read-ahead */
ondemand_readahead(mapping, ra, filp, false, offset, req_size);
}
--- linux-2.6.orig/include/linux/fs.h 2010-01-22 12:55:47.000000000 +0800
+++ linux-2.6/include/linux/fs.h 2010-01-22 12:57:08.000000000 +0800
@@ -87,6 +87,9 @@ struct inodes_stat_t {
*/
#define FMODE_NOCMTIME ((__force fmode_t)2048)
+/* Expect random access pattern */
+#define FMODE_RANDOM ((__force fmode_t)0x1000)
+
/*
* The below are the various read and write types that we support. Some of
* them include behavioral modifiers that send information down to the
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] devmem and readahead fixes for 2.6.33
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
` (3 preceding siblings ...)
2010-01-22 4:59 ` [PATCH 4/4] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
@ 2010-01-22 5:31 ` Greg KH
2010-01-27 0:50 ` Andrew Morton
2010-02-03 23:47 ` [stable] " Greg KH
5 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2010-01-22 5:31 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, stable, Andi Kleen, KAMEZAWA Hiroyuki, LKML,
Linux Memory Management List, linux-fsdevel
On Fri, Jan 22, 2010 at 12:59:14PM +0800, Wu Fengguang wrote:
> Andrew,
>
> Here are some good fixes for 2.6.33, they have been floating around
> with other patches for some time. I should really seperate them out
> earlier..
>
> Greg,
>
> The first two patches are on devmem. 2.6.32 also needs fixing, however
> the patches can only apply cleanly to 2.6.33. I can do backporting if
> necessary.
>
> [PATCH 1/4] devmem: check vmalloc address on kmem read/write
> [PATCH 2/4] devmem: fix kmem write bug on memory holes
After these hit Linus's tree, please send the backport to
stable@kernel.org and I will be glad to queue them up.
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] devmem and readahead fixes for 2.6.33
2010-01-22 5:31 ` [PATCH 0/4] devmem and readahead fixes for 2.6.33 Greg KH
@ 2010-01-27 0:50 ` Andrew Morton
2010-01-27 1:39 ` Greg KH
2010-01-27 2:45 ` Wu Fengguang
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2010-01-27 0:50 UTC (permalink / raw)
To: Greg KH
Cc: Wu Fengguang, stable, Andi Kleen, KAMEZAWA Hiroyuki, LKML,
Linux Memory Management List, linux-fsdevel
On Thu, 21 Jan 2010 21:31:57 -0800
Greg KH <gregkh@suse.de> wrote:
> On Fri, Jan 22, 2010 at 12:59:14PM +0800, Wu Fengguang wrote:
> > Andrew,
> >
> > Here are some good fixes for 2.6.33, they have been floating around
> > with other patches for some time. I should really seperate them out
> > earlier..
> >
> > Greg,
> >
> > The first two patches are on devmem. 2.6.32 also needs fixing, however
> > the patches can only apply cleanly to 2.6.33. I can do backporting if
> > necessary.
> >
> > [PATCH 1/4] devmem: check vmalloc address on kmem read/write
> > [PATCH 2/4] devmem: fix kmem write bug on memory holes
>
> After these hit Linus's tree, please send the backport to
> stable@kernel.org and I will be glad to queue them up.
>
I tagged the first two patches for -stable and shall send them in for 2.6.33.
The second two patches aren't quite as obvious - perhaps a risk of
weird regressions. So I'm thinking I'll send them in for 2.6.34-rc1
and I tagged them as "[2.6.33.x]" for -stable, so you can feed them
into 2.6.33.x once 2.6.34-rcX has had a bit of testing time, OK?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] devmem and readahead fixes for 2.6.33
2010-01-27 0:50 ` Andrew Morton
@ 2010-01-27 1:39 ` Greg KH
2010-01-27 2:45 ` Wu Fengguang
1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2010-01-27 1:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Wu Fengguang, stable, Andi Kleen, KAMEZAWA Hiroyuki, LKML,
Linux Memory Management List, linux-fsdevel
On Tue, Jan 26, 2010 at 04:50:50PM -0800, Andrew Morton wrote:
> On Thu, 21 Jan 2010 21:31:57 -0800
> Greg KH <gregkh@suse.de> wrote:
>
> > On Fri, Jan 22, 2010 at 12:59:14PM +0800, Wu Fengguang wrote:
> > > Andrew,
> > >
> > > Here are some good fixes for 2.6.33, they have been floating around
> > > with other patches for some time. I should really seperate them out
> > > earlier..
> > >
> > > Greg,
> > >
> > > The first two patches are on devmem. 2.6.32 also needs fixing, however
> > > the patches can only apply cleanly to 2.6.33. I can do backporting if
> > > necessary.
> > >
> > > [PATCH 1/4] devmem: check vmalloc address on kmem read/write
> > > [PATCH 2/4] devmem: fix kmem write bug on memory holes
> >
> > After these hit Linus's tree, please send the backport to
> > stable@kernel.org and I will be glad to queue them up.
> >
>
> I tagged the first two patches for -stable and shall send them in for 2.6.33.
>
> The second two patches aren't quite as obvious - perhaps a risk of
> weird regressions. So I'm thinking I'll send them in for 2.6.34-rc1
> and I tagged them as "[2.6.33.x]" for -stable, so you can feed them
> into 2.6.33.x once 2.6.34-rcX has had a bit of testing time, OK?
Sounds good to me.
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] devmem and readahead fixes for 2.6.33
2010-01-27 0:50 ` Andrew Morton
2010-01-27 1:39 ` Greg KH
@ 2010-01-27 2:45 ` Wu Fengguang
1 sibling, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-01-27 2:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg KH, stable@kernel.org, Andi Kleen, KAMEZAWA Hiroyuki, LKML,
Linux Memory Management List, linux-fsdevel@vger.kernel.org
On Tue, Jan 26, 2010 at 05:50:50PM -0700, Andrew Morton wrote:
> On Thu, 21 Jan 2010 21:31:57 -0800
> Greg KH <gregkh@suse.de> wrote:
>
> > On Fri, Jan 22, 2010 at 12:59:14PM +0800, Wu Fengguang wrote:
> > > Andrew,
> > >
> > > Here are some good fixes for 2.6.33, they have been floating around
> > > with other patches for some time. I should really seperate them out
> > > earlier..
> > >
> > > Greg,
> > >
> > > The first two patches are on devmem. 2.6.32 also needs fixing, however
> > > the patches can only apply cleanly to 2.6.33. I can do backporting if
> > > necessary.
> > >
> > > [PATCH 1/4] devmem: check vmalloc address on kmem read/write
> > > [PATCH 2/4] devmem: fix kmem write bug on memory holes
> >
> > After these hit Linus's tree, please send the backport to
> > stable@kernel.org and I will be glad to queue them up.
> >
>
> I tagged the first two patches for -stable and shall send them in for 2.6.33.
>
> The second two patches aren't quite as obvious - perhaps a risk of
> weird regressions. So I'm thinking I'll send them in for 2.6.34-rc1
> and I tagged them as "[2.6.33.x]" for -stable, so you can feed them
> into 2.6.33.x once 2.6.34-rcX has had a bit of testing time, OK?
OK, I'll send the patches to stable kernel once they hit mainline.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [stable] [PATCH 0/4] devmem and readahead fixes for 2.6.33
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
` (4 preceding siblings ...)
2010-01-22 5:31 ` [PATCH 0/4] devmem and readahead fixes for 2.6.33 Greg KH
@ 2010-02-03 23:47 ` Greg KH
2010-02-04 2:42 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write Wu Fengguang
5 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2010-02-03 23:47 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen, linux-fsdevel, stable,
KAMEZAWA Hiroyuki
On Fri, Jan 22, 2010 at 12:59:14PM +0800, Wu Fengguang wrote:
> Greg,
>
> The first two patches are on devmem. 2.6.32 also needs fixing, however
> the patches can only apply cleanly to 2.6.33. I can do backporting if
> necessary.
>
> [PATCH 1/4] devmem: check vmalloc address on kmem read/write
> [PATCH 2/4] devmem: fix kmem write bug on memory holes
As these patches are now in Linus's tree, can you provide backports for
them and send them to the stable@kernel.org address?
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [stable] [PATCH] devmem: check vmalloc address on kmem read/write
2010-02-03 23:47 ` [stable] " Greg KH
@ 2010-02-04 2:42 ` Wu Fengguang
2010-02-04 2:43 ` [stable] [PATCH] devmem: fix kmem write bug on memory holes Wu Fengguang
2010-02-04 2:58 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write KAMEZAWA Hiroyuki
0 siblings, 2 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-02-04 2:42 UTC (permalink / raw)
To: Greg KH
Cc: Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen,
linux-fsdevel@vger.kernel.org, stable@kernel.org,
KAMEZAWA Hiroyuki
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
commit 325fda71d0badc1073dc59f12a948f24ff05796a upstream.
Otherwise vmalloc_to_page() will BUG().
This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here
we return -ENXIO (inspired by Hugh) if no bytes have been transfered
to/from user space, otherwise return partial read/write results.
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Hugh Dickins <hugh.dickins@tiscali.co.uk>
CC: <stable@kernel.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
drivers/char/mem.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
--- linux-2.6.32.orig/drivers/char/mem.c 2010-02-04 10:28:19.000000000 +0800
+++ linux-2.6.32/drivers/char/mem.c 2010-02-04 10:37:55.000000000 +0800
@@ -408,6 +408,7 @@ static ssize_t read_kmem(struct file *fi
unsigned long p = *ppos;
ssize_t low_count, read, sz;
char * kbuf; /* k-addr because vread() takes vmlist_lock rwlock */
+ int err = 0;
read = 0;
if (p < (unsigned long) high_memory) {
@@ -464,14 +465,18 @@ static ssize_t read_kmem(struct file *fi
while (count > 0) {
int len = count;
+ if (!is_vmalloc_or_module_addr((void *)p)) {
+ err = -ENXIO;
+ break;
+ }
if (len > PAGE_SIZE)
len = PAGE_SIZE;
len = vread(kbuf, (char *)p, len);
if (!len)
break;
if (copy_to_user(buf, kbuf, len)) {
- free_page((unsigned long)kbuf);
- return -EFAULT;
+ err = -EFAULT;
+ break;
}
count -= len;
buf += len;
@@ -480,8 +485,8 @@ static ssize_t read_kmem(struct file *fi
}
free_page((unsigned long)kbuf);
}
- *ppos = p;
- return read;
+ *ppos = p;
+ return read ? read : err;
}
@@ -557,6 +562,7 @@ static ssize_t write_kmem(struct file *
ssize_t virtr = 0;
ssize_t written;
char * kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
+ int err = 0;
if (p < (unsigned long) high_memory) {
@@ -580,15 +586,17 @@ static ssize_t write_kmem(struct file *
while (count > 0) {
int len = count;
+ if (!is_vmalloc_or_module_addr((void *)p)) {
+ err = -ENXIO;
+ break;
+ }
if (len > PAGE_SIZE)
len = PAGE_SIZE;
if (len) {
written = copy_from_user(kbuf, buf, len);
if (written) {
- if (wrote + virtr)
- break;
- free_page((unsigned long)kbuf);
- return -EFAULT;
+ err = -EFAULT;
+ break;
}
}
len = vwrite(kbuf, (char *)p, len);
@@ -600,8 +608,8 @@ static ssize_t write_kmem(struct file *
free_page((unsigned long)kbuf);
}
- *ppos = p;
- return virtr + wrote;
+ *ppos = p;
+ return virtr + wrote ? : err;
}
#endif
^ permalink raw reply [flat|nested] 15+ messages in thread
* [stable] [PATCH] devmem: fix kmem write bug on memory holes
2010-02-04 2:42 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write Wu Fengguang
@ 2010-02-04 2:43 ` Wu Fengguang
2010-02-04 2:58 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write KAMEZAWA Hiroyuki
1 sibling, 0 replies; 15+ messages in thread
From: Wu Fengguang @ 2010-02-04 2:43 UTC (permalink / raw)
To: Greg KH
Cc: Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen,
linux-fsdevel@vger.kernel.org, stable@kernel.org,
KAMEZAWA Hiroyuki
From: Wu Fengguang <fengguang.wu@intel.com>
commit c85e9a97c4102ce2e83112da850d838cfab5ab13 upstream.
write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This creates
a bug that "buf" is not advanced accordingly.
Fix it to simply ignore the return value, hence the memory hole.
CC: Andi Kleen <andi@firstfloor.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Tejun Heo <tj@kernel.org>
CC: Nick Piggin <npiggin@suse.de>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: <stable@kernel.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
drivers/char/mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- linux-2.6.32.orig/drivers/char/mem.c 2010-02-04 10:37:55.000000000 +0800
+++ linux-2.6.32/drivers/char/mem.c 2010-02-04 10:37:59.000000000 +0800
@@ -599,7 +599,7 @@ static ssize_t write_kmem(struct file *
break;
}
}
- len = vwrite(kbuf, (char *)p, len);
+ vwrite(kbuf, (char *)p, len);
count -= len;
buf += len;
virtr += len;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [stable] [PATCH] devmem: check vmalloc address on kmem read/write
2010-02-04 2:42 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write Wu Fengguang
2010-02-04 2:43 ` [stable] [PATCH] devmem: fix kmem write bug on memory holes Wu Fengguang
@ 2010-02-04 2:58 ` KAMEZAWA Hiroyuki
2010-02-04 3:18 ` Wu Fengguang
1 sibling, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-04 2:58 UTC (permalink / raw)
To: Wu Fengguang
Cc: Greg KH, Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen,
linux-fsdevel@vger.kernel.org, stable@kernel.org,
juha_motorsportcom
On Thu, 4 Feb 2010 10:42:02 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> commit 325fda71d0badc1073dc59f12a948f24ff05796a upstream.
>
> Otherwise vmalloc_to_page() will BUG().
>
> This also makes the kmem read/write implementation aligned with mem(4):
> "References to nonexistent locations cause errors to be returned." Here
> we return -ENXIO (inspired by Hugh) if no bytes have been transfered
> to/from user space, otherwise return partial read/write results.
>
Wu-san, I have additonal fix to this patch. Now, *ppos update is unstable..
Could you make merged one ?
Maybe this one makes the all behavior clearer.
==
This is a more fix for devmem-check-vmalloc-address-on-kmem-read-write.patch
Now, the condition for updating *ppos is not good. (it's updated even if EFAULT
occurs..). This fixes that.
Reported-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
CC: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
drivers/char/mem.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
Index: mmotm-2.6.33-Feb01/drivers/char/mem.c
===================================================================
--- mmotm-2.6.33-Feb01.orig/drivers/char/mem.c
+++ mmotm-2.6.33-Feb01/drivers/char/mem.c
@@ -460,14 +460,18 @@ static ssize_t read_kmem(struct file *fi
}
free_page((unsigned long)kbuf);
}
+ /* EFAULT is always critical */
+ if (err == -EFAULT)
+ return err;
+ if (err == -ENXIO && !read)
+ return -ENXIO;
*ppos = p;
- return read ? read : err;
+ return read;
}
static inline ssize_t
-do_write_kmem(unsigned long p, const char __user *buf,
- size_t count, loff_t *ppos)
+do_write_kmem(unsigned long p, const char __user *buf, size_t count)
{
ssize_t written, sz;
unsigned long copied;
@@ -510,7 +514,6 @@ do_write_kmem(unsigned long p, const cha
written += sz;
}
- *ppos += written;
return written;
}
@@ -521,6 +524,7 @@ do_write_kmem(unsigned long p, const cha
static ssize_t write_kmem(struct file * file, const char __user * buf,
size_t count, loff_t *ppos)
{
+ /* Kernel virtual memory never exceeds unsigned long */
unsigned long p = *ppos;
ssize_t wrote = 0;
ssize_t virtr = 0;
@@ -530,7 +534,7 @@ static ssize_t write_kmem(struct file *
if (p < (unsigned long) high_memory) {
unsigned long to_write = min_t(unsigned long, count,
(unsigned long)high_memory - p);
- wrote = do_write_kmem(p, buf, to_write, ppos);
+ wrote = do_write_kmem(p, buf, to_write);
if (wrote != to_write)
return wrote;
p += wrote;
@@ -540,8 +544,13 @@ static ssize_t write_kmem(struct file *
if (count > 0) {
kbuf = (char *)__get_free_page(GFP_KERNEL);
- if (!kbuf)
- return wrote ? wrote : -ENOMEM;
+ if (!kbuf) {
+ if (wrote) { /* update ppos and return copied bytes */
+ *ppos = p;
+ return wrote;
+ } else
+ return -ENOMEM;
+ }
while (count > 0) {
unsigned long sz = size_inside_page(p, count);
unsigned long n;
@@ -563,9 +572,16 @@ static ssize_t write_kmem(struct file *
}
free_page((unsigned long)kbuf);
}
-
+ /* EFAULT is always critical. */
+ if (err == -EFAULT)
+ return err;
+ if (err == -ENXIO) {
+ /* We reached the end of vmalloc area..check real bug or not*/
+ if (!(virtr + wrote)) /* nothing written */
+ return -ENXIO;
+ }
*ppos = p;
- return virtr + wrote ? : err;
+ return virtr + wrote;
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [stable] [PATCH] devmem: check vmalloc address on kmem read/write
2010-02-04 2:58 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write KAMEZAWA Hiroyuki
@ 2010-02-04 3:18 ` Wu Fengguang
2010-02-04 3:27 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 15+ messages in thread
From: Wu Fengguang @ 2010-02-04 3:18 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Greg KH, Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen,
linux-fsdevel@vger.kernel.org, stable@kernel.org,
juha_motorsportcom@luukku.com
On Thu, Feb 04, 2010 at 10:58:01AM +0800, KAMEZAWA Hiroyuki wrote:
> On Thu, 4 Feb 2010 10:42:02 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > commit 325fda71d0badc1073dc59f12a948f24ff05796a upstream.
> >
> > Otherwise vmalloc_to_page() will BUG().
> >
> > This also makes the kmem read/write implementation aligned with mem(4):
> > "References to nonexistent locations cause errors to be returned." Here
> > we return -ENXIO (inspired by Hugh) if no bytes have been transfered
> > to/from user space, otherwise return partial read/write results.
> >
>
> Wu-san, I have additonal fix to this patch. Now, *ppos update is unstable..
> Could you make merged one ?
> Maybe this one makes the all behavior clearer.
>
> ==
> This is a more fix for devmem-check-vmalloc-address-on-kmem-read-write.patch
> Now, the condition for updating *ppos is not good. (it's updated even if EFAULT
> occurs..). This fixes that.
>
>
> Reported-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
Sorry, can you elaborate the problem? How it break the application?
It looks that do_generic_file_read() also updates *ppos progressively,
no one complains about that.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [stable] [PATCH] devmem: check vmalloc address on kmem read/write
2010-02-04 3:18 ` Wu Fengguang
@ 2010-02-04 3:27 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-04 3:27 UTC (permalink / raw)
To: Wu Fengguang
Cc: Greg KH, Andrew Morton, Greg Kroah-Hartman, LKML,
Linux Memory Management List, Andi Kleen,
linux-fsdevel@vger.kernel.org, stable@kernel.org,
juha_motorsportcom@luukku.com
On Thu, 4 Feb 2010 11:18:54 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Thu, Feb 04, 2010 at 10:58:01AM +0800, KAMEZAWA Hiroyuki wrote:
> > On Thu, 4 Feb 2010 10:42:02 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> >
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > >
> > > commit 325fda71d0badc1073dc59f12a948f24ff05796a upstream.
> > >
> > > Otherwise vmalloc_to_page() will BUG().
> > >
> > > This also makes the kmem read/write implementation aligned with mem(4):
> > > "References to nonexistent locations cause errors to be returned." Here
> > > we return -ENXIO (inspired by Hugh) if no bytes have been transfered
> > > to/from user space, otherwise return partial read/write results.
> > >
> >
> > Wu-san, I have additonal fix to this patch. Now, *ppos update is unstable..
> > Could you make merged one ?
> > Maybe this one makes the all behavior clearer.
> >
> > ==
> > This is a more fix for devmem-check-vmalloc-address-on-kmem-read-write.patch
> > Now, the condition for updating *ppos is not good. (it's updated even if EFAULT
> > occurs..). This fixes that.
> >
> >
> > Reported-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
>
> Sorry, can you elaborate the problem? How it break the application?
>
> It looks that do_generic_file_read() also updates *ppos progressively,
> no one complains about that.
>
Ah...it seems I misunderstood something...ok, *ppos should be updated every time.
I startted from adding comment on following line and got into a maze.
> return (virtr + wrote) ? : err;
Sorry for noise.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-02-04 3:27 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-22 4:59 [PATCH 0/4] devmem and readahead fixes for 2.6.33 Wu Fengguang
2010-01-22 4:59 ` [PATCH 1/4] devmem: check vmalloc address on kmem read/write Wu Fengguang
2010-01-22 4:59 ` [PATCH 2/4] devmem: fix kmem write bug on memory holes Wu Fengguang
2010-01-22 4:59 ` [PATCH 3/4] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
2010-01-22 4:59 ` [PATCH 4/4] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
2010-01-22 5:31 ` [PATCH 0/4] devmem and readahead fixes for 2.6.33 Greg KH
2010-01-27 0:50 ` Andrew Morton
2010-01-27 1:39 ` Greg KH
2010-01-27 2:45 ` Wu Fengguang
2010-02-03 23:47 ` [stable] " Greg KH
2010-02-04 2:42 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write Wu Fengguang
2010-02-04 2:43 ` [stable] [PATCH] devmem: fix kmem write bug on memory holes Wu Fengguang
2010-02-04 2:58 ` [stable] [PATCH] devmem: check vmalloc address on kmem read/write KAMEZAWA Hiroyuki
2010-02-04 3:18 ` Wu Fengguang
2010-02-04 3:27 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).