public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"mtosatti@redhat.com" <mtosatti@redhat.com>,
	"gregkh@suse.de" <gregkh@suse.de>,
	"broonie@opensource.wolfsonmicro.com" 
	<broonie@opensource.wolfsonmicro.com>,
	"johannes@sipsolutions.net" <johannes@sipsolutions.net>,
	"avi@qumranet.com" <avi@qumranet.com>,
	"andi@firstfloor.org" <andi@firstfloor.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	WANG Cong <xiyou.wangcong@gmail.com>,
	Mike Smith <scgtrp@gmail.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>
Subject: Re: [PATCH] devmem: handle partial kmem write/read
Date: Tue, 15 Sep 2009 10:57:10 +0800	[thread overview]
Message-ID: <20090915025710.GA23881@localhost> (raw)
In-Reply-To: <20090915113113.126f90f0.kamezawa.hiroyu@jp.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 10101 bytes --]

On Tue, Sep 15, 2009 at 10:31:13AM +0800, KAMEZAWA Hiroyuki wrote:
> On Tue, 15 Sep 2009 10:02:08 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Tue, Sep 15, 2009 at 08:24:48AM +0800, KAMEZAWA Hiroyuki wrote:
> > > On Mon, 14 Sep 2009 12:34:44 +0800
> > > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > 
> > > > Hi Kame,
> > > > 
> > > > This patch needs more work. I first intent to fix a bug:
> > > > 
> > > >                         sz = vwrite(kbuf, (char *)p, sz);
> > > >                         p += sz;
> > > >                 }
> > > > 
> > > > So if the returned len is 0, the kbuf/p pointers will mismatch. 
> > > > 
> > > > Then I realize it changed the write behavior. The current vwrite()
> > > > behavior is strange, it returns 0 if _whole range_ is hole, otherwise
> > > > ignores the hole silently. So holes can be treated differently even in
> > > > the original code.
> > > > 
> > > Ah, ok...
> > > 
> > > > I'm not really sure about the right behavior. KAME-san, do you have
> > > > any suggestions?
> > > > 
> > > maybe following make sense.
> > > =
> > > written = vwrite(kbuf, (char *p), sz);
> > > if (!written) // whole vmem was a hole
> > > 	written = sz;
> > 
> > Since the 0 return value won't be used at all, it would be simpler to
> > tell vwrite() return the untouched buflen/sz, like this. It will ignore
> > _all_ the holes silently. Need to update comments too.
> > 
> 
> Hmm. IIUC the original kmem code returns immediately if vread/vwrite returns 0.

Agreed for vread. It seems vwrite don't do that for both kmem and
vmalloc part.

> But it seems this depends on wrong assumption that vmap area is continuous,
> at the first look.

You mean it is possible that not all pages are mapped in a vmlist addr,size range?
Is vmalloc areas guaranteed to be page aligned? Sorry for the dumb questions.

> In man(4) mem,kmem
> ==
>        Byte  addresses  in  mem  are interpreted as physical memory addresses.
>        References to nonexistent locations cause errors to be returned.
> 	.....
>        The file kmem is the same as mem, except that the kernel virtual memory
>        rather than physical memory is accessed.
> 
> ==
> 
> Then, we have to return error for accesses to "nonexistent locations".

That's one important factor to consider. On the other hand, the
original kmem read/write implementation don't return error code for
holes. Instead kmem read returns early, while kmem write ignores holes
but is buggy..

> memory-hole in vmap area is ....."nonexistent" ? 
> I think it's nonexistent if there are no overlaps between requested [pos, pos+len)
> and registerred vmalloc area.
> But, hmm, there are no way for users to know "existing vmalloc area".
> Then, my above idea may be wrong.
> 
> Then, I'd like to modify as following,
> 
>   - If is_vmalloc_or_module_addr(requested address) is false, return -EFAULT.
>   - If is_vmalloc_or_module_addr(requested address) is true, return no error.
>     Even if specified range doesn't include no exsiting vmalloc area.
> 
> How do you think ?

Looks reasonable to me. But it's good to hear more wide opinions..

> Thanks,
> -Kame
> p.s. I wonder current /dev/kmem cannot read text area of kernel if it's not
> directly mapped.

Attached is a small tool (from LKML) for reading 'modprobe_path' from kmem,
it's not text, but is close..

Thanks,
Fengguang

> 
> 
> > --- linux-mm.orig/mm/vmalloc.c  2009-09-15 09:40:08.000000000 +0800                                                                  
> > +++ linux-mm/mm/vmalloc.c       2009-09-15 09:43:33.000000000 +0800                                                                  
> > @@ -1834,7 +1834,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >         struct vm_struct *tmp;                                                                                                       
> >         char *vaddr;                                                                                                                 
> >         unsigned long n, buflen;                                                                                                     
> > -       int copied = 0;                                                                                                              
> >                                                                                                                                      
> >         /* Don't allow overflow */                                                                                                   
> >         if ((unsigned long) addr + count < count)                                                                                    
> > @@ -1858,7 +1857,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >                         n = count;                                                                                                   
> >                 if (!(tmp->flags & VM_IOREMAP)) {                                                                                    
> >                         aligned_vwrite(buf, addr, n);                                                                                
> > -                       copied++;                                                                                                    
> >                 }                                                                                                                    
> >                 buf += n;                                                                                                            
> >                 addr += n;                                                                                                           
> > @@ -1866,8 +1864,6 @@ long vwrite(char *buf, char *addr, unsig                                                                       
> >         }                                                                                                                            
> >  finished:                                                                                                                           
> >         read_unlock(&vmlist_lock);                                                                                                   
> > -       if (!copied)                                                                                                                 
> > -               return 0;                                                                                                            
> >         return buflen;                                                                                                               
> >  }                                                                                                                                   
> > 
> > > ==
> > > needs fix.
> > > 
> > > Anyway, I wonder kmem is broken. It's should be totally rewritten.
> > > 
> > > For example, this doesn't check anything.
> > > ==
> > >         if (p < (unsigned long) high_memory) {
> > > 
> > > ==
> > > 
> > > But....are there users ?
> > > If necessary, I'll write some...
> > 
> > I'm trying to stop possible mem/kmem users to access hwpoison pages.. 
> > I'm not the user, but rather a tester ;)
> > 
> > Thanks,
> > Fengguang
> > 
> > > Thanks,
> > > -Kame
> > > 
> > > 
> > > > Thanks,
> > > > Fengguang
> > > > 
> > > > On Mon, Sep 14, 2009 at 11:29:01AM +0800, Wu Fengguang wrote:
> > > > > Current vwrite silently ignores vwrite() to vmap holes.
> > > > > Better behavior should be stop the write and return
> > > > > on such holes.
> > > > > 
> > > > > Also return on partial read, which may happen in future
> > > > > (eg. hit hwpoison pages).
> > > > > 
> > > > > CC: Andi Kleen <andi@firstfloor.org>
> > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > ---
> > > > >  drivers/char/mem.c |   30 ++++++++++++++++++------------
> > > > >  1 file changed, 18 insertions(+), 12 deletions(-)
> > > > > 
> > > > > --- linux-mm.orig/drivers/char/mem.c	2009-09-14 10:59:50.000000000 +0800
> > > > > +++ linux-mm/drivers/char/mem.c	2009-09-14 11:06:25.000000000 +0800
> > > > > @@ -444,18 +444,22 @@ static ssize_t read_kmem(struct file *fi
> > > > >  		if (!kbuf)
> > > > >  			return -ENOMEM;
> > > > >  		while (count > 0) {
> > > > > +			unsigned long n;
> > > > > +
> > > > >  			sz = size_inside_page(p, count);
> > > > > -			sz = vread(kbuf, (char *)p, sz);
> > > > > -			if (!sz)
> > > > > +			n = vread(kbuf, (char *)p, sz);
> > > > > +			if (!n)
> > > > >  				break;
> > > > > -			if (copy_to_user(buf, kbuf, sz)) {
> > > > > +			if (copy_to_user(buf, kbuf, n)) {
> > > > >  				free_page((unsigned long)kbuf);
> > > > >  				return -EFAULT;
> > > > >  			}
> > > > > -			count -= sz;
> > > > > -			buf += sz;
> > > > > -			read += sz;
> > > > > -			p += sz;
> > > > > +			count -= n;
> > > > > +			buf += n;
> > > > > +			read += n;
> > > > > +			p += n;
> > > > > +			if (n < sz)
> > > > > +				break;
> > > > >  		}
> > > > >  		free_page((unsigned long)kbuf);
> > > > >  	}
> > > > > @@ -551,11 +555,13 @@ static ssize_t write_kmem(struct file * 
> > > > >  				free_page((unsigned long)kbuf);
> > > > >  				return -EFAULT;
> > > > >  			}
> > > > > -			sz = vwrite(kbuf, (char *)p, sz);
> > > > > -			count -= sz;
> > > > > -			buf += sz;
> > > > > -			virtr += sz;
> > > > > -			p += sz;
> > > > > +			n = vwrite(kbuf, (char *)p, sz);
> > > > > +			count -= n;
> > > > > +			buf += n;
> > > > > +			virtr += n;
> > > > > +			p += n;
> > > > > +			if (n < sz)
> > > > > +				break;
> > > > >  		}
> > > > >  		free_page((unsigned long)kbuf);
> > > > >  	}
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at  http://www.tux.org/lkml/
> > > > 
> > 

[-- Attachment #2: tmap.c --]
[-- Type: text/x-csrc, Size: 1385 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/poll.h>
#include <sys/mman.h>

int page_size;
#define PAGE_SIZE page_size
#define PAGE_MASK (~(PAGE_SIZE-1))

void get_var (unsigned long addr) {
	off_t ptr = addr & ~(PAGE_MASK);
	off_t offset = addr & PAGE_MASK;
	int i = 0;
	char *map;
	static int kfd = -1;

	kfd = open("/dev/kmem",O_RDONLY);
	if (kfd < 0) {
		perror("open");
		exit(0);
	}

	map = mmap(NULL,PAGE_SIZE,PROT_READ,MAP_SHARED,kfd,offset);
	if (map == MAP_FAILED) {
		perror("mmap");
		exit(-1);
	}
	printf("%s\n",map+ptr);

	return;
}

int main(int argc, char **argv)
{
	FILE *fp;
	char addr_str[11]="0x";
	char var[51];
	unsigned long addr;
	char ch;
	int r;

	if (argc != 2) {
		fprintf(stderr,"usage: %s System.map\n",argv[0]);
		exit(-1);
	}

	if ((fp = fopen(argv[1],"r")) == NULL) {
		perror("fopen");
		exit(-1);
	}

	do {
		/* ffffffff81723880 D modprobe_path */
		r = fscanf(fp,"%16s %c %50s\n",&addr_str[2],&ch,var);
		if (strcmp(var,"modprobe_path")==0)
			break;
	} while(r > 0);
	if (r < 0) {
		printf("could not find modprobe_path\n");
		exit(-1);
	}
	page_size = getpagesize();
	addr = strtoul(addr_str,NULL,16);
	printf("found modprobe_path at (%s) %08lx\n",addr_str,addr);
	get_var(addr);

	return 0;
}


  reply	other threads:[~2009-09-15  2:57 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-14  3:29 [PATCH] devmem: handle partial kmem write/read Wu Fengguang
2009-09-14  4:34 ` Wu Fengguang
2009-09-15  0:24   ` KAMEZAWA Hiroyuki
2009-09-15  1:52     ` KAMEZAWA Hiroyuki
2009-09-15  2:05       ` Wu Fengguang
2009-09-15  2:02     ` Wu Fengguang
2009-09-15  2:31       ` KAMEZAWA Hiroyuki
2009-09-15  2:57         ` Wu Fengguang [this message]
2009-09-15  7:58 ` Question: how to handle too big f_pos " KAMEZAWA Hiroyuki
2009-09-15  8:11   ` Wu Fengguang
2009-09-15  9:52   ` Hugh Dickins
2009-09-16  5:29   ` [RFC][PATCH][bugfix] more checks for negative f_pos handling (Was Re: Question: how to handle too big f_pos KAMEZAWA Hiroyuki
2009-09-16  8:20     ` Américo Wang
2009-09-16  8:44       ` KAMEZAWA Hiroyuki
2009-09-16  9:13         ` Américo Wang
2009-09-16 12:06           ` KAMEZAWA Hiroyuki
2009-09-17  3:06             ` Américo Wang
2009-09-17  5:53     ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v2 KAMEZAWA Hiroyuki
2009-09-17  6:07       ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v3 KAMEZAWA Hiroyuki
2009-09-17  6:21         ` Wu Fengguang
2009-09-17  6:31           ` KAMEZAWA Hiroyuki
2009-09-17  6:53             ` Wu Fengguang
2009-09-17  6:51           ` [RFC][PATCH][bugfix] more checks for negative f_pos handling v4 KAMEZAWA Hiroyuki
2009-09-17  7:14             ` Wu Fengguang
2009-09-17  7:23               ` KAMEZAWA Hiroyuki
2009-09-17  7:30                 ` Wu Fengguang
2009-09-17  9:42                 ` Wu Fengguang
2009-09-17 10:54                   ` KAMEZAWA Hiroyuki
2009-09-17 10:58                     ` KAMEZAWA Hiroyuki
2009-09-17 12:40                     ` Wu Fengguang
2009-09-18  0:02                       ` KAMEZAWA Hiroyuki
2009-09-18  2:25                         ` Américo Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090915025710.GA23881@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@qumranet.com \
    --cc=broonie@opensource.wolfsonmicro.com \
    --cc=gregkh@suse.de \
    --cc=johannes@sipsolutions.net \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=scgtrp@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox