Re: writepage return value check in vmscan.c

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: chrisl@vmware.com
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andrew Morton <akpm@digeo.com>,
	linux-kernel@vger.kernel.org, chrisl@gnuchina.org,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: writepage return value check in vmscan.c
Date: Thu, 24 Oct 2002 12:15:32 -0700	[thread overview]
Message-ID: <20021024191531.GD1398@vmware.com> (raw)
In-Reply-To: <20021024183327.GS3354@dualathlon.random>

[-- Attachment #1: Type: text/plain, Size: 3848 bytes --]

On Thu, Oct 24, 2002 at 08:33:27PM +0200, Andrea Arcangeli wrote:
> On Thu, Oct 24, 2002 at 10:57:18AM -0700, chrisl@vmware.com wrote:
> > 			if ((gfp_mask & __GFP_FS) && writepage) {
> > +                               unsigned long flags = page->flags;
> > 
> > 				ClearPageDirty(page);
> > 				SetPageLaunder(page);
> > 				page_cache_get(page);
> > 				spin_unlock(&pagemap_lru_lock);
> > 
> > -				writepage(page);
> > +				if (writepage(page))
> > +					page->flags = flags;
> > 
> > 				page_cache_release(page);
> > 
> > 				spin_lock(&pagemap_lru_lock);
> > 				continue;
> >                         }
> 
> side note, you should use atomic bitflag operations here or you risk to
> lose a bit set by another cpu between the read and the write. you

Thanks. I am just shooting in dark.

> basically meant SetPageDirty() if writepage fails. That is supposed to
> happen in the lowlevel layer (like in fail_writepage) but the problem
> here is that this isn't ramfs, and block_write_full_page could left
> locked in ram lots of pages if it would disallow these pages to be
> discared from the vm.

Exactly.

> 
> > > A few fixes have been discussed.  One way would be to allocate
> > > the space for the page when it is first faulted into reality and
> > > deliver SIGBUS if backing store for it could not be allocated.
> > 
> > I am not sure how the user program handle that signal...
> > 
> > >  
> > > Ayup.  MAP_SHARED is a crock.  If you want to write to a file, use write().
> > > 
> > > View MAP_SHARED as a tool by which separate processes can attach
> > > to some shared memory which is identified by the filesystem namespace.
> > > It's not a very good way of performing I/O.
> > 
> > That is exactly the case for vmware ram file. VMware only use it to share
> > memory. Those are the virtual machine's memory. We don't want to write
> > it back to disk and we don't care what is left on the file system because
> > when vmware exit, we will throw the guest ram data away just like a real
> > machine power off ram will lost. We are not talking about machine using
> > flash ram :-). 
> > 
> > It is kswapd try to flush the data and it should take response to handle
> > the error. If it fail, one thing it should do is keep the page dirty
> > if write back fail. At least not corrupt memory like that.
> > 
> > If we can deliver the error to user program that would be a plus.
> > But this need to be fix frist.
> 
> as said this cannot be fixed easily in kernel, or it would be trivial to
> lockup a machine by filling the fs changing the i_size of a file and by
> marking all ram in the machine dirty in the hole, the vm must be allowed

Yes, but even now days it will able to lockup machine by doing that.

Try the test bigmm program I attach to this mail. It will simulate vmware's
memory mapping. It can easily lockup the machine even though there is
enough disk space.

See the comment at the source for parameter. basically, if you want
3 virtual machine, each have 2 process, using 1 G ram each you can do:

bigmm -i 3 -t 2 -c 1024

I run it on two 4G and 8G smp machine. Both can dead lock if I mmap
enough memory.

I haven't try it on the latest kernel yet. But last time I try it,
it works every time. I have to reset the machine. I mean ram file
create on normal file system.

But if I create it on /dev/shm, the kernel can correctly kill
some of the process and free the memory.

Prepare to reset the machine if you try that, you have been warned :-)


> to discard those pages and invaliding those posted writes. At least
> until a true solution will be available you should change vmware to
> preallocate the file, then it will work fine because you will catch the
> ENOSPC error during the preallocation. If you work on shmfs that will be
> very quick indeed.

Yes, shmfs seems to be the only choice so far.

Chris

[-- Attachment #2: bigmm.c --]
[-- Type: text/plain, Size: 2305 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int pagesize = 4096;
int instance = 1;
int thread = 1;
int blocksize = 1024*1024; // 1M
int blocks = 1024;	// total 1 G
char *filename="myram";
char *memblock[1024*256];

int memmap(char *argv[], int ppid)
{
	int i,j,fd,err=0,pid,k;
	char file[256];
	sprintf(argv[0],"a%d",ppid);
	sprintf(file,"%s.%d",filename,ppid);
	fd = open(file,O_CREAT|O_TRUNC|O_RDWR,00644);
	if (fd<0) {
		perror("open");
		err = fd;
		goto exit;
	}
	err = unlink(file);
	if (err<0) {
		perror("unlink");
		goto exit_close;
	}
	err = ftruncate(fd,blocksize*blocks);
	if (err<0) {
		perror("ftruncate");
		goto exit_close;
	}
	/* fork more process(thread) to use same share mem file */
	for (i=1;i<thread;i++) {
		pid = fork();
		if (pid <0) {
			perror("fork");
			exit(1);
		}
		if(pid) {
			printf("%d fork  child %d\n",ppid,pid);
		}else {
			sprintf(argv[0],"a%d.%d",ppid,i);
			break;
		}
	}
	for (i=0;i<blocks;i++) {
		void * ptr;
		ptr = mmap(NULL,blocksize, PROT_READ|PROT_WRITE,
			MAP_SHARED, fd, i*blocksize);
		if (ptr==MAP_FAILED) {
			perror("mmap");
			break;
		}
		memblock[i] = ptr;
	}
	/* touch all the share memory like a guest vm */
	while (1) {
		sleep(1);
		for (i=0;i<blocks;i++)
			for (j=0;j<blocksize;j+=pagesize)
				memblock[i][j]=i&0xff;
	}
	err = 0;
exit_close:
	close(fd);
exit:
	return err;
}


int main (int argc, char *argv[])
{
	int i,pid;
	int c;
	while (1) {
		c = getopt(argc,argv,"b:c:i:t:");
		if (c<=0) break;
		switch (c) {
			case 'i':
				/* number of virtual machine */
				instance = strtol(optarg,NULL,0);
				break;
			case 't':
				/* number of process per virtual machine */
				thread = strtol(optarg,NULL,0);
				break;
			case 'c':
				/* number of memory blocks in each virtual machine */
				blocks = strtol(optarg,NULL,0);
				break;
			case 'b':
				/* size of a memory block */
				blocksize = strtol(optarg,NULL,0);
				break;
		}
	}
	printf("i:%d t:%d b:%d c:%d\n",instance,thread,blocksize,blocks);
	for (i=0;i<instance;i++) {
		pid = fork();
		if (pid <0) {
			perror("mainfork");
			exit(1);
		}
		if(pid) {
			printf("mainfork  child %d\n",pid);
		}else {
			return memmap(argv,i);
		}
	}
	return 0;
}

next prev parent reply	other threads:[~2002-10-24 19:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-24  8:25 writepage return value check in vmscan.c chrisl
2002-10-24  8:36 ` Andrew Morton
2002-10-24  9:15   ` Alan Cox
2002-10-24 11:44     ` Andrea Arcangeli
2002-10-24 16:12       ` Andrew Morton
2002-10-24 17:59     ` chrisl
2002-10-24 11:31   ` Andrea Arcangeli
2002-10-24 18:30     ` chrisl
2002-10-24 18:40       ` Andrea Arcangeli
2002-10-24 19:14         ` Rik van Riel
2002-10-24 19:25           ` Andrew Morton
2002-10-24 17:57   ` chrisl
2002-10-24 18:33     ` Andrea Arcangeli
2002-10-24 19:15       ` chrisl [this message]
2002-10-24 20:41         ` Andrea Arcangeli
2002-10-24 21:17           ` chrisl
2002-10-24 20:46         ` Andrew Morton
2002-10-24 21:23           ` chrisl
2002-10-24 21:29             ` Andrew Morton
2002-10-25 16:11               ` Paul Larson
2002-10-25 16:31                 ` Christoph Hellwig
2002-10-25 17:07                 ` Rik van Riel
2002-10-25 18:44         ` Andrew Morton
2002-10-28 19:17           ` chrisl
2002-10-28 19:53             ` Andrew Morton
2002-10-28 20:38               ` chrisl
2002-10-28 21:14               ` Andrea Arcangeli
2002-10-28  8:28         ` Christoph Rohland
2002-10-28 18:44           ` chrisl
2002-10-28 19:22             ` Andrea Arcangeli
2002-10-28 19:29               ` chrisl
2002-10-29  6:10               ` Randy.Dunlap
2002-10-29  7:08                 ` Andreas Dilger
2002-10-28 19:58       ` chrisl
2002-10-28 21:32         ` Andrea Arcangeli
2002-10-30  4:13           ` chrisl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021024191531.GD1398@vmware.com \
    --to=chrisl@vmware.com \
    --cc=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrea@suse.de \
    --cc=chrisl@gnuchina.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).