From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <418C03CD.2080501@sgi.com> Date: Fri, 05 Nov 2004 16:50:53 -0600 From: Ray Bryant MIME-Version: 1.0 Subject: manual page migration, revisited... Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Marcelo Tosatti , Hirokazu Takahashi Cc: linux-mm List-ID: Marcelo and Takahashi-san (and anyone else who would like to comment), This is a little off topic, but this is as good of thread as any to start this discussion on. Feel free to peel this off as a separate discussion thread asap if you like. We have a requirement (for a potential customer) to do the following kind of thing: (1) Suspend and swap out a running process so that the node where the process is running can be reassigned to a higher priority job. (2) Resume and swap back in those suspended jobs, restoring the original memory layout on the original nodes, or (3) Resume and swap back in those suspended jobs on a new set of nodes, with as similar topological layout as possible. (It's also possible we may want to just move the jobs directly from one set of nodes to another without swapping them out first. This is all in the context of a batch scheduler being used to run jobs on a large paralell machine. As I understand it, there are various patches floating around (including the migration code that you are working on, the memory hotplug removal code, etc) that do parts of this, but I've had a little trouble piecing together the status of those various patches and where to get them. (e. g. where do I get the latest migration cache code?). There was also a thread in early April 2004 on this list about manual page migration, I think, but I don't know where that went, if anywhere (that would satisfy requirement 3.) So the question I am asking, I guess, is where would you suggest we start on an implementation for something like the above? Which existing bits and peices can I pick up, if anything, from your migration cache work and or the memory hotplug work, do you think? Or, which patches should I be looking at for ideas? I'm not asking you to >>do<< this work, of course, I'm just trying to get a start on the above and not unecessarily duplicate anyone's previous work in this area. Any pointers or advice would be greatly appeciated. -- Best Regards, Ray ----------------------------------------------- Ray Bryant 512-453-9679 (work) 512-507-7807 (cell) raybry@sgi.com raybry@austin.rr.com The box said: "Requires Windows 98 or better", so I installed Linux. ----------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: manual page migration, revisited... From: Nigel Cunningham Reply-To: ncunningham@linuxmail.org In-Reply-To: <418C03CD.2080501@sgi.com> References: <418C03CD.2080501@sgi.com> Content-Type: text/plain Message-Id: <1099695742.4507.114.camel@desktop.cunninghams> Mime-Version: 1.0 Date: Sat, 06 Nov 2004 10:02:22 +1100 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ray Bryant Cc: Marcelo Tosatti , Hirokazu Takahashi , Linux Memory Management List-ID: Hi. On Sat, 2004-11-06 at 09:50, Ray Bryant wrote: > Marcelo and Takahashi-san (and anyone else who would like to comment), > > This is a little off topic, but this is as good of thread as any to start this > discussion on. Feel free to peel this off as a separate discussion thread > asap if you like. > > We have a requirement (for a potential customer) to do the following kind of > thing: > > (1) Suspend and swap out a running process so that the node where the process > is running can be reassigned to a higher priority job. > > (2) Resume and swap back in those suspended jobs, restoring the original > memory layout on the original nodes, or > > (3) Resume and swap back in those suspended jobs on a new set of nodes, with > as similar topological layout as possible. (It's also possible we may > want to just move the jobs directly from one set of nodes to another > without swapping them out first. You may not even need any kernel patches to accomplish this. Bernard Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I haven't tried it myself, but it sounds like it might be at least part of what you're after. Regards, Nigel -- Nigel Cunningham Pastoral Worker Christian Reformed Church of Tuggeranong PO Box 1004, Tuggeranong, ACT 2901 You see, at just the right time, when we were still powerless, Christ died for the ungodly. -- Romans 5:6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sat, 6 Nov 2004 15:48:58 -0200 From: Marcelo Tosatti Subject: Re: manual page migration, revisited... Message-ID: <20041106174857.GA23420@logos.cnet> References: <418C03CD.2080501@sgi.com> <1099695742.4507.114.camel@desktop.cunninghams> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1099695742.4507.114.camel@desktop.cunninghams> Sender: owner-linux-mm@kvack.org Return-Path: To: Nigel Cunningham Cc: Ray Bryant , Hirokazu Takahashi , Linux Memory Management List-ID: On Sat, Nov 06, 2004 at 10:02:22AM +1100, Nigel Cunningham wrote: > Hi. > > On Sat, 2004-11-06 at 09:50, Ray Bryant wrote: > > Marcelo and Takahashi-san (and anyone else who would like to comment), > > > > This is a little off topic, but this is as good of thread as any to start this > > discussion on. Feel free to peel this off as a separate discussion thread > > asap if you like. > > > > We have a requirement (for a potential customer) to do the following kind of > > thing: > > > > (1) Suspend and swap out a running process so that the node where the process > > is running can be reassigned to a higher priority job. > > > > (2) Resume and swap back in those suspended jobs, restoring the original > > memory layout on the original nodes, or > > > > (3) Resume and swap back in those suspended jobs on a new set of nodes, with > > as similar topological layout as possible. (It's also possible we may > > want to just move the jobs directly from one set of nodes to another > > without swapping them out first. > > You may not even need any kernel patches to accomplish this. Bernard > Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I > haven't tried it myself, but it sounds like it might be at least part of > what you're after. Hi Ray, Nigel, And the swsusp code itself, isnt it what its doing? Stopping all processes, saving their memory to disk, and resuming later on. You should just need an API to stop a specific process? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: manual page migration, revisited... From: Nigel Cunningham Reply-To: ncunningham@linuxmail.org In-Reply-To: <20041106174857.GA23420@logos.cnet> References: <418C03CD.2080501@sgi.com> <1099695742.4507.114.camel@desktop.cunninghams> <20041106174857.GA23420@logos.cnet> Content-Type: text/plain Message-Id: <1099796318.3811.9.camel@desktop.cunninghams> Mime-Version: 1.0 Date: Sun, 07 Nov 2004 13:58:38 +1100 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Marcelo Tosatti Cc: Ray Bryant , Hirokazu Takahashi , Linux Memory Management List-ID: Hi. On Sun, 2004-11-07 at 04:48, Marcelo Tosatti wrote: > On Sat, Nov 06, 2004 at 10:02:22AM +1100, Nigel Cunningham wrote: > > On Sat, 2004-11-06 at 09:50, Ray Bryant wrote: > > > Marcelo and Takahashi-san (and anyone else who would like to comment), > > > > > > This is a little off topic, but this is as good of thread as any to start this > > > discussion on. Feel free to peel this off as a separate discussion thread > > > asap if you like. > > > > > > We have a requirement (for a potential customer) to do the following kind of > > > thing: > > > > > > (1) Suspend and swap out a running process so that the node where the process > > > is running can be reassigned to a higher priority job. > > > > > > (2) Resume and swap back in those suspended jobs, restoring the original > > > memory layout on the original nodes, or > > > > > > (3) Resume and swap back in those suspended jobs on a new set of nodes, with > > > as similar topological layout as possible. (It's also possible we may > > > want to just move the jobs directly from one set of nodes to another > > > without swapping them out first. > > > > You may not even need any kernel patches to accomplish this. Bernard > > Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I > > haven't tried it myself, but it sounds like it might be at least part of > > what you're after. > > Hi Ray, Nigel, > > And the swsusp code itself, isnt it what its doing? Stopping all processes, > saving their memory to disk, and resuming later on. Software suspend does the whole machine; I was understanding, perhaps wrongly, that Ray only wants to move particular processes. > You should just need an API to stop a specific process? (And save it's state). Regards, Nigel -- Nigel Cunningham Pastoral Worker Christian Reformed Church of Tuggeranong PO Box 1004, Tuggeranong, ACT 2901 You see, at just the right time, when we were still powerless, Christ died for the ungodly. -- Romans 5:6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <418DAB45.7040907@sgi.com> Date: Sat, 06 Nov 2004 22:57:41 -0600 From: Ray Bryant MIME-Version: 1.0 Subject: Re: manual page migration, revisited... References: <418C03CD.2080501@sgi.com> <1099695742.4507.114.camel@desktop.cunninghams> In-Reply-To: <1099695742.4507.114.camel@desktop.cunninghams> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: ncunningham@linuxmail.org Cc: Marcelo Tosatti , Hirokazu Takahashi , Linux Memory Management List-ID: Nigel Cunningham wrote: > Hi. > > On Sat, 2004-11-06 at 09:50, Ray Bryant wrote: > >>Marcelo and Takahashi-san (and anyone else who would like to comment), >> >>This is a little off topic, but this is as good of thread as any to start this >>discussion on. Feel free to peel this off as a separate discussion thread >>asap if you like. >> >>We have a requirement (for a potential customer) to do the following kind of >>thing: >> >>(1) Suspend and swap out a running process so that the node where the process >> is running can be reassigned to a higher priority job. >> >>(2) Resume and swap back in those suspended jobs, restoring the original >> memory layout on the original nodes, or >> >>(3) Resume and swap back in those suspended jobs on a new set of nodes, with >> as similar topological layout as possible. (It's also possible we may >> want to just move the jobs directly from one set of nodes to another >> without swapping them out first. > > > You may not even need any kernel patches to accomplish this. Bernard > Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I > haven't tried it myself, but it sounds like it might be at least part of > what you're after. > > Regards, > > Nigel Nigel, I think that having the resumed processes show up with a different pid than they had before is show-stopper. In a multiprocess parallel program, we have no idea whether the program itself has saved way pid's and is using them to send signals or whatnot. So I don't think there is a user space-only solution that will solve this problem for us, but it an interesting alternative to the kernel-only solutions I've been contemplating. There is probably some intermediate ground there which holds the real solution. -- Best Regards, Ray ----------------------------------------------- Ray Bryant 512-453-9679 (work) 512-507-7807 (cell) raybry@sgi.com raybry@austin.rr.com The box said: "Requires Windows 98 or better", so I installed Linux. ----------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <418DADDC.1030601@sgi.com> Date: Sat, 06 Nov 2004 23:08:44 -0600 From: Ray Bryant MIME-Version: 1.0 Subject: Re: manual page migration, revisited... References: <418C03CD.2080501@sgi.com> <1099695742.4507.114.camel@desktop.cunninghams> <20041106174857.GA23420@logos.cnet> In-Reply-To: <20041106174857.GA23420@logos.cnet> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Marcelo Tosatti Cc: Nigel Cunningham , Hirokazu Takahashi , Linux Memory Management List-ID: Marcelo Tosatti wrote: >>You may not even need any kernel patches to accomplish this. Bernard >>Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I >>haven't tried it myself, but it sounds like it might be at least part of >>what you're after. > > > Hi Ray, Nigel, > > And the swsusp code itself, isnt it what its doing? Stopping all processes, > saving their memory to disk, and resuming later on. > > You should just need an API to stop a specific process? > I think that sending the process a SIGSTOP is probably good enough to stop it for our purposes. But in addition to that, the reason we stopped the process is so we can start up another process on that node. Now, we can wait for memory pressure to grow to the point that kswap will force out the stopped processes's pages, but, why should the VM have to go to the effort to figure that out? Why not tell them VM somehow, that we don't want these pages in memory, and to please swap them out to make space for the new program that is running? Of course, one can argue that we don't know for sure that the new program will use enough space to force the other process out, but we worry that in that case, the new program could still end up with non-local memory allocation and that is an anathema to the HPC world where we require the good performance that local storage allocation provides. We want the new process that is run on the node to get as good performance as it would have gotten if it had started on an idle node. -- Best Regards, Ray ----------------------------------------------- Ray Bryant 512-453-9679 (work) 512-507-7807 (cell) raybry@sgi.com raybry@austin.rr.com The box said: "Requires Windows 98 or better", so I installed Linux. ----------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sun, 07 Nov 2004 20:19:32 +0900 (JST) Message-Id: <20041107.201932.104031093.taka@valinux.co.jp> Subject: Re: manual page migration, revisited... From: Hirokazu Takahashi In-Reply-To: <418DADDC.1030601@sgi.com> References: <1099695742.4507.114.camel@desktop.cunninghams> <20041106174857.GA23420@logos.cnet> <418DADDC.1030601@sgi.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: raybry@sgi.com Cc: marcelo.tosatti@cyclades.com, ncunningham@linuxmail.org, linux-mm@kvack.org List-ID: Hi, Ray, > Marcelo Tosatti wrote: > > >>You may not even need any kernel patches to accomplish this. Bernard > >>Blackham wrote some code called cryopid: http://cryopid.berlios.de/. I > >>haven't tried it myself, but it sounds like it might be at least part of > >>what you're after. > > > > > > Hi Ray, Nigel, > > > > And the swsusp code itself, isnt it what its doing? Stopping all processes, > > saving their memory to disk, and resuming later on. looks interesting. > > You should just need an API to stop a specific process? > > > > I think that sending the process a SIGSTOP is probably good enough to stop > it for our purposes. But in addition to that, the reason we stopped the > process is so we can start up another process on that node. Now, we can > wait for memory pressure to grow to the point that kswap will force out > the stopped processes's pages, but, why should the VM have to go to the > effort to figure that out? Why not tell them VM somehow, that we don't > want these pages in memory, and to please swap them out to make space for > the new program that is running? I agree stopping the target processes is enough. I thing you want to introduce whole process swapout mechanism which linux haven't implemented. I feel it isn't difficult to implement it. The following steps may work. 1. stop the target processes with SIGSTOP signal. 2. choose the pages, which depend on the processes. 3. pass them to shrink_list() with proper parameters. shrink_list() may have to be called several times to handle active pages and wait for the completion of the writeback I/Os which the previous shrink_list() has started. If you just want to make the pages migrated to another node, the migration code may help you. This is called process migration which NUMA guys may be also interested in. 1. select the target node where the processes are going to move, and move them to the target runqueue. 2. choose the pages, which depend on the processes. 3. start memory-migration against the pages. > Of course, one can argue that we don't know for sure that the new program > will use enough space to force the other process out, but we worry that in > that case, the new program could still end up with non-local memory allocation > and that is an anathema to the HPC world where we require the good performance > that local storage allocation provides. We want the new process that is > run on the node to get as good performance as it would have gotten if it had > started on an idle node. > -- Thanks, Hirokazu Takahashi. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: manual page migration, revisited... From: Nigel Cunningham Reply-To: ncunningham@linuxmail.org In-Reply-To: <418DAB45.7040907@sgi.com> References: <418C03CD.2080501@sgi.com> <1099695742.4507.114.camel@desktop.cunninghams> <418DAB45.7040907@sgi.com> Content-Type: text/plain Message-Id: <1099861888.5461.2.camel@desktop.cunninghams> Mime-Version: 1.0 Date: Mon, 08 Nov 2004 08:11:28 +1100 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ray Bryant Cc: Marcelo Tosatti , Hirokazu Takahashi , Linux Memory Management List-ID: Hi. On Sun, 2004-11-07 at 15:57, Ray Bryant wrote: > I think that having the resumed processes show up with a different pid than > they had before is show-stopper. In a multiprocess parallel program, we have > no idea whether the program itself has saved way pid's and is using them to > send signals or whatnot. So I don't think there is a user space-only solution > that will solve this problem for us, but it an interesting alternative to > the kernel-only solutions I've been contemplating. There is probably some > intermediate ground there which holds the real solution. I agree; it should be pretty trivial to add a patch to check that a given PID is not in use, allocate it and get the resumed program known by that PID. I won't offer to do it, though. I've got enough work at the moment :> Nigel -- Nigel Cunningham Pastoral Worker Christian Reformed Church of Tuggeranong PO Box 1004, Tuggeranong, ACT 2901 You see, at just the right time, when we were still powerless, Christ died for the ungodly. -- Romans 5:6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org