All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: haveblue@us.ibm.com, akpm@osdl.org, linux-mm@kvack.org,
	piggin@cyberone.com.au, arjanv@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] memory defragmentation to satisfy high order allocations
Date: Sat, 2 Oct 2004 15:33:49 -0300	[thread overview]
Message-ID: <20041002183349.GA7986@logos.cnet> (raw)
In-Reply-To: <20041002.183015.41630389.taka@valinux.co.jp>

On Sat, Oct 02, 2004 at 06:30:15PM +0900, Hirokazu Takahashi wrote:
> Hello, Marcelo.
> 
> Generic memory defragmentation will be very nice for me to implement
> hugetlbpage migration, as allocating a new hugetlbpage is a hard job.
> 
> > For the "defragmentation" operation we want to do an "easy" try - ie if we
> > can't remap giveup.
> > 
> > I feel we should try to "untie" the code which checks for remapping availability / 
> > does the remapping from the page migration - so to be able to share the most 
> > code between it and other users of the same functionality. 
> 
> I think it's possible to introduce non-wait mode to the migration code,
> as you may expect. Shall I implement it?
> 
> > Curiosity: How did you guys test the migration operation? Several threads on 
> > several processors operating on the memory, etc? 
> 
> I always test it with the zone hotplug emulation patch, which Mr.Iwamoto
> has made. I usually run following jobs concurrently while zones are added
> and removed repeatedly on a SMP machine.
>       - making linux kernel
>       - copying file trees.
>       - overwriting file trees.
>       - removing file trees
>       - some pages are swapped out automatically:)
> 
> And Mr.Iwamoto has some small programs to check any kind of page
> can be migrated. The programs repeat one of following actions:
>     - read/write files .
>     - use MAP_SHARED and MAP_PRIVATE mmap()'s and read/write there.
>     - use Direct I/O.
>     - use AIO.
>     - fork to have COW pages.
>     - use shmem.
>     - use sendfile.
> 
> > Cool. I'll take a closer look at the relevant parts of memory hotplug patches 
> > this weekend, hopefully. See if I can help with testing of these patches too.
> 
> Any comments are very welcome.


I have a few comments about the code:

1) 
I'm pretty sure you should transfer the radix tree tag at radix_tree_replace().
If for example you transfer a dirty tagged page to another zone, an mpage_writepages()
will miss it (because it uses pagevec_lookup_tag(PAGECACHE_DIRTY_TAG)). 

Should be quite trivial to do (save tags before deleting and set to new entry, 
all in radix_tree_replace).

My implementation also contained the same bug.

2) 
At migrate_onepage you add anonymous pages which aren't swap allocated
to the swap cache
+       /*
+        * Put the page in a radix tree if it isn't in the tree yet.
+        */
+#ifdef CONFIG_SWAP
+       if (PageAnon(page) && !PageSwapCache(page))
+               if (!add_to_swap(page, GFP_KERNEL)) {
+                       unlock_page(page);
+                       return ERR_PTR(-ENOSPC);
+               }
+#endif /* CONFIG_SWAP */

Why's that? You can copy anonymous pages without adding them to swap (thats
what the patch I posted does).

3) At migrate_page_common you assume additional page references 
(page_migratable returning -EAGAIN) means the code should try to writeout 
the page.

Is that assumption always valid?

In theory there is no need to writeout pages when migrating them to 
other zones - they will be copied and the dirty information retained (either
in the PageDirty bit or radix tree tag). 

I just noticed you do that on further patches (migrate_page_buffer), but AFAICS 
the writeout remains. Why arent you using migrate_page_buffer yet?

I think the final aim should be to remove the need for "pageout()" 
completly.

4) 
About implementing a nonblocking version of it. The easier way, it
seems to me, is to pass a "block" argument to generic_migrate_page() and
use that.

Questions: are there any documents on the memory hotplug userspace tools? 
Where can I find them?

Are Iwamoto's test programs available?

In general the code looks nice to me! I'll jump in and help with 
testing.


WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: haveblue@us.ibm.com, akpm@osdl.org, linux-mm@kvack.org,
	piggin@cyberone.com.au, arjanv@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] memory defragmentation to satisfy high order allocations
Date: Sat, 2 Oct 2004 15:33:49 -0300	[thread overview]
Message-ID: <20041002183349.GA7986@logos.cnet> (raw)
In-Reply-To: <20041002.183015.41630389.taka@valinux.co.jp>

On Sat, Oct 02, 2004 at 06:30:15PM +0900, Hirokazu Takahashi wrote:
> Hello, Marcelo.
> 
> Generic memory defragmentation will be very nice for me to implement
> hugetlbpage migration, as allocating a new hugetlbpage is a hard job.
> 
> > For the "defragmentation" operation we want to do an "easy" try - ie if we
> > can't remap giveup.
> > 
> > I feel we should try to "untie" the code which checks for remapping availability / 
> > does the remapping from the page migration - so to be able to share the most 
> > code between it and other users of the same functionality. 
> 
> I think it's possible to introduce non-wait mode to the migration code,
> as you may expect. Shall I implement it?
> 
> > Curiosity: How did you guys test the migration operation? Several threads on 
> > several processors operating on the memory, etc? 
> 
> I always test it with the zone hotplug emulation patch, which Mr.Iwamoto
> has made. I usually run following jobs concurrently while zones are added
> and removed repeatedly on a SMP machine.
>       - making linux kernel
>       - copying file trees.
>       - overwriting file trees.
>       - removing file trees
>       - some pages are swapped out automatically:)
> 
> And Mr.Iwamoto has some small programs to check any kind of page
> can be migrated. The programs repeat one of following actions:
>     - read/write files .
>     - use MAP_SHARED and MAP_PRIVATE mmap()'s and read/write there.
>     - use Direct I/O.
>     - use AIO.
>     - fork to have COW pages.
>     - use shmem.
>     - use sendfile.
> 
> > Cool. I'll take a closer look at the relevant parts of memory hotplug patches 
> > this weekend, hopefully. See if I can help with testing of these patches too.
> 
> Any comments are very welcome.


I have a few comments about the code:

1) 
I'm pretty sure you should transfer the radix tree tag at radix_tree_replace().
If for example you transfer a dirty tagged page to another zone, an mpage_writepages()
will miss it (because it uses pagevec_lookup_tag(PAGECACHE_DIRTY_TAG)). 

Should be quite trivial to do (save tags before deleting and set to new entry, 
all in radix_tree_replace).

My implementation also contained the same bug.

2) 
At migrate_onepage you add anonymous pages which aren't swap allocated
to the swap cache
+       /*
+        * Put the page in a radix tree if it isn't in the tree yet.
+        */
+#ifdef CONFIG_SWAP
+       if (PageAnon(page) && !PageSwapCache(page))
+               if (!add_to_swap(page, GFP_KERNEL)) {
+                       unlock_page(page);
+                       return ERR_PTR(-ENOSPC);
+               }
+#endif /* CONFIG_SWAP */

Why's that? You can copy anonymous pages without adding them to swap (thats
what the patch I posted does).

3) At migrate_page_common you assume additional page references 
(page_migratable returning -EAGAIN) means the code should try to writeout 
the page.

Is that assumption always valid?

In theory there is no need to writeout pages when migrating them to 
other zones - they will be copied and the dirty information retained (either
in the PageDirty bit or radix tree tag). 

I just noticed you do that on further patches (migrate_page_buffer), but AFAICS 
the writeout remains. Why arent you using migrate_page_buffer yet?

I think the final aim should be to remove the need for "pageout()" 
completly.

4) 
About implementing a nonblocking version of it. The easier way, it
seems to me, is to pass a "block" argument to generic_migrate_page() and
use that.

Questions: are there any documents on the memory hotplug userspace tools? 
Where can I find them?

Are Iwamoto's test programs available?

In general the code looks nice to me! I'll jump in and help with 
testing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-10-02 20:02 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-01 18:22 [RFC] memory defragmentation to satisfy high order allocations Marcelo Tosatti
2004-10-01 18:22 ` Marcelo Tosatti
2004-10-01 20:11 ` Andrew Morton
2004-10-01 20:11   ` Andrew Morton
2004-10-01 19:04   ` Marcelo Tosatti
2004-10-01 19:04     ` Marcelo Tosatti
2004-10-01 21:00     ` Andrew Morton
2004-10-01 21:00       ` Andrew Morton
2004-10-01 21:57     ` Dave Hansen
2004-10-01 21:57       ` Dave Hansen
2004-10-01 23:42       ` Marcelo Tosatti
2004-10-01 23:42         ` Marcelo Tosatti
2004-10-02  1:17         ` Andrew Morton
2004-10-02  1:17           ` Andrew Morton
2004-10-02  9:30         ` Hirokazu Takahashi
2004-10-02  9:30           ` Hirokazu Takahashi
2004-10-02 18:33           ` Marcelo Tosatti [this message]
2004-10-02 18:33             ` Marcelo Tosatti
2004-10-03  4:13             ` Hirokazu Takahashi
2004-10-03  4:13               ` Hirokazu Takahashi
2004-10-03 14:07               ` Marcelo Tosatti
2004-10-03 14:07                 ` Marcelo Tosatti
2004-10-03 18:35                 ` Hirokazu Takahashi
2004-10-03 18:35                   ` Hirokazu Takahashi
2004-10-03 19:21                   ` Trond Myklebust
2004-10-03 19:21                     ` Trond Myklebust
2004-10-03 20:03                     ` Hirokazu Takahashi
2004-10-03 20:03                       ` Hirokazu Takahashi
2004-10-03 20:44                       ` Trond Myklebust
2004-10-03 20:44                         ` Trond Myklebust
2004-10-04 13:02                         ` Hirokazu Takahashi
2004-10-04 13:02                           ` Hirokazu Takahashi
2004-10-04 17:24                   ` Marcelo Tosatti
2004-10-04 17:24                     ` Marcelo Tosatti
2004-10-05  2:53                     ` Hirokazu Takahashi
2004-10-05  2:53                       ` Hirokazu Takahashi
2004-10-07 12:06                       ` Marcelo Tosatti
2004-10-08  7:00                         ` Hirokazu Takahashi
2004-10-08 10:00                           ` Marcelo Tosatti
2004-10-08 12:23                             ` Hirokazu Takahashi
2004-10-08 12:41                               ` Marcelo Tosatti
2004-10-08 16:52                                 ` Hirokazu Takahashi
2004-10-08 15:36                                   ` Marcelo Tosatti
2004-10-12 10:56                                     ` IWAMOTO Toshihiro
2004-10-12 10:35                                       ` Marcelo Tosatti
2004-10-12 17:55                                         ` Hirokazu Takahashi
2004-10-12 14:26                                       ` Martin J. Bligh
2004-10-12 12:17                                         ` Marcelo Tosatti
2004-10-12 15:01                                         ` Dave Hansen
2004-10-04  3:24                 ` IWAMOTO Toshihiro
2004-10-04  3:24                   ` IWAMOTO Toshihiro
2004-10-04  2:22               ` Dave Hansen
2004-10-04  2:22                 ` Dave Hansen
2004-10-05 16:46               ` [PATCH] mhp: transfer dirty tag at radix_tree_replace Marcelo Tosatti
2004-10-05 18:35                 ` Dave Hansen
2004-10-06  7:39                 ` Hirokazu Takahashi
2004-10-08  8:15                   ` Hirokazu Takahashi
2004-10-08 20:36                     ` Marcelo Tosatti
2004-10-04  4:09             ` [RFC] memory defragmentation to satisfy high order allocations IWAMOTO Toshihiro
2004-10-04  4:09               ` IWAMOTO Toshihiro
2004-10-04 17:29               ` Marcelo Tosatti
2004-10-04 17:29                 ` Marcelo Tosatti
2004-10-02  2:30 ` Nick Piggin
2004-10-02  2:30   ` Nick Piggin
2004-10-02  3:08   ` Marcelo Tosatti
2004-10-02  3:08     ` Marcelo Tosatti
2004-10-04  8:15     ` Nick Piggin
2004-10-04  8:15       ` Nick Piggin
2004-10-02  2:41 ` Nick Piggin
2004-10-02  2:41   ` Nick Piggin
2004-10-02  3:50   ` Hirokazu Takahashi
2004-10-02  3:50     ` Hirokazu Takahashi
2004-10-02 16:06   ` Marcelo Tosatti
2004-10-02 16:06     ` Marcelo Tosatti
2004-10-04  2:38 ` Hiroyuki KAMEZAWA
2004-10-04  2:38   ` Hiroyuki KAMEZAWA
2004-10-04 17:32   ` Marcelo Tosatti
2004-10-04 17:32     ` Marcelo Tosatti
2004-10-04  6:58 ` Hiroyuki KAMEZAWA
2004-10-04  6:58   ` Hiroyuki KAMEZAWA
2004-10-07 15:58   ` memory hotplug and mem= Marcelo Tosatti
2004-10-07 18:36     ` Dave Hansen
2004-10-07 17:01       ` Marcelo Tosatti
2004-10-07 19:10         ` Dave Hansen
2004-10-07 20:25         ` Dave Hansen
  -- strict thread matches above, loose matches on Subject: below --
2004-10-11 16:40 [RFC] memory defragmentation to satisfy high order allocations linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041002183349.GA7986@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=arjanv@redhat.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=piggin@cyberone.com.au \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.