From: Ray Bryant <raybry@engr.sgi.com>
To: Andi Kleen <ak@muc.de>
Cc: Christoph Hellwig <hch@engr.sgi.com>,
linux-mm <linux-mm@kvack.org>,
lhms <lhms-devel@lists.sourceforge.net>
Subject: Re: manual page migration and madvise/mbind
Date: Tue, 17 May 2005 23:02:20 -0500 [thread overview]
Message-ID: <428ABE4C.1020702@engr.sgi.com> (raw)
In-Reply-To: <20050518012627.GA33395@muc.de>
Andi Kleen wrote:
> Sorry for late answer.
>
> On Tue, May 17, 2005 at 11:44:31AM -0500, Ray Bryant wrote:
>
>>(Remember that the migrate_pages() system call takes a pid, a count,
>>and a list of old and new node so that this process is allowed to
>>migrate that process over there, which is what the batch manager needs
>>to do. Running madvise() in the current process's address space doesn't
>>help much unless it marks something deeper in the address space hierarchy
>>than a vma.)
>>
>>This is something quite a bit different than what madvise() or mbind()
>>do today. (They just manipulate vma's AFAIK.)
>
>
> Nah, mbind manipulates backing objects too, in particular for shared
> memory. It is not right now implemented for files, but that was planned
> and Steve L's patches went into that direction with some limitations.
>
That's what I need then.
> And yes, the state would need to be stored in the address_space, which
> is shared. In my version it was in private backing store objects.
> Check Steve's patch.
>
I'm in the process of building Steve's stuff for Altix.
> The main problem I see with the "hack ld.so" approach is that it
> doesn't work for non program files. So if you really want to handle
> them you would need a daemon that sets the policies once a file
> is mapped or hack all the programs to set the policies. I don't
> see that as being practicable. Ok you could always add a "sticky" process
> policy that actually allocates mempolicies for newly read files
> and so marks them using your new flags. But that would seem
> somewhat ugly to me and is probably incompatible with your batch manager
> anyways. The only sane way to handle arbitary files like this
> would be the xattr.
>
> If you ignore data files then it would be ok to keep it to
> ELF loaders and ld.so I guess.
>
> -Andi
>
This turns out to be problematic. For example, if I look at /proc/pid/maps
for /bin/tcsh, the following files are mapped in, but they are not elf files
at all, they are just data files (at least according to the "file" command):
/usr/lib/gconv/gconv-modules.cache
/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION
/usr/lib/locale/en_US.utf8/LC_MEASUREMENT
/usr/lib/locale/en_US.utf8/LC_TELEPHONE
/usr/lib/locale/en_US.utf8/LC_ADDRESS
/usr/lib/locale/en_US.utf8/LC_NAME
/usr/lib/locale/en_US.utf8/LC_PAPER
/usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
/usr/lib/locale/en_US.utf8/LC_MONETARY
/usr/lib/locale/en_US.utf8/LC_COLLATE
/usr/lib/locale/en_US.utf8/LC_TIME
/usr/lib/locale/en_US.utf8/LC_NUMERIC
/usr/lib/locale/en_US.utf8/LC_CTYPE
Admittedly, most of this is National Language stuff, so not all programs
map this in, but nonetheless, it begs the question as to how to mark such
stuff as not-migratable, or at least migrate-non-shared, since that is how
they are marked now (we typically mark these files with the extended attribute
value "libr".) So we want to migrate the anonymous pages found in those
vma's, but not the shared pages.
Also the files are all small (a couple of pages each), so migrating them all
the time would not be such a problem, but it seems untidy to do so.
What could be done (beware of ugly hack following) would be in the migration
application (e. g. the batch manager), to look at /proc/pid/maps for each
process to be migrated and examine the file names specified there. The
batch manager could then do whatever algorithm it liked (it is just user code)
to determine whether or not, say, /usr/lib/locale/en_US.utf8/LC_TIME is
migratable or not. It could then use a modified mbind() system call to reach
into the kernel and set a bit (or 2) in the address_space object. (It would
have to map the file in first, but that is no big deal.) Those bits could
be used to control the migration the way we do it now with extended
attributes. There is a small performance hit here (mapping all of those files
in just before migration time and doing the mbind() system calls) but it is
probably doable and will be trivial in comparison to the actual time required
to do the migration. (I suppose the batch manager could keep a cache of files
it has mapped in and marked and not have to do this every time a migration
call is made.)
(Andi -- I know you dislike bringing /proc/pid/maps back into this because of
the raciness of reading that file, but here we are reading it before the
migration operation itself starts. And reading the file is a performance
assist, not required for correctness of the migration operation.)
This all can be done in preparation for using Steve Longerbeam's patches for
file numapolicy support and his patches for ld.so etc, which I believe I can
use, with only slight modifications, to support migration policies based on
information in elf program headers, at which point the ugly hack above can
ignore all elf files. The ugly hack, however, lives on forever for data
files, so I am not sure how much simplicity we have bought ourselves through
this whole process.
The ugly hack in some sense replaces the logic in ld.so until we can get a
modified version of ld.so into the glibc trees. The kernel interface
remains the same regardless of whether we use a modified ld.so or not.
(My personal preference is still just to set the user.migration extended
attribute on such data files; it just seems so much simpler than all of
the other approaches that have been suggested.)
Christoph, would the ugly hack above be acceptable or is that worse than the
original approach?
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@sgi.com raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2005-05-18 4:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-17 16:44 manual page migration and madvise/mbind Ray Bryant
2005-05-17 16:55 ` Ray Bryant
2005-05-18 1:26 ` Andi Kleen
2005-05-18 4:02 ` Ray Bryant [this message]
2005-05-28 8:49 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=428ABE4C.1020702@engr.sgi.com \
--to=raybry@engr.sgi.com \
--cc=ak@muc.de \
--cc=hch@engr.sgi.com \
--cc=lhms-devel@lists.sourceforge.net \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox