* [RFC][PATCH] nameing reserved pages [0/3]
@ 2005-04-20 12:02 KAMEZAWA Hiroyuki
2005-04-20 12:34 ` Arjan van de Ven
0 siblings, 1 reply; 7+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-04-20 12:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Dave Hansen, hari
Hi,
There are several types of PG_reserved pages,
(a) Memory Hole
(b) Used by Kernel
(c) Set by drivers
(d) Isorated by MCA
(e) used by perfmon
etc....
I think it's useful to distinguish many types of PG_reserved pages.
For example, Memory Hotplug can ignore (a).
2 patches [1/3][2/3] are for naming PG_reserved pages.
A type of a page is recoreded in page->private.
I'm not sure whether this is safe or not, so only reserved-at-boot pages are named, currently.
patch [3/3] is an interface to show state of memmap, /dev/memstate.
In /dev/memstate, file offset is pfn and a byte represents a state of a page.
In this patch, memory hole and Reserved pages has its value.
below is output of my box.
0xff --- Invalid page
0x00 --- Common page
0x02 --- Reserved at boot page
[root@casares char]# od -t x1 -j 0 -N 65535 /dev/memstate
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0001540 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 02
0001560 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
*
0002400 02 02 02 00 00 00 00 00 00 02 02 02 02 02 02 02
0002420 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
*
0003400 02 02 02 02 02 02 02 02 02 02 02 00 00 00 00 00
0003420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010000 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
*
0010640 02 02 02 02 02 02 02 02 02 02 02 02 02 02 00 00
0010660 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
This would be useful for Memory-Hotplug and some other stuffs.
I think more detailed types can be supported.
Thanks.
-- Kame <kamezawa.hiroyu@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 12:02 [RFC][PATCH] nameing reserved pages [0/3] KAMEZAWA Hiroyuki
@ 2005-04-20 12:34 ` Arjan van de Ven
2005-04-20 14:15 ` Kamezawa Hiroyuki
2005-04-20 17:00 ` Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: Arjan van de Ven @ 2005-04-20 12:34 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, Dave Hansen, hari
On Wed, 2005-04-20 at 21:02 +0900, KAMEZAWA Hiroyuki wrote:
> Hi,
>
> There are several types of PG_reserved pages,
> (a) Memory Hole
> (b) Used by Kernel
> (c) Set by drivers
> (d) Isorated by MCA
> (e) used by perfmon
> etc....
>
> I think it's useful to distinguish many types of PG_reserved pages.
I'm not so sure about this. at all.
> For example, Memory Hotplug can ignore (a).
Memory Hotplug can also use page_is_ram().
/dev/memstate really looks like a bad idea to me as well... I rather
have less than more /dev/*mem*
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 12:34 ` Arjan van de Ven
@ 2005-04-20 14:15 ` Kamezawa Hiroyuki
2005-04-20 14:30 ` Arjan van de Ven
2005-04-20 17:00 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: Kamezawa Hiroyuki @ 2005-04-20 14:15 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, Dave Hansen, hari
Arjan van de Ven wrote:
>>For example, Memory Hotplug can ignore (a).
>>
>>
>
>Memory Hotplug can also use page_is_ram().
>
>
Yes. we can use page_is_ram() for finding (a)memory hole.
But I'd like to catch other removable PG_reserved pages like (d)Isorated
by MCA (e)used by perfmon and
some of (b) used by kernerl and (c) Set by drivers.
What I'm thinking of is to detect whether memory is hot-removable or not
before removing actually.
>/dev/memstate really looks like a bad idea to me as well... I rather
>have less than more /dev/*mem*
>
>
For showing page usage and its "location", I've thought of other
interface, sysfs, procfs...
But I have no idea.
Physical memory area has vast space and I want to use lseek() or
ioctl().( I don't like ioctl())
Do you have any recommendation ?
-- Kame
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 14:15 ` Kamezawa Hiroyuki
@ 2005-04-20 14:30 ` Arjan van de Ven
2005-04-20 14:58 ` Kamezawa Hiroyuki
2005-04-20 17:18 ` Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: Arjan van de Ven @ 2005-04-20 14:30 UTC (permalink / raw)
To: Kamezawa Hiroyuki; +Cc: linux-kernel, Dave Hansen, hari
On Wed, 2005-04-20 at 23:15 +0900, Kamezawa Hiroyuki wrote:
> Arjan van de Ven wrote:
>
> >>For example, Memory Hotplug can ignore (a).
> >>
> >>
> >
> >Memory Hotplug can also use page_is_ram().
> >
> >
> Yes. we can use page_is_ram() for finding (a)memory hole.
> But I'd like to catch other removable PG_reserved pages like (d)Isorated
> by MCA (e)used by perfmon and
> some of (b) used by kernerl and (c) Set by drivers.
> What I'm thinking of is to detect whether memory is hot-removable or not
> before removing actually.
MCA's probably shouldn't set PG_reserved; I don't see why they should.
They could just steal the page and "leak" it.
>
> >/dev/memstate really looks like a bad idea to me as well... I rather
> >have less than more /dev/*mem*
> >
> >
> For showing page usage and its "location", I've thought of other
> interface, sysfs, procfs...
> But I have no idea.
Why do you want this exported to userspace? There is absolutely no way
you can get this exported race free without shutting the VM down, and
without being race free this information has absolutely no meaning !!
(and when you shut the VM down you really shouldn't depend on userspace
anymore either)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 14:30 ` Arjan van de Ven
@ 2005-04-20 14:58 ` Kamezawa Hiroyuki
2005-04-20 17:18 ` Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Kamezawa Hiroyuki @ 2005-04-20 14:58 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, Dave Hansen, hari
Arjan van de Ven wrote:
>On Wed, 2005-04-20 at 23:15 +0900, Kamezawa Hiroyuki wrote:
>
>
>MCA's probably shouldn't set PG_reserved; I don't see why they should.
>They could just steal the page and "leak" it.
>
>
Actually leaked pages cannot be hot-removed/replaced. So we have to
trace which pages is removed by MCA.
I think Set PG_reserved and set page->private = Removed_by_MCA is a
simple idea.
>>>/dev/memstate really looks like a bad idea to me as well... I rather
>>>have less than more /dev/*mem*
>>>
>>>
>>>
>>>
>>For showing page usage and its "location", I've thought of other
>>interface, sysfs, procfs...
>>But I have no idea.
>>
>>
>
>Why do you want this exported to userspace? There is absolutely no way
>you can get this exported race free without shutting the VM down, and
>without being race free this information has absolutely no meaning !!
>
>
No meaning ?
Before memory-hotremove, we can guessing whether memory is hot-removable
or not.
As you say , this is not atomic and not fully responsible.
After failing memory-hotremove, detecting why hot-remove was failed is
very important.
I think ,when memory hot-remove faild, memory area is isolated until it
is pushed back by an operator.
We can get a real snapshot of specified memory area.
Regards,
-- Kame
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 12:34 ` Arjan van de Ven
2005-04-20 14:15 ` Kamezawa Hiroyuki
@ 2005-04-20 17:00 ` Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2005-04-20 17:00 UTC (permalink / raw)
To: Arjan van de Ven
Cc: KAMEZAWA Hiroyuki, Linux Kernel Mailing List,
Hariprasad Nellitheertha [imap]
On Wed, 2005-04-20 at 14:34 +0200, Arjan van de Ven wrote:
> On Wed, 2005-04-20 at 21:02 +0900, KAMEZAWA Hiroyuki wrote:
> > Hi,
> >
> > There are several types of PG_reserved pages,
> > (a) Memory Hole
> > (b) Used by Kernel
> > (c) Set by drivers
> > (d) Isorated by MCA
> > (e) used by perfmon
> > etc....
> >
> > I think it's useful to distinguish many types of PG_reserved pages.
>
> I'm not so sure about this. at all.
Neither am I, that's why I hoped somebody would figure out something
better :)
> > For example, Memory Hotplug can ignore (a).
>
> Memory Hotplug can also use page_is_ram().
It uses this, to some degree, internally. But, things like the e820
table don't get updated as memory hotplugs occur.
This should a way to give more fine-grained information about what pages
are availabe as RAM at any point in time. kdump would need something
like this to figure out which pages inside of /dev/mem are actually
valid to dump. Here was another approach that used /proc files:
http://lkml.org/lkml/2005/3/24/11
> /dev/memstate really looks like a bad idea to me as well... I rather
> have less than more /dev/*mem*
Any other ideas?
-- Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] nameing reserved pages [0/3]
2005-04-20 14:30 ` Arjan van de Ven
2005-04-20 14:58 ` Kamezawa Hiroyuki
@ 2005-04-20 17:18 ` Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2005-04-20 17:18 UTC (permalink / raw)
To: Arjan van de Ven
Cc: KAMEZAWA Hiroyuki, Linux Kernel Mailing List,
Hariprasad Nellitheertha [imap]
On Wed, 2005-04-20 at 16:30 +0200, Arjan van de Ven wrote:
> Why do you want this exported to userspace? There is absolutely no way
> you can get this exported race free without shutting the VM down, and
> without being race free this information has absolutely no meaning !!
> (and when you shut the VM down you really shouldn't depend on userspace
> anymore either)
The two cases where this is expected to be used are not concerned with
races. The first is when a memory remove operation occurs. It first
looks at the hotplug area, and removes all the pages that it can from
the allocator. Then, it sets about migrating all of the other pages
that are being used for things like page cache or anonymous memory.
After that, the question sometimes remains why particular pages can't be
removed. Kame's patch is an attempt to help figure that out.
That's one reason I suggested having an individual device file for each
of the memory areas that get added or removed. It would keep the
confusion to a minimum, and you'd be more sure that what you were
looking at was information only for the memory area that is *almost*
removed.
I don't know what state the system is in when the kdump folks want to
read this information.
-- Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-04-20 18:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-20 12:02 [RFC][PATCH] nameing reserved pages [0/3] KAMEZAWA Hiroyuki
2005-04-20 12:34 ` Arjan van de Ven
2005-04-20 14:15 ` Kamezawa Hiroyuki
2005-04-20 14:30 ` Arjan van de Ven
2005-04-20 14:58 ` Kamezawa Hiroyuki
2005-04-20 17:18 ` Dave Hansen
2005-04-20 17:00 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).