linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
       [not found]     ` <CAKbGBLiVqaHEOZx6y4MW4xDTUdKRhVLZXTTGiqYT7vuH2Wgeww@mail.gmail.com>
@ 2014-03-25 20:16       ` Linus Torvalds
  2014-03-31 12:26         ` Mel Gorman
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2014-03-25 20:16 UTC (permalink / raw)
  To: Steven Noonan, Mel Gorman, Rik van Riel
  Cc: David Vrabel, Andrew Morton, linux-mm

On Mon, Mar 24, 2014 at 8:31 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> Vrabel's comments make me think that revisiting the elimination of the
> _PAGE_NUMA bit implementation would be a good idea... should I CC you
> on this discussion (not sure if you're subscribed to xen-devel, or if
> LKML is a better place for that discussion)?

I detest the PAGE_NUMA games myself, but:

From: David Vrabel <david.vrabel@citrix.com>:
>
> I really do not understand how you're supposed to distinguish between a
> PTE for a PROT_NONE page and one with _PAGE_NUMA -- they're identical.
> i.e., pte_numa() will return true for a PROT_NONE protected page which
> just seems wrong to me.

The way to distinguish PAGE_NUMA from PROTNONE is *supposed* to be by
looking at the vma, and PROTNONE goes together with a vma with
PROT_NONE. That's what the comments in pgtable_types.h say.

However, as far as I can tell, that is pure and utter bullshit.  It's
true that generally handle_mm_fault() shouldn't be called for
PROT_NONE pages, since it will fail the protection checks. However, we
have FOLL_FORCE that overrides those protection checks for things like
ptrace etc. So people have tried to convince me that _PAGE_NUMA works,
but I'm really not at all sure they are right.

I fundamentally think that it was a horrible horrible disaster to make
_PAGE_NUMA alias onto _PAGE_PROTNONE.

But I'm cc'ing the people who tried to convince me otherwise last time
around, to see if they can articulate this mess better now.

The argument *seems* to be that if things are truly PROT_NONE, then
the page will never be touched by page faulting code (and as
mentioned, I think that argument is fundamentally broken), and if it's
PROT_NUMA then the page faulting code will magically do the right
thing.

FURTHERMORE, the argument was that we can't just call things PROT_NONE
and just say that "those are the semantics", because on other
architectures PROT_NONE might mean/do something else.  Which really
makes no sense either, because if the argument was that PROT_NONE
causes faults that can either be handled as faults (for PROT_NONE) or
as NUMA issues (for NUMA), then dammit, that argument should be
completely architecture-independent.

But I gave up arguing with people. I will state (again) that I think
this is a f*cking mess, and saying that PROTNONE and NUMA are somehow
the exact same thing on x86 but not in general is bogus crap. And
saying that you can determine which it is from the vma is very
debatable too.

Let the people responsible for the crap try to explain why it works
and has to be that mess. Again. Rik, Mel?

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-03-25 20:16       ` [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags() Linus Torvalds
@ 2014-03-31 12:26         ` Mel Gorman
  2014-03-31 15:41           ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2014-03-31 12:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, linux-mm

On Tue, Mar 25, 2014 at 01:16:02PM -0700, Linus Torvalds wrote:
> On Mon, Mar 24, 2014 at 8:31 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> > Vrabel's comments make me think that revisiting the elimination of the
> > _PAGE_NUMA bit implementation would be a good idea... should I CC you
> > on this discussion (not sure if you're subscribed to xen-devel, or if
> > LKML is a better place for that discussion)?
> 
> I detest the PAGE_NUMA games myself, but:
> 

First of all, sorry for the slow response even by my standards. I was at
LSF/MM and Collaboration all last week and it took up all my attention. Today
is my first day back properly online and trawling through the inbox mess.

> From: David Vrabel <david.vrabel@citrix.com>:
> >
> > I really do not understand how you're supposed to distinguish between a
> > PTE for a PROT_NONE page and one with _PAGE_NUMA -- they're identical.
> > i.e., pte_numa() will return true for a PROT_NONE protected page which
> > just seems wrong to me.
> 
> The way to distinguish PAGE_NUMA from PROTNONE is *supposed* to be by
> looking at the vma, and PROTNONE goes together with a vma with
> PROT_NONE. That's what the comments in pgtable_types.h say.
> 

This is the expectation. We did not want to even attempt tracking NUMA
hints on a per-VMA basis because the fault handler would go to hell with
the need to fixup vmas.

> However, as far as I can tell, that is pure and utter bullshit.  It's
> true that generally handle_mm_fault() shouldn't be called for
> PROT_NONE pages, since it will fail the protection checks. However, we
> have FOLL_FORCE that overrides those protection checks for things like
> ptrace etc. So people have tried to convince me that _PAGE_NUMA works,
> but I'm really not at all sure they are right.
> 

For FOLL_FORCE, we do not set FOLL_NUMA in this chunk here

        /*
         * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault
         * would be called on PROT_NONE ranges. We must never invoke
         * handle_mm_fault on PROT_NONE ranges or the NUMA hinting
         * page faults would unprotect the PROT_NONE ranges if
         * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd
         * bitflag. So to avoid that, don't set FOLL_NUMA if
         * FOLL_FORCE is set.
         */
        if (!(gup_flags & FOLL_FORCE))
                gup_flags |= FOLL_NUMA;

Without FOLL_NUMA, we do not do "pmd_numa" checks because they cannot
distinguish between a prot_none and pmd_numa as they use identical bits
on x86. This is in follow_page_mask

        if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
                goto no_page_table;

Without the checks FOLL_FORCE would screw up when it encountered a page
protected for NUMA hinting faults. I recognise that it further muddies
the waters on what _PAGE_NUMA actually means.

A potential alternative would have been to have two pte bits -- _PAGE_NONE
and an unused PTE bit (if there is one) that we'd call_PAGE_NUMA where a
pmd_mknuma sets both. The _PAGE_NONE is what would cause a hinting fault
but we'd use the second bit to distinguish between PROT_NONE and a NUMA
hinting fault. I doubt the end result would be much cleaner though and
it would be a mess.

Another alternative is to simply not allow NUMA_BALANCING on Xen. It's not
even clear what it means as the Xen NUMA topology may or may not correspond
to underlying physical nodes. It's even less clear what happens if both
guest and host use automatic balancing.

> I fundamentally think that it was a horrible horrible disaster to make
> _PAGE_NUMA alias onto _PAGE_PROTNONE.
> 

We did not have much of a choice. We needed something that would trap a
fault and _PAGE_PROTNONE is not available on all architectures. ppc64
reused _PAGE_COHERENT for example.

> But I'm cc'ing the people who tried to convince me otherwise last time
> around, to see if they can articulate this mess better now.
> 
> The argument *seems* to be that if things are truly PROT_NONE, then
> the page will never be touched by page faulting code (and as
> mentioned, I think that argument is fundamentally broken), and if it's
> PROT_NUMA then the page faulting code will magically do the right
> thing.
> 

This is essentially the argument with the addendum that follow_page is
meant to avoid trying pmd_numa checks on FOLL_FORCE.

> FURTHERMORE, the argument was that we can't just call things PROT_NONE
> and just say that "those are the semantics", because on other
> architectures PROT_NONE might mean/do something else.

Or that the equivalent of _PAGE_PROTNONE did not exist and was
implemented by some other means.

> Which really
> makes no sense either, because if the argument was that PROT_NONE
> causes faults that can either be handled as faults (for PROT_NONE) or
> as NUMA issues (for NUMA), then dammit, that argument should be
> completely architecture-independent.
> 
> But I gave up arguing with people. I will state (again) that I think
> this is a f*cking mess, and saying that PROTNONE and NUMA are somehow
> the exact same thing on x86 but not in general is bogus crap. And
> saying that you can determine which it is from the vma is very
> debatable too.
> 

Ok, so how do you suggest that _PAGE_NUMA could have been implemented
that did *not* use _PAGE_PROTNONE on x86, trapped a fault and was not
expensive as hell to handle?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-03-31 12:26         ` Mel Gorman
@ 2014-03-31 15:41           ` Linus Torvalds
  2014-03-31 16:10             ` Linus Torvalds
  2014-04-01 18:18             ` David Vrabel
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2014-03-31 15:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, linux-mm

On Mon, Mar 31, 2014 at 5:26 AM, Mel Gorman <mgorman@suse.de> wrote:
>
> Ok, so how do you suggest that _PAGE_NUMA could have been implemented
> that did *not* use _PAGE_PROTNONE on x86, trapped a fault and was not
> expensive as hell to handle?

So on x86, the obvious model is to use another bit. We've got several.
The _PAGE_NUMA case only matters for when _PAGE_PRESENT is clear, and
when that bit is clear the hardware doesn't care about any of the
other bits. Currently we use:

  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
  #define _PAGE_BIT_FILE          _PAGE_BIT_DIRTY

which are bits 8 and 6 respectively, afaik.

and the only rule is that (a) we should *not* use a bit we already use
when the page is not present (since that is ambiguous!) and (b) we
should *not* use a bit that is used by the swap index cases. I think
bit 7 should work, but maybe I missed something.

Can somebody tell me why _PAGE_NUMA is *not* that bit seven? Make
"pte_present()" on x86 just check all of the present/numa/protnone
bits, and if any of them is set, it's a "present" page.

Now, unlike x86, some other architectures do *not* have free bits, so
there may be problems elsewhere.

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-03-31 15:41           ` Linus Torvalds
@ 2014-03-31 16:10             ` Linus Torvalds
  2014-03-31 16:27               ` Cyrill Gorcunov
  2014-04-01 18:18             ` David Vrabel
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2014-03-31 16:10 UTC (permalink / raw)
  To: Mel Gorman, Peter Anvin, Ingo Molnar, the arch/x86 maintainers
  Cc: Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, linux-mm, Cyrill Gorcunov

[ Adding x86 maintainers - Ingo was involved earlier, make it more explicit ]

On Mon, Mar 31, 2014 at 8:41 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So on x86, the obvious model is to use another bit. We've got several.
> The _PAGE_NUMA case only matters for when _PAGE_PRESENT is clear, and
> when that bit is clear the hardware doesn't care about any of the
> other bits. Currently we use:
>
>   #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
>   #define _PAGE_BIT_FILE          _PAGE_BIT_DIRTY
>
> which are bits 8 and 6 respectively, afaik.

Side note to the x86 guys: I think it was a mistake (long long long
ago) to define these "valid when not present" bits in terms of the
"valid when present" bits.

It causes insane situations like this:

  #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE

which makes no sense *except* if you think that those bits can have
random odd hardware-defined values. But they really can't. They are
just random bit numbers we chose.

It has *also* caused horrible pain with the whole "soft dirty" thing,
and we have absolutely ridiculous macros in pgtable-2level.h for the
insane soft-dirty case, trying to keep the swap bits spread out "just
right" to make soft-dirty (_PAGE_BIT_HIDDEN aka bit 11) not alias with
the bits we use for swap offsets etc.

So how about we just say:

 - define the bits we use when PAGE_PRESENT==0 separately and explicitly

 - clean up the crazy soft-dirty crap, preferably by just making it
depend on a 64-bit pte (so you have to have X86_PAE enabled or be on
x86-64)

that would sound like a good cleanup, because right now it's a
complete nightmare to think about which bits are used how when P is 0.
The above insane #if being the prime example of that confusion.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-03-31 16:10             ` Linus Torvalds
@ 2014-03-31 16:27               ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2014-03-31 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mel Gorman, Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, linux-mm

On Mon, Mar 31, 2014 at 09:10:07AM -0700, Linus Torvalds wrote:
> 
> It causes insane situations like this:
> 
>   #if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE
> 
> which makes no sense *except* if you think that those bits can have
> random odd hardware-defined values. But they really can't. They are
> just random bit numbers we chose.

I never understand this ifdef (I've asked once but got no reply).

> It has *also* caused horrible pain with the whole "soft dirty" thing,
> and we have absolutely ridiculous macros in pgtable-2level.h for the
> insane soft-dirty case, trying to keep the swap bits spread out "just
> right" to make soft-dirty (_PAGE_BIT_HIDDEN aka bit 11) not alias with
> the bits we use for swap offsets etc.
> 
> So how about we just say:
> 
>  - define the bits we use when PAGE_PRESENT==0 separately and explicitly
> 
>  - clean up the crazy soft-dirty crap, preferably by just making it
> depend on a 64-bit pte (so you have to have X86_PAE enabled or be on
> x86-64)

Sounds good for me, i'll try my best (if noone object).

> 
> that would sound like a good cleanup, because right now it's a
> complete nightmare to think about which bits are used how when P is 0.
> The above insane #if being the prime example of that confusion.

	Cyrill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-03-31 15:41           ` Linus Torvalds
  2014-03-31 16:10             ` Linus Torvalds
@ 2014-04-01 18:18             ` David Vrabel
  2014-04-01 18:43               ` Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: David Vrabel @ 2014-04-01 18:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mel Gorman, Steven Noonan, Rik van Riel, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, linux-mm

On 31/03/14 16:41, Linus Torvalds wrote:
> On Mon, Mar 31, 2014 at 5:26 AM, Mel Gorman <mgorman@suse.de> wrote:
>>
>> Ok, so how do you suggest that _PAGE_NUMA could have been implemented
>> that did *not* use _PAGE_PROTNONE on x86, trapped a fault and was not
>> expensive as hell to handle?
> 
> So on x86, the obvious model is to use another bit. We've got several.
> The _PAGE_NUMA case only matters for when _PAGE_PRESENT is clear, and
> when that bit is clear the hardware doesn't care about any of the
> other bits. Currently we use:
> 
>   #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
>   #define _PAGE_BIT_FILE          _PAGE_BIT_DIRTY
> 
> which are bits 8 and 6 respectively, afaik.
> 
> and the only rule is that (a) we should *not* use a bit we already use
> when the page is not present (since that is ambiguous!) and (b) we
> should *not* use a bit that is used by the swap index cases. I think
> bit 7 should work, but maybe I missed something.

I don't think it's sufficient to avoid collisions with bits used only
with P=0.  The original value of this bit must be retained when the
_PAGE_NUMA bit is set/cleared.

Bit 7 is PAT[2] and whilst Linux currently sets up the PAT such that
PAT[2] is a 'don't care', there has been talk up adjusting the PAT to
include more types. So I'm not sure it's a good idea to use bit 7.

What's wrong with using e.g., bit 62? And not supporting this NUMA
rebalancing feature on 32-bit non-PAE builds?

David

> Can somebody tell me why _PAGE_NUMA is *not* that bit seven? Make
> "pte_present()" on x86 just check all of the present/numa/protnone
> bits, and if any of them is set, it's a "present" page.
> 
> Now, unlike x86, some other architectures do *not* have free bits, so
> there may be problems elsewhere.
> 
>             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-04-01 18:18             ` David Vrabel
@ 2014-04-01 18:43               ` Linus Torvalds
  2014-04-01 19:03                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2014-04-01 18:43 UTC (permalink / raw)
  To: David Vrabel
  Cc: Mel Gorman, Steven Noonan, Rik van Riel, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, linux-mm

On Tue, Apr 1, 2014 at 11:18 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>
> I don't think it's sufficient to avoid collisions with bits used only
> with P=0.  The original value of this bit must be retained when the
> _PAGE_NUMA bit is set/cleared.
>
> Bit 7 is PAT[2] and whilst Linux currently sets up the PAT such that
> PAT[2] is a 'don't care', there has been talk up adjusting the PAT to
> include more types. So I'm not sure it's a good idea to use bit 7.
>
> What's wrong with using e.g., bit 62? And not supporting this NUMA
> rebalancing feature on 32-bit non-PAE builds?

Sounds good to me, but it's not available in 32-bit PAE. The high bits
are all reserved, afaik.

But you'd have to be insane to care about NUMA balancing on 32-bit,
even with PAE. So restricting it to x86-64 and using the high bits (I
think bits 52-62 are all available to SW) sounds fine to me.

Same goes for soft-dirty. I think it's fine if we say that you won't
have soft-dirty with a 32-bit kernel. Even with PAE.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-04-01 18:43               ` Linus Torvalds
@ 2014-04-01 19:03                 ` Cyrill Gorcunov
  2014-04-02 11:33                   ` Pavel Emelyanov
  0 siblings, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2014-04-01 19:03 UTC (permalink / raw)
  To: Linus Torvalds, Pavel Emelyanov
  Cc: David Vrabel, Mel Gorman, Steven Noonan, Rik van Riel,
	Andrew Morton, Ingo Molnar, Peter Zijlstra, linux-mm

On Tue, Apr 01, 2014 at 11:43:11AM -0700, Linus Torvalds wrote:
> On Tue, Apr 1, 2014 at 11:18 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> >
> > I don't think it's sufficient to avoid collisions with bits used only
> > with P=0.  The original value of this bit must be retained when the
> > _PAGE_NUMA bit is set/cleared.
> >
> > Bit 7 is PAT[2] and whilst Linux currently sets up the PAT such that
> > PAT[2] is a 'don't care', there has been talk up adjusting the PAT to
> > include more types. So I'm not sure it's a good idea to use bit 7.
> >
> > What's wrong with using e.g., bit 62? And not supporting this NUMA
> > rebalancing feature on 32-bit non-PAE builds?
> 
> Sounds good to me, but it's not available in 32-bit PAE. The high bits
> are all reserved, afaik.
> 
> But you'd have to be insane to care about NUMA balancing on 32-bit,
> even with PAE. So restricting it to x86-64 and using the high bits (I
> think bits 52-62 are all available to SW) sounds fine to me.
> 
> Same goes for soft-dirty. I think it's fine if we say that you won't
> have soft-dirty with a 32-bit kernel. Even with PAE.

Well, at the moment we use soft-dirty for x86-64 only in criu but there
were plans to implement complete 32bit support as well. While personally
I don't mind dropping soft-dirty for non x86-64 case, I would like
to hear Pavel's opinion, Pavel?

(n.b, i'm still working on cleaning up _page bits, it appeared to
 be harder than I've been expecting).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-04-01 19:03                 ` Cyrill Gorcunov
@ 2014-04-02 11:33                   ` Pavel Emelyanov
  2014-04-02 13:29                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Emelyanov @ 2014-04-02 11:33 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Linus Torvalds, David Vrabel, Mel Gorman, Steven Noonan,
	Rik van Riel, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	linux-mm

On 04/01/2014 11:03 PM, Cyrill Gorcunov wrote:
> On Tue, Apr 01, 2014 at 11:43:11AM -0700, Linus Torvalds wrote:
>> On Tue, Apr 1, 2014 at 11:18 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>
>>> I don't think it's sufficient to avoid collisions with bits used only
>>> with P=0.  The original value of this bit must be retained when the
>>> _PAGE_NUMA bit is set/cleared.
>>>
>>> Bit 7 is PAT[2] and whilst Linux currently sets up the PAT such that
>>> PAT[2] is a 'don't care', there has been talk up adjusting the PAT to
>>> include more types. So I'm not sure it's a good idea to use bit 7.
>>>
>>> What's wrong with using e.g., bit 62? And not supporting this NUMA
>>> rebalancing feature on 32-bit non-PAE builds?
>>
>> Sounds good to me, but it's not available in 32-bit PAE. The high bits
>> are all reserved, afaik.
>>
>> But you'd have to be insane to care about NUMA balancing on 32-bit,
>> even with PAE. So restricting it to x86-64 and using the high bits (I
>> think bits 52-62 are all available to SW) sounds fine to me.
>>
>> Same goes for soft-dirty. I think it's fine if we say that you won't
>> have soft-dirty with a 32-bit kernel. Even with PAE.
> 
> Well, at the moment we use soft-dirty for x86-64 only in criu but there
> were plans to implement complete 32bit support as well. While personally
> I don't mind dropping soft-dirty for non x86-64 case, I would like
> to hear Pavel's opinion, Pavel?

We (Parallels) don't have plans on C/R on 32-bit kernels, but I speak only
for Parallels. However, people I know who need 32-bit C/R use ARM :)

> (n.b, i'm still working on cleaning up _page bits, it appeared to
>  be harder than I've been expecting).
> .

Thanks,
Pavel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags()
  2014-04-02 11:33                   ` Pavel Emelyanov
@ 2014-04-02 13:29                     ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2014-04-02 13:29 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Linus Torvalds, David Vrabel, Mel Gorman, Steven Noonan,
	Rik van Riel, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	linux-mm

On Wed, Apr 02, 2014 at 03:33:48PM +0400, Pavel Emelyanov wrote:
...
> >>
> >> But you'd have to be insane to care about NUMA balancing on 32-bit,
> >> even with PAE. So restricting it to x86-64 and using the high bits (I
> >> think bits 52-62 are all available to SW) sounds fine to me.
> >>
> >> Same goes for soft-dirty. I think it's fine if we say that you won't
> >> have soft-dirty with a 32-bit kernel. Even with PAE.
> > 
> > Well, at the moment we use soft-dirty for x86-64 only in criu but there
> > were plans to implement complete 32bit support as well. While personally
> > I don't mind dropping soft-dirty for non x86-64 case, I would like
> > to hear Pavel's opinion, Pavel?
> 
> We (Parallels) don't have plans on C/R on 32-bit kernels, but I speak only
> for Parallels. However, people I know who need 32-bit C/R use ARM :)

OK, since it's x86 specific I can prepare patch for dropping softdirty on
x86-32 (this will release ugly macros in file mapping a bit but not that
significantly).

Guys, while looking into how to re-define _PAGE bits for case where present
bit is dropped I though about the form like

#define _PAGE_BIT_FILE		(_PAGE_BIT_PRESENT + 1)	/* _PAGE_BIT_RW */
#define _PAGE_BIT_NUMA		(_PAGE_BIT_PRESENT + 2)	/* _PAGE_BIT_USER */
#define _PAGE_BIT_PROTNONE	(_PAGE_BIT_PRESENT + 3)	/* _PAGE_BIT_PWT */

and while _PAGE_BIT_FILE case should work (as well as swap pages), I'm not that
sure about the numa and protnone case. I fear there are some code paths which
depends on the former bits positions -- ie when

	PAGE_BIT_PROTNONE = _PAGE_BIT_NUMA = _PAGE_BIT_GLOBAL.

One of the _PAGE_BIT_GLOBAL user is the page attributes code. It seems to always check
_PAGE_BIT_PRESENT together with _PAGE_BIT_GLOBAL, so if _PAGE_BIT_PROTNONE get redefined
to a new value it should not fail. Thus main concern is protnone + numa code, which
I must admit I don't know well enough yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-02 13:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1395425902-29817-1-git-send-email-david.vrabel@citrix.com>
     [not found] ` <1395425902-29817-3-git-send-email-david.vrabel@citrix.com>
     [not found]   ` <533016CB.4090807@citrix.com>
     [not found]     ` <CAKbGBLiVqaHEOZx6y4MW4xDTUdKRhVLZXTTGiqYT7vuH2Wgeww@mail.gmail.com>
2014-03-25 20:16       ` [PATCH 2/2] x86: use pv-ops in {pte,pmd}_{set,clear}_flags() Linus Torvalds
2014-03-31 12:26         ` Mel Gorman
2014-03-31 15:41           ` Linus Torvalds
2014-03-31 16:10             ` Linus Torvalds
2014-03-31 16:27               ` Cyrill Gorcunov
2014-04-01 18:18             ` David Vrabel
2014-04-01 18:43               ` Linus Torvalds
2014-04-01 19:03                 ` Cyrill Gorcunov
2014-04-02 11:33                   ` Pavel Emelyanov
2014-04-02 13:29                     ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).