linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: your mail
@ 2000-03-28  8:19 pnilesh
  2000-03-28 13:26 ` Stephen C. Tweedie
  0 siblings, 1 reply; 54+ messages in thread
From: pnilesh @ 2000-03-28  8:19 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm




No, if both processes have faulted in the page into their ptes, it will
be 2. The page count is normally the number of references from user
ptes, plus any long/short term holds kernel code establishes on the
page.

I was confused as Maurice Bach increases region reference count when any
region say text is shared among more than one processes, and not the page
reference count.

One more thing if the process ocurrs a page fault on text page it calls
file_no_page()
>From what you said in this case it should increment the page count but in
this function no where I could see the page count getting incremented.


>
> Q    When a page of a file is in page hash queue, does this page have
page
> table entry in any process ?

Possibly, if the file is mmaped into some other process.

> Q     Can this be discarded right away , if the need arises?
>
At the minimum, you need to write modified contents back to disk, if
the file page has not already been discarded.

The David Rusling book says when reducing page cache and buffer cache the
page table entries are not modified and the pages can be dropped directly.

Kanoj

> Nilesh Patel
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* [PATCH v7 0/7]  mseal system mappings
@ 2025-02-24 22:52 jeffxu
  2025-02-25 15:18 ` Lorenzo Stoakes
  0 siblings, 1 reply; 54+ messages in thread
From: jeffxu @ 2025-02-24 22:52 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
	ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
	willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
	rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
	ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
	aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu

From: Jeff Xu <jeffxu@chromium.org>

This is V7 version, addressing comments from V6, without code logic
change.

--------------------------------------------------

History:
V7:
 - Remove cover letter from the first patch (Liam R. Howlett)
 - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
 - logging and fclose() in selftest (Liam R. Howlett)

V6:
  https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/

V5
  https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/

V4:
  https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/

V3:
  https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/

V2:
  https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/

V1:
  https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/

--------------------------------------------------
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].

The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.

The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.

System mappings such as vdso, vvar, and sigpage (arm), vectors (arm)
are created by the kernel during program initialization, and could
be sealed after creation.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.

The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.

To support mseal of system mappings, architectures must define
CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS and update their special mappings
calls to pass mseal flag. Additionally, architectures must confirm they
do not unmap/remap system mappings during the process lifetime.

In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.

Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
  than its actual size). The dynamic linker, which relies on PT_LOAD
  information to determine mapping size, would then split and partially
  seal the vdso mapping. Since each architecture has its own vdso/vvar
  code, fixing this in the kernel would require going through each
  archiecture. Our initial goal was to enable sealing readonly mappings,
  e.g. .text, across all architectures, sealing vdso from kernel since
  creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
  information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
  and there is no effective solution for them.

This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.

Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
  i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
  Android to ensure the sealing doesn't affect the functionality of
  Chromebook and Android phone.

I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
  normal operations such as browsing the web, open/edit doc are OK.

In addition, Benjamin Berg tested this on UML.

Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]




Jeff Xu (7):
  mseal, system mappings: kernel config and header change
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal, system mappings: enable x86-64
  mseal, system mappings: enable arm64
  mseal, system mappings: enable uml architecture
  mseal, system mappings: uprobe mapping
  mseal, system mappings: update mseal.rst

 Documentation/userspace-api/mseal.rst         |  7 +++
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/kernel/vdso.c                      | 22 +++++++---
 arch/um/Kconfig                               |  1 +
 arch/x86/Kconfig                              |  1 +
 arch/x86/entry/vdso/vma.c                     | 16 ++++---
 arch/x86/um/vdso/vma.c                        |  6 ++-
 include/linux/mm.h                            | 10 +++++
 init/Kconfig                                  | 18 ++++++++
 kernel/events/uprobes.c                       |  5 ++-
 security/Kconfig                              | 18 ++++++++
 .../testing/selftests/x86/test_mremap_vdso.c  | 43 +++++++++++++++++++
 12 files changed, 132 insertions(+), 16 deletions(-)

-- 
2.48.1.658.g4767266eb4-goog



^ permalink raw reply	[flat|nested] 54+ messages in thread
* [PATCH] maple_tree: Fix a few documentation issues, 
@ 2023-05-10 19:01 Thomas Gleixner
  2023-05-15 19:27 ` your mail Liam R. Howlett
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2023-05-10 19:01 UTC (permalink / raw)
  To: LKML; +Cc: Liam R. Howlett, Matthew Wilcox, linux-mm, Shanker Donthineni

The documentation of mt_next() claims that it starts the search at the
provided index. That's incorrect as it starts the search after the provided
index.

The documentation of mt_find() is slightly confusing. "Handles locking" is
not really helpful as it does not explain how the "locking" works. Also the
documentation of index talks about a range, while in reality the index
is updated on a succesful search to the index of the found entry plus one.

Fix similar issues for mt_find_after() and mt_prev().

Remove the completely confusing and pointless "Note: Will not return the
zero entry." comment from mt_for_each() and document @__index correctly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/maple_tree.h |    4 +---
 lib/maple_tree.c           |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 8 deletions(-)

--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -659,10 +659,8 @@ void *mt_next(struct maple_tree *mt, uns
  * mt_for_each - Iterate over each entry starting at index until max.
  * @__tree: The Maple Tree
  * @__entry: The current entry
- * @__index: The index to update to track the location in the tree
+ * @__index: The index to start the search from. Subsequently used as iterator.
  * @__max: The maximum limit for @index
- *
- * Note: Will not return the zero entry.
  */
 #define mt_for_each(__tree, __entry, __index, __max) \
 	for (__entry = mt_find(__tree, &(__index), __max); \
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5947,7 +5947,10 @@ EXPORT_SYMBOL_GPL(mas_next);
  * @index: The start index
  * @max: The maximum index to check
  *
- * Return: The entry at @index or higher, or %NULL if nothing is found.
+ * Takes RCU read lock internally to protect the search, which does not
+ * protect the returned pointer after dropping RCU read lock.
+ *
+ * Return: The entry higher than @index or %NULL if nothing is found.
  */
 void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max)
 {
@@ -6012,7 +6015,10 @@ EXPORT_SYMBOL_GPL(mas_prev);
  * @index: The start index
  * @min: The minimum index to check
  *
- * Return: The entry at @index or lower, or %NULL if nothing is found.
+ * Takes RCU read lock internally to protect the search, which does not
+ * protect the returned pointer after dropping RCU read lock.
+ *
+ * Return: The entry before @index or %NULL if nothing is found.
  */
 void *mt_prev(struct maple_tree *mt, unsigned long index, unsigned long min)
 {
@@ -6487,9 +6493,14 @@ EXPORT_SYMBOL(mtree_destroy);
  * mt_find() - Search from the start up until an entry is found.
  * @mt: The maple tree
  * @index: Pointer which contains the start location of the search
- * @max: The maximum value to check
+ * @max: The maximum value of the search range
+ *
+ * Takes RCU read lock internally to protect the search, which does not
+ * protect the returned pointer after dropping RCU read lock.
  *
- * Handles locking.  @index will be incremented to one beyond the range.
+ * In case that an entry is found @index contains the index of the found
+ * entry plus one, so it can be used as iterator index to find the next
+ * entry.
  *
  * Return: The entry at or after the @index or %NULL
  */
@@ -6548,7 +6559,9 @@ EXPORT_SYMBOL(mt_find);
  * @index: Pointer which contains the start location of the search
  * @max: The maximum value to check
  *
- * Handles locking, detects wrapping on index == 0
+ * Same as mt_find() except that it checks @index for 0 before
+ * searching. If @index == 0, the search is aborted. This covers a wrap
+ * around of @index to 0 in an iterator loop.
  *
  * Return: The entry at or after the @index or %NULL
  */


^ permalink raw reply	[flat|nested] 54+ messages in thread
[parent not found: <20190225201635.4648-1-hannes@cmpxchg.org>]
* [PATCH -v2 0/9] mm: make movable onlining suck less
@ 2017-04-10 11:03 Michal Hocko
  2017-04-15 12:17 ` Michal Hocko
  0 siblings, 1 reply; 54+ messages in thread
From: Michal Hocko @ 2017-04-10 11:03 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Vlastimil Babka, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Joonsoo Kim, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML, Dan Williams,
	Heiko Carstens, Lai Jiangshan, Martin Schwidefsky, Michal Hocko,
	Tobias Regnery

Hi,
The last version of this series has been posted here [1]. It has seen
some more serious testing (thanks to Reza Arbab) and fixes for the found
issues. I have also decided to drop patch 1 [2] because it turned out to
be more complicated than I initially thought [3]. Few more patches were
added to deal with expectation on zone/node initialization.

I have rebased on top of the current mmotm-2017-04-07-15-53. It
conflicts with HMM because it touches memory hotplug as
well. We have discussed [4] with JA(C)rA'me and he agreed to
rebase on top of this rework [5] so I have reverted his series
before applyig mine. I will help him to resolve the resulting
conflicts. You can find the whole series including the HMM revers in
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git branch
attempts/rewrite-mem_hotplug

Motivation:
Movable onlining is a real hack with many downsides - mainly
reintroduction of lowmem/highmem issues we used to have on 32b systems -
but it is the only way to make the memory hotremove more reliable which
is something that people are asking for.

The current semantic of memory movable onlinening is really cumbersome,
however. The main reason for this is that the udev driven approach is
basically unusable because udev races with the memory probing while only
the last memory block or the one adjacent to the existing zone_movable
are allowed to be onlined movable. In short the criterion for the
successful online_movable changes under udev's feet. A reliable udev
approach would require a 2 phase approach where the first successful
movable online would have to check all the previous blocks and online
them in descending order. This is hard to be considered sane.

This patchset aims at making the onlining semantic more usable. First of
all it allows to online memory movable as long as it doesn't clash with
the existing ZONE_NORMAL. That means that ZONE_NORMAL and ZONE_MOVABLE
cannot overlap. Currently I preserve the original ordering semantic so
the zone always precedes the movable zone but I have plans to remove this
restriction in future because it is not really necessary.

First 3 patches are cleanups which should be ready to be merged right
away (unless I have missed something subtle of course).

Patch 4 deals with ZONE_DEVICE dependencies down the __add_pages path.

Patch 5 deals with implicit assumptions of register_one_node on pgdat
initialization.

Patch 6 is the core of the change. In order to make it easier to review
I have tried it to be as minimalistic as possible and the large code
removal is moved to patch 9.

Patch 7 is a trivial follow up cleanup. Patch 8 fixes sparse warnings
and finally patch 9 removes the unused code.

I have tested the patches in kvm:
# qemu-system-x86_64 -enable-kvm -monitor pty -m 2G,slots=4,maxmem=4G -numa node,mem=1G -numa node,mem=1G ...

and then probed the additional memory by
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

Then I have used this simple script to probe the memory block by hand
# cat probe_memblock.sh
#!/bin/sh

BLOCK_NR=$1

# echo $((0x100000000+$BLOCK_NR*(128<<20))) > /sys/devices/system/memory/probe

# for i in $(seq 10); do sh probe_memblock.sh $i; done
# grep . /sys/devices/system/memory/memory3?/valid_zones 2>/dev/null 
/sys/devices/system/memory/memory33/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory34/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory35/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory36/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory37/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory38/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory39/valid_zones:Normal Movable

The main difference to the original implementation is that all new
memblocks can be both online_kernel and online_movable initially
because there is no clash obviously. For the comparison the original
implementation would have

/sys/devices/system/memory/memory33/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory34/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory35/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory36/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory37/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory38/valid_zones:Normal                                                                                                                                                     
/sys/devices/system/memory/memory39/valid_zones:Normal Movable

Now
# echo online_movable > /sys/devices/system/memory/memory34/state                                                                                                                                      
# grep . /sys/devices/system/memory/memory3?/valid_zones 2>/dev/null                                                                                                                                   
/sys/devices/system/memory/memory33/valid_zones:Normal Movable                                                                                                                                                     
/sys/devices/system/memory/memory34/valid_zones:Movable                                                                                                                                                            
/sys/devices/system/memory/memory35/valid_zones:Movable                                                                                                                                                            
/sys/devices/system/memory/memory36/valid_zones:Movable                                                                                                                                                            
/sys/devices/system/memory/memory37/valid_zones:Movable                                                                                                                                                            
/sys/devices/system/memory/memory38/valid_zones:Movable
/sys/devices/system/memory/memory39/valid_zones:Movable

Block 33 can still be online both kernel and movable while all
the remaining can be only movable.
/proc/zonelist says
Node 0, zone   Normal
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
--
Node 0, zone  Movable
  pages free     32753
        min      85
        low      117
        high     149
        spanned  32768
        present  32768

A new memblock at a lower address will result in a new memblock (32)
which will still allow both Normal and Movable.
# sh probe_memblock.sh 0
# grep . /sys/devices/system/memory/memory3[2-5]/valid_zones 2>/dev/null
/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable
/sys/devices/system/memory/memory35/valid_zones:Movable

and online_kernel will convert it to the zone normal properly
while 33 can be still onlined both ways.
# echo online_kernel > /sys/devices/system/memory/memory32/state
# grep . /sys/devices/system/memory/memory3[2-5]/valid_zones 2>/dev/null
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable
/sys/devices/system/memory/memory35/valid_zones:Movable

/proc/zoneinfo will now tell
Node 0, zone   Normal
  pages free     65441
        min      165
        low      230
        high     295
        spanned  65536
        present  65536
--
Node 0, zone  Movable
  pages free     32740
        min      82
        low      114
        high     146
        spanned  32768
        present  32768

so both zones have one memblock spanned and present.

Onlining 39 should associate this block to the movable zone
# echo online > /sys/devices/system/memory/memory39/state

/proc/zoneinfo will now tell
Node 0, zone   Normal
  pages free     32765
        min      80
        low      112
        high     144
        spanned  32768
        present  32768
--
Node 0, zone  Movable
  pages free     65501
        min      160
        low      225
        high     290
        spanned  196608
        present  65536

so we will have a movable zone which spans 6 memblocks, 2 present and 4
representing a hole.

Offlining both movable blocks will lead to the zone with no present
pages which is the expected behavior I believe.
# echo offline > /sys/devices/system/memory/memory39/state
# echo offline > /sys/devices/system/memory/memory34/state
# grep -A6 "Movable\|Normal" /proc/zoneinfo 
Node 0, zone   Normal
  pages free     32735
        min      90
        low      122
        high     154
        spanned  32768
        present  32768
--
Node 0, zone  Movable
  pages free     0
        min      0
        low      0
        high     0
        spanned  196608
        present  0

Any thoughts, complains, suggestions?

As a bonus we will get a nice cleanup in the memory hotplug codebase
 arch/ia64/mm/init.c            |  11 +-
 arch/powerpc/mm/mem.c          |  12 +-
 arch/s390/mm/init.c            |  32 +--
 arch/sh/mm/init.c              |  10 +-
 arch/x86/mm/init_32.c          |   7 +-
 arch/x86/mm/init_64.c          |  11 +-
 drivers/base/memory.c          |  74 ++++---
 drivers/base/node.c            |  58 ++----
 include/linux/memory_hotplug.h |  19 +-
 include/linux/mmzone.h         |  16 +-
 include/linux/node.h           |  35 +++-
 kernel/memremap.c              |   6 +-
 mm/memory_hotplug.c            | 451 ++++++++++++++---------------------------
 mm/page_alloc.c                |   8 +-
 mm/sparse.c                    |   3 +-
 15 files changed, 284 insertions(+), 469 deletions(-)

Shortlog says:
Michal Hocko (9):
      mm: remove return value from init_currently_empty_zone
      mm, memory_hotplug: use node instead of zone in can_online_high_movable
      mm: drop page_initialized check from get_nid_for_pfn
      mm, memory_hotplug: get rid of is_zone_device_section
      mm, memory_hotplug: split up register_one_node
      mm, memory_hotplug: do not associate hotadded memory to zones until online
      mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory
      mm, memory_hotplug: fix the section mismatch warning
      mm, memory_hotplug: remove unused cruft after memory hotplug rework

[1] http://lkml.kernel.org/r/20170330115454.32154-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20170331073954.GF27098@dhcp22.suse.cz
[3] http://lkml.kernel.org/r/20170405081400.GE6035@dhcp22.suse.cz
[4] http://lkml.kernel.org/r/20170407121349.GB16392@dhcp22.suse.cz
[5] http://lkml.kernel.org/r/20170407182752.GA17852@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2012-10-04 16:50 Andrea Arcangeli
  2012-10-04 18:17 ` your mail Christoph Lameter
  0 siblings, 1 reply; 54+ messages in thread
From: Andrea Arcangeli @ 2012-10-04 16:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-mm, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Ingo Molnar, Mel Gorman, Hugh Dickins,
	Rik van Riel, Johannes Weiner, Hillf Danton, Andrew Jones,
	Dan Smith, Thomas Gleixner, Paul Turner, Suresh Siddha,
	Mike Galbraith, Paul E. McKenney

Subject: Re: [PATCH 29/33] autonuma: page_autonuma
Reply-To: 
In-Reply-To: <0000013a2c223da2-632aa43e-21f8-4abd-a0ba-2e1b49881e3a-000000@email.amazonses.com>

Hi Christoph,

On Thu, Oct 04, 2012 at 02:16:14PM +0000, Christoph Lameter wrote:
> On Thu, 4 Oct 2012, Andrea Arcangeli wrote:
> 
> > Move the autonuma_last_nid from the "struct page" to a separate
> > page_autonuma data structure allocated in the memsection (with
> > sparsemem) or in the pgdat (with flatmem).
> 
> Note that there is a available word in struct page before the autonuma
> patches on x86_64 with CONFIG_HAVE_ALIGNED_STRUCT_PAGE.
> 
> In fact the page_autonuma fills up the structure to nicely fit in one 64
> byte cacheline.

Good point indeed.

So we could drop page_autonuma by creating a CONFIG_SLUB=y dependency
(AUTONUMA wouldn't be available in the kernel config if SLAB=y, and it
also wouldn't be available on 32bit archs but the latter isn't a
problem).

I think it's a reasonable alternative to page_autonuma. Certainly it
looks more appealing than taking over 16 precious bits from
page->flags. There are still pros and cons. I'm neutral on it so more
comments would be welcome ;).

Andrea

PS. randomly moved some in Cc over to Bcc as I overflowed the max
header allowed on linux-kernel oops!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2010-06-16 16:33 Jan Kara
  2010-06-16 22:15 ` your mail Dave Chinner
  2010-06-22  2:59 ` Wu Fengguang
  0 siblings, 2 replies; 54+ messages in thread
From: Jan Kara @ 2010-06-16 16:33 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, Andrew Morton, npiggin

  Hello,

  here is the fourth version of the writeback livelock avoidance patches
for data integrity writes. To quickly summarize the idea: we tag dirty
pages at the beginning of write_cache_pages with a new TOWRITE tag and
then write only tagged pages to avoid parallel writers to livelock us.
See changelogs of the patches for more details.
  I have tested the patches with fsx and a test program I wrote which
checks that if we crash after fsync, the data is indeed on disk.
  If there are no more concerns, can these patches get merged?

								Honza

  Changes since last version:
- tagging function was changed to stop after given amount of pages to
  avoid keeping tree_lock and irqs disabled for too long
- changed names and updated comments as Andrew suggested
- measured memory impact and reported it in the changelog

  Things suggested but not changed (I want to avoid going in circles ;):
- use tagging also for WB_SYNC_NONE writeback - there's problem with an
  interaction with wbc->nr_to_write. If we tag all dirty pages, we can
  spend too much time tagging when we write only a few pages in the end
  because of nr_to_write. If we tag only say nr_to_write pages, we may
  not have enough pages tagged because some pages are written out by
  someone else and so we would have to restart and tagging would become
  essentially useless. So my option is - switch to tagging for WB_SYNC_NONE
  writeback if we can get rid of nr_to_write. But that's a story for
  a different patch set.
- implement function for clearing several tags (TOWRITE, DIRTY) at once
  - IMHO not worth it because we would save only conversion of page index
  to radix tree offsets. The rest would have to be separate anyways. And
  the interface would be incosistent as well...
- use __lookup_tag to implement radix_tree_range_tag_if_tagged - doesn't
  quite work because __lookup_tag returns only leaf nodes so we'd have to
  implement tree traversal anyways to tag also internal nodes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread
[parent not found: <1131.86.55.168.2.1170690089.squirrel@mail.thinknet.ro>]
* Re: your mail
@ 2003-01-24  5:54 Anoop J.
  2003-01-24  6:28 ` David Lang
  0 siblings, 1 reply; 54+ messages in thread
From: Anoop J. @ 2003-01-24  5:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel

How is this different from a fully associative cache .Would be better if u
could deal it based on the address bits used

Thanks

David Lang wrote:

>The idea of page coloring is based on the fact that common implementations
>of caching can't put any page in memory in any line in the cache (such an
>implementation is possible, but is more expensive to do so is not commonly
>done)
>
>With this implementation it means that if your program happens to use
>memory that cannot be mapped to half of the cache lines then effectivly
>the CPU cache is half it's rated size for your program. the next time your
>program runs it may get a more favorable memory allocation and be able to
>use all of the cache and therefor run faster.
>
>Page coloring is an attampt to take this into account when allocating
>memory to programs so that every program gets to use all of the cache.
>
>David Lang
>
>
> On Fri, 24 Jan 2003, Anoop J. wrote:
>
>>Date: Fri, 24 Jan 2003 10:38:03 +0530 (IST)
>>From: Anoop J. <cs99001@nitc.ac.in>
>>To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
>>
>>
>>How does page coloring work. Iwant its mechanism not the implementation.
>>I went through some pages of W.L.Lynch's paper on cache and VM. Still not
>>able to grasp it .
>>
>>
>>Thanks in advance
>>
>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at  http://www.tux.org/lkml/
>>
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (unknown), 
@ 2003-01-24  5:08 Anoop J.
  2003-01-24  5:11 ` your mail David Lang
  0 siblings, 1 reply; 54+ messages in thread
From: Anoop J. @ 2003-01-24  5:08 UTC (permalink / raw)
  To: linux-kernel, linux-mm


How does page coloring work. Iwant its mechanism not the implementation.
I went through some pages of W.L.Lynch's paper on cache and VM. Still not
able to grasp it .


Thanks in advance

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2002-04-21 14:54 raciel
  2002-04-21 19:12 ` your mail William Lee Irwin III
  0 siblings, 1 reply; 54+ messages in thread
From: raciel @ 2002-04-21 14:54 UTC (permalink / raw)
  To: linux-mm


Hello all :)

	I have been trying to understand the rmap patch from Rik Van Riel, but
i dont undertand what the rmap patch do. Somebody can explain me or know
a good site where i can get documentation?

Regards Raciel 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2002-01-02 14:20 mehul radheshyam choube
  2002-01-03 16:40 ` your mail Rik van Riel
  0 siblings, 1 reply; 54+ messages in thread
From: mehul radheshyam choube @ 2002-01-02 14:20 UTC (permalink / raw)
  To: linux-mm

hi friends,

   i would like to do some developement work for linux-mm
   please guide me .

mehul. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2001-08-04 11:10 Mahmoud Taghizadeh
  2001-08-04 13:18 ` your mail Francois Romieu
  0 siblings, 1 reply; 54+ messages in thread
From: Mahmoud Taghizadeh @ 2001-08-04 11:10 UTC (permalink / raw)
  To: linux-mm

I am sorry for my stupid question!
is there any linux distribution without memory managment unit?
I mean, the paging is disabled and memory managemnt is done only
by segmentation.

I am thankful in advance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2001-06-08  1:36 jnn
  2001-06-08 13:16 ` your mail Ralf Baechle
  0 siblings, 1 reply; 54+ messages in thread
From: jnn @ 2001-06-08  1:36 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 111 bytes --]

hi,all
     somebody PLS tell me where can I find some documention about the mechanism of the cach flushing.

[-- Attachment #2: Type: text/html, Size: 462 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 2000-09-04 12:01 Sahil
  2000-09-04 15:35 ` your mail Rik van Riel
  0 siblings, 1 reply; 54+ messages in thread
From: Sahil @ 2000-09-04 12:01 UTC (permalink / raw)
  To: linux-mm

dear friends,
I am a newbie to kernel programming.
can anyone suggest some good readings for the same?

Shahil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 54+ messages in thread
* (no subject)
@ 1998-02-25 22:15 Rik van Riel
  1998-02-25 22:48 ` your mail Linus Torvalds
  0 siblings, 1 reply; 54+ messages in thread
From: Rik van Riel @ 1998-02-25 22:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, linux-mm

Hi,

I've just come up with a very simple idea to limit
thrashing, and I'm asking you if you want it implemented
(there's some cost involved :-( ).

We could simply prohibit the VM subsystem from swapping
out pages which have been allocated less than one second
ago, this way the movement of pages becomes 'slower', and
thrashing might get somewhat less.

The cost involved is that we have to add a new entry
to the page_struct :-( and do some (relatively cheap)
bookkeeping on every page. Also, this might limit the
rate of allocation some programs do, giving rise to
all sorts of new and exiting problems.

Rik.
+-----------------------------+------------------------------+
| For Linux mm-patches, go to | "I'm busy managing memory.." |
| my homepage (via LinuxHQ).  | H.H.vanRiel@fys.ruu.nl       |
| ...submissions welcome...   | http://www.fys.ruu.nl/~riel/ |
+-----------------------------+------------------------------+

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2025-02-28 17:30 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-03-28  8:19 your mail pnilesh
2000-03-28 13:26 ` Stephen C. Tweedie
  -- strict thread matches above, loose matches on Subject: below --
2025-02-24 22:52 [PATCH v7 0/7] mseal system mappings jeffxu
2025-02-25 15:18 ` Lorenzo Stoakes
2025-02-26  0:12   ` Jeff Xu
2025-02-26  5:42     ` your mail Lorenzo Stoakes
2025-02-28  0:55       ` Jeff Xu
2025-02-28  9:35         ` Lorenzo Stoakes
2025-02-28 17:24           ` Jeff Xu
2025-02-28 17:30             ` Lorenzo Stoakes
2023-05-10 19:01 [PATCH] maple_tree: Fix a few documentation issues, Thomas Gleixner
2023-05-15 19:27 ` your mail Liam R. Howlett
2023-05-15 21:16   ` Thomas Gleixner
2023-05-16 22:47   ` Thomas Gleixner
2023-05-23 13:46     ` Liam R. Howlett
     [not found] <20190225201635.4648-1-hannes@cmpxchg.org>
2019-02-26 23:49 ` Roman Gushchin
2017-04-10 11:03 [PATCH -v2 0/9] mm: make movable onlining suck less Michal Hocko
2017-04-15 12:17 ` Michal Hocko
2017-04-17  5:47   ` your mail Joonsoo Kim
2017-04-17  8:15     ` Michal Hocko
2017-04-20  1:27       ` Joonsoo Kim
2017-04-20  7:28         ` Michal Hocko
2017-04-20  8:49           ` Michal Hocko
2017-04-20 11:56             ` Vlastimil Babka
2017-04-20 12:13               ` Michal Hocko
2017-04-21  4:38           ` Joonsoo Kim
2017-04-21  7:16             ` Michal Hocko
2017-04-24  1:44               ` Joonsoo Kim
2017-04-24  7:53                 ` Michal Hocko
2017-04-25  2:50                   ` Joonsoo Kim
2017-04-26  9:19                     ` Michal Hocko
2017-04-27  2:08                       ` Joonsoo Kim
2017-04-27 15:10                         ` Michal Hocko
2012-10-04 16:50 Andrea Arcangeli
2012-10-04 18:17 ` your mail Christoph Lameter
2010-06-16 16:33 Jan Kara
2010-06-16 22:15 ` your mail Dave Chinner
2010-06-22  2:59 ` Wu Fengguang
2010-06-22 13:54   ` Jan Kara
2010-06-22 14:12     ` Wu Fengguang
     [not found] <1131.86.55.168.2.1170690089.squirrel@mail.thinknet.ro>
2007-02-05 12:36 ` Joerg Roedel
2003-01-24  5:54 Anoop J.
2003-01-24  6:28 ` David Lang
2003-01-24  8:51   ` Anoop J.
2003-01-24  8:48     ` David Lang
2003-01-24  9:49       ` Anoop J.
2003-01-24 19:14         ` David Lang
2003-01-24 19:40           ` Maciej W. Rozycki
2003-01-24  5:08 (unknown), Anoop J.
2003-01-24  5:11 ` your mail David Lang
2003-01-24  6:06   ` John Alvord
2003-01-25  2:29     ` Jason Papadopoulos
2003-01-25  2:26       ` Larry McVoy
2003-01-25 17:47         ` Eric W. Biederman
2003-01-25 23:10           ` Larry McVoy
2003-01-26  8:12             ` David S. Miller
2002-04-21 14:54 raciel
2002-04-21 19:12 ` your mail William Lee Irwin III
2002-01-02 14:20 mehul radheshyam choube
2002-01-03 16:40 ` your mail Rik van Riel
2001-08-04 11:10 Mahmoud Taghizadeh
2001-08-04 13:18 ` your mail Francois Romieu
2001-06-08  1:36 jnn
2001-06-08 13:16 ` your mail Ralf Baechle
2000-09-04 12:01 Sahil
2000-09-04 15:35 ` your mail Rik van Riel
1998-02-25 22:15 Rik van Riel
1998-02-25 22:48 ` your mail Linus Torvalds
1998-02-25 23:26   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).