* [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove.
@ 2013-01-15 10:54 Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation Tang Chen
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
Hi Andrew,
Here are some bug fix patches for physical hot-remove patches.
And there are some new ones reported by others, I'll fix them soon.
Thanks. :)
Tang Chen (6):
Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X
operation.
Bug fix: Do not calculate direct mapping pages when freeing vmemmap
pagetables.
Bug fix: Do not free direct mapping pages twice.
Bug fix: Do not free page split from hugepage one by one.
Bug fix: Fix the wrong comments of map_entries.
Bug fix: Reuse the storage of /sys/firmware/memmap/X/ allocated by
bootmem.
arch/x86/mm/init_64.c | 92 ++++++++++++++++++++++++++++-----
drivers/firmware/memmap.c | 124 ++++++++++++++++++++++++++++++++++++--------
2 files changed, 179 insertions(+), 37 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
2013-01-15 22:26 ` Julian Calaby
2013-01-15 10:54 ` [BUG Fix Patch 2/6] Bug fix: Do not calculate direct mapping pages when freeing vmemmap pagetables Tang Chen
` (4 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
It is unsafe to return an entry pointer and release the map_entries_lock. So we should
not hold the map_entries_lock separately in firmware_map_find_entry() and
firmware_map_remove_entry(). Hold the map_entries_lock across find and remove
/sys/firmware/memmap/X operation.
And also, users of these two functions need to be careful to hold the lock when using
these two functions.
The suggestion is from Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/firmware/memmap.c | 25 +++++++++++++++++--------
1 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index 4211da5..940c4e9 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -150,12 +150,12 @@ static int firmware_map_add_entry(u64 start, u64 end,
* firmware_map_remove_entry() - Does the real work to remove a firmware
* memmap entry.
* @entry: removed entry.
+ *
+ * The caller must hold map_entries_lock, and release it properly.
**/
static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
{
- spin_lock(&map_entries_lock);
list_del(&entry->list);
- spin_unlock(&map_entries_lock);
}
/*
@@ -188,23 +188,28 @@ static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
}
/*
- * Search memmap entry
+ * firmware_map_find_entry: Search memmap entry.
+ * @start: Start of the memory range.
+ * @end: End of the memory range (exclusive).
+ * @type: Type of the memory range.
+ *
+ * This function is to find the memmap entey of a given memory range.
+ * The caller must hold map_entries_lock, and must not release the lock
+ * until the processing of the returned entry has completed.
+ *
+ * Return pointer to the entry to be found on success, or NULL on failure.
*/
-
static struct firmware_map_entry * __meminit
firmware_map_find_entry(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;
- spin_lock(&map_entries_lock);
list_for_each_entry(entry, &map_entries, list)
if ((entry->start == start) && (entry->end == end) &&
(!strcmp(entry->type, type))) {
- spin_unlock(&map_entries_lock);
return entry;
}
- spin_unlock(&map_entries_lock);
return NULL;
}
@@ -274,11 +279,15 @@ int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;
+ spin_lock(&map_entries_lock);
entry = firmware_map_find_entry(start, end - 1, type);
- if (!entry)
+ if (!entry) {
+ spin_unlock(&map_entries_lock);
return -EINVAL;
+ }
firmware_map_remove_entry(entry);
+ spin_unlock(&map_entries_lock);
/* remove the memmap entry */
remove_sysfs_fw_map_entry(entry);
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [BUG Fix Patch 2/6] Bug fix: Do not calculate direct mapping pages when freeing vmemmap pagetables.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 3/6] Bug fix: Do not free direct mapping pages twice Tang Chen
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
We only need to update direct_pages_count[level] when we freeing direct mapped
pagetables.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 17 +++++++----------
1 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e829113..368cc3f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -804,14 +804,13 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
if (IS_ALIGNED(addr, PAGE_SIZE) &&
IS_ALIGNED(next, PAGE_SIZE)) {
- if (!direct) {
+ if (!direct)
free_pagetable(pte_page(*pte), 0);
- pages++;
- }
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
+ pages++;
} else {
/*
* If we are not removing the whole page, it means
@@ -824,11 +823,11 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
free_pagetable(pte_page(*pte), 0);
- pages++;
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
+ pages++;
}
}
}
@@ -857,15 +856,14 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
if (pmd_large(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
- if (!direct) {
+ if (!direct)
free_pagetable(pmd_page(*pmd),
get_order(PMD_SIZE));
- pages++;
- }
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
+ pages++;
continue;
}
@@ -914,15 +912,14 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
if (pud_large(*pud)) {
if (IS_ALIGNED(addr, PUD_SIZE) &&
IS_ALIGNED(next, PUD_SIZE)) {
- if (!direct) {
+ if (!direct)
free_pagetable(pud_page(*pud),
get_order(PUD_SIZE));
- pages++;
- }
spin_lock(&init_mm.page_table_lock);
pud_clear(pud);
spin_unlock(&init_mm.page_table_lock);
+ pages++;
continue;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [BUG Fix Patch 3/6] Bug fix: Do not free direct mapping pages twice.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 2/6] Bug fix: Do not calculate direct mapping pages when freeing vmemmap pagetables Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 4/6] Bug fix: Do not free page split from hugepage one by one Tang Chen
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
Direct mapped pages were freed when they were offlined, or they were
not allocated. So we only need to free vmemmap pages, no need to free
direct mapped pages.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 368cc3f..e77d312 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -804,6 +804,10 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
if (IS_ALIGNED(addr, PAGE_SIZE) &&
IS_ALIGNED(next, PAGE_SIZE)) {
+ /*
+ * Do not free direct mapping pages since they were
+ * freed when offlining.
+ */
if (!direct)
free_pagetable(pte_page(*pte), 0);
@@ -819,10 +823,11 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
* remove the page when it is wholly filled with 0xFD.
*/
memset((void *)addr, PAGE_INUSE, next - addr);
- page_addr = page_address(pte_page(*pte));
+ page_addr = page_address(pte_page(*pte));
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
- free_pagetable(pte_page(*pte), 0);
+ if (!direct)
+ free_pagetable(pte_page(*pte), 0);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [BUG Fix Patch 4/6] Bug fix: Do not free page split from hugepage one by one.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
` (2 preceding siblings ...)
2013-01-15 10:54 ` [BUG Fix Patch 3/6] Bug fix: Do not free direct mapping pages twice Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 5/6] Bug fix: Fix the wrong comments of map_entries Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 6/6] Bug fix: Reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem Tang Chen
5 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
When we split a larger page into smaller ones, we should not free them
one by one because only the _count of the first page struct makes sense.
Otherwise, the kernel will panic.
So fulfill the unused small pages with 0xFD, and when the whole larger
page is fulfilled with 0xFD, free the whole larger page.
The call trace is like the following:
[ 1052.819430] ------------[ cut here ]------------
[ 1052.874575] kernel BUG at include/linux/mm.h:278!
[ 1052.930754] invalid opcode: 0000 [#1] SMP
[ 1052.979888] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[ 1053.580111] CPU 0
[ 1053.602026] Pid: 4, comm: kworker/0:0 Tainted: G W 3.8.0-rc2-memory-hotremove+ #3 FUJITSU-SV PRIMEQUEST 1800E/SB
[ 1053.736188] RIP: 0010:[<ffffffff81175bd7>] [<ffffffff81175bd7>] __free_pages+0x37/0x50
[ 1053.831952] RSP: 0018:ffff8807bdcb37f8 EFLAGS: 00010246
[ 1053.895403] RAX: 0000000000000000 RBX: ffff88077c401000 RCX: 000000000000002c
[ 1053.980660] RDX: ffff8807fffd7000 RSI: 0000000000000000 RDI: ffffea001df10040
[ 1054.065917] RBP: ffff8807bdcb37f8 R08: 0000000000000000 R09: 0000000000000000
[ 1054.151178] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1054.236433] R13: ffffea0040008000 R14: ffffea0040002000 R15: 00003ffffffff000
[ 1054.321691] FS: 0000000000000000(0000) GS:ffff8807c1a00000(0000) knlGS:0000000000000000
[ 1054.418372] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1054.487015] CR2: 00007fbc137ad000 CR3: 0000000001c0c000 CR4: 00000000000007f0
[ 1054.572272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1054.657533] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1054.742797] Process kworker/0:0 (pid: 4, threadinfo ffff8807bdcb2000, task ffff8807bde18000)
[ 1054.843628] Stack:
[ 1054.867622] ffff8807bdcb3818 ffffffff81175ccf 000000077c401000 ffff880796349008
[ 1054.956421] ffff8807bdcb3848 ffffffff816a16e2 ffffea0040001000 ffff880796349008
[ 1055.045227] ffffea0040008000 ffffea0040002000 ffff8807bdcb38b8 ffffffff816a181c
[ 1055.134033] Call Trace:
[ 1055.163230] [<ffffffff81175ccf>] free_pages+0x5f/0x70
[ 1055.224611] [<ffffffff816a16e2>] free_pagetable+0x7f/0xee
[ 1055.290147] [<ffffffff816a181c>] remove_pte_table+0xcb/0x1cd
[ 1055.358806] [<ffffffff81055bd0>] ? leave_mm+0x50/0x50
[ 1055.420187] [<ffffffff810df55d>] ? trace_hardirqs_on+0xd/0x10
[ 1055.489882] [<ffffffff816a1f8b>] remove_pmd_table+0x191/0x253
[ 1055.559576] [<ffffffff816a261e>] remove_pud_table+0x194/0x24d
[ 1055.629270] [<ffffffff811bbc3f>] ? sparse_remove_one_section+0x2f/0x150
[ 1055.709348] [<ffffffff816a278c>] remove_pagetable+0xb5/0x17c
[ 1055.778002] [<ffffffff81692f28>] vmemmap_free+0x18/0x20
[ 1055.841465] [<ffffffff811bbd15>] sparse_remove_one_section+0x105/0x150
[ 1055.920508] [<ffffffff811c953c>] __remove_pages+0xec/0x110
[ 1055.987087] [<ffffffff81692fa7>] arch_remove_memory+0x77/0xc0
[ 1056.056781] [<ffffffff81694138>] remove_memory+0xb8/0xf0
[ 1056.121284] [<ffffffff814040aa>] acpi_memory_device_remove+0x76/0xbc
[ 1056.198244] [<ffffffff813c1e50>] acpi_device_remove+0x90/0xb2
[ 1056.267941] [<ffffffff8146bf3c>] __device_release_driver+0x7c/0xf0
[ 1056.342824] [<ffffffff8146c0bf>] device_release_driver+0x2f/0x50
[ 1056.415635] [<ffffffff813c3142>] acpi_bus_remove+0x32/0x6d
[ 1056.482215] [<ffffffff813c320e>] acpi_bus_trim+0x91/0x102
[ 1056.547755] [<ffffffff813c3307>] acpi_bus_hot_remove_device+0x88/0x180
[ 1056.626794] [<ffffffff813be152>] acpi_os_execute_deferred+0x27/0x34
[ 1056.702717] [<ffffffff81091c5e>] process_one_work+0x20e/0x5c0
[ 1056.772411] [<ffffffff81091bef>] ? process_one_work+0x19f/0x5c0
[ 1056.844184] [<ffffffff813be12b>] ? acpi_os_wait_events_complete+0x23/0x23
[ 1056.926337] [<ffffffff81093cfe>] worker_thread+0x12e/0x370
[ 1056.992908] [<ffffffff81093bd0>] ? manage_workers+0x180/0x180
[ 1057.062602] [<ffffffff81099e3e>] kthread+0xee/0x100
[ 1057.121913] [<ffffffff810e10b9>] ? __lock_release+0x129/0x190
[ 1057.191609] [<ffffffff81099d50>] ? __init_kthread_worker+0x70/0x70
[ 1057.266494] [<ffffffff816b33ac>] ret_from_fork+0x7c/0xb0
[ 1057.330992] [<ffffffff81099d50>] ? __init_kthread_worker+0x70/0x70
[ 1057.405863] Code: 85 c0 74 27 f0 ff 4f 1c 0f 94 c0 84 c0 74 0a 85 f6 74 11 90 e8 bb e3 ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 e8 1b fe ff ff c9 c3 <0f> 0b 0f 1f 80 00 00 00 00 eb f7 66 66 66 66 66 2e 0f 1f 84 00
[ 1057.638271] RIP [<ffffffff81175bd7>] __free_pages+0x37/0x50
[ 1057.705994] RSP <ffff8807bdcb37f8>
[ 1057.747882] ---[ end trace 0f90e1e054d174f9 ]---
[ 1057.803158] Kernel panic - not syncing: Fatal exception
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 80 +++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 71 insertions(+), 9 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e77d312..ff0206b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -777,7 +777,7 @@ static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
static void __meminit
remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
- bool direct)
+ bool direct, bool split)
{
unsigned long next, pages = 0;
pte_t *pte;
@@ -806,14 +806,30 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
IS_ALIGNED(next, PAGE_SIZE)) {
/*
* Do not free direct mapping pages since they were
- * freed when offlining.
+ * freed when offlining, or simplely not in use.
+ *
+ * Do not free pages split from larger page since only
+ * the _count of the 1st page struct is available.
+ * Free the larger page when it is fulfilled with 0xFD.
*/
- if (!direct)
- free_pagetable(pte_page(*pte), 0);
+ if (!direct) {
+ if (split) {
+ /*
+ * Fill the split 4KB page with 0xFD.
+ * When the whole 2MB page is fulfilled
+ * with 0xFD, it could be freed.
+ */
+ memset((void *)addr, PAGE_INUSE,
+ PAGE_SIZE);
+ } else
+ free_pagetable(pte_page(*pte), 0);
+ }
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
+
+ /* For non-direct mapping, pages means nothing. */
pages++;
} else {
/*
@@ -824,6 +840,10 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
*/
memset((void *)addr, PAGE_INUSE, next - addr);
+ /*
+ * If the range is not aligned to PAGE_SIZE, then the
+ * page is definitely not split from larger page.
+ */
page_addr = page_address(pte_page(*pte));
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
if (!direct)
@@ -845,11 +865,13 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
static void __meminit
remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
- bool direct)
+ bool direct, bool split)
{
unsigned long pte_phys, next, pages = 0;
pte_t *pte_base;
pmd_t *pmd;
+ void *page_addr;
+ bool split_pmd = split, split_pte = false;
pmd = pmd_start + pmd_index(addr);
for (; addr < end; addr = next, pmd++) {
@@ -861,14 +883,32 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
if (pmd_large(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
- if (!direct)
- free_pagetable(pmd_page(*pmd),
+ if (!direct) {
+ if (split_pmd) {
+ /*
+ * Fill the split 2MB page with
+ * 0xFD. When the whole 1GB page
+ * is fulfilled with 0xFD, it
+ * could be freed.
+ */
+ memset((void *)addr, PAGE_INUSE,
+ PMD_SIZE);
+ } else {
+ free_pagetable(pmd_page(*pmd),
get_order(PMD_SIZE));
+ }
+ }
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
+
+ /*
+ * For non-direct mapping, pages means
+ * nothing.
+ */
pages++;
+
continue;
}
@@ -880,6 +920,7 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
BUG_ON(!pte_base);
__split_large_page((pte_t *)pmd, addr,
(pte_t *)pte_base);
+ split_pte = true;
spin_lock(&init_mm.page_table_lock);
pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
@@ -889,7 +930,16 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
}
pte_base = (pte_t *)map_low_page((pte_t *)pmd_page_vaddr(*pmd));
- remove_pte_table(pte_base, addr, next, direct);
+ remove_pte_table(pte_base, addr, next, direct, split_pte);
+
+ if (!direct && split_pte) {
+ page_addr = page_address(pmd_page(*pmd));
+ if (!memchr_inv(page_addr, PAGE_INUSE, PMD_SIZE)) {
+ free_pagetable(pmd_page(*pmd),
+ get_order(PMD_SIZE));
+ }
+ }
+
free_pte_table(pte_base, pmd);
unmap_low_page(pte_base);
}
@@ -906,6 +956,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
unsigned long pmd_phys, next, pages = 0;
pmd_t *pmd_base;
pud_t *pud;
+ void *page_addr;
+ bool split_pmd = false;
pud = pud_start + pud_index(addr);
for (; addr < end; addr = next, pud++) {
@@ -936,6 +988,7 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
BUG_ON(!pmd_base);
__split_large_page((pte_t *)pud, addr,
(pte_t *)pmd_base);
+ split_pmd = true;
spin_lock(&init_mm.page_table_lock);
pud_populate(&init_mm, pud, __va(pmd_phys));
@@ -945,7 +998,16 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
}
pmd_base = (pmd_t *)map_low_page((pmd_t *)pud_page_vaddr(*pud));
- remove_pmd_table(pmd_base, addr, next, direct);
+ remove_pmd_table(pmd_base, addr, next, direct, split_pmd);
+
+ if (!direct && split_pmd) {
+ page_addr = page_address(pud_page(*pud));
+ if (!memchr_inv(page_addr, PAGE_INUSE, PUD_SIZE)) {
+ free_pagetable(pud_page(*pud),
+ get_order(PUD_SIZE));
+ }
+ }
+
free_pmd_table(pmd_base, pud);
unmap_low_page(pmd_base);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [BUG Fix Patch 5/6] Bug fix: Fix the wrong comments of map_entries.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
` (3 preceding siblings ...)
2013-01-15 10:54 ` [BUG Fix Patch 4/6] Bug fix: Do not free page split from hugepage one by one Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 6/6] Bug fix: Reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem Tang Chen
5 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
Now we have a map_entries_lock to protect map_entries list.
So we need to update the comments.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/firmware/memmap.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index 940c4e9..ae823b4 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -108,11 +108,7 @@ static struct kobj_type memmap_ktype = {
* Registration functions ------------------------------------------------------
*/
-/*
- * Firmware memory map entries. No locking is needed because the
- * firmware_map_add() and firmware_map_add_early() functions are called
- * in firmware initialisation code in one single thread of execution.
- */
+/* Firmware memory map entries. */
static LIST_HEAD(map_entries);
static DEFINE_SPINLOCK(map_entries_lock);
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [BUG Fix Patch 6/6] Bug fix: Reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem.
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
` (4 preceding siblings ...)
2013-01-15 10:54 ` [BUG Fix Patch 5/6] Bug fix: Fix the wrong comments of map_entries Tang Chen
@ 2013-01-15 10:54 ` Tang Chen
5 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-15 10:54 UTC (permalink / raw)
To: akpm, rientjes, len.brown, benh, paulus, cl, minchan.kim,
kosaki.motohiro, isimatu.yasuaki, wujianguo, wency, tangchen, hpa,
linfeng, laijs, mgorman, yinghai, glommer, jiang.liu
Cc: linux-s390, linux-ia64, linux-acpi, linux-sh, x86, linux-kernel,
cmetcalf, linux-mm, sparclinux, linuxppc-dev
Now we don't free firmware_map_entry which is allocated by bootmem because
there is no way to do so when the system is up. But we can at least remember
the address of that memory and reuse the storage when the memory is added
next time.
This patch introduces a new list map_entries_bootmem to link the map entries
allocated by bootmem when they are removed, and a lock to protect it.
And these entries will be reused when the memory is hot-added again.
The idea is suggestted by Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/firmware/memmap.c | 103 ++++++++++++++++++++++++++++++++++++++-------
1 files changed, 88 insertions(+), 15 deletions(-)
diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index ae823b4..0710179 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -53,6 +53,9 @@ static ssize_t start_show(struct firmware_map_entry *entry, char *buf);
static ssize_t end_show(struct firmware_map_entry *entry, char *buf);
static ssize_t type_show(struct firmware_map_entry *entry, char *buf);
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type);
+
/*
* Static data -----------------------------------------------------------------
*/
@@ -80,6 +83,19 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
};
+/* Firmware memory map entries. */
+static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);
+
+/*
+ * For memory hotplug, there is no way to free memory map entries allocated
+ * by boot mem after the system is up. So when we hot-remove memory whose
+ * map entry is allocated by bootmem, we need to remember the storage and
+ * reuse it when the memory is hot-added again.
+ */
+static LIST_HEAD(map_entries_bootmem);
+static DEFINE_SPINLOCK(map_entries_bootmem_lock);
+
static inline struct firmware_map_entry *
to_memmap_entry(struct kobject *kobj)
@@ -91,9 +107,22 @@ static void release_firmware_map_entry(struct kobject *kobj)
{
struct firmware_map_entry *entry = to_memmap_entry(kobj);
- if (PageReserved(virt_to_page(entry)))
- /* There is no way to free memory allocated from bootmem */
+ if (PageReserved(virt_to_page(entry))) {
+ /*
+ * Remember the storage allocated by bootmem, and reuse it when
+ * the memory is hot-added again. The entry will be added to
+ * map_entries_bootmem here, and deleted from &map_entries in
+ * firmware_map_remove_entry().
+ */
+ if (firmware_map_find_entry(entry->start, entry->end,
+ entry->type)) {
+ spin_lock(&map_entries_bootmem_lock);
+ list_add(&entry->list, &map_entries_bootmem);
+ spin_unlock(&map_entries_bootmem_lock);
+ }
+
return;
+ }
kfree(entry);
}
@@ -108,10 +137,6 @@ static struct kobj_type memmap_ktype = {
* Registration functions ------------------------------------------------------
*/
-/* Firmware memory map entries. */
-static LIST_HEAD(map_entries);
-static DEFINE_SPINLOCK(map_entries_lock);
-
/**
* firmware_map_add_entry() - Does the real work to add a firmware memmap entry.
* @start: Start of the memory range.
@@ -184,23 +209,25 @@ static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
}
/*
- * firmware_map_find_entry: Search memmap entry.
+ * firmware_map_find_entry_in_list: Search memmap entry in a given list.
* @start: Start of the memory range.
* @end: End of the memory range (exclusive).
* @type: Type of the memory range.
+ * @list: In which to find the entry.
*
- * This function is to find the memmap entey of a given memory range.
- * The caller must hold map_entries_lock, and must not release the lock
- * until the processing of the returned entry has completed.
+ * This function is to find the memmap entey of a given memory range in a
+ * given list. The caller must hold map_entries_lock, and must not release
+ * the lock until the processing of the returned entry has completed.
*
* Return pointer to the entry to be found on success, or NULL on failure.
*/
static struct firmware_map_entry * __meminit
-firmware_map_find_entry(u64 start, u64 end, const char *type)
+firmware_map_find_entry_in_list(u64 start, u64 end, const char *type,
+ struct list_head *list)
{
struct firmware_map_entry *entry;
- list_for_each_entry(entry, &map_entries, list)
+ list_for_each_entry(entry, list, list)
if ((entry->start == start) && (entry->end == end) &&
(!strcmp(entry->type, type))) {
return entry;
@@ -209,6 +236,42 @@ firmware_map_find_entry(u64 start, u64 end, const char *type)
return NULL;
}
+/*
+ * firmware_map_find_entry: Search memmap entry in map_entries.
+ * @start: Start of the memory range.
+ * @end: End of the memory range (exclusive).
+ * @type: Type of the memory range.
+ *
+ * This function is to find the memmap entey of a given memory range.
+ * The caller must hold map_entries_lock, and must not release the lock
+ * until the processing of the returned entry has completed.
+ *
+ * Return pointer to the entry to be found on success, or NULL on failure.
+ */
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+ return firmware_map_find_entry_in_list(start, end, type, &map_entries);
+}
+
+/*
+ * firmware_map_find_entry_bootmem: Search memmap entry in map_entries_bootmem.
+ * @start: Start of the memory range.
+ * @end: End of the memory range (exclusive).
+ * @type: Type of the memory range.
+ *
+ * This function is similar to firmware_map_find_entry except that it find the
+ * given entry in map_entries_bootmem.
+ *
+ * Return pointer to the entry to be found on success, or NULL on failure.
+ */
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry_bootmem(u64 start, u64 end, const char *type)
+{
+ return firmware_map_find_entry_in_list(start, end, type,
+ &map_entries_bootmem);
+}
+
/**
* firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
* memory hotplug.
@@ -226,9 +289,19 @@ int __meminit firmware_map_add_hotplug(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;
- entry = kzalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
- if (!entry)
- return -ENOMEM;
+ entry = firmware_map_find_entry_bootmem(start, end, type);
+ if (!entry) {
+ entry = kzalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
+ if (!entry)
+ return -ENOMEM;
+ } else {
+ /* Reuse storage allocated by bootmem. */
+ spin_lock(&map_entries_bootmem_lock);
+ list_del(&entry->list);
+ spin_unlock(&map_entries_bootmem_lock);
+
+ memset(entry, 0, sizeof(*entry));
+ }
firmware_map_add_entry(start, end, type, entry);
/* create the memmap entry */
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation.
2013-01-15 10:54 ` [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation Tang Chen
@ 2013-01-15 22:26 ` Julian Calaby
2013-01-16 2:12 ` Tang Chen
0 siblings, 1 reply; 9+ messages in thread
From: Julian Calaby @ 2013-01-15 22:26 UTC (permalink / raw)
To: Tang Chen
Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
kosaki.motohiro, rientjes, len.brown, jiang.liu, wency, cmetcalf,
glommer, wujianguo, yinghai, laijs, linux-kernel, minchan.kim,
akpm, linuxppc-dev
Hi Tang,
One minor point.
On Tue, Jan 15, 2013 at 9:54 PM, Tang Chen <tangchen@cn.fujitsu.com> wrote:
> It is unsafe to return an entry pointer and release the map_entries_lock. So we should
> not hold the map_entries_lock separately in firmware_map_find_entry() and
> firmware_map_remove_entry(). Hold the map_entries_lock across find and remove
> /sys/firmware/memmap/X operation.
>
> And also, users of these two functions need to be careful to hold the lock when using
> these two functions.
>
> The suggestion is from Andrew Morton <akpm@linux-foundation.org>
>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
> drivers/firmware/memmap.c | 25 +++++++++++++++++--------
> 1 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
> index 4211da5..940c4e9 100644
> --- a/drivers/firmware/memmap.c
> +++ b/drivers/firmware/memmap.c
> @@ -188,23 +188,28 @@ static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
> }
>
> /*
> - * Search memmap entry
> + * firmware_map_find_entry: Search memmap entry.
> + * @start: Start of the memory range.
> + * @end: End of the memory range (exclusive).
> + * @type: Type of the memory range.
> + *
> + * This function is to find the memmap entey of a given memory range.
> + * The caller must hold map_entries_lock, and must not release the lock
> + * until the processing of the returned entry has completed.
> + *
> + * Return pointer to the entry to be found on success, or NULL on failure.
Why not make this completely kernel-doc compliant as you're already
re-writing the comment?
Thanks,
--
Julian Calaby
Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation.
2013-01-15 22:26 ` Julian Calaby
@ 2013-01-16 2:12 ` Tang Chen
0 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2013-01-16 2:12 UTC (permalink / raw)
To: Julian Calaby
Cc: linux-ia64, linux-sh, linux-mm, paulus, hpa, sparclinux, cl,
linux-s390, x86, linux-acpi, isimatu.yasuaki, linfeng, mgorman,
kosaki.motohiro, rientjes, len.brown, jiang.liu, wency, cmetcalf,
glommer, wujianguo, yinghai, laijs, linux-kernel, minchan.kim,
akpm, linuxppc-dev
On 01/16/2013 06:26 AM, Julian Calaby wrote:
> Hi Tang,
>
> One minor point.
>
>>
>> /*
>> - * Search memmap entry
>> + * firmware_map_find_entry: Search memmap entry.
>> + * @start: Start of the memory range.
>> + * @end: End of the memory range (exclusive).
>> + * @type: Type of the memory range.
>> + *
>> + * This function is to find the memmap entey of a given memory range.
>> + * The caller must hold map_entries_lock, and must not release the lock
>> + * until the processing of the returned entry has completed.
>> + *
>> + * Return pointer to the entry to be found on success, or NULL on failure.
>
> Why not make this completely kernel-doc compliant as you're already
> re-writing the comment?
Hi Julian,
Thank you for reminding me this. I think I may have some more problems
like this. I'll post a patch to fix as many of them as I can. :)
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-01-16 2:12 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-15 10:54 [BUG Fix Patch 0/6] Bug fix for physical memory hot-remove Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 1/6] Bug fix: Hold spinlock across find|remove /sys/firmware/memmap/X operation Tang Chen
2013-01-15 22:26 ` Julian Calaby
2013-01-16 2:12 ` Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 2/6] Bug fix: Do not calculate direct mapping pages when freeing vmemmap pagetables Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 3/6] Bug fix: Do not free direct mapping pages twice Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 4/6] Bug fix: Do not free page split from hugepage one by one Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 5/6] Bug fix: Fix the wrong comments of map_entries Tang Chen
2013-01-15 10:54 ` [BUG Fix Patch 6/6] Bug fix: Reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem Tang Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).