Re: [PATCH] makedumpfile: ARM: get correct mem_map offset

From: Liu hua <sdu.liu@huawei.com>
To: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH] makedumpfile: ARM: get correct mem_map offset
Date: Mon, 26 May 2014 13:17:04 +0800	[thread overview]
Message-ID: <5382CE50.4050807@huawei.com> (raw)
In-Reply-To: <0910DD04CBD6DE4193FCF86B9C00BE97212378@BPXM01GP.gisp.nec.co.jp>

于 2014/5/20 16:12, Atsushi Kumagai 写道:
>>>>> Did you mean the patch below is wrong?
>>>>>
>>>>>   commit 1e93ee75f9d47c219e833210eb31e4a747cc3a8d
>>>>>   Author: Mika Westerberg <ext-mika.1.westerberg@nokia.com>
>>>>>   Date:   Tue Jun 22 09:59:10 2010 +0300
>>>>>
>>>>>       use ARCH_PFN_OFFSET for pfn_to_paddr/paddr_to_pfn translations
>>>>>
>>>>> Your description sounds we should fix the way to convert paddr to pfn,
>>>>> but there is no such fix in your patch.
>>>>
>>>>
>>>> Yes, my first version does just as what you say. But the patch is huge.
>>>> I thing this patch is much better.
>>>>
>>>> Though commit 1e93ee75f9d47c219e833210eb31e4a747cc3a8d brings some problems
>>>> . But we can easy fix them.
>>>>
>>>>
>>>> Make my platform for example:	80000 sparse memory model.
>>>> mem 1G ; SECTION_SIZE_BITS 26
>>>>
>>>> (a) for the kernel
>>>>
>>>>    section number |phy start |  start pfn   | end pfn  |  valid  | mem_section  	|
>>>> 	0	   |0	      |   0          |  3fff    |   0	  |   [0]	|
>>>> 	1	   |4000000   |   4000       |  7fff    |   0	  |   [1]	|
>>>> 	2	   |8000000   |   8000       |  bfff    |   0	  |   [2]	|
>>>>
>>>>    [cut ...]
>>>>
>>>>       32 	   |80000000  |  80000	     | 83fff	|   1	  |  [32]	|
>>>>       33 	   |84000000  |  84000	     | 87fff	|   1	  |  [33]	|
>>>>
>>>>    [cut ...]
>>>>
>>>>       47 	   |bfc00000  |  bfc000	     | bffff	|   1	  |  [47]	|
>>>>
>>>>
>>>>  (b) for makedumpfile
>>>>
>>>>
>>>>       0	   |80000000  |    0         |  3fff    |   0	  |   [0]	|
>>>>       1 	   |84000000  |  4000        |  7fff    |   0	  |   [1]	|
>>>>
>>>>    [cut ...]
>>>>
>>>>       15 	   |bfc00000  |  3c000       | 3ffff	|   1	  |  [15]	|
>>>>
>>>>
>>>> So makedumpfile removes the offset of section number and pfn. The relationship between
>>>> pfn and section number remains as before. So this will not introduce problem.
>>>>
>>>> But the section nember and mem_section array do not match each other.
>>>>
>>>> For paddr 80000000
>>>> 	kernel        : pfn 8000: mem_section: 32
>>>> 	makedumpfile  : pfn 0   : mem_section: 0
>>>>
>>>> And we do not remove the offset of array mem_section. So makedumofile can not
>>>> get the right page struct. When fix this offset, everything is ok.
>>>
>>> Thanks for your explanation, I understand the sparse_mem case.
>>>
>>>> But If we revert 1e93ee75f9d:
>>>>
>>>>  (a) codes likes "for(pfn = 0" ,"for_each_cycle(0" and "for (section_nr = 0"  should be changed;
>>>>  (b) Due to "set_bit_on_1st_bitmap(pfn, cycle)", we will waste some bits.
>>>>  (c) crash utility should also be changed.
>>>>
>>>> BTW, when ARCH_PFN_OFFSET=0, section nember and mem_section matches each other..
>>>> So no problem was intrduced
>>>>
>>>>>
>>>>>> For the cases of ARCH_PFN_OFFSET=0 or non sparse memormy model,
>>>>>> this introduces no problem.
>>>>>>
>>>>>> But for my arma9 platform with ARCH_PFN_OFFSET=0x80000 and sparse
>>>>>> memory model. Makedumfile can not get the mem_map correctly. It it
>>>>>> due to there is still offset for mem_map array.
>>>>>
>>>>> Why the other memory models are OK? There is no offset even if ARCH_PFN_OFFSET!=0?
>>>>> I need more explanation to understand this issue.
>>>>
>>>> (1) For flatmem, the mem_map is continuous, And the start address of mem_map comes from
>>>> the kernel symbol.
>>>>
>>>> For paddr 80000000
>>>> 	kernel        : pfn 8000: mem_map[0]
>>>> 	makedumpfile  : pfn 0   : mem_map[0]
>>>>
>>>> This will not introduce problem.
>>>
>>> I understand that alloc_node_mem_map() allocates mem_map for flatmem and it
>>> considers ARCH_PFN_OFFSET like:
>>>
>>>         if (pgdat == NODE_DATA(0)) {
>>>                 mem_map = NODE_DATA(0)->node_mem_map;
>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>>                 if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>>                         mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
>>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>>
>> In all cases, mem_map indicates the start address of the mem_map.
>>
>> I think this is the inner process for the kernel, which we should not consider. Because once
>> we get the mem_map symbol value and the maxpfn from the vmcore. We know the start and length
>> of mem_map. And we can get every page struct correctly.
>>
>> For makedumpfile:
>>
>> get_mm_flatmem(void)
>> {
>>  ....
>> 2409         if (!readmem(VADDR, SYMBOL(mem_map), &mem_map, sizeof mem_map)//get the mem_map value
>>  ....
>> 2421         if (is_xen_memory())
>> 2422                 dump_mem_map(0, info->dom0_mapnr, mem_map, 0);
>> 2423         else
>> 2424                 dump_mem_map(0, info->max_mapnr, mem_map, 0);
>>
>> }
>>
>> So for flat memory model, makedumpfile can always get the correct mem_map.
> 
> I don't worry that we can't get the start address of the mem_map.
> 
> You said the kernel doesn't consider ARCH_PFN_OFFSET when converting paddr
> to pfn, this sounds the kernel doesn't make an exception for the pages lower
> than ARCH_PFN_OFFSET in page management to me.
> I mean I worry about a situation like below:
> 
> (For example, ARCH_PFN_OFFSET=0x4)
> 
>       phys addr   |   pfn for    |  pfn for       |  valid  |  mem_map  
>                   |   kernel     |  makedumpfile  |         |(struct page)
>     --------------+--------------+----------------+---------+------------
>         0 -  fff  |      0       |       X        |    0    |    [0]    
>      1000 - 1fff  |      1       |       X        |    0    |    [1]    
>      2000 - 2fff  |      2       |       X        |    0    |    [2]    
>      3000 - 3fff  |      3       |       X        |    0    |    [3]    
>      4000 - 4fff  |      4       |       0        |    1    |    [4]    
>      5000 - 5fff  |      5       |       1        |    1    |    [5]    
>      6000 - 6fff  |      6       |       2        |    1    |    [6]    
>          ...
> 
> When we check the page flag of the page[4000-4fff] in makedumpfile, we
> have to read mem_map[4], but makedumpfile reads mem_map[0] due to
> paddr_to_pfn(). This is my worry.

Hi Atsushi,

Sorry, I missed the point.

(1)for flatmem model, kernel get the page struct after minusing ARCH_PFN_OFFSET
 28 #if defined(CONFIG_FLATMEM)
 29
 30 #define __pfn_to_page(pfn)      (mem_map + ((pfn) - ARCH_PFN_OFFSET))
 31 #define __page_to_pfn(page)     ((unsigned long)((page) - mem_map) + \
 32                                  ARCH_PFN_OFFSET)

So, 1e93ee75f9d47c219e833210eb31e4a747cc3a8d do no harm.

(2)for discontigmem model

 18 #ifndef arch_local_page_offset
 19 #define arch_local_page_offset(pfn, nid)        \
 20         ((pfn) - NODE_DATA(nid)->node_start_pfn)
 21 #endif
 22
 23 #endif /* CONFIG_DISCONTIGMEM */

.....

 33 #elif defined(CONFIG_DISCONTIGMEM)
 34
 35 #define __pfn_to_page(pfn)                      \
 36 ({      unsigned long __pfn = (pfn);            \
 37         unsigned long __nid = arch_pfn_to_nid(__pfn);  \
 38         NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
 39 })

The kernel minuses "NODE_DATA(nid)->node_start_pfn". So relative postion is got
for the tranfser.

So to get page struct

kernel        :  paddr --> pfn  -[consider ARCH_PFN_OFFSET ] -> page
makedumpfile  :  paddr -[consider ARCH_PFN_OFFSET] -> pfn  --> page

So I think commit 1e93ee75f9d47c219e833210eb31e4a747cc3a8d can fit discontigmem and flatmem.
After applying this patch, it also fits sparsemem model. What do you think?

Thanks,
Liu Hua

> 
> Actually, the similar gap exists in the sparse_mem case as you described,
> so I suspect we have to take care of it also for other memory models.
> 
>>>         }
>>>
>>> So there is no problem in this model since the top of mem_map corresponds to
>>> ARCH_PFN_OFFSET, right?
>>
>> I don't think so. Is it clear for my words above?
>>
>>>
>>>> (2) For discontigmem, it manages the mem_map with node_memblk. commit
>>>> 1e93ee75f9d47c21 also does no harm.
>>>
>>> alloc_node_mem_map() allocates mem_map also for discontigmem, but I can't find
>>> any codes to consider ARCH_PFN_OFFSET for this model.
>>> So I suspect the mismatch between the pfn for makedumpfile and the actual content
>>> of mem_map can exist. Could you explain why this case is OK in more detail?
>>>
>> Actually I did not test this memory model. I reach my conclusion via the codes.
>>
>> get_mm_discontigmem
>> {
>> ....
>> for (i = 0; i < vt.numnodes; i++) {  //loop for every node
>> 2591                 if (!readmem(VADDR, pgdat + OFFSET(pglist_data.node_start_pfn),
>> 2592                     &pfn_start, sizeof pfn_start)) {           //get pfn_start for this node
>> ....
>> 2596                 if (!readmem(VADDR,pgdat+OFFSET(pglist_data.node_spanned_pages),
>> 2597                     &node_spanned_pages, sizeof node_spanned_pages)) { //get the number of pages in this node
>>
>> 2603                 if (SYMBOL(vmem_map) == NOT_FOUND_SYMBOL) {
>> 2604                         if (!readmem(VADDR, pgdat + OFFSET(pglist_data.node_mem_map),  //get the mem_map for this
>> node.
>> 2605                             &mem_map, sizeof mem_map)) {
>> 2606                                 ERRMSG("Can't get mem_map.\n");
>> 2607                                 return FALSE;
>> 2608                         }
>> 2609                 } else
>> 2610                         mem_map = vmem_map + (SIZE(page) * pfn_start);
>> ....
>> }
>>
>> So I think for discontigmem, makedumpfile can get the start address and length of mem_map from vmcore directly.
>> And Everything can go well without ARCH_PFN_OFFSET.
> 
> The same can be said, is it not needed to consider ARCH_PFN_OFFSET
> to get a page struct from the mem_map?
> 
> 
> Thanks
> Atsushi Kumagai
> 
>>
>> Perhaps I need some tests on discontigmem. Did I explan my idea clearly?
>>
>>
>>
>>>
>>> Thanks
>>> Atsushi Kumagai
>>>
>>>> What do you think?
>>>>
>>>> Thanks,
>>>> Liu Hua

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec