* [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() @ 2016-08-23 18:43 Toshi Kani 2016-08-23 20:42 ` Andrew Morton 2016-08-23 22:32 ` Dan Williams 0 siblings, 2 replies; 17+ messages in thread From: Toshi Kani @ 2016-08-23 18:43 UTC (permalink / raw) To: dan.j.williams Cc: m.abhilash-kumar, linux-nvdimm, linux-kernel, Toshi Kani, Andrew Morton, Ard Biesheuvel, Brian Starkey The following BUG was observed while starting up KVM with nvdimm device as memory-backend-file to /dev/dax. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30 Call Trace: follow_devmap_pmd+0x298/0x2c0 follow_page_mask+0x275/0x530 __get_user_pages+0xe3/0x750 __gfn_to_pfn_memslot+0x1b2/0x450 [kvm] ? hrtimer_try_to_cancel+0x2c/0x120 ? kvm_read_l1_tsc+0x55/0x60 [kvm] try_async_pf+0x66/0x230 [kvm] ? kvm_host_page_size+0x90/0xa0 [kvm] tdp_page_fault+0x130/0x280 [kvm] kvm_mmu_page_fault+0x5f/0xf0 [kvm] handle_ept_violation+0x94/0x180 [kvm_intel] vmx_handle_exit+0x1d3/0x1440 [kvm_intel] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel] ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm] ? wake_up_q+0x44/0x80 kvm_vcpu_ioctl+0x33c/0x620 [kvm] ? __vfs_write+0x37/0x160 do_vfs_ioctl+0xa2/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa4 devm_memremap_pages() calls for_each_device_pfn() to walk through all pfns in page_map. pfn_first(), however, returns a wrong pfn that leaves page->pgmap uninitialized. Since arch_add_memory() has set up direct mappings to the NVDIMM range with altmap, pfn_first() should not modify the start pfn. Change pfn_first() to simply return pfn of res->start. Reported-and-tested-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Brian Starkey <brian.starkey@arm.com> --- kernel/memremap.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 251d16b..50ea577 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -210,15 +210,9 @@ static void pgmap_radix_release(struct resource *res) static unsigned long pfn_first(struct page_map *page_map) { - struct dev_pagemap *pgmap = &page_map->pgmap; const struct resource *res = &page_map->res; - struct vmem_altmap *altmap = pgmap->altmap; - unsigned long pfn; - pfn = res->start >> PAGE_SHIFT; - if (altmap) - pfn += vmem_altmap_offset(altmap); - return pfn; + return res->start >> PAGE_SHIFT; } static unsigned long pfn_end(struct page_map *page_map) ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani @ 2016-08-23 20:42 ` Andrew Morton 2016-08-23 21:25 ` Kani, Toshimitsu 2016-08-23 22:32 ` Dan Williams 1 sibling, 1 reply; 17+ messages in thread From: Andrew Morton @ 2016-08-23 20:42 UTC (permalink / raw) To: Toshi Kani Cc: dan.j.williams, m.abhilash-kumar, linux-nvdimm, linux-kernel, Ard Biesheuvel, Brian Starkey On Tue, 23 Aug 2016 12:43:20 -0600 Toshi Kani <toshi.kani@hpe.com> wrote: > The following BUG was observed while starting up KVM with nvdimm > device as memory-backend-file to /dev/dax. > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > > ... > > devm_memremap_pages() calls for_each_device_pfn() to walk through > all pfns in page_map. pfn_first(), however, returns a wrong pfn > that leaves page->pgmap uninitialized. > > Since arch_add_memory() has set up direct mappings to the NVDIMM > range with altmap, pfn_first() should not modify the start pfn. > Change pfn_first() to simply return pfn of res->start. Which kernel version(s) do you think need fixing? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 20:42 ` Andrew Morton @ 2016-08-23 21:25 ` Kani, Toshimitsu 0 siblings, 0 replies; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-23 21:25 UTC (permalink / raw) To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, Mulumudi, Abhilash Kumar, linux-kernel@vger.kernel.org, ard.biesheuvel@linaro.org, linux-nvdimm@ml01.01.org, brian.starkey@arm.com On Tue, 2016-08-23 at 13:42 -0700, Andrew Morton wrote: > On Tue, 23 Aug 2016 12:43:20 -0600 Toshi Kani <toshi.kani@hpe.com> > wrote: > > > > > The following BUG was observed while starting up KVM with nvdimm > > device as memory-backend-file to /dev/dax. > > > > BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000008 > > > > ... > > > > devm_memremap_pages() calls for_each_device_pfn() to walk through > > all pfns in page_map. pfn_first(), however, returns a wrong pfn > > that leaves page->pgmap uninitialized. > > > > Since arch_add_memory() has set up direct mappings to the NVDIMM > > range with altmap, pfn_first() should not modify the start pfn. > > Change pfn_first() to simply return pfn of res->start. > > Which kernel version(s) do you think need fixing? The fix applies to v4.5 and newer versions. Thanks, -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani 2016-08-23 20:42 ` Andrew Morton @ 2016-08-23 22:32 ` Dan Williams 2016-08-23 23:47 ` Kani, Toshimitsu 1 sibling, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-23 22:32 UTC (permalink / raw) To: Toshi Kani Cc: m.abhilash-kumar, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, Andrew Morton, Ard Biesheuvel, Brian Starkey On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > The following BUG was observed while starting up KVM with nvdimm > device as memory-backend-file to /dev/dax. > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30 > Call Trace: > follow_devmap_pmd+0x298/0x2c0 > follow_page_mask+0x275/0x530 > __get_user_pages+0xe3/0x750 > __gfn_to_pfn_memslot+0x1b2/0x450 [kvm] > ? hrtimer_try_to_cancel+0x2c/0x120 > ? kvm_read_l1_tsc+0x55/0x60 [kvm] > try_async_pf+0x66/0x230 [kvm] > ? kvm_host_page_size+0x90/0xa0 [kvm] > tdp_page_fault+0x130/0x280 [kvm] > kvm_mmu_page_fault+0x5f/0xf0 [kvm] > handle_ept_violation+0x94/0x180 [kvm_intel] > vmx_handle_exit+0x1d3/0x1440 [kvm_intel] > ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel] > ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel] > kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm] > ? wake_up_q+0x44/0x80 > kvm_vcpu_ioctl+0x33c/0x620 [kvm] > ? __vfs_write+0x37/0x160 > do_vfs_ioctl+0xa2/0x5d0 > SyS_ioctl+0x79/0x90 > entry_SYSCALL_64_fastpath+0x1a/0xa4 > > devm_memremap_pages() calls for_each_device_pfn() to walk through > all pfns in page_map. pfn_first(), however, returns a wrong pfn > that leaves page->pgmap uninitialized. > > Since arch_add_memory() has set up direct mappings to the NVDIMM > range with altmap, pfn_first() should not modify the start pfn. > Change pfn_first() to simply return pfn of res->start. > > Reported-and-tested-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Cc: Brian Starkey <brian.starkey@arm.com> > --- > kernel/memremap.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index 251d16b..50ea577 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -210,15 +210,9 @@ static void pgmap_radix_release(struct resource *res) > > static unsigned long pfn_first(struct page_map *page_map) > { > - struct dev_pagemap *pgmap = &page_map->pgmap; > const struct resource *res = &page_map->res; > - struct vmem_altmap *altmap = pgmap->altmap; > - unsigned long pfn; > > - pfn = res->start >> PAGE_SHIFT; > - if (altmap) > - pfn += vmem_altmap_offset(altmap); > - return pfn; > + return res->start >> PAGE_SHIFT; > } I'm not sure about this fix. The point of honoring vmem_altmap_offset() is because a portion of the resource that is passed to devm_memremap_pages() also contains the metadata info block for the device. The offset says "use everything past this point for pages". This may work for avoiding a crash, but it may corrupt info block metadata in the process. Can you provide more information about the failing scenario to be sure that we are not triggering a fault on an address that is not meant to have a page mapping? I.e. what is the host physical address of the page that caused this fault, and is it valid? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 22:32 ` Dan Williams @ 2016-08-23 23:47 ` Kani, Toshimitsu 2016-08-24 0:34 ` Dan Williams 2016-08-24 1:00 ` Dan Williams 0 siblings, 2 replies; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-23 23:47 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: > On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> > wrote: : > I'm not sure about this fix. The point of honoring > vmem_altmap_offset() is because a portion of the resource that is > passed to devm_memremap_pages() also contains the metadata info block > for the device. The offset says "use everything past this point for > pages". This may work for avoiding a crash, but it may corrupt info > block metadata in the process. Can you provide more information > about the failing scenario to be sure that we are not triggering a > fault on an address that is not meant to have a page mapping? I.e. > what is the host physical address of the page that caused this fault, > and is it valid? The fault address in question was the 2nd page of an NVDIMM range. I assumed this fault address was valid and needed to be handled. Here is some info about the base and patched cases. Let me know if you need more info. Base ==== The following NVDIMM range was set to /dev/dax. /proc/iomem 480000000-87fffffff : Persistent Memory devm_memremap_pages() initialized struct page from 0x490200-0x87ffff. This left 0x48000-0x4901ff uninitialized for page->pgmap. devm_memremap_pages: pgmap 0xffff88046d0453f0 [0] : pfn 0x490200, page ffffea0012408000, pgmap ffff88046d0453f0 [1] : pfn 0x490201, page ffffea0012408040, pgmap ffff88046d0453f0 [2] : pfn 0x490202, page ffffea0012408080, pgmap ffff88046d0453f0 [3] : pfn 0x490203, page ffffea00124080c0, pgmap ffff88046d0453f0 [4] : pfn 0x490204, page ffffea0012408100, pgmap ffff88046d0453f0 : [E+1]: pfn 0x880000, page ffffea0021ffffc0, pgmap ffff88046d0453f0 The faulted page was pfn 0x480001, which was the 2nd page in the NVDIMM range and did not have valid pgmap. This led the BUG. pfn 0x480001 page 0xffffea0012000040 page->pgmap 0xffffea0012000060 page->pgmap->ref (null) Patch ===== With the patch, devm_memremap_pages() initializes as follows. devm_memremap_pages: pgmap ffff880462b3b4b0 [0] : pfn 0x480000, page ffffea0012000000, pgmap ffff880462b3b4b0 [1] : pfn 0x480001, page ffffea0012000040, pgmap ffff880462b3b4b0 [2] : pfn 0x480002, page ffffea0012000080, pgmap ffff880462b3b4b0 [3] : pfn 0x480003, page ffffea00120000c0, pgmap ffff880462b3b4b0 [4] : pfn 0x480004, page ffffea0012000100, pgmap ffff880462b3b4b0 : [E+1]: pfn 0x880000, page ffffea0021ffffc0, pgmap ffff880462b3b4b0 A page fault to pfn 0x480001 is handled as it has valid pgmap. pfn 0x480001 page 0xffffea0012000040 page->pgmap 0xffff880462b3b4b0 page->pgmap->ref 0xffff880462b3b530 Its dev_pagemap and vmem_altmap are as follows. crash> p {struct dev_pagemap} 0xffff880462b3b4b0 $2 = { altmap = 0xffff880462b3b4d0, res = 0xffff880462b3b468, ref = 0xffff880462b3b530, dev = 0xffff880463e37010 } crash> p {struct vmem_altmap} 0xffff880462b3b4d0 $3 = { base_pfn = 0x480000, reserve = 0x2, free = 0x101fe, align = 0x1fe, alloc = 0x10000 } This page entry is physically located at 0x480200040. crash> vtop 0xffffea0012000040 VIRTUAL PHYSICAL ffffea0012000040 480200040 PML4 DIRECTORY: ffffffff81c06000 PAGE DIRECTORY: 47ffe6067 PUD: 47ffe6000 => 47ffe5067 PMD: 47ffe5480 => 80000004802001e3 PAGE: 480200000 (2MB) PTE PHYSICAL FLAGS 80000004802001e3 480200000 (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX) PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0012008000 480200000 0 0 1 4fffe000000400 reserved Thanks, -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 23:47 ` Kani, Toshimitsu @ 2016-08-24 0:34 ` Dan Williams 2016-08-24 1:29 ` Kani, Toshimitsu 2016-08-24 1:00 ` Dan Williams 1 sibling, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-24 0:34 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> >> wrote: > : >> I'm not sure about this fix. The point of honoring >> vmem_altmap_offset() is because a portion of the resource that is >> passed to devm_memremap_pages() also contains the metadata info block >> for the device. The offset says "use everything past this point for >> pages". This may work for avoiding a crash, but it may corrupt info >> block metadata in the process. Can you provide more information >> about the failing scenario to be sure that we are not triggering a >> fault on an address that is not meant to have a page mapping? I.e. >> what is the host physical address of the page that caused this fault, >> and is it valid? > > The fault address in question was the 2nd page of an NVDIMM range. I > assumed this fault address was valid and needed to be handled. Here is > some info about the base and patched cases. Let me know if you need > more info. > > Base > ==== > > The following NVDIMM range was set to /dev/dax. With ndctl create-namespace or manually via sysfs? Specifically I'm looking for what the 'align' attribute was set to when the configuration was established. Can you provide a dump of the sysfs attributes for the /dev/dax parent device? ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 0:34 ` Dan Williams @ 2016-08-24 1:29 ` Kani, Toshimitsu 2016-08-24 2:53 ` Dan Williams 0 siblings, 1 reply; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-24 1:29 UTC (permalink / raw) To: Dan Williams Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com > On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: > >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> > >> wrote: > > : > >> I'm not sure about this fix. The point of honoring > >> vmem_altmap_offset() is because a portion of the resource that is > >> passed to devm_memremap_pages() also contains the metadata info > block > >> for the device. The offset says "use everything past this point for > >> pages". This may work for avoiding a crash, but it may corrupt info > >> block metadata in the process. Can you provide more information > >> about the failing scenario to be sure that we are not triggering a > >> fault on an address that is not meant to have a page mapping? I.e. > >> what is the host physical address of the page that caused this fault, > >> and is it valid? > > > > The fault address in question was the 2nd page of an NVDIMM range. I > > assumed this fault address was valid and needed to be handled. Here is > > some info about the base and patched cases. Let me know if you need > > more info. > > > > Base > > ==== > > > > The following NVDIMM range was set to /dev/dax. > > With ndctl create-namespace or manually via sysfs? Specifically I'm > looking for what the 'align' attribute was set to when the > configuration was established. Can you provide a dump of the sysfs > attributes for the /dev/dax parent device? I used the ndctl command below. ndctl create-namespace -f -e namespace0.0 -m dax Here is additional info from my note for the base case. p {struct dev_pagemap} 0xffff88046d0453f0 $3 = { altmap = 0xffff88046d045410, res = 0xffff88046d0453a8, ref = 0xffff88046d0452f0, dev = 0xffff880464790410 } crash> p {struct vmem_altmap} 0xffff88046d045410 $6 = { base_pfn = 0x480000, reserve = 0x2, // PHYS_PFN(SZ_8K) free = 0x101fe, align = 0x1fe, alloc = 0x10000 } crash> p {struct resource} 0xffff88046d0453a8 $4 = { start = 0x480000000, end = 0x87fffffff, name = 0xffff880c7da5d4a8 "region0", flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, child = 0x0 } crash> p {struct percpu_ref} 0xffff88046d0452f0 $7 = { count = { counter = 0x8000000000000001 }, percpu_count_ptr = 0x60f380403a98, release = 0xffffffffa008a0a0, confirm_switch = 0x0, force_atomic = 0x0, rcu = { next = 0x0, func = 0x0 } } Thanks, -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 1:29 ` Kani, Toshimitsu @ 2016-08-24 2:53 ` Dan Williams 2016-08-24 3:58 ` Dan Williams 0 siblings, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-24 2:53 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >> wrote: >> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> >> >> wrote: >> > : >> >> I'm not sure about this fix. The point of honoring >> >> vmem_altmap_offset() is because a portion of the resource that is >> >> passed to devm_memremap_pages() also contains the metadata info >> block >> >> for the device. The offset says "use everything past this point for >> >> pages". This may work for avoiding a crash, but it may corrupt info >> >> block metadata in the process. Can you provide more information >> >> about the failing scenario to be sure that we are not triggering a >> >> fault on an address that is not meant to have a page mapping? I.e. >> >> what is the host physical address of the page that caused this fault, >> >> and is it valid? >> > >> > The fault address in question was the 2nd page of an NVDIMM range. I >> > assumed this fault address was valid and needed to be handled. Here is >> > some info about the base and patched cases. Let me know if you need >> > more info. >> > >> > Base >> > ==== >> > >> > The following NVDIMM range was set to /dev/dax. >> >> With ndctl create-namespace or manually via sysfs? Specifically I'm >> looking for what the 'align' attribute was set to when the >> configuration was established. Can you provide a dump of the sysfs >> attributes for the /dev/dax parent device? > > I used the ndctl command below. > ndctl create-namespace -f -e namespace0.0 -m dax > > Here is additional info from my note for the base case. > > p {struct dev_pagemap} 0xffff88046d0453f0 > $3 = { > altmap = 0xffff88046d045410, > res = 0xffff88046d0453a8, > ref = 0xffff88046d0452f0, > dev = 0xffff880464790410 > } > > crash> p {struct vmem_altmap} 0xffff88046d045410 > $6 = { > base_pfn = 0x480000, > reserve = 0x2, // PHYS_PFN(SZ_8K) > free = 0x101fe, > align = 0x1fe, > alloc = 0x10000 > } Ah, so, on second look the 0x490200000 data offset looks correct. The total size of the address range is 16GB which equates to 256MB needed for struct page, plus 2MB more to re-align the data on the next 2MB boundary. The question now is why is the guest faulting on an access to an address less than 0x490200000? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 2:53 ` Dan Williams @ 2016-08-24 3:58 ` Dan Williams 2016-08-24 4:28 ` Kani, Toshimitsu 2016-08-24 5:48 ` Dan Williams 0 siblings, 2 replies; 17+ messages in thread From: Dan Williams @ 2016-08-24 3:58 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com [-- Attachment #1: Type: text/plain, Size: 2608 bytes --] On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >>> wrote: >>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> >>> >> wrote: >>> > : >>> >> I'm not sure about this fix. The point of honoring >>> >> vmem_altmap_offset() is because a portion of the resource that is >>> >> passed to devm_memremap_pages() also contains the metadata info >>> block >>> >> for the device. The offset says "use everything past this point for >>> >> pages". This may work for avoiding a crash, but it may corrupt info >>> >> block metadata in the process. Can you provide more information >>> >> about the failing scenario to be sure that we are not triggering a >>> >> fault on an address that is not meant to have a page mapping? I.e. >>> >> what is the host physical address of the page that caused this fault, >>> >> and is it valid? >>> > >>> > The fault address in question was the 2nd page of an NVDIMM range. I >>> > assumed this fault address was valid and needed to be handled. Here is >>> > some info about the base and patched cases. Let me know if you need >>> > more info. >>> > >>> > Base >>> > ==== >>> > >>> > The following NVDIMM range was set to /dev/dax. >>> >>> With ndctl create-namespace or manually via sysfs? Specifically I'm >>> looking for what the 'align' attribute was set to when the >>> configuration was established. Can you provide a dump of the sysfs >>> attributes for the /dev/dax parent device? >> >> I used the ndctl command below. >> ndctl create-namespace -f -e namespace0.0 -m dax >> >> Here is additional info from my note for the base case. >> >> p {struct dev_pagemap} 0xffff88046d0453f0 >> $3 = { >> altmap = 0xffff88046d045410, >> res = 0xffff88046d0453a8, >> ref = 0xffff88046d0452f0, >> dev = 0xffff880464790410 >> } >> >> crash> p {struct vmem_altmap} 0xffff88046d045410 >> $6 = { >> base_pfn = 0x480000, >> reserve = 0x2, // PHYS_PFN(SZ_8K) >> free = 0x101fe, >> align = 0x1fe, >> alloc = 0x10000 >> } > > Ah, so, on second look the 0x490200000 data offset looks correct. The > total size of the address range is 16GB which equates to 256MB needed > for struct page, plus 2MB more to re-align the data on the next 2MB > boundary. > > The question now is why is the guest faulting on an access to an > address less than 0x490200000? Does the attached patch fix this for you? [-- Attachment #2: 0001-dax-fix-device-dax-region-base.patch --] [-- Type: text/x-patch, Size: 3372 bytes --] From 506cdd2c4ec0f9283ac4eb92259f2e8a71c5b637 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams@intel.com> Date: Tue, 23 Aug 2016 19:59:31 -0700 Subject: [PATCH] dax: fix device-dax region base The data offset for a dax region needs to account for an altmap reservation in the resource range. Otherwise, device-dax is allowing mappings directly into the memmap or device info-block area, with crash signatures like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30 Call Trace: follow_devmap_pmd+0x298/0x2c0 follow_page_mask+0x275/0x530 __get_user_pages+0xe3/0x750 __gfn_to_pfn_memslot+0x1b2/0x450 [kvm] ? hrtimer_try_to_cancel+0x2c/0x120 ? kvm_read_l1_tsc+0x55/0x60 [kvm] try_async_pf+0x66/0x230 [kvm] ? kvm_host_page_size+0x90/0xa0 [kvm] tdp_page_fault+0x130/0x280 [kvm] kvm_mmu_page_fault+0x5f/0xf0 [kvm] handle_ept_violation+0x94/0x180 [kvm_intel] vmx_handle_exit+0x1d3/0x1440 [kvm_intel] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel] ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm] ? wake_up_q+0x44/0x80 kvm_vcpu_ioctl+0x33c/0x620 [kvm] ? __vfs_write+0x37/0x160 do_vfs_ioctl+0xa2/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa4 Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory") Reported-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/dax/pmem.c | 2 ++ include/linux/memremap.h | 1 + kernel/memremap.c | 9 +++++++++ 3 files changed, 12 insertions(+) diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index dfb168568af1..0603637df162 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -116,6 +116,8 @@ static int dax_pmem_probe(struct device *dev) if (rc) return rc; + res.start += vmem_altmap_resource_offset(altmap); + nd_region = to_nd_region(dev->parent); dax_region = alloc_dax_region(dev, nd_region->id, &res, le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 93416196ba64..7265b83cdbea 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -24,6 +24,7 @@ struct vmem_altmap { }; unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); +resource_size_t vmem_altmap_resource_offset(struct vmem_altmap *altmap); void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); #ifdef CONFIG_ZONE_DEVICE diff --git a/kernel/memremap.c b/kernel/memremap.c index 251d16b4cb41..941e897c52a8 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -384,6 +384,15 @@ unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) return altmap->reserve + altmap->free; } +resource_size_t vmem_altmap_resource_offset(struct vmem_altmap *altmap) +{ + /* number of bytes from base where data starts after the memmap */ + if (!altmap) + return 0; + return vmem_altmap_offset(altmap) * PAGE_SIZE; +} +EXPORT_SYMBOL_GPL(vmem_altmap_resource_offset); + void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns) { altmap->alloc -= nr_pfns; -- 2.5.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 3:58 ` Dan Williams @ 2016-08-24 4:28 ` Kani, Toshimitsu 2016-08-24 4:48 ` Dan Williams 2016-08-24 5:48 ` Dan Williams 1 sibling, 1 reply; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-24 4:28 UTC (permalink / raw) To: Dan Williams Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com > On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> > wrote: > > On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > >>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu > <toshi.kani@hpe.com> > >>> wrote: : > >> > >> crash> p {struct vmem_altmap} 0xffff88046d045410 > >> $6 = { > >> base_pfn = 0x480000, > >> reserve = 0x2, // PHYS_PFN(SZ_8K) > >> free = 0x101fe, > >> align = 0x1fe, > >> alloc = 0x10000 > >> } > > > > Ah, so, on second look the 0x490200000 data offset looks correct. The > > total size of the address range is 16GB which equates to 256MB needed > > for struct page, plus 2MB more to re-align the data on the next 2MB > > boundary. > > > > The question now is why is the guest faulting on an access to an > > address less than 0x490200000? > > Does the attached patch fix this for you? Yeah, that makes sense. I will test it tomorrow. BTW, why does devm_memremap_pages() put a whole range to pgmap_radix as device memory, but only initialize page->pgmap for its data range? Is there particular reason for this inconsistency? Thanks, -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 4:28 ` Kani, Toshimitsu @ 2016-08-24 4:48 ` Dan Williams 2016-08-24 14:40 ` Kani, Toshimitsu 0 siblings, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-24 4:48 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> >> wrote: >> > On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >> wrote: >> >>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu >> <toshi.kani@hpe.com> >> >>> wrote: > : >> >> >> >> crash> p {struct vmem_altmap} 0xffff88046d045410 >> >> $6 = { >> >> base_pfn = 0x480000, >> >> reserve = 0x2, // PHYS_PFN(SZ_8K) >> >> free = 0x101fe, >> >> align = 0x1fe, >> >> alloc = 0x10000 >> >> } >> > >> > Ah, so, on second look the 0x490200000 data offset looks correct. The >> > total size of the address range is 16GB which equates to 256MB needed >> > for struct page, plus 2MB more to re-align the data on the next 2MB >> > boundary. >> > >> > The question now is why is the guest faulting on an access to an >> > address less than 0x490200000? >> >> Does the attached patch fix this for you? > > Yeah, that makes sense. I will test it tomorrow. > > BTW, why does devm_memremap_pages() put a whole range to pgmap_radix > as device memory, but only initialize page->pgmap for its data range? Is there > particular reason for this inconsistency? > The radix tree is indexed by section number, but we don't always initialize a full section. The cases when we don't use a full section is when it overlaps device metadata, or if a platform multiplexes the device memory range with another resource within the same section. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 4:48 ` Dan Williams @ 2016-08-24 14:40 ` Kani, Toshimitsu 2016-08-24 14:55 ` Dan Williams 0 siblings, 1 reply; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-24 14:40 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, 2016-08-23 at 21:48 -0700, Dan Williams wrote: > On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com > > > > BTW, why does devm_memremap_pages() put a whole range to > > pgmap_radix as device memory, but only initialize page->pgmap for > > its data range? Is there particular reason for this inconsistency? > > The radix tree is indexed by section number, but we don't always > initialize a full section. The cases when we don't use a full > section is when it overlaps device metadata, or if a platform > multiplexes the device memory range with another resource within the > same section. I see, but I still feel odd about making get_dev_pagemap() to work for metadata, but get_page() -> get_zone_device_page() to crash like this. follow_devmap_pmd() assumes get_page() to work when get_dev_pagemap() returns a valid pgmap... Thanks, -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 14:40 ` Kani, Toshimitsu @ 2016-08-24 14:55 ` Dan Williams 0 siblings, 0 replies; 17+ messages in thread From: Dan Williams @ 2016-08-24 14:55 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com [ adding Konstantin ] On Wed, Aug 24, 2016 at 7:40 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-08-23 at 21:48 -0700, Dan Williams wrote: >> On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com >> > >> > BTW, why does devm_memremap_pages() put a whole range to >> > pgmap_radix as device memory, but only initialize page->pgmap for >> > its data range? Is there particular reason for this inconsistency? >> >> The radix tree is indexed by section number, but we don't always >> initialize a full section. The cases when we don't use a full >> section is when it overlaps device metadata, or if a platform >> multiplexes the device memory range with another resource within the >> same section. > > I see, but I still feel odd about making get_dev_pagemap() to work for > metadata, but get_page() -> get_zone_device_page() to crash like this. > follow_devmap_pmd() assumes get_page() to work when get_dev_pagemap() > returns a valid pgmap... > We could leave the unmapped portions of a section out of the radix, but I'm also worried about keeping the get_dev_pagemap() lookup cheap. I saw that Konstantin has some proposed changes to the radix api to make it easier to fill a range [1]. I might switch to radix_tree_fill_range() when it becomes available. [1]: https://github.com/koct9i/linux/commits/radix-tree ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 3:58 ` Dan Williams 2016-08-24 4:28 ` Kani, Toshimitsu @ 2016-08-24 5:48 ` Dan Williams 2016-08-24 15:21 ` Kani, Toshimitsu 1 sibling, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-24 5:48 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com [-- Attachment #1: Type: text/plain, Size: 2854 bytes --] On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> >>>> wrote: >>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> >>>> >> wrote: >>>> > : >>>> >> I'm not sure about this fix. The point of honoring >>>> >> vmem_altmap_offset() is because a portion of the resource that is >>>> >> passed to devm_memremap_pages() also contains the metadata info >>>> block >>>> >> for the device. The offset says "use everything past this point for >>>> >> pages". This may work for avoiding a crash, but it may corrupt info >>>> >> block metadata in the process. Can you provide more information >>>> >> about the failing scenario to be sure that we are not triggering a >>>> >> fault on an address that is not meant to have a page mapping? I.e. >>>> >> what is the host physical address of the page that caused this fault, >>>> >> and is it valid? >>>> > >>>> > The fault address in question was the 2nd page of an NVDIMM range. I >>>> > assumed this fault address was valid and needed to be handled. Here is >>>> > some info about the base and patched cases. Let me know if you need >>>> > more info. >>>> > >>>> > Base >>>> > ==== >>>> > >>>> > The following NVDIMM range was set to /dev/dax. >>>> >>>> With ndctl create-namespace or manually via sysfs? Specifically I'm >>>> looking for what the 'align' attribute was set to when the >>>> configuration was established. Can you provide a dump of the sysfs >>>> attributes for the /dev/dax parent device? >>> >>> I used the ndctl command below. >>> ndctl create-namespace -f -e namespace0.0 -m dax >>> >>> Here is additional info from my note for the base case. >>> >>> p {struct dev_pagemap} 0xffff88046d0453f0 >>> $3 = { >>> altmap = 0xffff88046d045410, >>> res = 0xffff88046d0453a8, >>> ref = 0xffff88046d0452f0, >>> dev = 0xffff880464790410 >>> } >>> >>> crash> p {struct vmem_altmap} 0xffff88046d045410 >>> $6 = { >>> base_pfn = 0x480000, >>> reserve = 0x2, // PHYS_PFN(SZ_8K) >>> free = 0x101fe, >>> align = 0x1fe, >>> alloc = 0x10000 >>> } >> >> Ah, so, on second look the 0x490200000 data offset looks correct. The >> total size of the address range is 16GB which equates to 256MB needed >> for struct page, plus 2MB more to re-align the data on the next 2MB >> boundary. >> >> The question now is why is the guest faulting on an access to an >> address less than 0x490200000? > > Does the attached patch fix this for you? Sorry, should be this much simpler patch that also mirrors what driver/nvdimm/pmem.c is doing... [-- Attachment #2: 0001-dax-fix-device-dax-region-base.patch --] [-- Type: text/x-patch, Size: 2245 bytes --] From 3369f0e825c12bb2f17c0f7d3ccecb7c60f645e0 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams@intel.com> Date: Tue, 23 Aug 2016 19:59:31 -0700 Subject: [PATCH] dax: fix device-dax region base The data offset for a dax region needs to account for an altmap reservation in the resource range. Otherwise, device-dax is allowing mappings directly into the memmap or device info-block area, with crash signatures like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30 Call Trace: follow_devmap_pmd+0x298/0x2c0 follow_page_mask+0x275/0x530 __get_user_pages+0xe3/0x750 __gfn_to_pfn_memslot+0x1b2/0x450 [kvm] ? hrtimer_try_to_cancel+0x2c/0x120 ? kvm_read_l1_tsc+0x55/0x60 [kvm] try_async_pf+0x66/0x230 [kvm] ? kvm_host_page_size+0x90/0xa0 [kvm] tdp_page_fault+0x130/0x280 [kvm] kvm_mmu_page_fault+0x5f/0xf0 [kvm] handle_ept_violation+0x94/0x180 [kvm_intel] vmx_handle_exit+0x1d3/0x1440 [kvm_intel] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel] ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm] ? wake_up_q+0x44/0x80 kvm_vcpu_ioctl+0x33c/0x620 [kvm] ? __vfs_write+0x37/0x160 do_vfs_ioctl+0xa2/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa4 Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory") Reported-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/dax/pmem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index dfb168568af1..1f01e98c83c7 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -116,6 +116,9 @@ static int dax_pmem_probe(struct device *dev) if (rc) return rc; + /* adjust the dax_region resource to the start of data */ + res.start += le64_to_cpu(pfn_sb->dataoff); + nd_region = to_nd_region(dev->parent); dax_region = alloc_dax_region(dev, nd_region->id, &res, le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP); -- 2.5.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 5:48 ` Dan Williams @ 2016-08-24 15:21 ` Kani, Toshimitsu 0 siblings, 0 replies; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-24 15:21 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, 2016-08-23 at 22:48 -0700, Dan Williams wrote: > On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.williams@intel.c > > > > > Does the attached patch fix this for you? > > Sorry, should be this much simpler patch that also mirrors what > driver/nvdimm/pmem.c is doing... Yes, this change works fine. :-) Tested-by: Toshi Kani <toshi.kani@hpe.com> Thanks for the quick fix! -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-23 23:47 ` Kani, Toshimitsu 2016-08-24 0:34 ` Dan Williams @ 2016-08-24 1:00 ` Dan Williams 2016-08-24 1:38 ` Kani, Toshimitsu 1 sibling, 1 reply; 17+ messages in thread From: Dan Williams @ 2016-08-24 1:00 UTC (permalink / raw) To: Kani, Toshimitsu Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> >> wrote: > : >> I'm not sure about this fix. The point of honoring >> vmem_altmap_offset() is because a portion of the resource that is >> passed to devm_memremap_pages() also contains the metadata info block >> for the device. The offset says "use everything past this point for >> pages". This may work for avoiding a crash, but it may corrupt info >> block metadata in the process. Can you provide more information >> about the failing scenario to be sure that we are not triggering a >> fault on an address that is not meant to have a page mapping? I.e. >> what is the host physical address of the page that caused this fault, >> and is it valid? > > The fault address in question was the 2nd page of an NVDIMM range. I > assumed this fault address was valid and needed to be handled. Here is > some info about the base and patched cases. Let me know if you need > more info. > > Base > ==== > > The following NVDIMM range was set to /dev/dax. > > /proc/iomem > 480000000-87fffffff : Persistent Memory > > devm_memremap_pages() initialized struct page from 0x490200-0x87ffff. This seems like the start of the trouble. What happened to the first 1GB of the address range? I'm assuming the 'align' attribute is set to 2MB because we start at a pfn offset of 0x200, but this should be starting at 0x480200. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() 2016-08-24 1:00 ` Dan Williams @ 2016-08-24 1:38 ` Kani, Toshimitsu 0 siblings, 0 replies; 17+ messages in thread From: Kani, Toshimitsu @ 2016-08-24 1:38 UTC (permalink / raw) To: Dan Williams Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org, ard.biesheuvel@linaro.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, brian.starkey@arm.com > On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: > >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> > >> wrote: > > : > >> I'm not sure about this fix. The point of honoring > >> vmem_altmap_offset() is because a portion of the resource that is > >> passed to devm_memremap_pages() also contains the metadata info > block > >> for the device. The offset says "use everything past this point for > >> pages". This may work for avoiding a crash, but it may corrupt info > >> block metadata in the process. Can you provide more information > >> about the failing scenario to be sure that we are not triggering a > >> fault on an address that is not meant to have a page mapping? I.e. > >> what is the host physical address of the page that caused this fault, > >> and is it valid? > > > > The fault address in question was the 2nd page of an NVDIMM range. I > > assumed this fault address was valid and needed to be handled. Here is > > some info about the base and patched cases. Let me know if you need > > more info. > > > > Base > > ==== > > > > The following NVDIMM range was set to /dev/dax. > > > > /proc/iomem > > 480000000-87fffffff : Persistent Memory > > > > devm_memremap_pages() initialized struct page from 0x490200-0x87ffff. > > This seems like the start of the trouble. What happened to the first > 1GB of the address range? I'm assuming the 'align' attribute is set > to 2MB because we start at a pfn offset of 0x200, but this should be > starting at 0x480200. pfn_first() adds an offset from vmem_altmap_offset(), which is altmap->reserve + altmap->free. You can see why it ended up with this offset from the dumps in my previous email. Thanks -Toshi ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-08-24 15:39 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani 2016-08-23 20:42 ` Andrew Morton 2016-08-23 21:25 ` Kani, Toshimitsu 2016-08-23 22:32 ` Dan Williams 2016-08-23 23:47 ` Kani, Toshimitsu 2016-08-24 0:34 ` Dan Williams 2016-08-24 1:29 ` Kani, Toshimitsu 2016-08-24 2:53 ` Dan Williams 2016-08-24 3:58 ` Dan Williams 2016-08-24 4:28 ` Kani, Toshimitsu 2016-08-24 4:48 ` Dan Williams 2016-08-24 14:40 ` Kani, Toshimitsu 2016-08-24 14:55 ` Dan Williams 2016-08-24 5:48 ` Dan Williams 2016-08-24 15:21 ` Kani, Toshimitsu 2016-08-24 1:00 ` Dan Williams 2016-08-24 1:38 ` Kani, Toshimitsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox