public inbox for linux-nvdimm@lists.01.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH V1 1/2] libnvdimm/nsio: differentiate between probe mapping and runtime mapping
Date: Thu, 17 Oct 2019 08:13:00 +0530	[thread overview]
Message-ID: <07a4dd71-65f0-a911-60be-e3ea1ce8305b@linux.ibm.com> (raw)
In-Reply-To: <CAPcyv4iAz1OSDCKhNt+weBOTg1OsKbs6h740vG8P2NxRHbUrPw@mail.gmail.com>

On 10/17/19 12:35 AM, Dan Williams wrote:
> On Wed, Oct 16, 2019 at 9:59 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 10/16/19 9:34 PM, Dan Williams wrote:
>>> On Tue, Oct 15, 2019 at 10:29 PM Aneesh Kumar K.V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>
>>>> On 10/16/19 3:32 AM, Dan Williams wrote:
>>>>> On Tue, Oct 15, 2019 at 8:33 AM Aneesh Kumar K.V
>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>
>>>>>> nvdimm core currently maps the full namespace to an ioremap range
>>>>>> while probing the namespace mode. This can result in probe failures
>>>>>> on architectures that have limited ioremap space.
>>>>>
>>>>> Is there a #define for that limit?
>>>>>
>>>>
>>>> Arch specific #define. For example. ppc64 have different limits based on
>>>> platform and translation mode. Hash translation with 4k PAGE_SIZE limit
>>>> ioremap range to 8TB.
>>>>
>>>>>> nvdimm core can avoid this failure by only mapping the reserver block area to
>>>>>> check for pfn superblock type and map the full namespace resource only before
>>>>>> using the namespace. nvdimm core use ioremap range only for the raw and btt
>>>>>> namespace and we can limit the max namespace size for these two modes. For
>>>>>> both fsdax and devdax this change enables nvdimm to map namespace larger
>>>>>> that ioremap limit.
>>>>>
>>>>> If the direct map has more space I think it would be better to add a
>>>>> way to use that to map for all namespaces rather than introduce
>>>>> arbitrary failures based on the mode.
>>>>>
>>>>> I would buy a performance argument to avoid overmapping, but for
>>>>> namespace access compatibility where an alternate mapping method would
>>>>> succeed I think we should aim for that to be used instead. Thoughts?
>>>>>
>>>>
>>>> That would require to have struct page allocated for these range and
>>>> both raw and btt don't need a struct page backing?
>>>>
>>>
>>> I was thinking a new mapping interface that just consumed direct-map
>>> space, but did not allocate pages.
>>>
>>
>> Not sure how easy that would be. We are looking at having part of
>> direct-map address not managed by any zone and then possibly archs need
>> to be taught to handle these ? (for example for ppc64 we "bolt" direct
>> map range where as we allow taking low level hash fault for I/O remap range)
>>
>> Even though you don't consider the patch as complete, considering the
>> approach you outlined would require larger changes, do you think this
>> patch can be accepted as a bug fix? Right now we can fail namespace
>> initialization during boot or ndctl enable-namespace all.
>>
>> For example with ppc64 and I/O remap range limit of 8TB, we can
>> individually create a 6TB namespace. We also allow to create multiple
>> such namespaces. But if we try to enable them all together using ndctl
>> enable-namespace all, that will fail with error
>>
>> [   54.259910] vmap allocation for size x failed: use vmalloc=<size> to
>> increase size
>>
>> because we probe these namespaces in parallel.
> 
> The patch is incomplete, right?

Incomplete with respect to the detail that we still don't allow large 
raw and btt namespaces.


> It does not fix the raw mode namespace
> case, and that error message seems to indicate to the user how to fix
> the problem. I was of the impression it was a fixed range in the
> address map. Could you instead try to autodetect the potential pmem
> usage and auto increase the vmap space?
> 
The error is printed by generic code and the failures are due to fixed 
size. We can't workaround that using vmalloc=<size> option.

-aneesh


_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2019-10-17  2:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 15:33 [PATCH V1 1/2] libnvdimm/nsio: differentiate between probe mapping and runtime mapping Aneesh Kumar K.V
2019-10-15 15:33 ` [PATCH V1 2/2] libnvdimm/nsio: Rename devm_nsio_enable/disable to devm_nsio_probe_enable/disable Aneesh Kumar K.V
2019-10-15 22:02 ` [PATCH V1 1/2] libnvdimm/nsio: differentiate between probe mapping and runtime mapping Dan Williams
2019-10-16  5:29   ` Aneesh Kumar K.V
2019-10-16 16:04     ` Dan Williams
2019-10-16 16:58       ` Aneesh Kumar K.V
2019-10-16 19:05         ` Dan Williams
2019-10-17  2:43           ` Aneesh Kumar K.V [this message]
2019-10-17  3:04             ` Dan Williams
2019-10-17  3:18               ` Aneesh Kumar K.V
2019-10-17  4:12                 ` Dan Williams
2019-10-17  7:10                   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07a4dd71-65f0-a911-60be-e3ea1ce8305b@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox