From: Mike Travis <travis@sgi.com>
To: Toshi Kani <toshi.kani@hp.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
akpm@linux-foundation.org, roland@purestorage.com,
dan.j.williams@intel.com, x86@kernel.org,
linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org,
Clive Harding <clive@sgi.com>, Russ Anderson <rja@sgi.com>
Subject: Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap
Date: Mon, 22 Jun 2015 11:22:38 -0700 [thread overview]
Message-ID: <5588526E.7010801@sgi.com> (raw)
In-Reply-To: <1434993807.11808.155.camel@misato.fc.hp.com>
On 6/22/2015 10:23 AM, Toshi Kani wrote:
> On Mon, 2015-06-22 at 09:21 -0700, Mike Travis wrote:
>>
>> On 6/19/2015 2:44 PM, Toshi Kani wrote:
>>> __ioremap_caller() calls region_is_ram() to look up the resource
>>> to check if a target range is RAM, which was added as an additinal
>>> check to improve the lookup performance over page_is_ram() (commit
>>> 906e36c5c717 "x86: use optimized ioresource lookup in ioremap
>>> function").
>>>
>>> __ioremap_caller() then calls walk_system_ram_range(), which had
>>> replaced page_is_ram() to improve the lookup performance (commit
>>> c81c8a1eeede "x86, ioremap: Speed up check for RAM pages").
>>>
>>> Since both functions walk through the resource table, there is
>>> no need to call the two functions. Furthermore, region_is_ram()
>>> has bugs and always returns with -1. This makes
>>> walk_system_ram_range() as the only check being used.
>>
>> Do you have an example of a failing case?
>
> Well, region_is_ram() fails with -1 for every case... This is because it
> breaks the for-loop at 'if (p->end < start)' in the first entry (i.e.
> the lowest address range) of the resource table. For example, the first
> entry of 'p->end' is 0xfff on my system, which is certainly smaller than
> 'start'.
>
> # cat /proc/iomem
> 00000000-00000fff : reserved
> 00001000-00092fff : System RAM
> 00093000-00093fff : reserved
> :
That is a small entry. But I don't understand that when it
returned the -1, the page_is_ram function was not used instead?
Or am I missing something?
- /* If could not be identified(-1), check page by page */
- if (ram_region < 0) {
- pfn = phys_addr >> PAGE_SHIFT;
- last_pfn = last_addr >> PAGE_SHIFT;
- if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
>> Also, I didn't know that
>> IOREMAP'd addresses were allowed to be on non-page boundaries?
>
> Yes, that is allowed. Here is a comment in __ioremap_caller().
>
> * NOTE! We need to allow non-page-aligned mappings too: we will
> obviously
> * have to convert them into an offset in a page-aligned mapping, but
> the
> * caller shouldn't need to know that small detail.
You're right, I forgot about that realignment. The original
intent was to try to optimize and if that failed, fall back
to the original method. I think this case would have been
caught as well?
>
>> Here's the comment and reason for the patches from Patch 0:
>>
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system. This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM. These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK). I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
>
> I wonder how this test was conducted. When the region_is_ram() change
> got in, the kernel already had the other speed up change (c81c8a1eeede),
> which had replaced page_is_ram().
The onsite system was not running the latest kernel (these large
system customers are very reluctant to disrupt their running
environments.)
>
>> If there is a failure, it would be better for all to fix the specific
>> bug and not re-introduce the original problem. Perhaps drop to
>> page is ram if the address is not page aligned?
>
> walk_system_ram_range() is the one that has the issue with
> page-unaligned address. region_is_ram() after fixed by patch 3/3 does
> not have this issue. ioremap() does not call page_is_ram() anymore.
It appears ioremap does not call region_is_ram either.
- /* First check if whole region can be identified as RAM or not */
- ram_region = region_is_ram(phys_addr, size);
- if (ram_region > 0) {
...
+ pfn = phys_addr >> PAGE_SHIFT;
+ last_pfn = last_addr >> PAGE_SHIFT;
+ if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
It appears that the walk_system_ram_range() patch does supersede
the region_is_ram patch. Perhaps we can add a caveat to this
patch that you should not use this patch without also using
the patch from c81c8a1eeede? Otherwise the excessive slowdown
in ioremap will be reintroduced?
(I'm thinking about back ports to distro kernels that customers use.)
>
> pcibios_add_device(), ksysfs.c, kdebugfs.c (and possibly more) call
> ioremap for setup_data, which is page-unaligned. If we are going to
> disallow such callers, they all need to be converted with a different
> mapping interface, such as vmap(). But vmap() takes an array of page
> pointers as an argument, and is not convenient for them to use. Since
> setup_data is a special range, if they need to be changed may be
> arguable. I think issue 3 described in patch 0/3 needs further
> discussion. For now, I'd like to fix issue 1 & 2.
Since __ioremap_caller() was the only caller of region_is_ram, then
the function can be removed instead of being fixed.
Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2015-06-22 18:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-19 21:44 [PATCH 0/3] mm, x86: Fix ioremap RAM check interfaces Toshi Kani
2015-06-19 21:44 ` [PATCH 1/3] mm, x86: Fix warning in ioremap RAM check Toshi Kani
2015-06-19 21:44 ` [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap Toshi Kani
2015-06-22 16:21 ` Mike Travis
2015-06-22 17:23 ` Toshi Kani
2015-06-22 18:22 ` Mike Travis [this message]
2015-06-22 19:06 ` Toshi Kani
2015-06-23 9:01 ` Ingo Molnar
2015-06-23 15:19 ` Toshi Kani
2015-06-23 18:57 ` Mike Travis
2015-06-19 21:44 ` [PATCH 3/3] mm: Fix bugs in region_is_ram() Toshi Kani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5588526E.7010801@sgi.com \
--to=travis@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=clive@sgi.com \
--cc=dan.j.williams@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=mingo@redhat.com \
--cc=rja@sgi.com \
--cc=roland@purestorage.com \
--cc=tglx@linutronix.de \
--cc=toshi.kani@hp.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox