* Re: [PATCH v3] powerpc/32: Remove left over function prototypes
From: Michael Ellerman @ 2018-06-21 11:27 UTC (permalink / raw)
To: Mathieu Malaterre
Cc: Mathieu Malaterre, Benjamin Herrenschmidt, Paul Mackerras,
Nicholas Piggin, linuxppc-dev, linux-kernel
In-Reply-To: <20180620190038.3250-1-malat@debian.org>
Mathieu Malaterre <malat@debian.org> writes:
> In commit 4aea909eeba3 ("powerpc: Add missing prototypes in setup_32.c")
I don't have that commit ^ ?
That might be because I squashed some of your fixes together or something?
> diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
> index 35ca309848d7..829ed66f0a40 100644
> --- a/arch/powerpc/kernel/setup.h
> +++ b/arch/powerpc/kernel/setup.h
> @@ -19,9 +19,6 @@ void irqstack_early_init(void);
> void setup_power_save(void);
> unsigned long __init early_init(unsigned long dt_ptr);
> void __init machine_init(u64 dt_ptr);
> -int __init ppc_setup_l2cr(char *str);
> -int __init ppc_setup_l3cr(char *str);
> -int __init ppc_init(void);
> #else
> static inline void setup_power_save(void) { };
> #endif
I have:
#ifdef CONFIG_PPC32
void setup_power_save(void);
#else
static inline void setup_power_save(void) { };
#endif
cheers
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config
From: Balbir Singh @ 2018-06-21 11:32 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Nicholas Piggin, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20180621083158.30849-2-aneesh.kumar@linux.ibm.com>
On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
> impacted.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
1PB directly (50 bits)?
Balbir Singh
^ permalink raw reply
* Re: [PATCH 0/7 v5] Support for fsl-mc bus and its devices in SMMU
From: Will Deacon @ 2018-06-21 11:40 UTC (permalink / raw)
To: Nipun Gupta
Cc: robin.murphy@arm.com, gregkh@linuxfoundation.org, hch@lst.de,
joro@8bytes.org, m.szyprowski@samsung.com, shawnguo@kernel.org,
frowand.list@gmail.com, iommu@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
Bharat Bhushan, stuyoder@gmail.com, Leo Li
In-Reply-To: <HE1PR0401MB24250CEFBC0B7CC919C4388AE6760@HE1PR0401MB2425.eurprd04.prod.outlook.com>
Hi Nipun,
On Thu, Jun 21, 2018 at 03:59:27AM +0000, Nipun Gupta wrote:
> Will this patch-set be taken for the next kernel release (and via which
> tree)?
I think you need Acks from Robin and Joerg in order for this to be queued.
Robin should be back at the beginning of next month, so there's still time
for 4.19.
Will
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config
From: Aneesh Kumar K.V @ 2018-06-21 15:42 UTC (permalink / raw)
To: Balbir Singh
Cc: Nicholas Piggin, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAKTCnz=qh7fLAuor9YNm5LH5ZKETTP9RjKv_SpR4c=wX37VTEA@mail.gmail.com>
On 06/21/2018 05:02 PM, Balbir Singh wrote:
> On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
>> impacted.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>
> Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
> 1PB directly (50 bits)?
>
That will impact config with VMEMMAP_EXTREME with no real immediate
benefit. We could possibly make MAX_PHYSMEM_BITS a Kconfig variable.
s390 do that. Not sure we want to do that.
-aneesh
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config
From: Aneesh Kumar K.V @ 2018-06-21 15:43 UTC (permalink / raw)
To: Balbir Singh
Cc: Nicholas Piggin, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAKTCnz=qh7fLAuor9YNm5LH5ZKETTP9RjKv_SpR4c=wX37VTEA@mail.gmail.com>
On 06/21/2018 05:02 PM, Balbir Singh wrote:
> On Thu, Jun 21, 2018 at 6:31 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
>> impacted.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>
> Why 128TB, given that it's sparse_vmemmap_extreme by default, why not
> 1PB directly (50 bits)?
>
That will impact config with VMEMMAP_EXTREME disabled with no real
immediate benefit. We could possibly make MAX_PHYSMEM_BITS a Kconfig
variable. s390 do that. Not sure we want to do that.
-aneesh
^ permalink raw reply
* Re: [PATCH v2 1/6] powerpc/pkeys: Enable all user-allocatable pkeys at init.
From: Ram Pai @ 2018-06-21 17:24 UTC (permalink / raw)
To: Michael Ellerman
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, Ulrich.Weigand, fweimer, luto, msuchanek
In-Reply-To: <878t78wzb6.fsf@concordia.ellerman.id.au>
On Thu, Jun 21, 2018 at 02:14:53PM +1000, Michael Ellerman wrote:
> Ram Pai <linuxram@us.ibm.com> writes:
> > On Tue, Jun 19, 2018 at 10:39:52PM +1000, Michael Ellerman wrote:
> >> Ram Pai <linuxram@us.ibm.com> writes:
> >>
> >> > In a multithreaded application, a key allocated by one thread must
> >> > be activate and usable on all threads.
> >> >
> >> > Currently this is not the case, because the UAMOR bits for all keys are
> >> > disabled by default. When a new key is allocated in one thread, though
> >> > the corresponding UAMOR bits for that thread get enabled, the UAMOR bits
> >> > for all other existing threads continue to have their bits disabled.
> >> > Other threads have no way to set permissions on the key, effectively
> >> > making the key useless.
> >>
> >> This all seems a bit strongly worded to me. It's arguable whether a key
> >> should be usable by the thread that allocated it or all threads.
> >>
> >> You could conceivably have a design where threads are blocked from using
> >> a key until they're given permission to do so by the thread that
> >> allocated the key.
> >>
> >> But we're changing the behaviour to match x86 and because we don't have
> >> an API to grant another thread access to a key. Right?
> >
> > correct. The other threads have no way to access or change the
> > permissions on the key.
>
> OK.
>
> Though prior to patch 6 all threads have read/write permissions for all
> keys, so they don't necessarily need to change permissions on a key
> allocated by another thread.
>
> >> > Enable the UAMOR bits for all keys, at process creation. Since the
> >> > contents of UAMOR are inherited at fork, all threads are capable of
> >> > modifying the permissions on any key.
> >> >
> >> > BTW: changing the permission on unallocated keys has no effect, till
> >> > those keys are not associated with any PTEs. The kernel will anyway
> >> > disallow to association of unallocated keys with PTEs.
> >>
> >> This is an ABI change, which is bad, but I guess we call it a bug fix
> >> because things didn't really work previously?
> >
> > Yes its a behaviorial change for the better. There is no downside
> > to the change because no applications should break. Single threaded
> > apps will continue to just work fine. Multithreaded applications,
> > which were unable to consume the API/ABI, will now be able to do so.
>
> Multi-threaded applications were able to use the API, as long as they
> were satisfied with the semantics it provided, ie. that restrictions on
> a key were only possible on the thread that allocated the key.
>
> I'm not trying to argue for the sake of it, it's important that we
> understand the subtleties of what we're changing and how it affects
> existing software - even if we think there is essentially no existing
> software.
>
> I'll try and massage the change log to capture it.
>
> I ended up with what's below.
>
> cheers
>
> powerpc/pkeys: Give all threads control of their key permissions
>
> Currently in a multithreaded application, a key allocated by one
> thread is not usable by other threads. By "not usable" we mean that
> other threads are unable to change the access permissions for that
> key for themselves.
>
> When a new key is allocated in one thread, the corresponding UAMOR
> bits for that thread get enabled, however the UAMOR bits for that key
> for all other threads remain disabled.
>
> Other threads have no way to set permissions on the key, and the
> current default permissions are that read/write is enabled for all
> keys, which means the key has no effect for other threads. Although
> that may be the desired behaviour in some circumstances, having all
> threads able to control their permissions for the key is more
> flexible.
>
> The current behaviour also differs from the x86 behaviour, which is
> problematic for users.
>
> To fix this, enable the UAMOR bits for all keys, at process
> creation (in start_thread(), ie exec time). Since the contents of
> UAMOR are inherited at fork, all threads are capable of modifying the
> permissions on any key.
>
> This is technically an ABI break on powerpc, but pkey support is
> fairly new on powerpc and not widely used, and this brings us into
> line with x86.
Wow! yes it crisply captures the subtle API change and the reasoning
behind it.
RP
^ permalink raw reply
* Re: [PATCH v2 2/6] powerpc/pkeys: Save the pkey registers before fork
From: Ram Pai @ 2018-06-21 17:35 UTC (permalink / raw)
To: Michael Ellerman
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, Ulrich.Weigand, fweimer, luto, msuchanek
In-Reply-To: <87a7rowzd7.fsf@concordia.ellerman.id.au>
On Thu, Jun 21, 2018 at 02:13:40PM +1000, Michael Ellerman wrote:
> Ram Pai <linuxram@us.ibm.com> writes:
>
> > On Tue, Jun 19, 2018 at 10:39:56PM +1000, Michael Ellerman wrote:
> >> Ram Pai <linuxram@us.ibm.com> writes:
> >>
> >> > When a thread forks the contents of AMR, IAMR, UAMOR registers in the
> >> > newly forked thread are not inherited.
> >> >
> >> > Save the registers before forking, for content of those
> >> > registers to be automatically copied into the new thread.
> >> >
> >> > CC: Michael Ellerman <mpe@ellerman.id.au>
> >> > CC: Florian Weimer <fweimer@redhat.com>
> >> > CC: Andy Lutomirski <luto@kernel.org>
> >> > CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> >> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> >>
> >> Again this is an ABI change but we'll call it a bug fix I guess.
> >
> > yes. the same defense here too. its a behaviorial change for the better.
> > Single threaded applications will not see any behaviorial change.
> > Multithreaded apps, which were unable to consume, the behavior will now be
> > able to do so.
>
> Well threads is one thing, but this also affects processes.
>
> And actually without this fix it's possible that a child process could
> fault on a region protected in the parent, if the value in the AMR in
> the thread struct happens to block access at the time of fork(). The
> value in the thread struct would be whatever was in the AMR the last
> time the parent was scheduled in. I think?
right. Child processes will see stale value of AMR. Technically this
behavior is a bug, since existing applications; if any, cannot rely on
this stale AMR value.
RP
^ permalink raw reply
* Re: [PATCH v2 0/6] powerpc/pkeys: fixes to pkeys
From: Ram Pai @ 2018-06-21 18:10 UTC (permalink / raw)
To: Michael Ellerman
Cc: Florian Weimer, linuxppc-dev, dave.hansen, aneesh.kumar,
bsingharora, hbabu, mhocko, bauerman, Ulrich.Weigand, luto,
msuchanek
In-Reply-To: <878t78fn6o.fsf@concordia.ellerman.id.au>
On Thu, Jun 21, 2018 at 08:28:47PM +1000, Michael Ellerman wrote:
> Florian Weimer <fweimer@redhat.com> writes:
>
> > On 06/19/2018 02:40 PM, Michael Ellerman wrote:
> >>> I tested the whole series with the new selftests, with the printamr.c
> >>> program I posted earlier, and the glibc test for pkey_alloc &c. The
> >>> latter required some test fixes, but now passes as well. As far as I
> >>> can tell, everything looks good now.
> >>>
> >>> Tested-By: Florian Weimer<fweimer@redhat.com>
> >> Thanks. I'll add that to each patch I guess, if you're happy with that?
> >
> > Sure, but I only tested the whole series as a whole.
>
> Yeah OK. We don't have a good way to express that, other than using a
> merge which I'd prefer to avoid.
>
> So I've tagged them all with your Tested-by. If any of them turn out to
> have bugs you can blame me :)
I just tested the patches incrementally using the pkey selftests.
So I feel confident these patches are not bugs. I will take the blame
if the blame lands on Mpe :)
RP
^ permalink raw reply
* Re: [PATCH v04 9/9] hotplug/pmt: Update topology after PMT
From: Michael Bringmann @ 2018-06-21 19:37 UTC (permalink / raw)
To: kbuild test robot
Cc: Thomas Falcon, John Allen, kbuild-all, Nathan Fontenot,
linuxppc-dev, Tyrel Datwyler
In-Reply-To: <201806211048.moFJfljN%fengguang.wu@intel.com>
I posted the wrong copy of the file the first time.
That is what broke here. I posted the correction almost
immediately to the list. The correct one has
Message ID <8c437fe5-632c-a7ed-1f11-66c4578a1d93@linux.vnet.ibm.com>
Sorry for the inconvenience.
Michael
On 06/20/2018 09:13 PM, kbuild test robot wrote:
> Hi Michael,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc1 next-20180620]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-hotplug-Update-affinity-for-migrated-CPUs/20180621-085543
> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc
>
> All errors (new ones prefixed by >>):
>
> arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_pmt':
>>> arch/powerpc/platforms/pseries/dlpar.c:453:2: error: implicit declaration of function 'rebuild_sched_domains' [-Werror=implicit-function-declaration]
> rebuild_sched_domains();
> ^~~~~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
>
> vim +/rebuild_sched_domains +453 arch/powerpc/platforms/pseries/dlpar.c
>
> 435
> 436 static int dlpar_pmt(struct pseries_hp_errorlog *work)
> 437 {
> 438 struct list_head *pos, *q;
> 439
> 440 ssleep(15);
> 441
> 442 list_for_each_safe(pos, q, &dlpar_delayed_list) {
> 443 struct pseries_hp_errorlog *tmp;
> 444
> 445 tmp = list_entry(pos, struct pseries_hp_errorlog, list);
> 446 handle_dlpar_errorlog(tmp);
> 447
> 448 list_del(pos);
> 449 kfree(tmp);
> 450 }
> 451
> 452 ssleep(5);
> > 453 rebuild_sched_domains();
> 454
> 455 return 0;
> 456 }
> 457
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@linux.vnet.ibm.com
^ permalink raw reply
* Re: [PATCH v04 9/9] hotplug/pmt: Update topology after PMT
From: Michael Bringmann @ 2018-06-21 19:37 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, linuxppc-dev, Nathan Fontenot, Thomas Falcon,
Tyrel Datwyler, John Allen
In-Reply-To: <201806211048.moFJfljN%fengguang.wu@intel.com>
I posted the wrong copy of the file the first time.
That is what broke here. I posted the correction almost
immediately to the list. The correct one has
Message ID <8c437fe5-632c-a7ed-1f11-66c4578a1d93@linux.vnet.ibm.com>
Sorry for the inconvenience.
Michael
On 06/20/2018 09:13 PM, kbuild test robot wrote:
> Hi Michael,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc1 next-20180620]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-hotplug-Update-affinity-for-migrated-CPUs/20180621-085543
> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc
>
> All errors (new ones prefixed by >>):
>
> arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_pmt':
>>> arch/powerpc/platforms/pseries/dlpar.c:453:2: error: implicit declaration of function 'rebuild_sched_domains' [-Werror=implicit-function-declaration]
> rebuild_sched_domains();
> ^~~~~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
>
> vim +/rebuild_sched_domains +453 arch/powerpc/platforms/pseries/dlpar.c
>
> 435
> 436 static int dlpar_pmt(struct pseries_hp_errorlog *work)
> 437 {
> 438 struct list_head *pos, *q;
> 439
> 440 ssleep(15);
> 441
> 442 list_for_each_safe(pos, q, &dlpar_delayed_list) {
> 443 struct pseries_hp_errorlog *tmp;
> 444
> 445 tmp = list_entry(pos, struct pseries_hp_errorlog, list);
> 446 handle_dlpar_errorlog(tmp);
> 447
> 448 list_del(pos);
> 449 kfree(tmp);
> 450 }
> 451
> 452 ssleep(5);
> > 453 rebuild_sched_domains();
> 454
> 455 return 0;
> 456 }
> 457
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@linux.vnet.ibm.com
^ permalink raw reply
* [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Matthew Wilcox @ 2018-06-21 21:28 UTC (permalink / raw)
To: linux-kernel
Cc: Matthew Wilcox, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Aneesh Kumar K.V, Nicholas Piggin,
Thiago Jung Bauermann, Ram Pai, linuxppc-dev
In-Reply-To: <20180621212835.5636-1-willy@infradead.org>
ida_alloc_range is the perfect fit for this use case. Eliminates
a custom spinlock, a call to ida_pre_get and a local check for the
allocated ID exceeding a maximum.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
arch/powerpc/mm/mmu_context_book3s64.c | 44 +++-----------------------
1 file changed, 4 insertions(+), 40 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index f3d4b4a0e561..5a0cf2cc8ba0 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -26,48 +26,16 @@
#include <asm/mmu_context.h>
#include <asm/pgalloc.h>
-static DEFINE_SPINLOCK(mmu_context_lock);
static DEFINE_IDA(mmu_context_ida);
static int alloc_context_id(int min_id, int max_id)
{
- int index, err;
-
-again:
- if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
- return -ENOMEM;
-
- spin_lock(&mmu_context_lock);
- err = ida_get_new_above(&mmu_context_ida, min_id, &index);
- spin_unlock(&mmu_context_lock);
-
- if (err == -EAGAIN)
- goto again;
- else if (err)
- return err;
-
- if (index > max_id) {
- spin_lock(&mmu_context_lock);
- ida_remove(&mmu_context_ida, index);
- spin_unlock(&mmu_context_lock);
- return -ENOMEM;
- }
-
- return index;
+ return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
}
void hash__reserve_context_id(int id)
{
- int rc, result = 0;
-
- do {
- if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
- break;
-
- spin_lock(&mmu_context_lock);
- rc = ida_get_new_above(&mmu_context_ida, id, &result);
- spin_unlock(&mmu_context_lock);
- } while (rc == -EAGAIN);
+ int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", id, result);
}
@@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
void __destroy_context(int context_id)
{
- spin_lock(&mmu_context_lock);
- ida_remove(&mmu_context_ida, context_id);
- spin_unlock(&mmu_context_lock);
+ ida_free(&mmu_context_ida, context_id);
}
EXPORT_SYMBOL_GPL(__destroy_context);
@@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
{
int index, context_id;
- spin_lock(&mmu_context_lock);
for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
context_id = ctx->extended_id[index];
if (context_id)
- ida_remove(&mmu_context_ida, context_id);
+ ida_free(&mmu_context_ida, context_id);
}
- spin_unlock(&mmu_context_lock);
}
static void pte_frag_destroy(void *pte_frag)
--
2.17.1
^ permalink raw reply related
* [PATCH 15/26] ppc: Convert vas ID allocation to new IDA API
From: Matthew Wilcox @ 2018-06-21 21:28 UTC (permalink / raw)
To: linux-kernel
Cc: Matthew Wilcox, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, linuxppc-dev
In-Reply-To: <20180621212835.5636-1-willy@infradead.org>
Removes a custom spinlock and simplifies the code.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
arch/powerpc/platforms/powernv/vas-window.c | 26 ++++-----------------
1 file changed, 4 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/vas-window.c b/arch/powerpc/platforms/powernv/vas-window.c
index ff9f48812331..2a5e68a2664d 100644
--- a/arch/powerpc/platforms/powernv/vas-window.c
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -515,35 +515,17 @@ int init_winctx_regs(struct vas_window *window, struct vas_winctx *winctx)
return 0;
}
-static DEFINE_SPINLOCK(vas_ida_lock);
-
static void vas_release_window_id(struct ida *ida, int winid)
{
- spin_lock(&vas_ida_lock);
- ida_remove(ida, winid);
- spin_unlock(&vas_ida_lock);
+ ida_free(ida, winid);
}
static int vas_assign_window_id(struct ida *ida)
{
- int rc, winid;
-
- do {
- rc = ida_pre_get(ida, GFP_KERNEL);
- if (!rc)
- return -EAGAIN;
-
- spin_lock(&vas_ida_lock);
- rc = ida_get_new(ida, &winid);
- spin_unlock(&vas_ida_lock);
- } while (rc == -EAGAIN);
-
- if (rc)
- return rc;
+ int winid = ida_alloc_max(ida, VAX_WINDOWS_PER_CHIP, GFP_KERNEL);
- if (winid > VAS_WINDOWS_PER_CHIP) {
- pr_err("Too many (%d) open windows\n", winid);
- vas_release_window_id(ida, winid);
+ if (winid == -ENOSPC) {
+ pr_err("Too many (%d) open windows\n", VAX_WINDOWS_PER_CHIP);
return -EAGAIN;
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] selftests/powerpc: Fix strncpy usage
From: Segher Boessenkool @ 2018-06-21 23:18 UTC (permalink / raw)
To: Breno Leitao; +Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <1529535071-14555-1-git-send-email-leitao@debian.org>
On Wed, Jun 20, 2018 at 07:51:11PM -0300, Breno Leitao wrote:
> - strncpy(prog, argv[0], strlen(argv[0]));
> + strncpy(prog, argv[0], sizeof(prog) - 1);
strncpy(prog, argv[0], sizeof prog);
if (prog[sizeof prog - 1])
scream_bloody_murder();
Silently using the wrong data is a worse habit than not checking for
overflows ;-)
Segher
^ permalink raw reply
* Re: [PATCH kernel v2 REPOST] powerpc/powernv/ioda: Allocate indirect TCE levels on demand
From: David Gibson @ 2018-06-22 0:48 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: linuxppc-dev, kvm-ppc, Alex Williamson, Benjamin Herrenschmidt,
Russell Currey
In-Reply-To: <20180621093627.2900-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 14725 bytes --]
On Thu, Jun 21, 2018 at 07:36:27PM +1000, Alexey Kardashevskiy wrote:
> At the moment we allocate the entire TCE table, twice (hardware part and
> userspace translation cache). This normally works as we normally have
> contigous memory and the guest will map entire RAM for 64bit DMA.
>
> However if we have sparse RAM (one example is a memory device), then
> we will allocate TCEs which will never be used as the guest only maps
> actual memory for DMA. If it is a single level TCE table, there is nothing
> we can really do but if it a multilevel table, we can skip allocating
> TCEs we know we won't need.
>
> This adds ability to allocate only first level, saving memory.
>
> This changes iommu_table::free() to avoid allocating of an extra level;
> iommu_table::set() will do this when needed.
>
> This adds @alloc parameter to iommu_table::exchange() to tell the callback
> if it can allocate an extra level; the flag is set to "false" for
> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> H_TOO_HARD.
>
> This still requires the entire table to be counted in mm::locked_vm.
>
> To be conservative, this only does on-demand allocation when
> the usespace cache table is requested which is the case of VFIO.
>
> The example math for a system replicating a powernv setup with NVLink2
> in a guest:
> 16GB RAM mapped at 0x0
> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000
>
> the table to cover that all with 64K pages takes:
> (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB
>
> If we allocate only necessary TCE levels, we will only need:
> (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
> levels).
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>
>
> This is what I meant to post few days ago, sorry for the noise.
>
>
> ---
> Changes:
> v2:
> * fixed bug in cleanup path which forced the entire table to be
> allocated right before destroying
> * added memory allocation error handling pnv_tce()
> ---
> arch/powerpc/include/asm/iommu.h | 7 ++-
> arch/powerpc/platforms/powernv/pci.h | 6 ++-
> arch/powerpc/kvm/book3s_64_vio_hv.c | 4 +-
> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 73 +++++++++++++++++++++------
> arch/powerpc/platforms/powernv/pci-ioda.c | 8 +--
> drivers/vfio/vfio_iommu_spapr_tce.c | 2 +-
> 6 files changed, 73 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index 4bdcf22..daa3ee5 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -70,7 +70,7 @@ struct iommu_table_ops {
> unsigned long *hpa,
> enum dma_data_direction *direction);
>
> - __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> + __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
> #endif
> void (*clear)(struct iommu_table *tbl,
> long index, long npages);
> @@ -122,10 +122,13 @@ struct iommu_table {
> __be64 *it_userspace; /* userspace view of the table */
> struct iommu_table_ops *it_ops;
> struct kref it_kref;
> + int it_nid;
> };
>
> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), false))
> #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> - ((tbl)->it_ops->useraddrptr((tbl), (entry)))
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
>
> /* Pure 2^n version of get_order */
> static inline __attribute_const__
> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
> index 5e02408..1fa5590 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
> unsigned long attrs);
> extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
> extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> - unsigned long *hpa, enum dma_data_direction *direction);
> -extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
> + unsigned long *hpa, enum dma_data_direction *direction,
> + bool alloc);
> +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
> + bool alloc);
> extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>
> extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cc1caf..efb90d8 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
> {
> struct mm_iommu_table_group_mem_t *mem = NULL;
> const unsigned long pgsize = 1ULL << tbl->it_page_shift;
> - __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
> + __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
>
> if (!pua)
> /* it_userspace allocation might be delayed */
> @@ -264,7 +264,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
> {
> long ret;
> unsigned long hpa = 0;
> - __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
> + __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
> struct mm_iommu_table_group_mem_t *mem;
>
> if (!pua)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index 36c2eb0..fe96910 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -48,7 +48,7 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
> return addr;
> }
>
> -static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
> +static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
> {
> __be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
> int level = tbl->it_indirect_levels;
> @@ -57,7 +57,23 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
>
> while (level) {
> int n = (idx & mask) >> (level * shift);
> - unsigned long tce = be64_to_cpu(tmp[n]);
> + unsigned long tce;
> +
> + if (tmp[n] == 0) {
> + __be64 *tmp2;
> +
> + if (!alloc)
> + return NULL;
> +
> + tmp2 = pnv_alloc_tce_level(tbl->it_nid,
> + ilog2(tbl->it_level_size) + 3);
> + if (!tmp2)
> + return NULL;
> +
> + tmp[n] = cpu_to_be64(__pa(tmp2) |
> + TCE_PCI_READ | TCE_PCI_WRITE);
> + }
> + tce = be64_to_cpu(tmp[n]);
>
> tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
> idx &= ~mask;
> @@ -84,7 +100,7 @@ int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
> ((rpn + i) << tbl->it_page_shift);
> unsigned long idx = index - tbl->it_offset + i;
>
> - *(pnv_tce(tbl, false, idx)) = cpu_to_be64(newtce);
> + *(pnv_tce(tbl, false, idx, true)) = cpu_to_be64(newtce);
> }
>
> return 0;
> @@ -92,31 +108,46 @@ int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
>
> #ifdef CONFIG_IOMMU_API
> int pnv_tce_xchg(struct iommu_table *tbl, long index,
> - unsigned long *hpa, enum dma_data_direction *direction)
> + unsigned long *hpa, enum dma_data_direction *direction,
> + bool alloc)
> {
> u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> unsigned long newtce = *hpa | proto_tce, oldtce;
> unsigned long idx = index - tbl->it_offset;
> + __be64 *ptce = NULL;
>
> BUG_ON(*hpa & ~IOMMU_PAGE_MASK(tbl));
>
> + if (*direction == DMA_NONE) {
> + ptce = pnv_tce(tbl, false, idx, false);
> + if (!ptce) {
> + *hpa = 0;
> + return 0;
> + }
> + }
> +
> + if (!ptce) {
> + ptce = pnv_tce(tbl, false, idx, alloc);
> + if (!ptce)
> + return alloc ? H_HARDWARE : H_TOO_HARD;
> + }
> +
> if (newtce & TCE_PCI_WRITE)
> newtce |= TCE_PCI_READ;
>
> - oldtce = be64_to_cpu(xchg(pnv_tce(tbl, false, idx),
> - cpu_to_be64(newtce)));
> + oldtce = be64_to_cpu(xchg(ptce, cpu_to_be64(newtce)));
> *hpa = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> *direction = iommu_tce_direction(oldtce);
>
> return 0;
> }
>
> -__be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index)
> +__be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index, bool alloc)
> {
> if (WARN_ON_ONCE(!tbl->it_userspace))
> return NULL;
>
> - return pnv_tce(tbl, true, index - tbl->it_offset);
> + return pnv_tce(tbl, true, index - tbl->it_offset, alloc);
> }
> #endif
>
> @@ -126,14 +157,19 @@ void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
>
> for (i = 0; i < npages; i++) {
> unsigned long idx = index - tbl->it_offset + i;
> + __be64 *ptce = pnv_tce(tbl, false, idx, false);
>
> - *(pnv_tce(tbl, false, idx)) = cpu_to_be64(0);
> + if (ptce)
> + *ptce = cpu_to_be64(0);
> }
> }
>
> unsigned long pnv_tce_get(struct iommu_table *tbl, long index)
> {
> - __be64 *ptce = pnv_tce(tbl, false, index - tbl->it_offset);
> + __be64 *ptce = pnv_tce(tbl, false, index - tbl->it_offset, false);
> +
> + if (!ptce)
> + return 0;
>
> return be64_to_cpu(*ptce);
> }
> @@ -224,6 +260,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> unsigned int table_shift = max_t(unsigned int, entries_shift + 3,
> PAGE_SHIFT);
> const unsigned long tce_table_size = 1UL << table_shift;
> + unsigned int tmplevels = levels;
>
> if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS))
> return -EINVAL;
> @@ -231,6 +268,9 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> if (!is_power_of_2(window_size))
> return -EINVAL;
>
> + if (alloc_userspace_copy && (window_size > (1ULL << 32)))
> + tmplevels = 1;
> +
> /* Adjust direct table size from window_size and levels */
> entries_shift = (entries_shift + levels - 1) / levels;
> level_shift = entries_shift + 3;
> @@ -241,7 +281,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>
> /* Allocate TCE table */
> addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
> - levels, tce_table_size, &offset, &total_allocated);
> + tmplevels, tce_table_size, &offset, &total_allocated);
>
> /* addr==NULL means that the first level allocation failed */
> if (!addr)
> @@ -252,7 +292,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> * we did not allocate as much as we wanted,
> * release partially allocated table.
> */
> - if (offset < tce_table_size)
> + if (tmplevels == levels && offset < tce_table_size)
> goto free_tces_exit;
>
> /* Allocate userspace view of the TCE table */
> @@ -263,8 +303,8 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> &total_allocated_uas);
> if (!uas)
> goto free_tces_exit;
> - if (offset < tce_table_size ||
> - total_allocated_uas != total_allocated)
> + if (tmplevels == levels && (offset < tce_table_size ||
> + total_allocated_uas != total_allocated))
> goto free_uas_exit;
> }
>
> @@ -275,10 +315,11 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> tbl->it_indirect_levels = levels - 1;
> tbl->it_allocated_size = total_allocated;
> tbl->it_userspace = uas;
> + tbl->it_nid = nid;
>
> - pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d\n",
> + pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d/%d\n",
> window_size, tce_table_size, bus_offset, tbl->it_base,
> - tbl->it_userspace, levels);
> + tbl->it_userspace, tmplevels, levels);
>
> return 0;
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f3a7829..81489ae 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2011,7 +2011,7 @@ static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index,
> static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index,
> unsigned long *hpa, enum dma_data_direction *direction)
> {
> - long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction, true);
>
> if (!ret)
> pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, false);
> @@ -2022,7 +2022,7 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index,
> static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
> unsigned long *hpa, enum dma_data_direction *direction)
> {
> - long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction, false);
>
> if (!ret)
> pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, true);
> @@ -2176,7 +2176,7 @@ static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
> static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, long index,
> unsigned long *hpa, enum dma_data_direction *direction)
> {
> - long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction, true);
>
> if (!ret)
> pnv_pci_ioda2_tce_invalidate(tbl, index, 1, false);
> @@ -2187,7 +2187,7 @@ static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, long index,
> static int pnv_ioda2_tce_xchg_rm(struct iommu_table *tbl, long index,
> unsigned long *hpa, enum dma_data_direction *direction)
> {
> - long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction, false);
>
> if (!ret)
> pnv_pci_ioda2_tce_invalidate(tbl, index, 1, true);
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 5a2e8e4..6e174ef 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -635,7 +635,7 @@ static long tce_iommu_create_table(struct tce_container *container,
> page_shift, window_size, levels, ptbl);
>
> WARN_ON(!ret && !(*ptbl)->it_ops->free);
> - WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size));
> + WARN_ON(!ret && ((*ptbl)->it_allocated_size > table_size));
>
> return ret;
> }
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v2] powerpc/numa: Correct kernel message severity
From: Michael Ellerman @ 2018-06-22 1:32 UTC (permalink / raw)
To: Vipin K Parashar, linuxppc-dev; +Cc: nfont, Vipin K Parashar
In-Reply-To: <1521013954-21348-1-git-send-email-vipin@linux.vnet.ibm.com>
Vipin K Parashar <vipin@linux.vnet.ibm.com> writes:
> printk() in unmap_cpu_from_node() uses KERN_ERR message severity,
> for a WARNING message. Change it to pr_warn().
>
> Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
> ---
> arch/powerpc/mm/numa.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index edd8d0b..1632f4b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -163,8 +163,7 @@ static void unmap_cpu_from_node(unsigned long cpu)
> if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
> cpumask_clear_cpu(cpu, node_to_cpumask_map[node]);
> } else {
> - printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n",
> - cpu, node);
> + pr_warn("WARNING: cpu %lu not found in node %d\n", cpu, node);
> }
> }
The full function is:
static void unmap_cpu_from_node(unsigned long cpu)
{
int node = numa_cpu_lookup_table[cpu];
dbg("removing cpu %lu from node %d\n", cpu, node);
if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
cpumask_clear_cpu(cpu, node_to_cpumask_map[node]);
} else {
printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n",
cpu, node);
}
}
So we look up what node the CPU is on, and then we lookup the set of
CPUs on that node, and they don't match.
That seems like a bug, not a warning.
Have you looked at why we're seeing this warning? It seems like maybe
something else is going wrong to get us into this situation to begin
with.
If there's some good reason why this is happening, and it's truly
harmless, then we can just remove the printk() entirely.
cheers
^ permalink raw reply
* Re: [PATCH kernel] KVM: PPC: Fix hardware and emulated TCE tables matching
From: David Gibson @ 2018-06-22 1:17 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras
In-Reply-To: <20180620084258.1155-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 3176 bytes --]
On Wed, Jun 20, 2018 at 06:42:58PM +1000, Alexey Kardashevskiy wrote:
> When attaching a hardware table to LIOBN in KVM, we match table parameters
> such as page size, table offset and table size. However the tables are
> created via very different paths - VFIO and KVM - and the VFIO path goes
> through the platform code which has minimum TCE page size requirement
> (which is 4K but since we allocate memory by pages and cannot avoid
> alignment anyway, we align to 64k pages for powernv_defconfig).
>
> So when we match the tables, one might be bigger that the other which
> means the hardware table cannot get attached to LIOBN and DMA mapping
> fails.
>
> This removes the table size alignment from the guest visible table.
> This does not affect the memory allocation which is still aligned -
> kvmppc_tce_pages() takes care of this.
>
> This relaxes the check we do when attaching tables to allow the hardware
> table be bigger than the guest visible table.
>
> Ideally we want the KVM table to cover the same space as the hardware
> table does but since the hardware table may use multiple levels, and
> all levels must use the same table size (IODA2 design), the area it can
> actually cover might get very different from the window size which
> the guest requested, even though the guest won't map it all.
>
> Fixes: ca1fc489cf "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages"
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 8c456fa..8167ce8 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -180,7 +180,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
> if ((tbltmp->it_page_shift <= stt->page_shift) &&
> (tbltmp->it_offset << tbltmp->it_page_shift ==
> stt->offset << stt->page_shift) &&
> - (tbltmp->it_size << tbltmp->it_page_shift ==
> + (tbltmp->it_size << tbltmp->it_page_shift >=
> stt->size << stt->page_shift)) {
> /*
> * Reference the table to avoid races with
> @@ -296,7 +296,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> {
> struct kvmppc_spapr_tce_table *stt = NULL;
> struct kvmppc_spapr_tce_table *siter;
> - unsigned long npages, size;
> + unsigned long npages, size = args->size;
> int ret = -ENOMEM;
> int i;
>
> @@ -304,7 +304,6 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> (args->offset + args->size > (ULLONG_MAX >> args->page_shift)))
> return -EINVAL;
>
> - size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
> npages = kvmppc_tce_pages(size);
> ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
> if (ret)
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Nicholas Piggin @ 2018-06-22 2:15 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Aneesh Kumar K.V, Thiago Jung Bauermann,
Ram Pai, linuxppc-dev
In-Reply-To: <20180621212835.5636-14-willy@infradead.org>
On Thu, 21 Jun 2018 14:28:22 -0700
Matthew Wilcox <willy@infradead.org> wrote:
> ida_alloc_range is the perfect fit for this use case. Eliminates
> a custom spinlock, a call to ida_pre_get and a local check for the
> allocated ID exceeding a maximum.
>
> Signed-off-by: Matthew Wilcox <willy@infradead.org>
> ---
> arch/powerpc/mm/mmu_context_book3s64.c | 44 +++-----------------------
> 1 file changed, 4 insertions(+), 40 deletions(-)
>
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index f3d4b4a0e561..5a0cf2cc8ba0 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -26,48 +26,16 @@
> #include <asm/mmu_context.h>
> #include <asm/pgalloc.h>
>
> -static DEFINE_SPINLOCK(mmu_context_lock);
> static DEFINE_IDA(mmu_context_ida);
>
> static int alloc_context_id(int min_id, int max_id)
> {
> - int index, err;
> -
> -again:
> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
> - return -ENOMEM;
> -
> - spin_lock(&mmu_context_lock);
> - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> - spin_unlock(&mmu_context_lock);
> -
> - if (err == -EAGAIN)
> - goto again;
> - else if (err)
> - return err;
> -
> - if (index > max_id) {
> - spin_lock(&mmu_context_lock);
> - ida_remove(&mmu_context_ida, index);
> - spin_unlock(&mmu_context_lock);
> - return -ENOMEM;
> - }
> -
> - return index;
> + return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
> }
>
> void hash__reserve_context_id(int id)
> {
> - int rc, result = 0;
> -
> - do {
> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
> - break;
> -
> - spin_lock(&mmu_context_lock);
> - rc = ida_get_new_above(&mmu_context_ida, id, &result);
> - spin_unlock(&mmu_context_lock);
> - } while (rc == -EAGAIN);
> + int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
>
> WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", id, result);
> }
> @@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
>
> void __destroy_context(int context_id)
> {
> - spin_lock(&mmu_context_lock);
> - ida_remove(&mmu_context_ida, context_id);
> - spin_unlock(&mmu_context_lock);
> + ida_free(&mmu_context_ida, context_id);
> }
> EXPORT_SYMBOL_GPL(__destroy_context);
>
> @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
> {
> int index, context_id;
>
> - spin_lock(&mmu_context_lock);
> for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
> context_id = ctx->extended_id[index];
> if (context_id)
> - ida_remove(&mmu_context_ida, context_id);
> + ida_free(&mmu_context_ida, context_id);
> }
> - spin_unlock(&mmu_context_lock);
> }
>
> static void pte_frag_destroy(void *pte_frag)
This hunk should be okay because the mmu_context_lock does not protect
the extended_id array, right Aneesh?
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Matthew Wilcox @ 2018-06-22 4:38 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Aneesh Kumar K.V, Thiago Jung Bauermann,
Ram Pai, linuxppc-dev
In-Reply-To: <20180622121511.00ae9d00@roar.ozlabs.ibm.com>
On Fri, Jun 22, 2018 at 12:15:11PM +1000, Nicholas Piggin wrote:
> On Thu, 21 Jun 2018 14:28:22 -0700
> Matthew Wilcox <willy@infradead.org> wrote:
> > static int alloc_context_id(int min_id, int max_id)
...
> > - spin_lock(&mmu_context_lock);
> > - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> > - spin_unlock(&mmu_context_lock);
...
> > @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
> > {
> > int index, context_id;
> >
> > - spin_lock(&mmu_context_lock);
> > for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
> > context_id = ctx->extended_id[index];
> > if (context_id)
> > - ida_remove(&mmu_context_ida, context_id);
> > + ida_free(&mmu_context_ida, context_id);
> > }
> > - spin_unlock(&mmu_context_lock);
> > }
> >
> > static void pte_frag_destroy(void *pte_frag)
>
> This hunk should be okay because the mmu_context_lock does not protect
> the extended_id array, right Aneesh?
That's my understanding. The code today does this:
static inline int alloc_extended_context(struct mm_struct *mm,
unsigned long ea)
{
int context_id;
int index = ea >> MAX_EA_BITS_PER_CONTEXT;
context_id = hash__alloc_context_id();
if (context_id < 0)
return context_id;
VM_WARN_ON(mm->context.extended_id[index]);
mm->context.extended_id[index] = context_id;
so it's not currently protected by this lock. I suppose we are currently
protected from destroy_contexts() being called twice simultaneously, but
you'll notice that we don't zero the array elements in destroy_contexts(),
so if we somehow had a code path which could call it concurrently, we'd
be seeing warnings when the second caller tried to remove the context
IDs from the IDA. I deduced that something else must be preventing
this situation from occurring (like, oh i don't know, this function only
being called on process exit, so implicitly only called once per context).
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Thanks.
^ permalink raw reply
* Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Nicholas Piggin @ 2018-06-22 4:53 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Aneesh Kumar K.V, Thiago Jung Bauermann,
Ram Pai, linuxppc-dev
In-Reply-To: <20180622043815.GA31255@bombadil.infradead.org>
On Thu, 21 Jun 2018 21:38:15 -0700
Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jun 22, 2018 at 12:15:11PM +1000, Nicholas Piggin wrote:
> > On Thu, 21 Jun 2018 14:28:22 -0700
> > Matthew Wilcox <willy@infradead.org> wrote:
> > > static int alloc_context_id(int min_id, int max_id)
> ...
> > > - spin_lock(&mmu_context_lock);
> > > - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
> > > - spin_unlock(&mmu_context_lock);
> ...
> > > @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
> > > {
> > > int index, context_id;
> > >
> > > - spin_lock(&mmu_context_lock);
> > > for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
> > > context_id = ctx->extended_id[index];
> > > if (context_id)
> > > - ida_remove(&mmu_context_ida, context_id);
> > > + ida_free(&mmu_context_ida, context_id);
> > > }
> > > - spin_unlock(&mmu_context_lock);
> > > }
> > >
> > > static void pte_frag_destroy(void *pte_frag)
> >
> > This hunk should be okay because the mmu_context_lock does not protect
> > the extended_id array, right Aneesh?
>
> That's my understanding. The code today does this:
>
> static inline int alloc_extended_context(struct mm_struct *mm,
> unsigned long ea)
> {
> int context_id;
>
> int index = ea >> MAX_EA_BITS_PER_CONTEXT;
>
> context_id = hash__alloc_context_id();
> if (context_id < 0)
> return context_id;
>
> VM_WARN_ON(mm->context.extended_id[index]);
> mm->context.extended_id[index] = context_id;
>
> so it's not currently protected by this lock. I suppose we are currently
> protected from destroy_contexts() being called twice simultaneously, but
> you'll notice that we don't zero the array elements in destroy_contexts(),
> so if we somehow had a code path which could call it concurrently, we'd
> be seeing warnings when the second caller tried to remove the context
Yeah that'd be an existing bug.
> IDs from the IDA. I deduced that something else must be preventing
> this situation from occurring (like, oh i don't know, this function only
> being called on process exit, so implicitly only called once per context).
I think that's exactly right.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling
From: Finn Thain @ 2018-06-22 5:26 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Paul Mackerras, Michael Ellerman, Geert Uytterhoeven,
Joshua Thompson, Mathieu Malaterre, Benjamin Herrenschmidt,
Greg Ungerer, linux-m68k, linuxppc-dev, linux-kernel, y2038,
Meelis Roos, Andreas Schwab
In-Reply-To: <20180619140229.3615110-2-arnd@arndb.de>
On Tue, 19 Jun 2018, Arnd Bergmann wrote:
> The real-time clock on m68k (and powerpc) mac systems uses an unsigned
> 32-bit value starting in 1904, which overflows in 2040, about two years
> later than everyone else, but this gets wrapped around in the Linux code
> in 2038 already because of the deprecated usage of time_t and/or long in
> the conversion.
>
> Getting rid of the deprecated interfaces makes it work until 2040 as
> documented, and it could be easily extended by reinterpreting the
> resulting time64_t as a positive number. For the moment, I'm adding a
> WARN_ON() that triggers if we encounter a time before 1970 or after 2040
> (the two are indistinguishable).
>
I really don't like the WARN_ON(), but I'd prefer to address that in a
separate patch rather than impede the progress of this patch (or of this
series, since 3/3 seems to be unrelated).
BTW, have you considered using the same wrap-around test (i.e. YY < 70)
that we use for the year register in the other RTC chips?
> This brings it in line with the corresponding code that we have on
> powerpc macintosh.
>
Your recent patches to the Mac RTC routines (which are duplicated under
arch/m68k and arch/powerpc) conflict with my recent patch that
deduplicates the same code. So I will rebase and resubmit after someone
merges these fixes.
Apparently the PowerMac routines work now, which is sufficient testing for
me; the PowerMac routines will get tested on m68k Macs when that code gets
deduplicated again.
BTW, Joshua tells me that he is not doing code review. We should probably
drop the "M68K ON APPLE MACINTOSH" entry from the MAINTAINERS file, like
the Amiga and Atari ports...
--
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: Fix varargs passing bug pointed out by Andreas Schwab
> Fix a typo that caused a build regression
> ---
> arch/m68k/mac/misc.c | 62 +++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 39 insertions(+), 23 deletions(-)
>
> diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
> index c68054361615..0a2572a6bfe5 100644
> --- a/arch/m68k/mac/misc.c
> +++ b/arch/m68k/mac/misc.c
> @@ -26,33 +26,39 @@
>
> #include <asm/machdep.h>
>
> -/* Offset between Unix time (1970-based) and Mac time (1904-based) */
> +/* Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and PMU
> + * times wrap in 2040. If we need to handle later times, the read_time functions
> + * need to be changed to interpret wrapped times as post-2040. */
>
> #define RTC_OFFSET 2082844800
>
> static void (*rom_reset)(void);
>
> #ifdef CONFIG_ADB_CUDA
> -static long cuda_read_time(void)
> +static time64_t cuda_read_time(void)
> {
> struct adb_request req;
> - long time;
> + time64_t time;
>
> if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
> return 0;
> while (!req.complete)
> cuda_poll();
>
> - time = (req.reply[3] << 24) | (req.reply[4] << 16) |
> - (req.reply[5] << 8) | req.reply[6];
> + time = (u32)((req.reply[3] << 24) | (req.reply[4] << 16) |
> + (req.reply[5] << 8) | req.reply[6]);
> +
> + /* it's either after year 2040, or the RTC has gone backwards */
> + WARN_ON(time < RTC_OFFSET);
> +
> return time - RTC_OFFSET;
> }
>
> -static void cuda_write_time(long data)
> +static void cuda_write_time(time64_t time)
> {
> struct adb_request req;
> + u32 data = lower_32_bits(time + RTC_OFFSET);
>
> - data += RTC_OFFSET;
> if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
> (data >> 24) & 0xFF, (data >> 16) & 0xFF,
> (data >> 8) & 0xFF, data & 0xFF) < 0)
> @@ -86,26 +92,30 @@ static void cuda_write_pram(int offset, __u8 data)
> #endif /* CONFIG_ADB_CUDA */
>
> #ifdef CONFIG_ADB_PMU68K
> -static long pmu_read_time(void)
> +static time64_t pmu_read_time(void)
> {
> struct adb_request req;
> - long time;
> + time64_t time;
>
> if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
> return 0;
> while (!req.complete)
> pmu_poll();
>
> - time = (req.reply[1] << 24) | (req.reply[2] << 16) |
> - (req.reply[3] << 8) | req.reply[4];
> + time = (u32)((req.reply[1] << 24) | (req.reply[2] << 16) |
> + (req.reply[3] << 8) | req.reply[4]);
> +
> + /* it's either after year 2040, or the RTC has gone backwards */
> + WARN_ON(time < RTC_OFFSET);
> +
> return time - RTC_OFFSET;
> }
>
> -static void pmu_write_time(long data)
> +static void pmu_write_time(time64_t time)
> {
> struct adb_request req;
> + u32 data = lower_32_bits(time + RTC_OFFSET);
>
> - data += RTC_OFFSET;
> if (pmu_request(&req, NULL, 5, PMU_SET_RTC,
> (data >> 24) & 0xFF, (data >> 16) & 0xFF,
> (data >> 8) & 0xFF, data & 0xFF) < 0)
> @@ -269,8 +279,12 @@ static long via_read_time(void)
> via_pram_command(0x89, &result.cdata[1]);
> via_pram_command(0x8D, &result.cdata[0]);
>
> - if (result.idata == last_result.idata)
> + if (result.idata == last_result.idata) {
> + if (result.idata < RTC_OFFSET)
> + result.idata += 0x100000000ull;
> +
> return result.idata - RTC_OFFSET;
> + }
>
> if (++count > 10)
> break;
> @@ -291,11 +305,11 @@ static long via_read_time(void)
> * is basically any machine with Mac II-style ADB.
> */
>
> -static void via_write_time(long time)
> +static void via_write_time(time64_t time)
> {
> union {
> __u8 cdata[4];
> - long idata;
> + __u32 idata;
> } data;
> __u8 temp;
>
> @@ -585,12 +599,15 @@ void mac_reset(void)
> * This function translates seconds since 1970 into a proper date.
> *
> * Algorithm cribbed from glibc2.1, __offtime().
> + *
> + * This is roughly same as rtc_time64_to_tm(), which we should probably
> + * use here, but it's only available when CONFIG_RTC_LIB is enabled.
> */
> #define SECS_PER_MINUTE (60)
> #define SECS_PER_HOUR (SECS_PER_MINUTE * 60)
> #define SECS_PER_DAY (SECS_PER_HOUR * 24)
>
> -static void unmktime(unsigned long time, long offset,
> +static void unmktime(time64_t time, long offset,
> int *yearp, int *monp, int *dayp,
> int *hourp, int *minp, int *secp)
> {
> @@ -602,11 +619,10 @@ static void unmktime(unsigned long time, long offset,
> /* Leap years. */
> { 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366 }
> };
> - long int days, rem, y, wday, yday;
> + int days, rem, y, wday, yday;
> const unsigned short int *ip;
>
> - days = time / SECS_PER_DAY;
> - rem = time % SECS_PER_DAY;
> + days = div_u64_rem(time, SECS_PER_DAY, &rem);
> rem += offset;
> while (rem < 0) {
> rem += SECS_PER_DAY;
> @@ -657,7 +673,7 @@ static void unmktime(unsigned long time, long offset,
>
> int mac_hwclk(int op, struct rtc_time *t)
> {
> - unsigned long now;
> + time64_t now;
>
> if (!op) { /* read */
> switch (macintosh_config->adb_type) {
> @@ -693,8 +709,8 @@ int mac_hwclk(int op, struct rtc_time *t)
> __func__, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
> t->tm_hour, t->tm_min, t->tm_sec);
>
> - now = mktime(t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
> - t->tm_hour, t->tm_min, t->tm_sec);
> + now = mktime64(t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
> + t->tm_hour, t->tm_min, t->tm_sec);
>
> switch (macintosh_config->adb_type) {
> case MAC_ADB_IOP:
>
^ permalink raw reply
* Re: [PATCH 2/3] migration/memory: Evaluate LMB assoc changes
From: kbuild test robot @ 2018-06-22 5:41 UTC (permalink / raw)
To: Michael Bringmann
Cc: kbuild-all, linuxppc-dev, Nathan Fontenot, Michael Bringmann,
Thomas Falcon, Tyrel Datwyler, John Allen
In-Reply-To: <8e886b24-1226-756e-369a-ceca2e30c620@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]
Hi Michael,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.18-rc1 next-20180621]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-migration-Affinity-fix-for-memory/20180621-090151
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc
All errors (new ones prefixed by >>):
arch/powerpc/platforms/pseries/hotplug-memory.c: In function 'pseries_update_drconf_memory':
>> arch/powerpc/platforms/pseries/hotplug-memory.c:1032:4: error: implicit declaration of function 'dlpar_queue_action'; did you mean 'waitqueue_active'? [-Werror=implicit-function-declaration]
dlpar_queue_action(
^~~~~~~~~~~~~~~~~~
waitqueue_active
cc1: some warnings being treated as errors
vim +1032 arch/powerpc/platforms/pseries/hotplug-memory.c
999
1000 static int pseries_update_drconf_memory(struct drmem_lmb_info *new_dinfo)
1001 {
1002 struct drmem_lmb *old_lmb, *new_lmb;
1003 unsigned long memblock_size;
1004 int rc = 0;
1005
1006 if (rtas_hp_event)
1007 return 0;
1008
1009 memblock_size = pseries_memory_block_size();
1010 if (!memblock_size)
1011 return -EINVAL;
1012
1013 /* Arrays should have the same size and DRC indexes */
1014 for_each_pair_dinfo_lmb(drmem_info, old_lmb, new_dinfo, new_lmb) {
1015
1016 if (new_lmb->drc_index != old_lmb->drc_index)
1017 continue;
1018
1019 if ((old_lmb->flags & DRCONF_MEM_ASSIGNED) &&
1020 (!(new_lmb->flags & DRCONF_MEM_ASSIGNED))) {
1021 rc = pseries_remove_memblock(
1022 old_lmb->base_addr, memblock_size);
1023 break;
1024 } else if ((!(old_lmb->flags & DRCONF_MEM_ASSIGNED)) &&
1025 (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
1026 rc = memblock_add(old_lmb->base_addr,
1027 memblock_size);
1028 rc = (rc < 0) ? -EINVAL : 0;
1029 break;
1030 } else if ((old_lmb->aa_index != new_lmb->aa_index) &&
1031 (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
> 1032 dlpar_queue_action(
1033 PSERIES_HP_ELOG_RESOURCE_MEM,
1034 PSERIES_HP_ELOG_ACTION_READD,
1035 new_lmb->drc_index);
1036 }
1037 }
1038 return rc;
1039 }
1040
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56130 bytes --]
^ permalink raw reply
* Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Aneesh Kumar K.V @ 2018-06-22 5:47 UTC (permalink / raw)
To: Nicholas Piggin, Matthew Wilcox
Cc: Ram Pai, linux-kernel, Paul Mackerras, Thiago Jung Bauermann,
linuxppc-dev
In-Reply-To: <20180622121511.00ae9d00@roar.ozlabs.ibm.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> On Thu, 21 Jun 2018 14:28:22 -0700
> Matthew Wilcox <willy@infradead.org> wrote:
>
>> ida_alloc_range is the perfect fit for this use case. Eliminates
>> a custom spinlock, a call to ida_pre_get and a local check for the
>> allocated ID exceeding a maximum.
>>
>> Signed-off-by: Matthew Wilcox <willy@infradead.org>
>> ---
>> arch/powerpc/mm/mmu_context_book3s64.c | 44 +++-----------------------
>> 1 file changed, 4 insertions(+), 40 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
>> index f3d4b4a0e561..5a0cf2cc8ba0 100644
>> --- a/arch/powerpc/mm/mmu_context_book3s64.c
>> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
>> @@ -26,48 +26,16 @@
>> #include <asm/mmu_context.h>
>> #include <asm/pgalloc.h>
>>
>> -static DEFINE_SPINLOCK(mmu_context_lock);
>> static DEFINE_IDA(mmu_context_ida);
>>
>> static int alloc_context_id(int min_id, int max_id)
>> {
>> - int index, err;
>> -
>> -again:
>> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
>> - return -ENOMEM;
>> -
>> - spin_lock(&mmu_context_lock);
>> - err = ida_get_new_above(&mmu_context_ida, min_id, &index);
>> - spin_unlock(&mmu_context_lock);
>> -
>> - if (err == -EAGAIN)
>> - goto again;
>> - else if (err)
>> - return err;
>> -
>> - if (index > max_id) {
>> - spin_lock(&mmu_context_lock);
>> - ida_remove(&mmu_context_ida, index);
>> - spin_unlock(&mmu_context_lock);
>> - return -ENOMEM;
>> - }
>> -
>> - return index;
>> + return ida_alloc_range(&mmu_context_ida, min_id, max_id, GFP_KERNEL);
>> }
>>
>> void hash__reserve_context_id(int id)
>> {
>> - int rc, result = 0;
>> -
>> - do {
>> - if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
>> - break;
>> -
>> - spin_lock(&mmu_context_lock);
>> - rc = ida_get_new_above(&mmu_context_ida, id, &result);
>> - spin_unlock(&mmu_context_lock);
>> - } while (rc == -EAGAIN);
>> + int result = ida_alloc_range(&mmu_context_ida, id, id, GFP_KERNEL);
>>
>> WARN(result != id, "mmu: Failed to reserve context id %d (rc %d)\n", id, result);
>> }
>> @@ -172,9 +140,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
>>
>> void __destroy_context(int context_id)
>> {
>> - spin_lock(&mmu_context_lock);
>> - ida_remove(&mmu_context_ida, context_id);
>> - spin_unlock(&mmu_context_lock);
>> + ida_free(&mmu_context_ida, context_id);
>> }
>> EXPORT_SYMBOL_GPL(__destroy_context);
>>
>> @@ -182,13 +148,11 @@ static void destroy_contexts(mm_context_t *ctx)
>> {
>> int index, context_id;
>>
>> - spin_lock(&mmu_context_lock);
>> for (index = 0; index < ARRAY_SIZE(ctx->extended_id); index++) {
>> context_id = ctx->extended_id[index];
>> if (context_id)
>> - ida_remove(&mmu_context_ida, context_id);
>> + ida_free(&mmu_context_ida, context_id);
>> }
>> - spin_unlock(&mmu_context_lock);
>> }
>>
>> static void pte_frag_destroy(void *pte_frag)
>
> This hunk should be okay because the mmu_context_lock does not protect
> the extended_id array, right Aneesh?
Yes. This is called at process exit, so we should not find parallel
calls. On the allocation side, we are protected by mmap_sem. We do
allocate extended_id when doing mmap.
-aneesh
^ permalink raw reply
* Re: [PATCH 13/26] ppc: Convert mmu context allocation to new IDA API
From: Aneesh Kumar K.V @ 2018-06-22 5:47 UTC (permalink / raw)
To: Matthew Wilcox, Nicholas Piggin
Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Thiago Jung Bauermann, Ram Pai, linuxppc-dev
In-Reply-To: <20180622043815.GA31255@bombadil.infradead.org>
Matthew Wilcox <willy@infradead.org> writes:
> this situation from occurring (like, oh i don't know, this function only
> being called on process exit, so implicitly only called once per context).
>
That is correct.
^ permalink raw reply
* [PATCH 2/2] powerpc: Document issues with TM on POWER9
From: Michael Neuling @ 2018-06-22 6:14 UTC (permalink / raw)
To: mpe; +Cc: linuxppc-dev, mikey
In-Reply-To: <20180622061452.19016-1-mikey@neuling.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
.../powerpc/transactional_memory.txt | 44 +++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index e32fdbb4c9..b254eab517 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -198,3 +198,47 @@ presented). The transaction cannot then be continued and will take the failure
handler route. Furthermore, the transactional 2nd register state will be
inaccessible. GDB can currently be used on programs using TM, but not sensibly
in parts within transactions.
+
+POWER9
+======
+
+TM on POWER9 has issues with storing the complete register state. This
+is described in this commit:
+
+ commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
+ Author: Paul Mackerras <paulus@ozlabs.org>
+ Date: Wed Mar 21 21:32:01 2018 +1100
+ KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
+
+To account for this different POWER9 chips have TM enabled in
+different ways.
+
+On POWER9N DD2.01 and below, TM is disabled. ie
+HWCAP2[PPC_FEATURE2_HTM] is not set.
+
+On POWER9N DD2.1 TM is configured by firmware to always abort a
+transaction when tm suspend occurs. So tsuspend will cause a
+transaction to be aborted and rolled back. Kernel exceptions will also
+cause the transaction to be aborted and rolled back and the exception
+will not occur. If userspace constructs a sigcontext that enables TM
+suspend, the sigcontext will be rejected by the kernel. This mode is
+advertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set.
+HWCAP2[PPC_FEATURE2_HTM] is not set in this mode.
+
+On POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as
+descibed in commit 4bb3c7a0208f), hence TM is enabled for guests
+ie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that
+makes heavy use of TM suspend (tsuspend or kernel suspend) will result
+in traps into the hypervisor and hence will suffer a performance
+degredation. Host userspace has TM disabled
+ie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it
+at some point in the future if we bring the emulation into host
+userspace context switching).
+
+POWER9C DD1.2 and above are only avaliable with POWERNV and hence
+Linux only runs as a guest. On these systems TM is emulated like on
+POWER9N DD2.2.
+
+Guest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and
+POWER9C DD1.2. Since earlier POWER9 processors don't support TM
+emulation, migration from POWER8 to POWER9 is not supported there.
--
2.17.1
^ permalink raw reply related
* [PATCH 1/2] powerpc: Document issues with the DAWR on POWER9
From: Michael Neuling @ 2018-06-22 6:14 UTC (permalink / raw)
To: mpe; +Cc: linuxppc-dev, mikey
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Documentation/powerpc/DAWR-POWER9.txt | 58 +++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
create mode 100644 Documentation/powerpc/DAWR-POWER9.txt
diff --git a/Documentation/powerpc/DAWR-POWER9.txt b/Documentation/powerpc/DAWR-POWER9.txt
new file mode 100644
index 0000000000..cc6e69b69b
--- /dev/null
+++ b/Documentation/powerpc/DAWR-POWER9.txt
@@ -0,0 +1,58 @@
+DAWR issues on POWER9
+============================
+
+On POWER9 the DAWR can cause a checkstop if it points to cache
+inhibited (CI) memory. Currently Linux has no way to disinguish CI
+memory when configuring the DAWR, so (for now) the DAWR is disabled by
+this commit:
+
+ commit 9654153158d3e0684a1bdb76dbababdb7111d5a0
+ Author: Michael Neuling <mikey@neuling.org>
+ Date: Tue Mar 27 15:37:24 2018 +1100
+ powerpc: Disable DAWR in the base POWER9 CPU features
+
+Technical Details:
+============================
+
+DAWR has 6 different ways of being set.
+1) ptrace
+2) h_set_mode(DAWR)
+3) h_set_dabr()
+4) kvmppc_set_one_reg()
+5) xmon
+
+For ptrace, we now advertise zero breakpoints on POWER9 via the
+PPC_PTRACE_GETHWDBGINFO call. This results in GDB falling back to
+software emulation of the watchpoint (which is slow).
+
+h_set_mode(DAWR) and h_set_dabr() will now return an error to the
+guest on a POWER9 host. Current Linux guests ignore this error, so
+they will silently not get the DAWR.
+
+kvmppc_set_one_reg() will store the value in the vcpu but won't
+actually set it on POWER9 hardware. This is done so we don't break
+migration from POWER8 to POWER9, at the cost of silently losing the
+DAWR on the migration.
+
+For xmon, the 'bd' command will return an error on P9.
+
+Consequences for users
+============================
+
+For GDB watchpoints (ie 'watch' command) on POWER9 bare metal , GDB
+will accept the command. Unfortunatley since there is no hardware
+support for the watchpoint, GDB will software emulate the watchpoint
+making it run very slowly.
+
+The same will also be true for any guests started on a POWER9
+host. The watchpoint will fail and GDB will fall back to software
+emulation.
+
+If a guest is started on a POWER8 host, GDB will accept the watchpoint
+and configure the hardware to use the DAWR. This will run at full
+speed since it can use the hardware emualation. Unfortnatley if this
+guest is migrated to a POWER9 host, the watchpoint will be lost on the
+POWER9. Loads and stores to the watchpoint locations will not be
+trapped in GDB. The watchpoint is remembered, so if the guest is
+migrated back to the POWER8 host, it will start working again.
+
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox