* Re: [PATCH v10 09/12] ima: Implement support for module-style appended signatures
From: Thiago Jung Bauermann @ 2019-05-28 19:27 UTC (permalink / raw)
To: Mimi Zohar
Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
Jonathan Corbet, linux-kernel, James Morris, David Howells,
AKASHI, Takahiro, linux-security-module, keyrings, linux-crypto,
Jessica Yu, linux-integrity, linuxppc-dev, David Woodhouse,
Serge E. Hallyn
In-Reply-To: <1557835765.4139.9.camel@linux.ibm.com>
Mimi Zohar <zohar@linux.ibm.com> writes:
> Hi Thiago,
>
> On Thu, 2019-04-18 at 00:51 -0300, Thiago Jung Bauermann wrote:
>>
>> @@ -326,6 +356,10 @@ int ima_appraise_measurement(enum ima_hooks func,
>> case INTEGRITY_UNKNOWN:
>> break;
>> case INTEGRITY_NOXATTRS:/* No EVM protected xattrs. */
>> +/* It's fine not to have xattrs when using a modsig. */
>> +if (try_modsig)
>> +break;
>> +/* fall through */
>> case INTEGRITY_NOLABEL:/* No security.evm xattr. */
>> cause = "missing-HMAC";
>> goto out;
>> @@ -340,6 +374,14 @@ int ima_appraise_measurement(enum ima_hooks func,
>> rc = xattr_verify(func, iint, xattr_value, xattr_len, &status,
>> &cause);
>>
>> +/*
>> + * If we have a modsig and either no imasig or the imasig's key isn't
>> + * known, then try verifying the modsig.
>> + */
>> +if (status != INTEGRITY_PASS && try_modsig &&
>> + (!xattr_value || rc == -ENOKEY))
>> +rc = modsig_verify(func, modsig, &status, &cause);
>
> EVM protects other security xattrs, not just security.ima, if they
> exist. As a result, evm_verifyxattr() could pass based on the other
> security xattrs.
Indeed! It doesn't make sense to test for status != INTEGRITY_PASS here.
Not sure what I was thinking. Thanks for spotting it. With your other
comments about this if clause, this code now reads:
/*
* If we have a modsig and either no imasig or the imasig's key isn't
* known, then try verifying the modsig.
*/
if (try_modsig &&
(!xattr_value || xattr_value->type == IMA_XATTR_DIGEST_NG ||
rc == -ENOKEY))
rc = modsig_verify(func, modsig, &status, &cause);
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH RFC 0/5] Remove some notrace RCU APIs
From: Paul E. McKenney @ 2019-05-28 20:00 UTC (permalink / raw)
To: Joel Fernandes
Cc: rcu, Jonathan Corbet, linux-doc, Lai Jiangshan, Josh Triplett,
Steven Rostedt, linux-kernel, Miguel Ojeda, Ingo Molnar,
Mathieu Desnoyers, kvm-ppc, linuxppc-dev
In-Reply-To: <20190528190007.GC252809@google.com>
On Tue, May 28, 2019 at 03:00:07PM -0400, Joel Fernandes wrote:
> On Tue, May 28, 2019 at 05:24:47AM -0700, Paul E. McKenney wrote:
> > On Sat, May 25, 2019 at 02:14:07PM -0400, Joel Fernandes wrote:
> > > On Sat, May 25, 2019 at 08:50:35AM -0700, Paul E. McKenney wrote:
> > > > On Sat, May 25, 2019 at 10:19:54AM -0400, Joel Fernandes wrote:
> > > > > On Sat, May 25, 2019 at 07:08:26AM -0400, Steven Rostedt wrote:
> > > > > > On Sat, 25 May 2019 04:14:44 -0400
> > > > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > >
> > > > > > > > I guess the difference between the _raw_notrace and just _raw variants
> > > > > > > > is that _notrace ones do a rcu_check_sparse(). Don't we want to keep
> > > > > > > > that check?
> > > > > > >
> > > > > > > This is true.
> > > > > > >
> > > > > > > Since the users of _raw_notrace are very few, is it worth keeping this API
> > > > > > > just for sparse checking? The API naming is also confusing. I was expecting
> > > > > > > _raw_notrace to do fewer checks than _raw, instead of more. Honestly, I just
> > > > > > > want to nuke _raw_notrace as done in this series and later we can introduce a
> > > > > > > sparse checking version of _raw if need-be. The other option could be to
> > > > > > > always do sparse checking for _raw however that used to be the case and got
> > > > > > > changed in http://lists.infradead.org/pipermail/linux-afs/2016-July/001016.html
> > > > > >
> > > > > > What if we just rename _raw to _raw_nocheck, and _raw_notrace to _raw ?
> > > > >
> > > > > That would also mean changing 160 usages of _raw to _raw_nocheck in the
> > > > > kernel :-/.
> > > > >
> > > > > The tracing usage of _raw_notrace is only like 2 or 3 users. Can we just call
> > > > > rcu_check_sparse directly in the calling code for those and eliminate the APIs?
> > > > >
> > > > > I wonder what Paul thinks about the matter as well.
> > > >
> > > > My thought is that it is likely that a goodly number of the current uses
> > > > of _raw should really be some form of _check, with lockdep expressions
> > > > spelled out. Not that working out what exactly those lockdep expressions
> > > > should be is necessarily a trivial undertaking. ;-)
> > >
> > > Yes, currently where I am a bit stuck is the rcu_dereference_raw()
> > > cannot possibly know what SRCU domain it is under, so lockdep cannot check if
> > > an SRCU lock is held without the user also passing along the SRCU domain. I
> > > am trying to change lockdep to see if it can check if *any* srcu domain lock
> > > is held (regardless of which one) and complain if none are. This is at least
> > > better than no check at all.
> > >
> > > However, I think it gets tricky for mutexes. If you have something like:
> > > mutex_lock(some_mutex);
> > > p = rcu_dereference_raw(gp);
> > > mutex_unlock(some_mutex);
> > >
> > > This might be a perfectly valid invocation of _raw, however my checks (patch
> > > is still cooking) trigger a lockdep warning becase _raw cannot know that this
> > > is Ok. lockdep thinks it is not in a reader section. This then gets into the
> > > territory of a new rcu_derference_raw_protected(gp, assert_held(some_mutex))
> > > which sucks because its yet another API. To circumvent this issue, can we
> > > just have callers of rcu_dereference_raw ensure that they call
> > > rcu_read_lock() if they are protecting dereferences by a mutex? That would
> > > make things a lot easier and also may be Ok since rcu_read_lock is quite
> > > cheap.
> >
> > Why not just rcu_dereference_protected(lockdep_is_held(some_mutex))?
> > The API is already there, and no need for spurious readers.
>
> Hmm, so I gave a bad example, here is a better example:
>
> fib_get_table calls hlist_for_each_entry_rcu()
> hlist_for_each_entry_rcu calls rcu_dereference_raw().
>
> This is perfectly Ok to be called under rtnl_mutex. However rcu_dererence_raw
> in hlist_for_each_entry_rcu has no way of knowing that the rtnl_mutex held is
> sufficient for the protection since it is not directly called by the caller.
Agreed, and this just happens to be one of the use cases that led to
rcu_dereference_raw(). The calling code (in this case, FIB) simply has
no idea what the synchronization strategy might be.
> I am almost sure I saw other examples of rcu_dereference_raw being called
> this way as well.
And I am OK with this sort of use case. The ones I am less happy with
are the ones where there really is a lockdep expression that could be
constructed.
> I was trying to make an "automatic" lockdep check for all this, but it is
> quite hard to do so without passing down lockdep experessions down a call
> chain thus complicating all such callchains.
Understood! Not an easy task.
> Further I don't think code can trivially be converted from
> rcu_dereference_raw to rcu_dereference_protected even if the protection being
> offered is known, since the former does not do sparse checking and the latter
> might trigger false sparse checks in case the pointer in concern is protected
> both by RCU and non-RCU methods. I believe this is why you removed sparse
> checking from rcu_dereference_raw as well:
>
> http://lists.infradead.org/pipermail/linux-afs/2016-July/001016.html
Good point!
> > > > That aside, if we are going to change the name of an API that is
> > > > used 160 places throughout the tree, we would need to have a pretty
> > > > good justification. Without such a justification, it will just look
> > > > like pointless churn to the various developers and maintainers on the
> > > > receiving end of the patches.
> > >
> > > Actually, the API name change is not something I want to do, it is Steven
> > > suggestion. My suggestion is let us just delete _raw_notrace and just use the
> > > _raw API for tracing, since _raw doesn't do any tracing anyway. Steve pointed
> > > that _raw_notrace does sparse checking unlike _raw, but I think that isn't an
> > > issue since _raw doesn't do such checking at the moment anyway.. (if possible
> > > check my cover letter again for details/motivation of this series).
> >
> > Understood, but regardless of who suggested it, if we are to go through
> > with it, good justification will be required. ;-)
>
> Ok ;-). About the names of the APIs, I thought of leaving rcu_dereference_raw
> and its callers intact, and just rename:
>
> * hlist_for_each_entry_rcu_notrace
> * rcu_dereference_raw_notrace
>
> to:
> * hlist_for_each_entry_rcu_sparse
> * rcu_dereference_raw_sparse
>
> The _sparse would stand for "sparse checking". However I am open to better
> names..
>
> Such renaming would avoid confusion and keep the fact about sparse checking
> less ambiguous.
Let's give people a few days to propose different names, and if nothing
compelling, those names look good. There are not very many of them, so
the penalty for having to rename is quite low.
Thanx, Paul
^ permalink raw reply
* Re: [PATCH v10 09/12] ima: Implement support for module-style appended signatures
From: Mimi Zohar @ 2019-05-28 20:06 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Herbert Xu, linux-doc, Dmitry Kasatkin, David S. Miller,
Jonathan Corbet, linux-kernel, James Morris, David Howells,
AKASHI, Takahiro, linux-security-module, keyrings, linux-crypto,
Jessica Yu, linux-integrity, linuxppc-dev, David Woodhouse,
Serge E. Hallyn
In-Reply-To: <87zhn65qor.fsf@morokweng.localdomain>
On Tue, 2019-05-28 at 16:23 -0300, Thiago Jung Bauermann wrote:
> Mimi Zohar <zohar@linux.ibm.com> writes:
>
> > Hi Thiago,
> >
> >> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> >> index fca7a3f23321..a7a20a8c15c1 100644
> >> --- a/security/integrity/ima/ima_policy.c
> >> +++ b/security/integrity/ima/ima_policy.c
> >> @@ -1144,6 +1144,12 @@ void ima_delete_rules(void)
> >> }
> >> }
> >>
> >> +#define __ima_hook_stringify(str) (#str),
> >> +
> >> +const char *const func_tokens[] = {
> >> + __ima_hooks(__ima_hook_stringify)
> >> +};
> >> +
> >> #ifdef CONFIG_IMA_READ_POLICY
> >> enum {
> >> mask_exec = 0, mask_write, mask_read, mask_append
> >> @@ -1156,12 +1162,6 @@ static const char *const mask_tokens[] = {
> >> "MAY_APPEND"
> >> };
> >>
> >> -#define __ima_hook_stringify(str) (#str),
> >> -
> >> -static const char *const func_tokens[] = {
> >> - __ima_hooks(__ima_hook_stringify)
> >> -};
> >> -
> >> void *ima_policy_start(struct seq_file *m, loff_t *pos)
> >> {
> >> loff_t l = *pos;
> >
> > Is moving this something left over from previous versions or there is
> > a need for this change?
>
> Well, it's not a strong need, but it's still relevant in the current
> version. I use func_tokens in ima_read_modsig() in order to be able to
> mention the hook name in mod_check_sig()'s error message:
>
> In ima_read_modsig():
>
> rc = mod_check_sig(sig, buf_len, func_tokens[func]);
>
> And in mod_check_sig():
>
> pr_err("%s: Module is not signed with expected PKCS#7 message\n",
> name);
>
> If you think it's not worth it to expose func_tokens, I can make
> ima_read_modsig() pass a more generic const string such as "IMA modsig"
> for example.
This is fine. I somehow missed moving func_tokens[] outside of the
ifdef was in order to make it independent of "CONFIG_IMA_READ_POLICY".
thanks,
Mimi
^ permalink raw reply
* [PATCH] powerpc/pseries: avoid blocking in irq when queuing hotplug events
From: Nathan Lynch @ 2019-05-28 23:28 UTC (permalink / raw)
To: linuxppc-dev
A couple of bugs in queue_hotplug_event():
1. Unchecked kmalloc result which could lead to an oops.
2. Use of GFP_KERNEL allocations in interrupt context (this code's
only caller is ras_hotplug_interrupt()).
Use kmemdup to avoid open-coding the allocation+copy and check for
failure; use GFP_ATOMIC for both allocations.
Ultimately it probably would be better to avoid or reduce allocations
in this path if possible.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
Found by inspection, built but not runtime-tested.
arch/powerpc/platforms/pseries/dlpar.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 17958043e7f7..d70f9b925378 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -386,11 +386,11 @@ void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog)
struct pseries_hp_work *work;
struct pseries_hp_errorlog *hp_errlog_copy;
- hp_errlog_copy = kmalloc(sizeof(struct pseries_hp_errorlog),
- GFP_KERNEL);
- memcpy(hp_errlog_copy, hp_errlog, sizeof(struct pseries_hp_errorlog));
+ hp_errlog_copy = kmemdup(hp_errlog, sizeof(*hp_errlog), GFP_ATOMIC);
+ if (!hp_errlog_copy)
+ return;
- work = kmalloc(sizeof(struct pseries_hp_work), GFP_KERNEL);
+ work = kmalloc(sizeof(struct pseries_hp_work), GFP_ATOMIC);
if (work) {
INIT_WORK((struct work_struct *)work, pseries_hp_work_fn);
work->errlog = hp_errlog_copy;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] powerpc/configs: Rename foo_basic_defconfig to foo_base.config
From: Masahiro Yamada @ 2019-05-29 1:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Linux Kbuild mailing list, linuxppc-dev
In-Reply-To: <20190528121009.GA11901@infradead.org>
On Tue, May 28, 2019 at 9:10 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, May 28, 2019 at 06:16:14PM +1000, Michael Ellerman wrote:
> > We have several "defconfigs" that are not actually full defconfigs
> > they are just a base set of options which are then merged with other
> > fragments to produce a working defconfig.
The default values from Kconfig files are used
where CONFIG options are not specified by the defconfig.
So, I think corenet_basic_defconfig is a full defconfig
even if it contains a single CONFIG option.
Since the difference between "*_defconfig" and "*.config"
is ambiguous in some cases, it depends on the intended usage.
> > The most obvious example is corenet_basic_defconfig which only
> > contains one symbol CONFIG_CORENET_GENERIC=y. But there is also
> > mpc85xx_base_defconfig which doesn't actually enable CONFIG_PPC_85xx.
> >
> > To avoid confusion, rename these config fragments to "foo_base.config"
> > to make it clearer that they are not full defconfigs.
>
> Adding linux-kbuild, maybe we can make the handling of these fragments
> generic and actually document it..
I do not know how it should be documented.
> >
> > Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> > ---
> > arch/powerpc/Makefile | 12 ++++++------
> > .../{corenet_basic_defconfig => corenet_base.config} | 0
> > .../{mpc85xx_basic_defconfig => mpc85xx_base.config} | 0
> > .../{mpc86xx_basic_defconfig => mpc86xx_base.config} | 0
> > 4 files changed, 6 insertions(+), 6 deletions(-)
> > rename arch/powerpc/configs/{corenet_basic_defconfig => corenet_base.config} (100%)
> > rename arch/powerpc/configs/{mpc85xx_basic_defconfig => mpc85xx_base.config} (100%)
> > rename arch/powerpc/configs/{mpc86xx_basic_defconfig => mpc86xx_base.config} (100%)
> >
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index c345b79414a9..94f735db2229 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -333,32 +333,32 @@ PHONY += powernv_be_defconfig
> >
> > PHONY += mpc85xx_defconfig
> > mpc85xx_defconfig:
> > - $(call merge_into_defconfig,mpc85xx_basic_defconfig,\
> > + $(call merge_into_defconfig,mpc85xx_base.config,\
> > 85xx-32bit 85xx-hw fsl-emb-nonhw)
> >
> > PHONY += mpc85xx_smp_defconfig
> > mpc85xx_smp_defconfig:
> > - $(call merge_into_defconfig,mpc85xx_basic_defconfig,\
> > + $(call merge_into_defconfig,mpc85xx_base.config,\
> > 85xx-32bit 85xx-smp 85xx-hw fsl-emb-nonhw)
> >
> > PHONY += corenet32_smp_defconfig
> > corenet32_smp_defconfig:
> > - $(call merge_into_defconfig,corenet_basic_defconfig,\
> > + $(call merge_into_defconfig,corenet_base.config,\
> > 85xx-32bit 85xx-smp 85xx-hw fsl-emb-nonhw dpaa)
> >
> > PHONY += corenet64_smp_defconfig
> > corenet64_smp_defconfig:
> > - $(call merge_into_defconfig,corenet_basic_defconfig,\
> > + $(call merge_into_defconfig,corenet_base.config,\
> > 85xx-64bit 85xx-smp altivec 85xx-hw fsl-emb-nonhw dpaa)
> >
> > PHONY += mpc86xx_defconfig
> > mpc86xx_defconfig:
> > - $(call merge_into_defconfig,mpc86xx_basic_defconfig,\
> > + $(call merge_into_defconfig,mpc86xx_base.config,\
> > 86xx-hw fsl-emb-nonhw)
> >
> > PHONY += mpc86xx_smp_defconfig
> > mpc86xx_smp_defconfig:
> > - $(call merge_into_defconfig,mpc86xx_basic_defconfig,\
> > + $(call merge_into_defconfig,mpc86xx_base.config,\
> > 86xx-smp 86xx-hw fsl-emb-nonhw)
> >
> > PHONY += ppc32_allmodconfig
> > diff --git a/arch/powerpc/configs/corenet_basic_defconfig b/arch/powerpc/configs/corenet_base.config
> > similarity index 100%
> > rename from arch/powerpc/configs/corenet_basic_defconfig
> > rename to arch/powerpc/configs/corenet_base.config
> > diff --git a/arch/powerpc/configs/mpc85xx_basic_defconfig b/arch/powerpc/configs/mpc85xx_base.config
> > similarity index 100%
> > rename from arch/powerpc/configs/mpc85xx_basic_defconfig
> > rename to arch/powerpc/configs/mpc85xx_base.config
> > diff --git a/arch/powerpc/configs/mpc86xx_basic_defconfig b/arch/powerpc/configs/mpc86xx_base.config
> > similarity index 100%
> > rename from arch/powerpc/configs/mpc86xx_basic_defconfig
> > rename to arch/powerpc/configs/mpc86xx_base.config
> > --
> > 2.20.1
> >
> ---end quoted text---
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* [PATCH v4 2/2] powerpc: Fix compile issue with force DAWR
From: Michael Neuling @ 2019-05-29 2:01 UTC (permalink / raw)
To: mpe; +Cc: Mathieu Malaterre, mikey, linuxppc-dev
In-Reply-To: <20190529020115.14201-1-mikey@neuling.org>
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable'
This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").
This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.
Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling <mikey@neuling.org>
--
v4:
- Fix merge conflict with patch from Mathieu Malaterre:
powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
- Fixed checkpatch issues noticed by Christophe Leroy.
v3:
Fixes based on Christophe Leroy's comments:
- Fix Kconfig options to better reflect reality
- Reorder alphabetically
- Inline vs #define
- Fixed default return for dawr_enabled() when CONFIG_PPC_DAWR=N
V2:
Fixes based on Christophe Leroy's comments:
- Fix commit message formatting
- Move more DAWR code into dawr.c
---
arch/powerpc/Kconfig | 5 ++
arch/powerpc/include/asm/hw_breakpoint.h | 21 +++--
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/dawr.c | 100 +++++++++++++++++++++++
arch/powerpc/kernel/hw_breakpoint.c | 61 --------------
arch/powerpc/kernel/process.c | 28 -------
arch/powerpc/kvm/Kconfig | 1 +
7 files changed, 121 insertions(+), 96 deletions(-)
create mode 100644 arch/powerpc/kernel/dawr.c
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308..87a3ce4e92 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -234,6 +234,7 @@ config PPC
select OLD_SIGSUSPEND
select PCI_DOMAINS if PCI
select PCI_SYSCALL if PCI
+ select PPC_DAWR if PPC64
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
@@ -370,6 +371,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
+config PPC_DAWR
+ bool
+ default n
+
config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..41abdae6d0 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,18 +90,25 @@ static inline void hw_breakpoint_disable(void)
extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
int hw_breakpoint_handler(struct die_args *args);
-extern int set_dawr(struct arch_hw_breakpoint *brk);
+#else /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline void hw_breakpoint_disable(void) { }
+static inline void thread_change_pc(struct task_struct *tsk,
+ struct pt_regs *regs) { }
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+
+#ifdef CONFIG_PPC_DAWR
extern bool dawr_force_enable;
static inline bool dawr_enabled(void)
{
return dawr_force_enable;
}
-
-#else /* CONFIG_HAVE_HW_BREAKPOINT */
-static inline void hw_breakpoint_disable(void) { }
-static inline void thread_change_pc(struct task_struct *tsk,
- struct pt_regs *regs) { }
+int set_dawr(struct arch_hw_breakpoint *brk);
+#else
static inline bool dawr_enabled(void) { return false; }
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline int set_dawr(struct arch_hw_breakpoint *brk) { return -1; }
+#endif
+
#endif /* __KERNEL__ */
#endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..56dfa7a2a6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
obj-$(CONFIG_VDSO32) += vdso32/
obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR) += dawr.o
obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o
obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o
obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 0000000000..c8b3fb610c
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR infrastructure
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <asm/debugfs.h>
+#include <asm/machdep.h>
+#include <asm/hvcall.h>
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
+
+int set_dawr(struct arch_hw_breakpoint *brk)
+{
+ unsigned long dawr, dawrx, mrd;
+
+ dawr = brk->address;
+
+ dawrx = (brk->type & (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE))
+ << (63 - 58);
+ dawrx |= ((brk->type & (HW_BRK_TYPE_TRANSLATE)) >> 2) << (63 - 59);
+ dawrx |= (brk->type & (HW_BRK_TYPE_PRIV_ALL)) >> 3;
+ /* dawr length is stored in field MDR bits 48:53. Matches range in
+ * doublewords (64 bits) baised by -1 eg. 0b000000=1DW and
+ * 0b111111=64DW.
+ * brk->len is in bytes.
+ * This aligns up to double word size, shifts and does the bias.
+ */
+ mrd = ((brk->len + 7) >> 3) - 1;
+ dawrx |= (mrd & 0x3f) << (63 - 53);
+
+ if (ppc_md.set_dawr)
+ return ppc_md.set_dawr(dawr, dawrx);
+ mtspr(SPRN_DAWR, dawr);
+ mtspr(SPRN_DAWRX, dawrx);
+ return 0;
+}
+
+static void set_dawr_cb(void *info)
+{
+ set_dawr(info);
+}
+
+static ssize_t dawr_write_file_bool(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct arch_hw_breakpoint null_brk = {0, 0, 0};
+ size_t rc;
+
+ /* Send error to user if they hypervisor won't allow us to write DAWR */
+ if ((!dawr_force_enable) &&
+ (firmware_has_feature(FW_FEATURE_LPAR)) &&
+ (set_dawr(&null_brk) != H_SUCCESS))
+ return -1;
+
+ rc = debugfs_write_file_bool(file, user_buf, count, ppos);
+ if (rc)
+ return rc;
+
+ /* If we are clearing, make sure all CPUs have the DAWR cleared */
+ if (!dawr_force_enable)
+ smp_call_function(set_dawr_cb, &null_brk, 0);
+
+ return rc;
+}
+
+static const struct file_operations dawr_enable_fops = {
+ .read = debugfs_read_file_bool,
+ .write = dawr_write_file_bool,
+ .open = simple_open,
+ .llseek = default_llseek,
+};
+
+static int __init dawr_force_setup(void)
+{
+ dawr_force_enable = false;
+
+ if (cpu_has_feature(CPU_FTR_DAWR)) {
+ /* Don't setup sysfs file for user control on P8 */
+ dawr_force_enable = true;
+ return 0;
+ }
+
+ if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
+ /* Turn DAWR off by default, but allow admin to turn it on */
+ dawr_force_enable = false;
+ debugfs_create_file_unsafe("dawr_enable_dangerous", 0600,
+ powerpc_debugfs_root,
+ &dawr_force_enable,
+ &dawr_enable_fops);
+ }
+ return 0;
+}
+arch_initcall(dawr_force_setup);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index ca3a2358b7..95605a9c9a 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -380,64 +380,3 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
{
/* TODO */
}
-
-bool dawr_force_enable;
-EXPORT_SYMBOL_GPL(dawr_force_enable);
-
-static void set_dawr_cb(void *info)
-{
- set_dawr(info);
-}
-
-static ssize_t dawr_write_file_bool(struct file *file,
- const char __user *user_buf,
- size_t count, loff_t *ppos)
-{
- struct arch_hw_breakpoint null_brk = {0, 0, 0};
- size_t rc;
-
- /* Send error to user if they hypervisor won't allow us to write DAWR */
- if ((!dawr_force_enable) &&
- (firmware_has_feature(FW_FEATURE_LPAR)) &&
- (set_dawr(&null_brk) != H_SUCCESS))
- return -1;
-
- rc = debugfs_write_file_bool(file, user_buf, count, ppos);
- if (rc)
- return rc;
-
- /* If we are clearing, make sure all CPUs have the DAWR cleared */
- if (!dawr_force_enable)
- smp_call_function(set_dawr_cb, &null_brk, 0);
-
- return rc;
-}
-
-static const struct file_operations dawr_enable_fops = {
- .read = debugfs_read_file_bool,
- .write = dawr_write_file_bool,
- .open = simple_open,
- .llseek = default_llseek,
-};
-
-static int __init dawr_force_setup(void)
-{
- dawr_force_enable = false;
-
- if (cpu_has_feature(CPU_FTR_DAWR)) {
- /* Don't setup sysfs file for user control on P8 */
- dawr_force_enable = true;
- return 0;
- }
-
- if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
- /* Turn DAWR off by default, but allow admin to turn it on */
- dawr_force_enable = false;
- debugfs_create_file_unsafe("dawr_enable_dangerous", 0600,
- powerpc_debugfs_root,
- &dawr_force_enable,
- &dawr_enable_fops);
- }
- return 0;
-}
-arch_initcall(dawr_force_setup);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 87da401299..03a2da35ce 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -797,34 +797,6 @@ static inline int set_dabr(struct arch_hw_breakpoint *brk)
return __set_dabr(dabr, dabrx);
}
-int set_dawr(struct arch_hw_breakpoint *brk)
-{
- unsigned long dawr, dawrx, mrd;
-
- dawr = brk->address;
-
- dawrx = (brk->type & (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE)) \
- << (63 - 58); //* read/write bits */
- dawrx |= ((brk->type & (HW_BRK_TYPE_TRANSLATE)) >> 2) \
- << (63 - 59); //* translate */
- dawrx |= (brk->type & (HW_BRK_TYPE_PRIV_ALL)) \
- >> 3; //* PRIM bits */
- /* dawr length is stored in field MDR bits 48:53. Matches range in
- doublewords (64 bits) baised by -1 eg. 0b000000=1DW and
- 0b111111=64DW.
- brk->len is in bytes.
- This aligns up to double word size, shifts and does the bias.
- */
- mrd = ((brk->len + 7) >> 3) - 1;
- dawrx |= (mrd & 0x3f) << (63 - 53);
-
- if (ppc_md.set_dawr)
- return ppc_md.set_dawr(dawr, dawrx);
- mtspr(SPRN_DAWR, dawr);
- mtspr(SPRN_DAWRX, dawrx);
- return 0;
-}
-
void __set_breakpoint(struct arch_hw_breakpoint *brk)
{
memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk));
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index f53997a8ca..b8e13d5a4a 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -38,6 +38,7 @@ config KVM_BOOK3S_32_HANDLER
config KVM_BOOK3S_64_HANDLER
bool
select KVM_BOOK3S_HANDLER
+ select PPC_DAWR_FORCE_ENABLE
config KVM_BOOK3S_PR_POSSIBLE
bool
--
2.21.0
^ permalink raw reply related
* [PATCH v4 1/2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
From: Michael Neuling @ 2019-05-29 2:01 UTC (permalink / raw)
To: mpe; +Cc: Mathieu Malaterre, mikey, linuxppc-dev
From: Mathieu Malaterre <malat@debian.org>
In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:
smp_call_function((smp_call_func_t)set_dawr, &null_brk, 0);
Since GCC 8 this triggers the following warning about incompatible
function types:
arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void *)' [-Werror=cast-function-type]
Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.
Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/hw_breakpoint.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..ca3a2358b7 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -384,6 +384,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
bool dawr_force_enable;
EXPORT_SYMBOL_GPL(dawr_force_enable);
+static void set_dawr_cb(void *info)
+{
+ set_dawr(info);
+}
+
static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -403,7 +408,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
- smp_call_function((smp_call_func_t)set_dawr, &null_brk, 0);
+ smp_call_function(set_dawr_cb, &null_brk, 0);
return rc;
}
--
2.21.0
^ permalink raw reply related
* Re: [PATCH v4 2/2] powerpc: Fix compile issue with force DAWR
From: Christoph Hellwig @ 2019-05-29 6:28 UTC (permalink / raw)
To: Michael Neuling; +Cc: Mathieu Malaterre, linuxppc-dev
In-Reply-To: <20190529020115.14201-2-mikey@neuling.org>
> +config PPC_DAWR
> + bool
> + default n
"default n" is the default default. No need to write this line.
> +++ b/arch/powerpc/kernel/dawr.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +//
> +// DAWR infrastructure
> +//
> +// Copyright 2019, Michael Neuling, IBM Corporation.
Normal top of file header should be /* */, //-style comments are only
for the actual SPDX heder line.
> + /* Send error to user if they hypervisor won't allow us to write DAWR */
> + if ((!dawr_force_enable) &&
> + (firmware_has_feature(FW_FEATURE_LPAR)) &&
> + (set_dawr(&null_brk) != H_SUCCESS))
None of the three inner brace sets here are required, and the code
becomes much easier to read without them.
> + return -1;
What about returning a proper error code?
> +static int __init dawr_force_setup(void)
> +{
> + dawr_force_enable = false;
This variable already is initialized to alse by default, so this line
is not required.
> + if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
> + /* Turn DAWR off by default, but allow admin to turn it on */
> + dawr_force_enable = false;
.. and neither is this one.
^ permalink raw reply
* Re: [PATCH v4 1/3] powerpc: Fix vDSO clock_getres()
From: Sasha Levin @ 2019-05-29 13:14 UTC (permalink / raw)
To: Sasha Levin, Vincenzo Frascino, linux-arch, linuxppc-dev
Cc: stable, Paul Mackerras, vincenzo.frascino
In-Reply-To: <20190523112116.19233-2-vincenzo.frascino@arm.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: a7f290dad32ee [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel.
The bot has tested the following trees: v5.1.4, v5.0.18, v4.19.45, v4.14.121, v4.9.178, v4.4.180, v3.18.140.
v5.1.4: Build OK!
v5.0.18: Build OK!
v4.19.45: Build OK!
v4.14.121: Failed to apply! Possible dependencies:
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
v4.9.178: Failed to apply! Possible dependencies:
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
5d451a87e5ebb ("powerpc/64: Retrieve number of L1 cache sets from device-tree")
7c5b06cadf274 ("KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9")
83677f551e0a6 ("KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9")
902e06eb86cd6 ("powerpc/32: Change the stack protector canary value per task")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bd067f83b0840 ("powerpc/64: Fix naming of cache block vs. cache line")
e2827fe5c1566 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
e9cf1e085647b ("KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs")
f4c51f841d2ac ("KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests")
v4.4.180: Failed to apply! Possible dependencies:
153086644fd1f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
3eb5d5888dc68 ("powerpc: Add ppc_strict_facility_enable boot option")
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
579e633e764e6 ("powerpc: create flush_all_to_thread()")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
70fe3d980f5f1 ("powerpc: Restore FPU/VEC/VSX if previously used")
85baa095497f3 ("powerpc/livepatch: Add live patching support on ppc64le")
902e06eb86cd6 ("powerpc/32: Change the stack protector canary value per task")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bf76f73c5f655 ("powerpc: enable UBSAN support")
c208505900b23 ("powerpc: create giveup_all()")
d1e1cf2e38def ("powerpc: clean up asm/switch_to.h")
dc4fbba11e466 ("powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()")
f17c4e01e906c ("powerpc/module: Mark module stubs with a magic value")
v3.18.140: Failed to apply! Possible dependencies:
10239733ee861 ("powerpc: Remove bootmem allocator")
2449acc5348b9 ("powerpc/kernel: Enable seccomp filter")
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
49e4e15619cd7 ("tile: support CONTEXT_TRACKING and thus NOHZ_FULL")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
73569d87e2cc5 ("MIPS: OCTEON: Enable little endian kernel.")
817820b0226a1 ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask")
83fe27ea53116 ("rcu: Make SRCU optional by using CONFIG_SRCU")
85baa095497f3 ("powerpc/livepatch: Add live patching support on ppc64le")
b01aec9b2c7d3 ("EDAC: Cleanup atomic_scrub mess")
b30e759072c18 ("powerpc/mm: Switch to generic RCU get_user_pages_fast")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bf76f73c5f655 ("powerpc: enable UBSAN support")
c54b2bf1b5e99 ("powerpc: Add ppc64 hard lockup detector support")
f30c59e921f12 ("mm: Update generic gup implementation to handle hugepage directory")
f47436734dc89 ("tile: Use the more common pr_warn instead of pr_warning")
How should we proceed with this patch?
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH v3 1/3] powerpc: Fix vDSO clock_getres()
From: Sasha Levin @ 2019-05-29 13:14 UTC (permalink / raw)
To: Sasha Levin, Vincenzo Frascino, linux-arch, linuxppc-dev
Cc: stable, Paul Mackerras
In-Reply-To: <20190522110722.28094-2-vincenzo.frascino@arm.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: a7f290dad32ee [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel.
The bot has tested the following trees: v5.1.4, v5.0.18, v4.19.45, v4.14.121, v4.9.178, v4.4.180, v3.18.140.
v5.1.4: Build OK!
v5.0.18: Build OK!
v4.19.45: Build OK!
v4.14.121: Failed to apply! Possible dependencies:
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
v4.9.178: Failed to apply! Possible dependencies:
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
5d451a87e5ebb ("powerpc/64: Retrieve number of L1 cache sets from device-tree")
7c5b06cadf274 ("KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9")
83677f551e0a6 ("KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9")
902e06eb86cd6 ("powerpc/32: Change the stack protector canary value per task")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bd067f83b0840 ("powerpc/64: Fix naming of cache block vs. cache line")
e2827fe5c1566 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
e9cf1e085647b ("KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs")
f4c51f841d2ac ("KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests")
v4.4.180: Failed to apply! Possible dependencies:
153086644fd1f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
3eb5d5888dc68 ("powerpc: Add ppc_strict_facility_enable boot option")
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
579e633e764e6 ("powerpc: create flush_all_to_thread()")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
70fe3d980f5f1 ("powerpc: Restore FPU/VEC/VSX if previously used")
85baa095497f3 ("powerpc/livepatch: Add live patching support on ppc64le")
902e06eb86cd6 ("powerpc/32: Change the stack protector canary value per task")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bf76f73c5f655 ("powerpc: enable UBSAN support")
c208505900b23 ("powerpc: create giveup_all()")
d1e1cf2e38def ("powerpc: clean up asm/switch_to.h")
dc4fbba11e466 ("powerpc: Create disable_kernel_{fp,altivec,vsx,spe}()")
f17c4e01e906c ("powerpc/module: Mark module stubs with a magic value")
v3.18.140: Failed to apply! Possible dependencies:
10239733ee861 ("powerpc: Remove bootmem allocator")
2449acc5348b9 ("powerpc/kernel: Enable seccomp filter")
4546561551106 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
49e4e15619cd7 ("tile: support CONTEXT_TRACKING and thus NOHZ_FULL")
5c929885f1bb4 ("powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
73569d87e2cc5 ("MIPS: OCTEON: Enable little endian kernel.")
817820b0226a1 ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask")
83fe27ea53116 ("rcu: Make SRCU optional by using CONFIG_SRCU")
85baa095497f3 ("powerpc/livepatch: Add live patching support on ppc64le")
b01aec9b2c7d3 ("EDAC: Cleanup atomic_scrub mess")
b30e759072c18 ("powerpc/mm: Switch to generic RCU get_user_pages_fast")
b5b4453e7912f ("powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038")
bf76f73c5f655 ("powerpc: enable UBSAN support")
c54b2bf1b5e99 ("powerpc: Add ppc64 hard lockup detector support")
f30c59e921f12 ("mm: Update generic gup implementation to handle hugepage directory")
f47436734dc89 ("tile: Use the more common pr_warn instead of pr_warning")
How should we proceed with this patch?
--
Thanks,
Sasha
^ permalink raw reply
* [PATCH] powerpc/64s: Fix misleading SPR and timebase information
From: Shaokun Zhang @ 2019-05-29 9:21 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Shaokun Zhang, Nicholas Piggin
pr_info shows SPR and timebase as a decimal value with a '0x'
prefix, which is somewhat misleading.
Fix it to print hexadecimal, as was intended.
Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
arch/powerpc/platforms/powernv/idle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index c9133f7908ca..77f2e0a4ee37 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -1159,10 +1159,10 @@ static void __init pnv_power9_idle_init(void)
pnv_deepest_stop_psscr_mask);
}
- pr_info("cpuidle-powernv: First stop level that may lose SPRs = 0x%lld\n",
+ pr_info("cpuidle-powernv: First stop level that may lose SPRs = 0x%llx\n",
pnv_first_spr_loss_level);
- pr_info("cpuidle-powernv: First stop level that may lose timebase = 0x%lld\n",
+ pr_info("cpuidle-powernv: First stop level that may lose timebase = 0x%llx\n",
pnv_first_tb_loss_level);
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] powerpc/configs: Rename foo_basic_defconfig to foo_base.config
From: Michael Ellerman @ 2019-05-29 13:55 UTC (permalink / raw)
To: Masahiro Yamada, Christoph Hellwig
Cc: linuxppc-dev, Linux Kbuild mailing list
In-Reply-To: <CAK7LNAS3iTOeX5b2F7E9PeWqma1_hx7Tbrt2V=3fvrqhSk5Zug@mail.gmail.com>
Masahiro Yamada <yamada.masahiro@socionext.com> writes:
> On Tue, May 28, 2019 at 9:10 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> On Tue, May 28, 2019 at 06:16:14PM +1000, Michael Ellerman wrote:
>> > We have several "defconfigs" that are not actually full defconfigs
>> > they are just a base set of options which are then merged with other
>> > fragments to produce a working defconfig.
>
> The default values from Kconfig files are used
> where CONFIG options are not specified by the defconfig.
>
> So, I think corenet_basic_defconfig is a full defconfig
> even if it contains a single CONFIG option.
That's technically true, but it's not a full defconfig in the sense that
it doesn't define a meaningful set of options for building for a
specific machine. In fact if you build it you get a .config that doesn't
include the one option it defines, CONFIG_CORENET_GENERIC=y.
> Since the difference between "*_defconfig" and "*.config"
> is ambiguous in some cases, it depends on the intended usage.
I'm pretty sure all the existing foo.config files are fragments that are
intended to be merged with an existing .config or other fragments.
ie:
These are fragments:
arch/arm/configs/dram_0x00000000.config
arch/arm/configs/dram_0xc0000000.config
arch/arm/configs/dram_0xd0000000.config
These are all fragments:
arch/powerpc/configs/be.config
arch/powerpc/configs/book3s_32.config
arch/powerpc/configs/altivec.config
arch/powerpc/configs/85xx-hw.config
arch/powerpc/configs/guest.config
arch/powerpc/configs/85xx-smp.config
arch/powerpc/configs/85xx-64bit.config
arch/powerpc/configs/dpaa.config
arch/powerpc/configs/85xx-32bit.config
arch/powerpc/configs/fsl-emb-nonhw.config
arch/powerpc/configs/86xx-smp.config
arch/powerpc/configs/le.config
arch/powerpc/configs/86xx-hw.config
Pretty sure these all are, they're used in gen_generic_defconfigs in arch/mips/Makefile:
arch/mips/configs/generic/board-xilfpga.config
arch/mips/configs/generic/board-ocelot.config
arch/mips/configs/generic/board-ni169445.config
arch/mips/configs/generic/32r6.config
arch/mips/configs/generic/64r1.config
arch/mips/configs/generic/32r1.config
arch/mips/configs/generic/64r6.config
arch/mips/configs/generic/eb.config
arch/mips/configs/generic/micro32r2.config
arch/mips/configs/generic/32r2.config
arch/mips/configs/generic/board-boston.config
arch/mips/configs/generic/el.config
arch/mips/configs/generic/board-ranchu.config
arch/mips/configs/generic/64r2.config
arch/mips/configs/generic/board-sead-3.config
These are also both fragments:
arch/x86/configs/tiny.config
arch/x86/configs/xen.config
>> > The most obvious example is corenet_basic_defconfig which only
>> > contains one symbol CONFIG_CORENET_GENERIC=y. But there is also
>> > mpc85xx_base_defconfig which doesn't actually enable CONFIG_PPC_85xx.
>> >
>> > To avoid confusion, rename these config fragments to "foo_base.config"
>> > to make it clearer that they are not full defconfigs.
>>
>> Adding linux-kbuild, maybe we can make the handling of these fragments
>> generic and actually document it..
>
> I do not know how it should be documented.
Me either.
cheers
^ permalink raw reply
* Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
From: Bjorn Helgaas @ 2019-05-29 14:00 UTC (permalink / raw)
To: Oliver
Cc: Shawn Anastasio, Sam Bobroff, Linux Kernel Mailing List, rppt,
Paul Mackerras, linux-pci, xyjxie, linuxppc-dev
In-Reply-To: <CAOSf1CEFfbmwfvmdqT1xdt8SFb=tYdYXLfXeyZ8=iRnhg4a3Pg@mail.gmail.com>
On Tue, May 28, 2019 at 03:36:34PM +1000, Oliver wrote:
> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@anastas.io> wrote:
> >
> > Introduce a new pcibios function pcibios_ignore_alignment_request
> > which allows the PCI core to defer to platform-specific code to
> > determine whether or not to ignore alignment requests for PCI resources.
> >
> > The existing behavior is to simply ignore alignment requests when
> > PCI_PROBE_ONLY is set. This is behavior is maintained by the
> > default implementation of pcibios_ignore_alignment_request.
> >
> > Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> > ---
> > drivers/pci/pci.c | 9 +++++++--
> > include/linux/pci.h | 1 +
> > 2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8abc843b1615..8207a09085d1 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void)
> > return 0;
> > }
> >
> > +int __weak pcibios_ignore_alignment_request(void)
> > +{
> > + return pci_has_flag(PCI_PROBE_ONLY);
> > +}
> > +
> > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
> > static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
> > static DEFINE_SPINLOCK(resource_alignment_lock);
> > @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> > p = resource_alignment_param;
> > if (!*p && !align)
> > goto out;
> > - if (pci_has_flag(PCI_PROBE_ONLY)) {
> > + if (pcibios_ignore_alignment_request()) {
> > align = 0;
> > - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
> > + pr_info_once("PCI: Ignoring requested alignments\n");
> > goto out;
> > }
>
> I think the logic here is questionable to begin with. If the user has
> explicitly requested re-aligning a resource via the command line then
> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
> they get to keep the pieces.
I agree. I don't like PCI_PROBE_ONLY in the first place. It's a
sledgehammer approach that doesn't tell us which resource assignments
need to be preserved or why. I'd rather use IORESOURCE_PCI_FIXED and
set it for the BARs where there's actually some sort of
hypervisor/firmware/OS dependency.
If there's a way to avoid another pciobios_*() weak function, that
would also be better.
Bjorn
^ permalink raw reply
* Re: [PATCH v3 3/3] powerpc/pseries: Allow user-specified PCI resource alignment after init
From: Bjorn Helgaas @ 2019-05-29 14:02 UTC (permalink / raw)
To: Shawn Anastasio
Cc: sbobroff, linux-pci, linux-kernel, rppt, paulus, xyjxie,
linuxppc-dev
In-Reply-To: <20190528040313.35582-4-shawn@anastas.io>
On Mon, May 27, 2019 at 11:03:13PM -0500, Shawn Anastasio wrote:
> On pseries, custom PCI resource alignment specified with the commandline
> argument pci=resource_alignment is disabled due to PCI resources being
> managed by the firmware. However, in the case of PCI hotplug the
> resources are managed by the kernel, so custom alignments should be
> honored in these cases. This is done by only honoring custom
> alignments after initial PCI initialization is done, to ensure that
> all devices managed by the firmware are excluded.
>
> Without this ability, sub-page BARs sometimes get mapped in between
> page boundaries for hotplugged devices and are therefore unusable
> with the VFIO framework. This change allows users to request
> page alignment for devices they wish to access via VFIO using
> the pci=resource_alignment commandline argument.
>
> In the future, this could be extended to provide page-aligned
> resources by default for hotplugged devices, similar to what is
> done on powernv by commit 382746376993 ("powerpc/powernv: Override
> pcibios_default_alignment() to force PCI devices to be page aligned")
>
> Signed-off-by: Shawn Anastasio <shawn@anastas.io>
> ---
> arch/powerpc/include/asm/machdep.h | 3 +++
> arch/powerpc/kernel/pci-common.c | 9 +++++++++
> arch/powerpc/platforms/pseries/setup.c | 22 ++++++++++++++++++++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index 2fbfaa9176ed..46eb62c0954e 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -179,6 +179,9 @@ struct machdep_calls {
>
> resource_size_t (*pcibios_default_alignment)(void);
>
> + /* Called when determining PCI resource alignment */
> + int (*pcibios_ignore_alignment_request)(void);
> +
> #ifdef CONFIG_PCI_IOV
> void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
> resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index ff4b7539cbdf..8e0d73b4c188 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -238,6 +238,15 @@ resource_size_t pcibios_default_alignment(void)
> return 0;
> }
>
> +int pcibios_ignore_alignment_request(void)
> +{
> + if (ppc_md.pcibios_ignore_alignment_request)
> + return ppc_md.pcibios_ignore_alignment_request();
> +
> + /* Fall back to default method of checking PCI_PROBE_ONLY */
> + return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
> #ifdef CONFIG_PCI_IOV
> resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
> {
> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
> index e4f0dfd4ae33..07f03be02afe 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -82,6 +82,8 @@ EXPORT_SYMBOL(CMO_PageSize);
>
> int fwnmi_active; /* TRUE if an FWNMI handler is present */
>
> +static int initial_pci_init_done; /* TRUE if initial pcibios init has completed */
> +
> static void pSeries_show_cpuinfo(struct seq_file *m)
> {
> struct device_node *root;
> @@ -749,6 +751,23 @@ static resource_size_t pseries_pci_iov_resource_alignment(struct pci_dev *pdev,
> }
> #endif
>
> +static void pseries_after_init(void)
> +{
> + initial_pci_init_done = 1;
> +}
> +
> +static int pseries_ignore_alignment_request(void)
> +{
> + if (initial_pci_init_done)
> + /*
> + * Allow custom alignments after init for things
> + * like PCI hotplugging.
> + */
> + return 0;
Hmm, if there's any way to avoid this sort of early/late flag, that
would be nicer.
> +
> + return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
> static void __init pSeries_setup_arch(void)
> {
> set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
> @@ -797,6 +816,9 @@ static void __init pSeries_setup_arch(void)
> }
>
> ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
> + ppc_md.pcibios_after_init = pseries_after_init;
> + ppc_md.pcibios_ignore_alignment_request =
> + pseries_ignore_alignment_request;
> }
>
> static void pseries_panic(char *str)
> --
> 2.20.1
>
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Move some of the boot time info print to generic file
From: Christophe Leroy @ 2019-05-29 15:55 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, npiggin, paulus
In-Reply-To: <20190528053513.1966-1-aneesh.kumar@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> a écrit :
> With radix translation enabled we find in dmesg
>
> hash-mmu: ppc64_pft_size = 0x0
> hash-mmu: kernel vmalloc start = 0xc008000000000000
> hash-mmu: kernel IO start = 0xc00a000000000000
> hash-mmu: kernel vmemmap start = 0xc00c000000000000
>
> This is because these pr_info calls are in hash_utils.c which has
>
> #define pr_fmt(fmt) "hash-mmu: " fmt
>
> The information printed in generic and hence move that to generic file
Some similarities with Nick's patch
https://patchwork.ozlabs.org/patch/1100245/ ?
Christophe
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> arch/powerpc/kernel/setup-common.c | 4 ++++
> arch/powerpc/mm/book3s64/hash_utils.c | 5 -----
> 2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kernel/setup-common.c
> b/arch/powerpc/kernel/setup-common.c
> index aad9f5df6ab6..a73a91f2c21f 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -810,6 +810,10 @@ static __init void print_system_info(void)
> pr_info("mmu_features = 0x%08x\n", cur_cpu_spec->mmu_features);
> #ifdef CONFIG_PPC64
> pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features);
> + pr_info("ppc64_pft_size = 0x%llx\n", ppc64_pft_size);
> + pr_info("kernel vmalloc start = 0x%lx\n", KERN_VIRT_START);
> + pr_info("kernel IO start = 0x%lx\n", KERN_IO_START);
> + pr_info("kernel vmemmap start = 0x%lx\n", (unsigned long)vmemmap);
> #endif
>
> print_system_hash_info();
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861a8ec0..2f677914bfd2 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1950,11 +1950,6 @@ machine_device_initcall(pseries, hash64_debugfs);
>
> void __init print_system_hash_info(void)
> {
> - pr_info("ppc64_pft_size = 0x%llx\n", ppc64_pft_size);
> -
> if (htab_hash_mask)
> pr_info("htab_hash_mask = 0x%lx\n", htab_hash_mask);
> - pr_info("kernel vmalloc start = 0x%lx\n", KERN_VIRT_START);
> - pr_info("kernel IO start = 0x%lx\n", KERN_IO_START);
> - pr_info("kernel vmemmap start = 0x%lx\n", (unsigned long)vmemmap);
> }
> --
> 2.21.0
^ permalink raw reply
* Re: kmemleak: 1157 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
From: Catalin Marinas @ 2019-05-29 17:04 UTC (permalink / raw)
To: Mathieu Malaterre; +Cc: linuxppc-dev
In-Reply-To: <CA+7wUszCdg_xRRh_DX=wAoWnpZTyc7dG=RsiEUCYJN=p_yBX6A@mail.gmail.com>
On Tue, May 28, 2019 at 09:14:12PM +0200, Mathieu Malaterre wrote:
> On Tue, May 28, 2019 at 7:21 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> > Mathieu Malaterre <malat@debian.org> writes:
> > > Is there a way to dump more context (somewhere in of tree
> > > flattening?). I cannot make sense of the following:
> >
> > Hmm. Not that I know of.
> >
> > Those don't look related to OF flattening/unflattening. That's just
> > sysfs setup based on the unflattened device tree.
> >
> > The allocations are happening in safe_name() AFAICS.
> >
> > int __of_add_property_sysfs(struct device_node *np, struct property *pp)
> > {
> > ...
> > pp->attr.attr.name = safe_name(&np->kobj, pp->name);
> >
> > And the free is in __of_sysfs_remove_bin_file():
> >
> > void __of_sysfs_remove_bin_file(struct device_node *np, struct property *prop)
> > {
> > if (!IS_ENABLED(CONFIG_SYSFS))
> > return;
> >
> > sysfs_remove_bin_file(&np->kobj, &prop->attr);
> > kfree(prop->attr.attr.name);
> >
>
> Right. That helped a lot !
>
> > There is this check which could be failing leading to us not calling the
> > free at all:
> >
> > void __of_remove_property_sysfs(struct device_node *np, struct property *prop)
> > {
> > /* at early boot, bail here and defer setup to of_init() */
> > if (of_kset && of_node_is_attached(np))
> > __of_sysfs_remove_bin_file(np, prop);
> > }
> >
> >
> > So maybe stick a printk() in there to see if you're hitting that
> > condition, eg something like:
> >
> > if (of_kset && of_node_is_attached(np))
> > __of_sysfs_remove_bin_file(np, prop);
> > else
> > printk("%s: leaking prop %s on node %pOF\n", __func__, prop->attr.attr.name, np);
> >
>
> If I understand correctly those are false positive. I was first
> starting to consider using something like kmemleak_not_leak, but I
> remember that I have been using kmemleak for a couple of years now.
> Those reports starting to show up only recently.
>
> Catalin, do you have an idea why on a non-SMP machine kmemleak reports
> leaks from:
>
> [...]
> void __init of_core_init(void)
> {
> [...]
> for_each_of_allnodes(np)
> __of_attach_node_sysfs(np);
It's likely that they are false positives but usually, rather than just
adding a kmemleak_not_leak(), it's better to figure out why kmemleak
reports them. The strings allocated above through kstrdup() can't be
tracked starting with the root objects. I think for the of stuff, this
should be the of_root pointer.
Is it only with non-SMP that this happens? I can't reproduce it on arm64
to be able to dig further.
Even better if you could bisect to the commit that's causing this.
--
Catalin
^ permalink raw reply
* Re: [PATCH v2] mm: add account_locked_vm utility function
From: Ira Weiny @ 2019-05-29 18:05 UTC (permalink / raw)
To: Daniel Jordan
Cc: Mark Rutland, Davidlohr Bueso, kvm, Alan Tull,
Alexey Kardashevskiy, linux-fpga, linux-kernel, kvm-ppc, linux-mm,
Alex Williamson, Jason Gunthorpe, Moritz Fischer, Steve Sistare,
akpm, linuxppc-dev, Christoph Lameter, Wu Hao
In-Reply-To: <20190524175045.26897-1-daniel.m.jordan@oracle.com>
On Fri, May 24, 2019 at 01:50:45PM -0400, Daniel Jordan wrote:
[snip]
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834ac32b7..72c1034d2ec7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1564,6 +1564,25 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> int get_user_pages_fast(unsigned long start, int nr_pages,
> unsigned int gup_flags, struct page **pages);
>
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> + struct task_struct *task, bool bypass_rlim);
> +
> +static inline int account_locked_vm(struct mm_struct *mm, unsigned long pages,
> + bool inc)
> +{
> + int ret;
> +
> + if (pages == 0 || !mm)
> + return 0;
> +
> + down_write(&mm->mmap_sem);
> + ret = __account_locked_vm(mm, pages, inc, current,
> + capable(CAP_IPC_LOCK));
> + up_write(&mm->mmap_sem);
> +
> + return ret;
> +}
> +
> /* Container for pinned pfns / pages */
> struct frame_vector {
> unsigned int nr_allocated; /* Number of frames we have space for */
> diff --git a/mm/util.c b/mm/util.c
> index e2e4f8c3fa12..bd3bdf16a084 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -6,6 +6,7 @@
> #include <linux/err.h>
> #include <linux/sched.h>
> #include <linux/sched/mm.h>
> +#include <linux/sched/signal.h>
> #include <linux/sched/task_stack.h>
> #include <linux/security.h>
> #include <linux/swap.h>
> @@ -346,6 +347,51 @@ int __weak get_user_pages_fast(unsigned long start,
> }
> EXPORT_SYMBOL_GPL(get_user_pages_fast);
>
> +/**
> + * __account_locked_vm - account locked pages to an mm's locked_vm
> + * @mm: mm to account against, may be NULL
This kernel doc is wrong. You dereference mm straight away...
> + * @pages: number of pages to account
> + * @inc: %true if @pages should be considered positive, %false if not
> + * @task: task used to check RLIMIT_MEMLOCK
> + * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped
> + *
> + * Assumes @task and @mm are valid (i.e. at least one reference on each), and
> + * that mmap_sem is held as writer.
> + *
> + * Return:
> + * * 0 on success
> + * * 0 if @mm is NULL (can happen for example if the task is exiting)
> + * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
> + */
> +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> + struct task_struct *task, bool bypass_rlim)
> +{
> + unsigned long locked_vm, limit;
> + int ret = 0;
> +
> + locked_vm = mm->locked_vm;
here...
Perhaps the comment was meant to document account_locked_vm()? Or should the
parameter checks be moved here?
Ira
> + if (inc) {
> + if (!bypass_rlim) {
> + limit = task_rlimit(task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> + if (locked_vm + pages > limit)
> + ret = -ENOMEM;
> + }
> + if (!ret)
> + mm->locked_vm = locked_vm + pages;
> + } else {
> + WARN_ON_ONCE(pages > locked_vm);
> + mm->locked_vm = locked_vm - pages;
> + }
> +
> + pr_debug("%s: [%d] caller %ps %c%lu %lu/%lu%s\n", __func__, task->pid,
> + (void *)_RET_IP_, (inc) ? '+' : '-', pages << PAGE_SHIFT,
> + locked_vm << PAGE_SHIFT, task_rlimit(task, RLIMIT_MEMLOCK),
> + ret ? " - exceeded" : "");
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(__account_locked_vm);
>
> +
> unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot,
> unsigned long flag, unsigned long pgoff)
>
> base-commit: a188339ca5a396acc588e5851ed7e19f66b0ebd9
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v2] mm: add account_locked_vm utility function
From: Daniel Jordan @ 2019-05-29 18:35 UTC (permalink / raw)
To: Ira Weiny
Cc: Mark Rutland, kvm, Alexey Kardashevskiy, linux-fpga, linux-mm,
Steve Sistare, Christoph Lameter, Davidlohr Bueso, Daniel Jordan,
Jason Gunthorpe, Wu Hao, Alan Tull, kvm-ppc, Alex Williamson,
Moritz Fischer, linux-kernel, akpm, linuxppc-dev
In-Reply-To: <20190529180547.GA16182@iweiny-DESK2.sc.intel.com>
On Wed, May 29, 2019 at 11:05:48AM -0700, Ira Weiny wrote:
> On Fri, May 24, 2019 at 01:50:45PM -0400, Daniel Jordan wrote:
> > +static inline int account_locked_vm(struct mm_struct *mm, unsigned long pages,
> > + bool inc)
> > +{
> > + int ret;
> > +
> > + if (pages == 0 || !mm)
> > + return 0;
> > +
> > + down_write(&mm->mmap_sem);
> > + ret = __account_locked_vm(mm, pages, inc, current,
> > + capable(CAP_IPC_LOCK));
> > + up_write(&mm->mmap_sem);
> > +
> > + return ret;
> > +}
> > +
...snip...
> > +/**
> > + * __account_locked_vm - account locked pages to an mm's locked_vm
> > + * @mm: mm to account against, may be NULL
>
> This kernel doc is wrong. You dereference mm straight away...
...snip...
> > +
> > + locked_vm = mm->locked_vm;
>
> here...
>
> Perhaps the comment was meant to document account_locked_vm()?
Yes, the comment got out of sync when I moved the !mm check outside
__account_locked_vm. Thanks for catching, will fix.
^ permalink raw reply
* Re: [PATCH v2] mm: add account_locked_vm utility function
From: Alex Williamson @ 2019-05-29 18:56 UTC (permalink / raw)
To: Daniel Jordan
Cc: Mark Rutland, Davidlohr Bueso, kvm, Alan Tull,
Alexey Kardashevskiy, linux-fpga, linux-kernel, kvm-ppc, linux-mm,
Jason Gunthorpe, Moritz Fischer, Steve Sistare, Andrew Morton,
linuxppc-dev, Christoph Lameter, Wu Hao
In-Reply-To: <20190528150424.tjbaiptpjhzg7y75@ca-dmjordan1.us.oracle.com>
On Tue, 28 May 2019 11:04:24 -0400
Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> On Sat, May 25, 2019 at 02:51:18PM -0700, Andrew Morton wrote:
> > On Fri, 24 May 2019 13:50:45 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> >
> > > locked_vm accounting is done roughly the same way in five places, so
> > > unify them in a helper. Standardize the debug prints, which vary
> > > slightly, but include the helper's caller to disambiguate between
> > > callsites.
> > >
> > > Error codes stay the same, so user-visible behavior does too. The one
> > > exception is that the -EPERM case in tce_account_locked_vm is removed
> > > because Alexey has never seen it triggered.
> > >
> > > ...
> > >
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -1564,6 +1564,25 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> > > int get_user_pages_fast(unsigned long start, int nr_pages,
> > > unsigned int gup_flags, struct page **pages);
> > >
> > > +int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> > > + struct task_struct *task, bool bypass_rlim);
> > > +
> > > +static inline int account_locked_vm(struct mm_struct *mm, unsigned long pages,
> > > + bool inc)
> > > +{
> > > + int ret;
> > > +
> > > + if (pages == 0 || !mm)
> > > + return 0;
> > > +
> > > + down_write(&mm->mmap_sem);
> > > + ret = __account_locked_vm(mm, pages, inc, current,
> > > + capable(CAP_IPC_LOCK));
> > > + up_write(&mm->mmap_sem);
> > > +
> > > + return ret;
> > > +}
> >
> > That's quite a mouthful for an inlined function. How about uninlining
> > the whole thing and fiddling drivers/vfio/vfio_iommu_type1.c to suit.
> > I wonder why it does down_write_killable and whether it really needs
> > to...
>
> Sure, I can uninline it. vfio changelogs don't show a particular reason for
> _killable[1]. Maybe Alex has something to add. Otherwise I'll respin without
> it since the simplification seems worth removing _killable.
>
> [1] 0cfef2b7410b ("vfio/type1: Remove locked page accounting workqueue")
A userspace vfio driver maps DMA via an ioctl through this path, so I
believe I used killable here just to be friendly that it could be
interrupted and we could fall out with an errno if it were stuck here.
No harm, no foul, the user's mapping is aborted and unwound. If we're
deadlocked or seriously contended on mmap_sem, maybe we're already in
trouble, but it seemed like a valid and low hanging use case for
killable. Thanks,
Alex
^ permalink raw reply
* [PATCH v3] mm: add account_locked_vm utility function
From: Daniel Jordan @ 2019-05-29 20:50 UTC (permalink / raw)
To: akpm
Cc: Mark Rutland, kvm, Alexey Kardashevskiy, linux-fpga, linux-mm,
Steve Sistare, Christoph Lameter, Ira Weiny, Davidlohr Bueso,
Daniel Jordan, Jason Gunthorpe, Wu Hao, Alan Tull, kvm-ppc,
Alex Williamson, Moritz Fischer, linux-kernel, linuxppc-dev
In-Reply-To: <20190529125627.0cb5b704@x1.home>
locked_vm accounting is done roughly the same way in five places, so
unify them in a helper.
Include the helper's caller in the debug print to distinguish between
callsites.
Error codes stay the same, so user-visible behavior does too. The one
exception is that the -EPERM case in tce_account_locked_vm is removed
because Alexey has never seen it triggered.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Alan Tull <atull@kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Wu Hao <hao.wu@intel.com>
Cc: linux-mm@kvack.org
Cc: kvm@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-fpga@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
v3:
- uninline account_locked_vm (Andrew)
- fix doc comment (Ira)
- retain down_write_killable in vfio type1 (Alex)
- leave Alexey's T-b since the code is the same aside from uninlining
- sanity tested with vfio type1, sanity-built on ppc
arch/powerpc/kvm/book3s_64_vio.c | 44 ++--------------
arch/powerpc/mm/book3s64/iommu_api.c | 41 ++-------------
drivers/fpga/dfl-afu-dma-region.c | 53 ++------------------
drivers/vfio/vfio_iommu_spapr_tce.c | 54 ++------------------
drivers/vfio/vfio_iommu_type1.c | 17 +------
include/linux/mm.h | 4 ++
mm/util.c | 75 ++++++++++++++++++++++++++++
7 files changed, 98 insertions(+), 190 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 66270e07449a..768b645c7edf 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -30,6 +30,7 @@
#include <linux/anon_inodes.h>
#include <linux/iommu.h>
#include <linux/file.h>
+#include <linux/mm.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
@@ -56,43 +57,6 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
}
-static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
-{
- long ret = 0;
-
- if (!current || !current->mm)
- return ret; /* process exited */
-
- down_write(¤t->mm->mmap_sem);
-
- if (inc) {
- unsigned long locked, lock_limit;
-
- locked = current->mm->locked_vm + stt_pages;
- lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
- if (locked > lock_limit && !capable(CAP_IPC_LOCK))
- ret = -ENOMEM;
- else
- current->mm->locked_vm += stt_pages;
- } else {
- if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
- stt_pages = current->mm->locked_vm;
-
- current->mm->locked_vm -= stt_pages;
- }
-
- pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
- inc ? '+' : '-',
- stt_pages << PAGE_SHIFT,
- current->mm->locked_vm << PAGE_SHIFT,
- rlimit(RLIMIT_MEMLOCK),
- ret ? " - exceeded" : "");
-
- up_write(¤t->mm->mmap_sem);
-
- return ret;
-}
-
static void kvm_spapr_tce_iommu_table_free(struct rcu_head *head)
{
struct kvmppc_spapr_tce_iommu_table *stit = container_of(head,
@@ -302,7 +266,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
kvm_put_kvm(stt->kvm);
- kvmppc_account_memlimit(
+ account_locked_vm(current->mm,
kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
call_rcu(&stt->rcu, release_spapr_tce_table);
@@ -327,7 +291,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
return -EINVAL;
npages = kvmppc_tce_pages(size);
- ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
+ ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
if (ret)
return ret;
@@ -373,7 +337,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kfree(stt);
fail_acct:
- kvmppc_account_memlimit(kvmppc_stt_pages(npages), false);
+ account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
return ret;
}
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 5c521f3924a5..18d22eec0ebd 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -19,6 +19,7 @@
#include <linux/hugetlb.h>
#include <linux/swap.h>
#include <linux/sizes.h>
+#include <linux/mm.h>
#include <asm/mmu_context.h>
#include <asm/pte-walk.h>
#include <linux/mm_inline.h>
@@ -51,40 +52,6 @@ struct mm_iommu_table_group_mem_t {
u64 dev_hpa; /* Device memory base address */
};
-static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
- unsigned long npages, bool incr)
-{
- long ret = 0, locked, lock_limit;
-
- if (!npages)
- return 0;
-
- down_write(&mm->mmap_sem);
-
- if (incr) {
- locked = mm->locked_vm + npages;
- lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
- if (locked > lock_limit && !capable(CAP_IPC_LOCK))
- ret = -ENOMEM;
- else
- mm->locked_vm += npages;
- } else {
- if (WARN_ON_ONCE(npages > mm->locked_vm))
- npages = mm->locked_vm;
- mm->locked_vm -= npages;
- }
-
- pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
- current ? current->pid : 0,
- incr ? '+' : '-',
- npages << PAGE_SHIFT,
- mm->locked_vm << PAGE_SHIFT,
- rlimit(RLIMIT_MEMLOCK));
- up_write(&mm->mmap_sem);
-
- return ret;
-}
-
bool mm_iommu_preregistered(struct mm_struct *mm)
{
return !list_empty(&mm->context.iommu_group_mem_list);
@@ -101,7 +68,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
unsigned long entry, chunk;
if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
- ret = mm_iommu_adjust_locked_vm(mm, entries, true);
+ ret = account_locked_vm(mm, entries, true);
if (ret)
return ret;
@@ -216,7 +183,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
kfree(mem);
unlock_exit:
- mm_iommu_adjust_locked_vm(mm, locked_entries, false);
+ account_locked_vm(mm, locked_entries, false);
return ret;
}
@@ -316,7 +283,7 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
unlock_exit:
mutex_unlock(&mem_list_mutex);
- mm_iommu_adjust_locked_vm(mm, unlock_entries, false);
+ account_locked_vm(mm, unlock_entries, false);
return ret;
}
diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
index c438722bf4e1..0a532c602d8f 100644
--- a/drivers/fpga/dfl-afu-dma-region.c
+++ b/drivers/fpga/dfl-afu-dma-region.c
@@ -12,6 +12,7 @@
#include <linux/dma-mapping.h>
#include <linux/sched/signal.h>
#include <linux/uaccess.h>
+#include <linux/mm.h>
#include "dfl-afu.h"
@@ -31,52 +32,6 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
afu->dma_regions = RB_ROOT;
}
-/**
- * afu_dma_adjust_locked_vm - adjust locked memory
- * @dev: port device
- * @npages: number of pages
- * @incr: increase or decrease locked memory
- *
- * Increase or decrease the locked memory size with npages input.
- *
- * Return 0 on success.
- * Return -ENOMEM if locked memory size is over the limit and no CAP_IPC_LOCK.
- */
-static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
-{
- unsigned long locked, lock_limit;
- int ret = 0;
-
- /* the task is exiting. */
- if (!current->mm)
- return 0;
-
- down_write(¤t->mm->mmap_sem);
-
- if (incr) {
- locked = current->mm->locked_vm + npages;
- lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
- if (locked > lock_limit && !capable(CAP_IPC_LOCK))
- ret = -ENOMEM;
- else
- current->mm->locked_vm += npages;
- } else {
- if (WARN_ON_ONCE(npages > current->mm->locked_vm))
- npages = current->mm->locked_vm;
- current->mm->locked_vm -= npages;
- }
-
- dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
- incr ? '+' : '-', npages << PAGE_SHIFT,
- current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
- ret ? "- exceeded" : "");
-
- up_write(¤t->mm->mmap_sem);
-
- return ret;
-}
-
/**
* afu_dma_pin_pages - pin pages of given dma memory region
* @pdata: feature device platform data
@@ -92,7 +47,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
struct device *dev = &pdata->dev->dev;
int ret, pinned;
- ret = afu_dma_adjust_locked_vm(dev, npages, true);
+ ret = account_locked_vm(current->mm, npages, true);
if (ret)
return ret;
@@ -121,7 +76,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
free_pages:
kfree(region->pages);
unlock_vm:
- afu_dma_adjust_locked_vm(dev, npages, false);
+ account_locked_vm(current->mm, npages, false);
return ret;
}
@@ -141,7 +96,7 @@ static void afu_dma_unpin_pages(struct dfl_feature_platform_data *pdata,
put_all_pages(region->pages, npages);
kfree(region->pages);
- afu_dma_adjust_locked_vm(dev, npages, false);
+ account_locked_vm(current->mm, npages, false);
dev_dbg(dev, "%ld pages unpinned\n", npages);
}
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 40ddc0c5f677..d06e8e291924 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -22,6 +22,7 @@
#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include <linux/sched/signal.h>
+#include <linux/mm.h>
#include <asm/iommu.h>
#include <asm/tce.h>
@@ -34,51 +35,6 @@
static void tce_iommu_detach_group(void *iommu_data,
struct iommu_group *iommu_group);
-static long try_increment_locked_vm(struct mm_struct *mm, long npages)
-{
- long ret = 0, locked, lock_limit;
-
- if (WARN_ON_ONCE(!mm))
- return -EPERM;
-
- if (!npages)
- return 0;
-
- down_write(&mm->mmap_sem);
- locked = mm->locked_vm + npages;
- lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
- if (locked > lock_limit && !capable(CAP_IPC_LOCK))
- ret = -ENOMEM;
- else
- mm->locked_vm += npages;
-
- pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
- npages << PAGE_SHIFT,
- mm->locked_vm << PAGE_SHIFT,
- rlimit(RLIMIT_MEMLOCK),
- ret ? " - exceeded" : "");
-
- up_write(&mm->mmap_sem);
-
- return ret;
-}
-
-static void decrement_locked_vm(struct mm_struct *mm, long npages)
-{
- if (!mm || !npages)
- return;
-
- down_write(&mm->mmap_sem);
- if (WARN_ON_ONCE(npages > mm->locked_vm))
- npages = mm->locked_vm;
- mm->locked_vm -= npages;
- pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
- npages << PAGE_SHIFT,
- mm->locked_vm << PAGE_SHIFT,
- rlimit(RLIMIT_MEMLOCK));
- up_write(&mm->mmap_sem);
-}
-
/*
* VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
*
@@ -336,7 +292,7 @@ static int tce_iommu_enable(struct tce_container *container)
return ret;
locked = table_group->tce32_size >> PAGE_SHIFT;
- ret = try_increment_locked_vm(container->mm, locked);
+ ret = account_locked_vm(container->mm, locked, true);
if (ret)
return ret;
@@ -355,7 +311,7 @@ static void tce_iommu_disable(struct tce_container *container)
container->enabled = false;
BUG_ON(!container->mm);
- decrement_locked_vm(container->mm, container->locked_pages);
+ account_locked_vm(container->mm, container->locked_pages, false);
}
static void *tce_iommu_open(unsigned long arg)
@@ -659,7 +615,7 @@ static long tce_iommu_create_table(struct tce_container *container,
if (!table_size)
return -EINVAL;
- ret = try_increment_locked_vm(container->mm, table_size >> PAGE_SHIFT);
+ ret = account_locked_vm(container->mm, table_size >> PAGE_SHIFT, true);
if (ret)
return ret;
@@ -678,7 +634,7 @@ static void tce_iommu_free_table(struct tce_container *container,
unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT;
iommu_tce_table_put(tbl);
- decrement_locked_vm(container->mm, pages);
+ account_locked_vm(container->mm, pages, false);
}
static long tce_iommu_create_window(struct tce_container *container,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3ddc375e7063..bf449ace1676 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -275,21 +275,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
ret = down_write_killable(&mm->mmap_sem);
if (!ret) {
- if (npage > 0) {
- if (!dma->lock_cap) {
- unsigned long limit;
-
- limit = task_rlimit(dma->task,
- RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
- if (mm->locked_vm + npage > limit)
- ret = -ENOMEM;
- }
- }
-
- if (!ret)
- mm->locked_vm += npage;
-
+ ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
+ dma->lock_cap);
up_write(&mm->mmap_sem);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..95510f6fad45 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1564,6 +1564,10 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
+int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
+int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
+ struct task_struct *task, bool bypass_rlim);
+
/* Container for pinned pfns / pages */
struct frame_vector {
unsigned int nr_allocated; /* Number of frames we have space for */
diff --git a/mm/util.c b/mm/util.c
index 91682a2090ee..cbbcc035b12b 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -7,6 +7,7 @@
#include <linux/err.h>
#include <linux/sched.h>
#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
#include <linux/sched/task_stack.h>
#include <linux/security.h>
#include <linux/swap.h>
@@ -347,6 +348,80 @@ int __weak get_user_pages_fast(unsigned long start,
}
EXPORT_SYMBOL_GPL(get_user_pages_fast);
+/**
+ * __account_locked_vm - account locked pages to an mm's locked_vm
+ * @mm: mm to account against
+ * @pages: number of pages to account
+ * @inc: %true if @pages should be considered positive, %false if not
+ * @task: task used to check RLIMIT_MEMLOCK
+ * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped
+ *
+ * Assumes @task and @mm are valid (i.e. at least one reference on each), and
+ * that mmap_sem is held as writer.
+ *
+ * Return:
+ * * 0 on success
+ * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
+ */
+int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
+ struct task_struct *task, bool bypass_rlim)
+{
+ unsigned long locked_vm, limit;
+ int ret = 0;
+
+ lockdep_assert_held_exclusive(&mm->mmap_sem);
+
+ locked_vm = mm->locked_vm;
+ if (inc) {
+ if (!bypass_rlim) {
+ limit = task_rlimit(task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+ if (locked_vm + pages > limit)
+ ret = -ENOMEM;
+ }
+ if (!ret)
+ mm->locked_vm = locked_vm + pages;
+ } else {
+ WARN_ON_ONCE(pages > locked_vm);
+ mm->locked_vm = locked_vm - pages;
+ }
+
+ pr_debug("%s: [%d] caller %ps %c%lu %lu/%lu%s\n", __func__, task->pid,
+ (void *)_RET_IP_, (inc) ? '+' : '-', pages << PAGE_SHIFT,
+ locked_vm << PAGE_SHIFT, task_rlimit(task, RLIMIT_MEMLOCK),
+ ret ? " - exceeded" : "");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__account_locked_vm);
+
+/**
+ * account_locked_vm - account locked pages to an mm's locked_vm
+ * @mm: mm to account against, may be NULL
+ * @pages: number of pages to account
+ * @inc: %true if @pages should be considered positive, %false if not
+ *
+ * Assumes a non-NULL @mm is valid (i.e. at least one reference on it).
+ *
+ * Return:
+ * * 0 on success, or if mm is NULL
+ * * -ENOMEM if RLIMIT_MEMLOCK would be exceeded.
+ */
+int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc)
+{
+ int ret;
+
+ if (pages == 0 || !mm)
+ return 0;
+
+ down_write(&mm->mmap_sem);
+ ret = __account_locked_vm(mm, pages, inc, current,
+ capable(CAP_IPC_LOCK));
+ up_write(&mm->mmap_sem);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(account_locked_vm);
+
unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flag, unsigned long pgoff)
base-commit: cd6c84d8f0cdc911df435bb075ba22ce3c605b07
--
2.21.0
^ permalink raw reply related
* Re: [PATCH][next] soc: fsl: fix spelling mistake "Firmaware" -> "Firmware"
From: Li Yang @ 2019-05-29 20:53 UTC (permalink / raw)
To: Colin King
Cc: kernel-janitors, linuxppc-dev, lkml,
moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <20190521085624.13665-1-colin.king@canonical.com>
On Tue, May 21, 2019 at 3:57 AM Colin King <colin.king@canonical.com> wrote:
>
> From: Colin Ian King <colin.king@canonical.com>
>
> There is a spelling mistake in a pr_err message. Fix it.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied. Thanks!
Regards,
Leo
> ---
> drivers/soc/fsl/dpaa2-console.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/fsl/dpaa2-console.c b/drivers/soc/fsl/dpaa2-console.c
> index 9168d8ddc932..27243f706f37 100644
> --- a/drivers/soc/fsl/dpaa2-console.c
> +++ b/drivers/soc/fsl/dpaa2-console.c
> @@ -73,7 +73,7 @@ static u64 get_mc_fw_base_address(void)
>
> mcfbaregs = ioremap(mc_base_addr.start, resource_size(&mc_base_addr));
> if (!mcfbaregs) {
> - pr_err("could not map MC Firmaware Base registers\n");
> + pr_err("could not map MC Firmware Base registers\n");
> return 0;
> }
>
> --
> 2.20.1
>
^ permalink raw reply
* [PATCH v2] KVM: PPC: Report single stepping capability
From: Fabiano Rosas @ 2019-05-29 22:22 UTC (permalink / raw)
To: kvm-ppc; +Cc: kvm, rkrcmar, aik, pbonzini, linuxppc-dev, david
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
the next instruction to be single stepped via the
KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
We currently don't have support for guest single stepping implemented
in Book3S HV.
This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
to inform userspace about the state of single stepping support.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
v1 -> v2:
- add capability description to Documentation/virtual/kvm/api.txt
Documentation/virtual/kvm/api.txt | 3 +++
arch/powerpc/kvm/powerpc.c | 5 +++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 9 insertions(+)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba6c42c576dd..a77643bfa917 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2969,6 +2969,9 @@ can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
indicating the number of supported registers.
+For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether
+the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported.
+
When debug events exit the main run loop with the reason
KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
structure containing architecture specific debug information.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3393b166817a..fd7e7d55637e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -538,6 +538,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IMMEDIATE_EXIT:
r = 1;
break;
+ case KVM_CAP_PPC_GUEST_DEBUG_SSTEP:
+#ifdef CONFIG_BOOKE
+ r = 1;
+ break;
+#endif
case KVM_CAP_PPC_PAIRED_SINGLES:
case KVM_CAP_PPC_OSI:
case KVM_CAP_PPC_GET_PVINFO:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b40d503..cad9fcd90f39 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -993,6 +993,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_SVE 170
#define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 173
#ifdef KVM_CAP_IRQ_ROUTING
--
2.20.1
^ permalink raw reply related
* [PATCH 09/22] docs: mark orphan documents as such
From: Mauro Carvalho Chehab @ 2019-05-29 23:23 UTC (permalink / raw)
To: Linux Doc Mailing List
Cc: kvm, Radim Krčmář, Maxime Ripard, dri-devel,
platform-driver-x86, Paul Mackerras, Mauro Carvalho Chehab,
linux-stm32, Alexandre Torgue, Jonathan Corbet, David Airlie,
Andrew Donnellan, linux-pm, Maarten Lankhorst, Matan Ziv-Av,
Mauro Carvalho Chehab, Daniel Vetter, Sean Paul, linux-arm-kernel,
linux-kernel, Maxime Coquelin, Frederic Barrat, Paolo Bonzini,
linuxppc-dev, Georgi Djakov
In-Reply-To: <cover.1559171394.git.mchehab+samsung@kernel.org>
Sphinx doesn't like orphan documents:
Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree
Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree
Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree
Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree
Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree
Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree
Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree
Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree
So, while they aren't on any toctree, add :orphan: to them, in order
to silent this warning.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
Documentation/accelerators/ocxl.rst | 2 ++
Documentation/arm/stm32/overview.rst | 2 ++
Documentation/arm/stm32/stm32f429-overview.rst | 2 ++
Documentation/arm/stm32/stm32f746-overview.rst | 2 ++
Documentation/arm/stm32/stm32f769-overview.rst | 2 ++
Documentation/arm/stm32/stm32h743-overview.rst | 2 ++
Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++
Documentation/gpu/msm-crash-dump.rst | 2 ++
Documentation/interconnect/interconnect.rst | 2 ++
Documentation/laptops/lg-laptop.rst | 2 ++
Documentation/powerpc/isa-versions.rst | 2 ++
Documentation/virtual/kvm/amd-memory-encryption.rst | 2 ++
Documentation/virtual/kvm/vcpu-requests.rst | 2 ++
13 files changed, 26 insertions(+)
diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
index 14cefc020e2d..b1cea19a90f5 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -1,3 +1,5 @@
+:orphan:
+
========================================================
OpenCAPI (Open Coherent Accelerator Processor Interface)
========================================================
diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst
index 85cfc8410798..f7e734153860 100644
--- a/Documentation/arm/stm32/overview.rst
+++ b/Documentation/arm/stm32/overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
========================
STM32 ARM Linux Overview
========================
diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst
index 18feda97f483..65bbb1c3b423 100644
--- a/Documentation/arm/stm32/stm32f429-overview.rst
+++ b/Documentation/arm/stm32/stm32f429-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
STM32F429 Overview
==================
diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst
index b5f4b6ce7656..42d593085015 100644
--- a/Documentation/arm/stm32/stm32f746-overview.rst
+++ b/Documentation/arm/stm32/stm32f746-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
STM32F746 Overview
==================
diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst
index 228656ced2fe..f6adac862b17 100644
--- a/Documentation/arm/stm32/stm32f769-overview.rst
+++ b/Documentation/arm/stm32/stm32f769-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
STM32F769 Overview
==================
diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst
index 3458dc00095d..c525835e7473 100644
--- a/Documentation/arm/stm32/stm32h743-overview.rst
+++ b/Documentation/arm/stm32/stm32h743-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
STM32H743 Overview
==================
diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst
index 62e176d47ca7..2c52cd020601 100644
--- a/Documentation/arm/stm32/stm32mp157-overview.rst
+++ b/Documentation/arm/stm32/stm32mp157-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
STM32MP157 Overview
===================
diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst
index 757cd257e0d8..240ef200f76c 100644
--- a/Documentation/gpu/msm-crash-dump.rst
+++ b/Documentation/gpu/msm-crash-dump.rst
@@ -1,3 +1,5 @@
+:orphan:
+
=====================
MSM Crash Dump Format
=====================
diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
index c3e004893796..56e331dab70e 100644
--- a/Documentation/interconnect/interconnect.rst
+++ b/Documentation/interconnect/interconnect.rst
@@ -1,5 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
+:orphan:
+
=====================================
GENERIC SYSTEM INTERCONNECT SUBSYSTEM
=====================================
diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst
index aa503ee9b3bc..f2c2ffe31101 100644
--- a/Documentation/laptops/lg-laptop.rst
+++ b/Documentation/laptops/lg-laptop.rst
@@ -1,5 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0+
+:orphan:
+
LG Gram laptop extra features
=============================
diff --git a/Documentation/powerpc/isa-versions.rst b/Documentation/powerpc/isa-versions.rst
index 812e20cc898c..66c24140ebf1 100644
--- a/Documentation/powerpc/isa-versions.rst
+++ b/Documentation/powerpc/isa-versions.rst
@@ -1,3 +1,5 @@
+:orphan:
+
CPU to ISA Version Mapping
==========================
diff --git a/Documentation/virtual/kvm/amd-memory-encryption.rst b/Documentation/virtual/kvm/amd-memory-encryption.rst
index 659bbc093b52..33d697ab8a58 100644
--- a/Documentation/virtual/kvm/amd-memory-encryption.rst
+++ b/Documentation/virtual/kvm/amd-memory-encryption.rst
@@ -1,3 +1,5 @@
+:orphan:
+
======================================
Secure Encrypted Virtualization (SEV)
======================================
diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
index 5feb3706a7ae..c1807a1b92e6 100644
--- a/Documentation/virtual/kvm/vcpu-requests.rst
+++ b/Documentation/virtual/kvm/vcpu-requests.rst
@@ -1,3 +1,5 @@
+:orphan:
+
=================
KVM VCPU Requests
=================
--
2.21.0
^ permalink raw reply related
* [PATCH 00/22] Some documentation fixes
From: Mauro Carvalho Chehab @ 2019-05-29 23:23 UTC (permalink / raw)
To: Linux Doc Mailing List
Cc: alsa-devel, kvm, linux-pci, dri-devel, virtualization, linux-mm,
keyrings, linux-mtd, linux-i2c, linux-kselftest,
Mauro Carvalho Chehab, devel, Jonathan Corbet, x86, linux-acpi,
xen-devel, devicetree, linux-pm, Mauro Carvalho Chehab,
linux-gpio, linux-amlogic, bpf, devel, patches, linux-kernel,
linux-security-module, netdev, linux-integrity, linuxppc-dev
Fix several warnings and broken links.
This series was generated against linux-next, but was rebased to be applied at
docs-next. It should apply cleanly on either tree.
There's a git tree with all of them applied on the top of docs/docs-next
at:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=fix_doc_links_v2
Mauro Carvalho Chehab (21):
ABI: sysfs-devices-system-cpu: point to the right docs
isdn: mISDN: remove a bogus reference to a non-existing doc
dt: fix broken references to nand.txt
docs: zh_CN: get rid of basic_profiling.txt
doc: it_IT: fix reference to magic-number.rst
docs: mm: numaperf.rst: get rid of a build warning
docs: bpf: get rid of two warnings
docs: mark orphan documents as such
docs: amd-memory-encryption.rst get rid of warnings
gpu: amdgpu: fix broken amdgpu_dma_buf.c references
gpu: i915.rst: Fix references to renamed files
docs: zh_CN: avoid duplicate citation references
docs: vm: hmm.rst: fix some warnings
docs: it: license-rules.rst: get rid of warnings
docs: gpio: driver.rst: fix a bad tag
docs: soundwire: locking: fix tags for a code-block
docs: security: trusted-encrypted.rst: fix code-block tag
docs: security: core.rst: Fix several warnings
docs: net: dpio-driver.rst: fix two codeblock warnings
docs: net: sja1105.rst: fix table format
docs: fix broken documentation links
Otto Sabart (1):
mfd: madera: Fix bad reference to pinctrl.txt file
.../ABI/testing/sysfs-devices-system-cpu | 3 +-
Documentation/accelerators/ocxl.rst | 2 +
Documentation/acpi/dsd/leds.txt | 2 +-
.../admin-guide/kernel-parameters.rst | 6 +-
.../admin-guide/kernel-parameters.txt | 16 ++---
Documentation/admin-guide/mm/numaperf.rst | 5 +-
Documentation/admin-guide/ras.rst | 2 +-
Documentation/arm/stm32/overview.rst | 2 +
.../arm/stm32/stm32f429-overview.rst | 2 +
.../arm/stm32/stm32f746-overview.rst | 2 +
.../arm/stm32/stm32f769-overview.rst | 2 +
.../arm/stm32/stm32h743-overview.rst | 2 +
.../arm/stm32/stm32mp157-overview.rst | 2 +
Documentation/bpf/btf.rst | 2 +
.../bindings/mtd/amlogic,meson-nand.txt | 2 +-
.../devicetree/bindings/mtd/gpmc-nand.txt | 2 +-
.../devicetree/bindings/mtd/marvell-nand.txt | 2 +-
.../devicetree/bindings/mtd/tango-nand.txt | 2 +-
.../devicetree/bindings/net/fsl-enetc.txt | 7 +-
.../bindings/pci/amlogic,meson-pcie.txt | 2 +-
.../regulator/qcom,rpmh-regulator.txt | 2 +-
.../devicetree/booting-without-of.txt | 2 +-
Documentation/driver-api/gpio/board.rst | 2 +-
Documentation/driver-api/gpio/consumer.rst | 2 +-
Documentation/driver-api/gpio/driver.rst | 2 +-
.../driver-api/soundwire/locking.rst | 4 +-
.../firmware-guide/acpi/enumeration.rst | 2 +-
.../firmware-guide/acpi/method-tracing.rst | 2 +-
Documentation/gpu/amdgpu.rst | 4 +-
Documentation/gpu/i915.rst | 6 +-
Documentation/gpu/msm-crash-dump.rst | 2 +
Documentation/i2c/instantiating-devices | 2 +-
Documentation/interconnect/interconnect.rst | 2 +
Documentation/laptops/lg-laptop.rst | 2 +
.../freescale/dpaa2/dpio-driver.rst | 4 +-
Documentation/networking/dsa/sja1105.rst | 6 +-
Documentation/powerpc/isa-versions.rst | 2 +
Documentation/security/keys/core.rst | 16 +++--
.../security/keys/trusted-encrypted.rst | 4 +-
Documentation/sysctl/kernel.txt | 4 +-
.../translations/it_IT/process/howto.rst | 2 +-
.../it_IT/process/license-rules.rst | 28 ++++----
.../it_IT/process/magic-number.rst | 2 +-
.../it_IT/process/stable-kernel-rules.rst | 4 +-
.../translations/zh_CN/basic_profiling.txt | 71 -------------------
.../translations/zh_CN/process/4.Coding.rst | 2 +-
.../zh_CN/process/management-style.rst | 4 +-
.../zh_CN/process/programming-language.rst | 28 ++++----
.../virtual/kvm/amd-memory-encryption.rst | 5 ++
Documentation/virtual/kvm/vcpu-requests.rst | 2 +
Documentation/vm/hmm.rst | 9 ++-
Documentation/x86/x86_64/5level-paging.rst | 2 +-
Documentation/x86/x86_64/boot-options.rst | 4 +-
.../x86/x86_64/fake-numa-for-cpusets.rst | 2 +-
MAINTAINERS | 6 +-
arch/arm/Kconfig | 2 +-
arch/arm64/kernel/kexec_image.c | 2 +-
arch/powerpc/Kconfig | 2 +-
arch/x86/Kconfig | 16 ++---
arch/x86/Kconfig.debug | 2 +-
arch/x86/boot/header.S | 2 +-
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/bootparam_utils.h | 2 +-
arch/x86/include/asm/page_64_types.h | 2 +-
arch/x86/include/asm/pgtable_64_types.h | 2 +-
arch/x86/kernel/cpu/microcode/amd.c | 2 +-
arch/x86/kernel/kexec-bzimage64.c | 2 +-
arch/x86/kernel/pci-dma.c | 2 +-
arch/x86/mm/tlb.c | 2 +-
arch/x86/platform/pvh/enlighten.c | 2 +-
drivers/acpi/Kconfig | 10 +--
drivers/isdn/mISDN/dsp_core.c | 2 -
drivers/net/ethernet/faraday/ftgmac100.c | 2 +-
.../fieldbus/Documentation/fieldbus_dev.txt | 4 +-
drivers/vhost/vhost.c | 2 +-
include/acpi/acpi_drivers.h | 2 +-
include/linux/fs_context.h | 2 +-
include/linux/lsm_hooks.h | 2 +-
include/linux/mfd/madera/pdata.h | 3 +-
mm/Kconfig | 2 +-
security/Kconfig | 2 +-
tools/include/linux/err.h | 2 +-
.../Documentation/stack-validation.txt | 4 +-
tools/testing/selftests/x86/protection_keys.c | 2 +-
84 files changed, 183 insertions(+), 212 deletions(-)
delete mode 100644 Documentation/translations/zh_CN/basic_profiling.txt
--
2.21.0
^ permalink raw reply
* Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage
From: Mike Kravetz @ 2019-05-29 23:31 UTC (permalink / raw)
To: Wanpeng Li
Cc: kvm, Punit Agrawal, Xiao Guangrong, linux-kernel@vger.kernel.org,
Michal Hocko, linux-mm@kvack.org, yongkaiwu, Aneesh Kumar K.V,
Paolo Bonzini, Andrew Morton, lidongchen,
linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi, Anshuman Khandual
In-Reply-To: <CANRm+CxAgWVv5aVzQ0wdP_A7QQgqfy7nN_SxyaactG7Mnqfr2A@mail.gmail.com>
On 5/28/19 2:49 AM, Wanpeng Li wrote:
> Cc Paolo,
> Hi all,
> On Wed, 14 Feb 2018 at 06:34, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
>>> Andrew Morton <akpm@linux-foundation.org> writes:
>>>
>>>> On Thu, 08 Feb 2018 12:30:45 +0000 Punit Agrawal <punit.agrawal@arm.com> wrote:
>>>>
>>>>>>
>>>>>> So I don't think that the above test result means that errors are properly
>>>>>> handled, and the proposed patch should help for arm64.
>>>>>
>>>>> Although, the deviation of pud_huge() avoids a kernel crash the code
>>>>> would be easier to maintain and reason about if arm64 helpers are
>>>>> consistent with expectations by core code.
>>>>>
>>>>> I'll look to update the arm64 helpers once this patch gets merged. But
>>>>> it would be helpful if there was a clear expression of semantics for
>>>>> pud_huge() for various cases. Is there any version that can be used as
>>>>> reference?
>>>>
>>>> Is that an ack or tested-by?
>>>>
>>>> Mike keeps plaintively asking the powerpc developers to take a look,
>>>> but they remain steadfastly in hiding.
>>>
>>> Cc'ing linuxppc-dev is always a good idea :)
>>>
>>
>> Thanks Michael,
>>
>> I was mostly concerned about use cases for soft/hard offline of huge pages
>> larger than PMD_SIZE on powerpc. I know that powerpc supports PGD_SIZE
>> huge pages, and soft/hard offline support was specifically added for this.
>> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
>> at PGD level"
>>
>> This patch will disable that functionality. So, at a minimum this is a
>> 'heads up'. If there are actual use cases that depend on this, then more
>> work/discussions will need to happen. From the e-mail thread on PGD_SIZE
>> support, I can not tell if there is a real use case or this is just a
>> 'nice to have'.
>
> 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> encounter gup_pud_range() panic several times in product environment.
> Is there any plan to reenable and fix arch codes?
I too am aware of slightly more interest in 1G huge pages. Suspect that as
Intel MMU capacity increases to handle more TLB entries there will be more
and more interest.
Personally, I am not looking at this issue. Perhaps Naoya will comment as
he know most about this code.
> In addition, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> are 1GB hugetlbfs pages, after above PUD panic is fixed,
> try_to_unmap() which is called in MCA recovery path will mark the PUD
> hwpoison entry. The guest will vmexit and retry endlessly when
> accessing any memory in the guest which is backed by this 1GB poisoned
> hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> will be delivered to VM at page fault next time for the offensive
> SPTE. Is this proposal acceptable?
I am not sure of the error handling design, but this does sound reasonable.
That block of code which potentially dissolves a huge page on memory error
is hard to understand and I'm not sure if that is even the 'normal'
functionality. Certainly, we would hate to waste/poison an entire 1G page
for an error on a small subsection.
--
Mike Kravetz
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox