* Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code
From: Mathieu Desnoyers @ 2020-12-04 20:24 UTC (permalink / raw)
To: Andy Lutomirski, Peter Zijlstra
Cc: linux-arch, Nadav Amit, Arnd Bergmann, riel, Will Deacon, x86,
Dave Hansen, linux-kernel, Nicholas Piggin, linux-mm,
Catalin Marinas, Jann Horn, linuxppc-dev
In-Reply-To: <203d39d11562575fd8bd6a094d97a3a332d8b265.1607059162.git.luto@kernel.org>
----- On Dec 4, 2020, at 12:26 AM, Andy Lutomirski luto@kernel.org wrote:
> The core scheduler isn't a great place for
> membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't
> actually know whether we are lazy. With the old code, if a CPU is
> running a membarrier-registered task, goes idle, gets unlazied via a TLB
> shootdown IPI, and switches back to the membarrier-registered task, it
> will do an unnecessary core sync.
>
> Conveniently, x86 is the only architecture that does anything in this
> hook, so we can just move the code.
>
> XXX: there are some comments in swich_mm_irqs_off() that seem to be
> trying to document what barriers are expected, and it's not clear to me
> that these barriers are actually present in all paths through the
> code. So I think this change makes the code more comprehensible and
> has no effect on the code's correctness, but I'm not at all convinced
> that the code is correct.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> arch/x86/mm/tlb.c | 17 ++++++++++++++++-
> kernel/sched/core.c | 14 +++++++-------
> 2 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 3338a1feccf9..23df035b80e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -8,6 +8,7 @@
> #include <linux/export.h>
> #include <linux/cpu.h>
> #include <linux/debugfs.h>
> +#include <linux/sched/mm.h>
>
> #include <asm/tlbflush.h>
> #include <asm/mmu_context.h>
> @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
> mm_struct *next,
> * from one thread in a process to another thread in the same
> * process. No TLB flush required.
> */
> +
> + // XXX: why is this okay wrt membarrier?
> if (!was_lazy)
> return;
As I recall, when the scheduler switches between threads which belong to
the same mm, it does not have to provide explicit memory barriers for
membarrier because it does not change the "rq->curr->mm" value which is
used as condition in the membarrier loop to send the IPI.
>
> @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
> mm_struct *next,
> smp_mb();
> next_tlb_gen = atomic64_read(&next->context.tlb_gen);
> if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
> - next_tlb_gen)
> + next_tlb_gen) {
> + /*
> + * We're reactivating an mm, and membarrier might
> + * need to serialize. Tell membarrier.
> + */
> +
> + // XXX: I can't understand the logic in
> + // membarrier_mm_sync_core_before_usermode(). What's
> + // the mm check for?
> + membarrier_mm_sync_core_before_usermode(next);
I think you mean the:
if (current->mm != mm)
return;
check at the beginning.
We have to look at it from the scheduler context from which this function is called
(yeah, I know, it's not so great to mix up scheduler and mm states like this).
in finish_task_switch() we have:
struct mm_struct *mm = rq->prev_mm;
[...]
if (mm) {
membarrier_mm_sync_core_before_usermode(mm);
mmdrop(mm);
}
I recall that this current->mm vs rq->prev_mm check is just there to
figure out whether we are in lazy tlb mode, and don't sync core in lazy
tlb mode. Hopefully I'm not stating anything stupid here, maybe Peter
could enlighten us. And you should definitely be careful when calling this
helper from other contexts, as it was originally crafted only for that
single use in the scheduler.
> return;
> + }
>
> /*
> * TLB contents went out of date while we were in lazy
> * mode. Fall through to the TLB switching code below.
> + * No need for an explicit membarrier invocation -- the CR3
> + * write will serialize.
> */
> new_asid = prev_asid;
> need_flush = true;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d95dc3f4644..6c4b76147166 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3619,22 +3619,22 @@ static struct rq *finish_task_switch(struct task_struct
> *prev)
> kcov_finish_switch(current);
>
> fire_sched_in_preempt_notifiers(current);
> +
> /*
> * When switching through a kernel thread, the loop in
> * membarrier_{private,global}_expedited() may have observed that
> * kernel thread and not issued an IPI. It is therefore possible to
> * schedule between user->kernel->user threads without passing though
> * switch_mm(). Membarrier requires a barrier after storing to
> - * rq->curr, before returning to userspace, so provide them here:
> + * rq->curr, before returning to userspace, and mmdrop() provides
> + * this barrier.
> *
> - * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
> - * provided by mmdrop(),
> - * - a sync_core for SYNC_CORE.
> + * XXX: I don't think mmdrop() actually does this. There's no
> + * smp_mb__before/after_atomic() in there.
I recall mmdrop providing a memory barrier. It looks like I event went though the
trouble of documenting it myself. ;-)
static inline void mmdrop(struct mm_struct *mm)
{
/*
* The implicit full barrier implied by atomic_dec_and_test() is
* required by the membarrier system call before returning to
* user-space, after storing to rq->curr.
*/
if (unlikely(atomic_dec_and_test(&mm->mm_count)))
__mmdrop(mm);
}
> */
> - if (mm) {
> - membarrier_mm_sync_core_before_usermode(mm);
OK so here is the meat. The current code is using the (possibly incomplete)
lazy TLB state known by the scheduler to sync core, and it appears it may be
a bit more heavy that what is strictly needed.
Your change instead rely on the internal knowledge of lazy TLB within x86
switch_mm_irqs_off to achieve this, which overall seems like an improvement.
I agree with Nick's comment that it should go on top of his exit_lazy_mm
patches.
Thanks,
Mathieu
> + if (mm)
> mmdrop(mm);
> - }
> +
> if (unlikely(prev_state == TASK_DEAD)) {
> if (prev->sched_class->task_dead)
> prev->sched_class->task_dead(prev);
> --
> 2.28.0
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code
From: Mathieu Desnoyers @ 2020-12-04 20:39 UTC (permalink / raw)
To: Nadav Amit, Peter Zijlstra
Cc: linux-arch, Arnd Bergmann, riel, Will Deacon, x86, Dave Hansen,
linux-kernel, Nicholas Piggin, linux-mm, Andy Lutomirski,
Catalin Marinas, Jann Horn, linuxppc-dev
In-Reply-To: <A61977A7-F0B2-4492-AB6D-06E24417FA59@gmail.com>
----- On Dec 4, 2020, at 3:17 AM, Nadav Amit nadav.amit@gmail.com wrote:
> I am not very familiar with membarrier, but here are my 2 cents while trying
> to answer your questions.
>
>> On Dec 3, 2020, at 9:26 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
>> mm_struct *next,
>> * from one thread in a process to another thread in the same
>> * process. No TLB flush required.
>> */
>> +
>> + // XXX: why is this okay wrt membarrier?
>> if (!was_lazy)
>> return;
>
> I am confused.
>
> On one hand, it seems that membarrier_private_expedited() would issue an IPI
> to that core, as it would find that this core’s cpu_rq(cpu)->curr->mm is the
> same as the one that the membarrier applies to.
If the scheduler switches from one thread to another which both have the same mm,
it means cpu_rq(cpu)->curr->mm is invariant, even though ->curr changes. So there
is no need to issue a memory barrier or sync core for membarrier in this case,
because there is no way the IPI can be missed.
> But… (see below)
>
>
>> @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
>> mm_struct *next,
>> smp_mb();
>> next_tlb_gen = atomic64_read(&next->context.tlb_gen);
>> if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
>> - next_tlb_gen)
>> + next_tlb_gen) {
>> + /*
>> + * We're reactivating an mm, and membarrier might
>> + * need to serialize. Tell membarrier.
>> + */
>> +
>> + // XXX: I can't understand the logic in
>> + // membarrier_mm_sync_core_before_usermode(). What's
>> + // the mm check for?
>> + membarrier_mm_sync_core_before_usermode(next);
>
> On the other hand the reason for this mm check that you mention contradicts
> my previous understanding as the git log says:
>
> commit 2840cf02fae627860156737e83326df354ee4ec6
> Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Thu Sep 19 13:37:01 2019 -0400
>
> sched/membarrier: Call sync_core only before usermode for same mm
>
> When the prev and next task's mm change, switch_mm() provides the core
> serializing guarantees before returning to usermode. The only case
> where an explicit core serialization is needed is when the scheduler
> keeps the same mm for prev and next.
Hrm, so your point here is that if the scheduler keeps the same mm for
prev and next, it means membarrier will have observed the same rq->curr->mm,
and therefore the IPI won't be missed. I wonder if that
membarrier_mm_sync_core_before_usermode is needed at all then or if we
have just been too careful and did not consider that all the scenarios which
need to be core-sync'd are indeed taken care of ?
I see here that my prior commit message indeed discusses prev and next task's
mm, but in reality, we are comparing current->mm with rq->prev_mm. So from
a lazy TLB perspective, this probably matters, and we may still need a core sync
in some lazy TLB scenarios.
>
>> /*
>> * When switching through a kernel thread, the loop in
>> * membarrier_{private,global}_expedited() may have observed that
>> * kernel thread and not issued an IPI. It is therefore possible to
>> * schedule between user->kernel->user threads without passing though
>> * switch_mm(). Membarrier requires a barrier after storing to
>> - * rq->curr, before returning to userspace, so provide them here:
>> + * rq->curr, before returning to userspace, and mmdrop() provides
>> + * this barrier.
>> *
>> - * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
>> - * provided by mmdrop(),
>> - * - a sync_core for SYNC_CORE.
>> + * XXX: I don't think mmdrop() actually does this. There's no
>> + * smp_mb__before/after_atomic() in there.
>
> I presume that since x86 is the only one that needs
> membarrier_mm_sync_core_before_usermode(), nobody noticed the missing
> smp_mb__before/after_atomic(). These are anyhow a compiler barrier in x86,
> and such a barrier would take place before the return to userspace.
mmdrop already provides the memory barriers for membarrer, as I documented within the
function.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards
From: Nathan Lynch @ 2020-12-04 20:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201030011805.1224603-2-nathanl@linux.ibm.com>
Nathan Lynch <nathanl@linux.ibm.com> writes:
> rtas_call_reentrant() isn't platform-dependent; move it out of
> CONFIG_PPC_PSERIES-guarded code.
As reported by the 0-day CI, this is mistaken, and it breaks non-pseries
builds:
>> arch/powerpc/kernel/rtas.c:938:21: error: no member named 'rtas_args_reentrant' in 'struct paca_struct'
args = local_paca->rtas_args_reentrant;
~~~~~~~~~~ ^
1 error generated.
https://lore.kernel.org/linuxppc-dev/202012050432.SFbbjWMw-lkp@intel.com/
I'll drop this and reroll the series.
^ permalink raw reply
* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: kernel test robot @ 2020-12-04 21:02 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, clang-built-linux, kbuild-all,
Nicholas Piggin
In-Reply-To: <20201203054724.44838-1-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 7629 bytes --]
Hi Alexey,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v5.10-rc6 next-20201204]
[cannot apply to powerpc/next scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 34816d20f173a90389c8a7e641166d8ea9dce70a
config: powerpc64-randconfig-r023-20201204 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 32c501dd88b62787d3a5ffda7aabcf4650dbe3cd)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# https://github.com/0day-ci/linux/commit/d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
git checkout d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:17:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:9:
In file included from include/linux/sched/task.h:11:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
>> arch/powerpc/include/asm/kup.h:71:29: error: redefinition of 'get_kuap'
static inline unsigned long get_kuap(void)
^
arch/powerpc/include/asm/book3s/64/kup-radix.h:152:29: note: previous definition is here
static inline unsigned long get_kuap(void)
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:17:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:9:
In file included from include/linux/sched/task.h:11:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
>> arch/powerpc/include/asm/kup.h:76:20: error: redefinition of 'set_kuap'
static inline void set_kuap(unsigned long value) { }
^
arch/powerpc/include/asm/book3s/64/kup-radix.h:157:20: note: previous definition is here
static inline void set_kuap(unsigned long value) { }
^
In file included from arch/powerpc/kernel/asm-offsets.c:21:
include/linux/mman.h:155:9: warning: division by zero is undefined [-Wdivision-by-zero]
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/mman.h:133:21: note: expanded from macro '_calc_vm_trans'
: ((x) & (bit1)) / ((bit1) / (bit2))))
^ ~~~~~~~~~~~~~~~~~
include/linux/mman.h:156:9: warning: division by zero is undefined [-Wdivision-by-zero]
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/mman.h:133:21: note: expanded from macro '_calc_vm_trans'
: ((x) & (bit1)) / ((bit1) / (bit2))))
^ ~~~~~~~~~~~~~~~~~
2 warnings and 2 errors generated.
--
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:17:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:9:
In file included from include/linux/sched/task.h:11:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
>> arch/powerpc/include/asm/kup.h:71:29: error: redefinition of 'get_kuap'
static inline unsigned long get_kuap(void)
^
arch/powerpc/include/asm/book3s/64/kup-radix.h:152:29: note: previous definition is here
static inline unsigned long get_kuap(void)
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:17:
In file included from include/linux/fs.h:33:
In file included from include/linux/percpu-rwsem.h:7:
In file included from include/linux/rcuwait.h:6:
In file included from include/linux/sched/signal.h:9:
In file included from include/linux/sched/task.h:11:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
>> arch/powerpc/include/asm/kup.h:76:20: error: redefinition of 'set_kuap'
static inline void set_kuap(unsigned long value) { }
^
arch/powerpc/include/asm/book3s/64/kup-radix.h:157:20: note: previous definition is here
static inline void set_kuap(unsigned long value) { }
^
In file included from arch/powerpc/kernel/asm-offsets.c:21:
include/linux/mman.h:155:9: warning: division by zero is undefined [-Wdivision-by-zero]
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/mman.h:133:21: note: expanded from macro '_calc_vm_trans'
: ((x) & (bit1)) / ((bit1) / (bit2))))
^ ~~~~~~~~~~~~~~~~~
include/linux/mman.h:156:9: warning: division by zero is undefined [-Wdivision-by-zero]
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/mman.h:133:21: note: expanded from macro '_calc_vm_trans'
: ((x) & (bit1)) / ((bit1) / (bit2))))
^ ~~~~~~~~~~~~~~~~~
2 warnings and 2 errors generated.
make[2]: *** [scripts/Makefile.build:117: arch/powerpc/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1198: prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:185: __sub-make] Error 2
make: Target 'prepare' not remade because of errors.
vim +/get_kuap +71 arch/powerpc/include/asm/kup.h
69
70 static inline void kuap_check_amr(void) { }
> 71 static inline unsigned long get_kuap(void)
72 {
73 return AMR_KUAP_BLOCKED;
74 }
75
> 76 static inline void set_kuap(unsigned long value) { }
77
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 41007 bytes --]
^ permalink raw reply
* Re: [PATCH v3 15/18] ibmvfc: send Cancel MAD down each hw scsi channel
From: Brian King @ 2020-12-04 21:26 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201203020806.14747-16-tyreld@linux.ibm.com>
On 12/2/20 8:08 PM, Tyrel Datwyler wrote:
> In general the client needs to send Cancel MADs and task management
> commands down the same channel as the command(s) intended to cancel or
> abort. The client assigns cancel keys per LUN and thus must send a
> Cancel down each channel commands were submitted for that LUN. Further,
> the client then must wait for those cancel completions prior to
> submitting a LUN RESET or ABORT TASK SET.
>
> Add a cancel event pointer and cancel rsp iu storage to the
> ibmvfc_sub_queue struct such that the cancel routine can assign a cancel
> event to each applicable queue. When in legacy CRQ mode we fake treating
> it as a subqueue by using a subqueue struct allocated on the stack. Wait
> for completion of each submitted cancel.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 104 ++++++++++++++++++++++-----------
> drivers/scsi/ibmvscsi/ibmvfc.h | 38 ++++++------
> 2 files changed, 90 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ec3db5a6baf3..e353b9e88104 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2339,67 +2339,103 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
> {
> struct ibmvfc_host *vhost = shost_priv(sdev->host);
> struct ibmvfc_event *evt, *found_evt;
> - union ibmvfc_iu rsp;
> - int rsp_rc = -EBUSY;
> + struct ibmvfc_sub_queue *scrqs;
> + struct ibmvfc_sub_queue legacy_crq;
> + int rsp_rc = 0;
> unsigned long flags;
> u16 status;
> + int cancel_cnt = 0;
> + int num_hwq;
> + int ret = 0;
> + int i;
>
> ENTER;
> spin_lock_irqsave(vhost->host->host_lock, flags);
> - found_evt = NULL;
> - list_for_each_entry(evt, &vhost->sent, queue) {
> - if (evt->cmnd && evt->cmnd->device == sdev) {
> - found_evt = evt;
> + if (vhost->using_channels && vhost->scsi_scrqs.active_queues) {
> + num_hwq = vhost->scsi_scrqs.active_queues;
> + scrqs = vhost->scsi_scrqs.scrqs;
> + } else {
> + /* Use ibmvfc_sub_queue on the stack to fake legacy CRQ as a subqueue */
> + num_hwq = 1;
> + scrqs = &legacy_crq;
> + }
> +
> + for (i = 0; i < num_hwq; i++) {
> + scrqs[i].cancel_event = NULL;
> + found_evt = NULL;
> + list_for_each_entry(evt, &vhost->sent, queue) {
> + if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq == i) {
> + found_evt = evt;
> + cancel_cnt++;
> + break;
> + }
> + }
> +
> + if (!found_evt)
> + continue;
> +
> + if (vhost->logged_in) {
> + scrqs[i].cancel_event = ibmvfc_init_tmf(vhost, sdev, type);
> + scrqs[i].cancel_event->hwq = i;
> + scrqs[i].cancel_event->sync_iu = &scrqs[i].cancel_rsp;
> + rsp_rc = ibmvfc_send_event(scrqs[i].cancel_event, vhost, default_timeout);
> + if (rsp_rc)
> + break;
It looks like if you have two outstanding commands, on two different hwqs, and you succeed
in sending a cancel for the first hwq but fail sending it for the second hwq due to
something happening like a xport event of some sort, then you would end up falling down
into free_events where you'd call ibmvfc_free_event which will do a list_add_tail to add
the event to the free list without having even pulled the event off the sent list, which
will result in list corruption as now the free list and sent list will be intermingled.
It would probably be better to only free the events if you never sent them or if you
are sure they completed. So, you might need to have to wait for the completion of
the cancel events that did get sent, which would likely be completed via purge_all.
> + } else {
> + rsp_rc = -EBUSY;
> break;
> }
> }
>
> - if (!found_evt) {
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
> + if (!cancel_cnt) {
> if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
> sdev_printk(KERN_INFO, sdev, "No events found to cancel\n");
> - spin_unlock_irqrestore(vhost->host->host_lock, flags);
> return 0;
> }
>
> - if (vhost->logged_in) {
> - evt = ibmvfc_init_tmf(vhost, sdev, type);
> - evt->sync_iu = &rsp;
> - rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
> - }
> -
> - spin_unlock_irqrestore(vhost->host->host_lock, flags);
> -
> if (rsp_rc != 0) {
> sdev_printk(KERN_ERR, sdev, "Failed to send cancel event. rc=%d\n", rsp_rc);
> /* If failure is received, the host adapter is most likely going
> through reset, return success so the caller will wait for the command
> being cancelled to get returned */
> - return 0;
> + goto free_events;
> }
>
> sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
>
> - wait_for_completion(&evt->comp);
> - status = be16_to_cpu(rsp.mad_common.status);
> - spin_lock_irqsave(vhost->host->host_lock, flags);
> - ibmvfc_free_event(evt);
> - spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + for (i = 0; i < num_hwq; i++) {
> + if (!scrqs[i].cancel_event)
> + continue;
>
> - if (status != IBMVFC_MAD_SUCCESS) {
> - sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", status);
> - switch (status) {
> - case IBMVFC_MAD_DRIVER_FAILED:
> - case IBMVFC_MAD_CRQ_ERROR:
> - /* Host adapter most likely going through reset, return success to
> - the caller will wait for the command being cancelled to get returned */
> - return 0;
> - default:
> - return -EIO;
> - };
> + wait_for_completion(&scrqs[i].cancel_event->comp);
> + status = be16_to_cpu(scrqs[i].cancel_rsp.mad_common.status);
> +
> + if (status != IBMVFC_MAD_SUCCESS) {
> + sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", status);
> + switch (status) {
> + case IBMVFC_MAD_DRIVER_FAILED:
> + case IBMVFC_MAD_CRQ_ERROR:
> + /* Host adapter most likely going through reset, return success to
> + the caller will wait for the command being cancelled to get returned */
> + goto free_events;
Similar comment here... What about the rest of the outstanding cancel commands? Do you need
to wait for those to complete before freeing them?
> + default:
> + ret = -EIO;
> + goto free_events;
> + };
> + }
> }
>
> sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding commands\n");
> - return 0;
> +free_events:
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + for (i = 0; i < num_hwq; i++)
> + if (scrqs[i].cancel_event)
> + ibmvfc_free_event(scrqs[i].cancel_event);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
> + return ret;
> }
>
> /**
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 17/18] ibmvfc: provide modules parameters for MQ settings
From: Brian King @ 2020-12-04 21:28 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201203020806.14747-18-tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 18/18] ibmvfc: drop host lock when completing commands in CRQ
From: Brian King @ 2020-12-04 21:35 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20201203020806.14747-19-tyreld@linux.ibm.com>
On 12/2/20 8:08 PM, Tyrel Datwyler wrote:
> The legacy CRQ holds the host lock the even while completing commands.
> This presents a problem when in legacy single queue mode and
> nr_hw_queues is greater than one since calling scsi_done() introduces
> the potential for deadlock.
>
> If nr_hw_queues is greater than one drop the hostlock in the legacy CRQ
> path when completing a command.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index e499599662ec..e2200bdff2be 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2969,6 +2969,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
> {
> long rc;
> struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
>
> switch (crq->valid) {
> case IBMVFC_CRQ_INIT_RSP:
> @@ -3039,7 +3040,12 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
> del_timer(&evt->timer);
> list_del(&evt->queue);
> ibmvfc_trc_end(evt);
> - evt->done(evt);
> + if (nr_scsi_hw_queues > 1) {
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + } else
> + evt->done(evt);
Similar comment here as previously. The flags parameter is an output for
spin_lock_irqsave but an input for spin_unlock_irqrestore. You'll need
to rethink the locking here. You could just do a spin_unlock_irq / spin_lock_irq
here and that would probably be OK, but probably isn't the best.
> }
>
> /**
>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* [powerpc:next] BUILD SUCCESS c9344769e2b46ba28b947bec7a8a8f0a091ecd57
From: kernel test robot @ 2020-12-04 22:26 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
branch HEAD: c9344769e2b46ba28b947bec7a8a8f0a091ecd57 selftests/powerpc: Fix uninitialized variable warning
elapsed time: 1926m
configs tested: 161
configs skipped: 4
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
arm shannon_defconfig
mips decstation_defconfig
mips mpc30x_defconfig
mips ci20_defconfig
nios2 alldefconfig
arc vdk_hs38_defconfig
powerpc cm5200_defconfig
arm tango4_defconfig
mips decstation_r4k_defconfig
mips mtx1_defconfig
mips qi_lb60_defconfig
arm omap1_defconfig
arm h5000_defconfig
m68k amiga_defconfig
sh r7785rp_defconfig
sh microdev_defconfig
m68k m5275evb_defconfig
nds32 alldefconfig
mips fuloong2e_defconfig
sh se7724_defconfig
powerpc obs600_defconfig
arm pcm027_defconfig
arm dove_defconfig
mips tb0219_defconfig
m68k m5475evb_defconfig
powerpc ep8248e_defconfig
powerpc mpc512x_defconfig
arm orion5x_defconfig
arm shmobile_defconfig
arm u8500_defconfig
powerpc ppc6xx_defconfig
arm vf610m4_defconfig
sh se7750_defconfig
powerpc mgcoge_defconfig
ia64 tiger_defconfig
mips jmr3927_defconfig
xtensa iss_defconfig
arm ixp4xx_defconfig
powerpc motionpro_defconfig
nds32 defconfig
mips jazz_defconfig
powerpc cell_defconfig
ia64 alldefconfig
powerpc tqm5200_defconfig
arc haps_hs_defconfig
arm spitz_defconfig
arm exynos_defconfig
powerpc asp8347_defconfig
arm nhk8815_defconfig
arm colibri_pxa300_defconfig
powerpc chrp32_defconfig
sh urquell_defconfig
arm netwinder_defconfig
mips gpr_defconfig
powerpc skiroot_defconfig
arm aspeed_g4_defconfig
arm hackkit_defconfig
sparc sparc64_defconfig
powerpc wii_defconfig
arm versatile_defconfig
arc nsimosci_hs_defconfig
mips maltasmvp_eva_defconfig
arm trizeps4_defconfig
arm spear3xx_defconfig
powerpc redwood_defconfig
m68k m5208evb_defconfig
arm stm32_defconfig
powerpc pseries_defconfig
arm prima2_defconfig
sh titan_defconfig
powerpc eiger_defconfig
sh lboxre2_defconfig
mips ip32_defconfig
arm s3c2410_defconfig
xtensa defconfig
c6x evmc6474_defconfig
powerpc amigaone_defconfig
powerpc mpc834x_itxgp_defconfig
arm vt8500_v6_v7_defconfig
parisc defconfig
arm keystone_defconfig
sh se7343_defconfig
powerpc tqm8540_defconfig
arm pxa_defconfig
arm omap2plus_defconfig
powerpc socrates_defconfig
xtensa smp_lx200_defconfig
c6x evmc6472_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 tinyconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a004-20201204
x86_64 randconfig-a006-20201204
x86_64 randconfig-a002-20201204
x86_64 randconfig-a001-20201204
x86_64 randconfig-a005-20201204
x86_64 randconfig-a003-20201204
i386 randconfig-a005-20201204
i386 randconfig-a004-20201204
i386 randconfig-a001-20201204
i386 randconfig-a002-20201204
i386 randconfig-a006-20201204
i386 randconfig-a003-20201204
i386 randconfig-a014-20201204
i386 randconfig-a013-20201204
i386 randconfig-a011-20201204
i386 randconfig-a015-20201204
i386 randconfig-a012-20201204
i386 randconfig-a016-20201204
riscv nommu_k210_defconfig
riscv allyesconfig
riscv nommu_virt_defconfig
riscv allnoconfig
riscv defconfig
riscv rv32_defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 kexec
x86_64 rhel-8.3
clang tested configs:
x86_64 randconfig-a016-20201204
x86_64 randconfig-a012-20201204
x86_64 randconfig-a014-20201204
x86_64 randconfig-a013-20201204
x86_64 randconfig-a015-20201204
x86_64 randconfig-a011-20201204
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [PATCH] powerpc: Stop exporting __clear_user which is now inlined.
From: Michal Suchanek @ 2020-12-04 23:28 UTC (permalink / raw)
To: stable; +Cc: linux-kernel, Paul Mackerras, Michal Suchanek, linuxppc-dev
Stable commit 452e2a83ea23 ("powerpc: Fix __clear_user() with KUAP
enabled") redefines __clear_user as inline function but does not remove
the export.
Fixes: 452e2a83ea23 ("powerpc: Fix __clear_user() with KUAP enabled")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
arch/powerpc/lib/ppc_ksyms.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/lib/ppc_ksyms.c b/arch/powerpc/lib/ppc_ksyms.c
index c7f8e9586316..4b81fd96aa3e 100644
--- a/arch/powerpc/lib/ppc_ksyms.c
+++ b/arch/powerpc/lib/ppc_ksyms.c
@@ -24,7 +24,6 @@ EXPORT_SYMBOL(csum_tcpudp_magic);
#endif
EXPORT_SYMBOL(__copy_tofrom_user);
-EXPORT_SYMBOL(__clear_user);
EXPORT_SYMBOL(copy_page);
#ifdef CONFIG_PPC64
--
2.26.2
^ permalink raw reply related
* Re: [PATCH] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
From: Mark Brown @ 2020-12-04 23:30 UTC (permalink / raw)
To: Arnd Bergmann, Nicolin Chen, Shengjiu Wang, Liam Girdwood,
Timur Tabi, Takashi Iwai, Xiubo Li, Jaroslav Kysela
Cc: alsa-devel, Arnd Bergmann, linuxppc-dev, linux-kernel,
Fabio Estevam, Shengjiu Wang
In-Reply-To: <20201203222900.1042578-1-arnd@kernel.org>
On Thu, 3 Dec 2020 23:28:47 +0100, Arnd Bergmann wrote:
> When CONFIG_PM is disabled, we get a warning for unused functions:
>
> sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_suspend(struct device *dev)
> sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_resume(struct device *dev)
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/1] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
commit: 7b153760513cee875515398f4a9ba329a8d426e2
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* Re: [PATCH v3 04/18] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
From: Tyrel Datwyler @ 2020-12-05 0:15 UTC (permalink / raw)
To: Brian King, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <b372b257-49d8-16ae-2390-9617222e4cd9@linux.vnet.ibm.com>
On 12/4/20 6:47 AM, Brian King wrote:
> On 12/2/20 8:07 PM, Tyrel Datwyler wrote:
>> @@ -4983,6 +4993,118 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
>> return retrc;
>> }
>>
>> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
>> + int index)
>> +{
>> + struct device *dev = vhost->dev;
>> + struct vio_dev *vdev = to_vio_dev(dev);
>> + struct ibmvfc_sub_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
>> + int rc = -ENOMEM;
>> +
>> + ENTER;
>> +
>> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
>> + if (!scrq->msgs)
>> + return rc;
>> +
>> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
>> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
>> + DMA_BIDIRECTIONAL);
>> +
>> + if (dma_mapping_error(dev, scrq->msg_token))
>> + goto dma_map_failed;
>> +
>> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
>> + &scrq->cookie, &scrq->hw_irq);
>> +
>> + if (rc) {
>> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
>> + if (rc == H_PARAMETER)
>> + dev_warn_once(dev, "Firmware may not support MQ\n");
>> + goto reg_failed;
>> + }
>> +
>> + scrq->hwq_id = index;
>> + scrq->vhost = vhost;
>> +
>> + LEAVE;
>> + return 0;
>> +
>> +reg_failed:
>> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
>> +dma_map_failed:
>> + free_page((unsigned long)scrq->msgs);
>> + LEAVE;
>> + return rc;
>> +}
>> +
>> +static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index)
>> +{
>> + struct device *dev = vhost->dev;
>> + struct vio_dev *vdev = to_vio_dev(dev);
>> + struct ibmvfc_sub_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
>> + long rc;
>> +
>> + ENTER;
>> +
>> + do {
>> + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
>> + scrq->cookie);
>> + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
>> +
>> + if (rc)
>> + dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
>> +
>> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
>> + free_page((unsigned long)scrq->msgs);
>> + LEAVE;
>> +}
>> +
>> +static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
>> +{
>> + int i, j;
>> +
>> + ENTER;
>> +
>> + vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
>> + sizeof(*vhost->scsi_scrqs.scrqs),
>> + GFP_KERNEL);
>> + if (!vhost->scsi_scrqs.scrqs)
>> + return -1;
>> +
>> + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
>> + if (ibmvfc_register_scsi_channel(vhost, i)) {
>> + for (j = i; j > 0; j--)
>> + ibmvfc_deregister_scsi_channel(vhost, j - 1);
>> + kfree(vhost->scsi_scrqs.scrqs);
>> + vhost->scsi_scrqs.scrqs = NULL;
>> + vhost->scsi_scrqs.active_queues = 0;
>> + LEAVE;
>> + return -1;
>> + }
>> + }
>> +
>> + LEAVE;
>> + return 0;
>> +}
>> +
>> +static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
>> +{
>> + int i;
>> +
>> + ENTER;
>> + if (!vhost->scsi_scrqs.scrqs)
>> + return;
>> +
>> + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
>> + ibmvfc_deregister_scsi_channel(vhost, i);
>> +
>> + kfree(vhost->scsi_scrqs.scrqs);
>> + vhost->scsi_scrqs.scrqs = NULL;
>> + vhost->scsi_scrqs.active_queues = 0;
>> + LEAVE;
>> +}
>> +
>> /**
>> * ibmvfc_free_mem - Free memory for vhost
>> * @vhost: ibmvfc host struct
>> @@ -5239,6 +5361,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>> goto remove_shost;
>> }
>>
>> + if (vhost->mq_enabled) {
>> + rc = ibmvfc_init_sub_crqs(vhost);
>> + if (rc)
>> + dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", rc);
>
> So, I think if you end up down this path, you will have:
>
> vhost->scsi_scrqs.scrqs == NULL
> vhost->scsi_scrqs.active_queues == 0
>
> And you proceed with discovery. You will proceed with enquiry and channel setup.
> Then, I think you could end up in queuecommand doing this
>
> evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
>
> And that is a divide by zero...
Actually, we would bite the dust earlier than that but it requires the sub-crq
allocation to fail for a reason other than lack of firmware support. In the no
firmware support case the VIOS doesn't report channel support and we skip the
enquiry and setup steps. However, in the case where there is support and
allocation fails we would dereference a NULL pointer trying to write the channel
sub-crq handles into the channel_setup MAD.
>
> I wonder if it would be better in this scenario where registering the sub crqs fails,
> if you just did:
>
> vhost->do_enquiry = 0;
> vhost->mq_enabled = 0;
> vhost->using_channels = 0;
>
> It looks like you only try to allocate the subcrqs in probe, so if that fails, we'd
> never end up using mq, so just disabling in this case seems reasonable.
This breaks migration from legacy to a target with channel support. It appears
that migration for that case is already broken anyways. Need to rethink sub-crq
setup. Maybe best to actually do it during the negoation steps instead of in probe.
-Tyrel
>
> Thanks,
>
> Brian
>
^ permalink raw reply
* Re: [PATCH v3 06/18] ibmvfc: add handlers to drain and complete Sub-CRQ responses
From: Tyrel Datwyler @ 2020-12-05 0:16 UTC (permalink / raw)
To: Brian King, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <3fe8683a-47f6-8713-762a-02c57c2e4ec2@linux.vnet.ibm.com>
On 12/4/20 6:51 AM, Brian King wrote:
> On 12/2/20 8:07 PM, Tyrel Datwyler wrote:
>> The logic for iterating over the Sub-CRQ responses is similiar to that
>> of the primary CRQ. Add the necessary handlers for processing those
>> responses.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>> drivers/scsi/ibmvscsi/ibmvfc.c | 80 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 80 insertions(+)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index e082935f56cf..b61ae1df21e5 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -3381,6 +3381,86 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable)
>> return rc;
>> }
>>
>> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
>> +{
>> + struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
>> + unsigned long flags;
>> +
>> + switch (crq->valid) {
>> + case IBMVFC_CRQ_CMD_RSP:
>> + break;
>> + case IBMVFC_CRQ_XPORT_EVENT:
>> + return;
>> + default:
>> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid);
>> + return;
>> + }
>> +
>> + /* The only kind of payload CRQs we should get are responses to
>> + * things we send. Make sure this response is to something we
>> + * actually sent
>> + */
>> + if (unlikely(!ibmvfc_valid_event(&vhost->pool, evt))) {
>> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n",
>> + crq->ioba);
>> + return;
>> + }
>> +
>> + if (unlikely(atomic_read(&evt->free))) {
>> + dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n",
>> + crq->ioba);
>> + return;
>> + }
>> +
>> + del_timer(&evt->timer);
>> + list_del(&evt->queue);
>> + ibmvfc_trc_end(evt);> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
>
> You can't do this here... You are grabbing the host lock in ibmvfc_drain_sub_crq
> and saving the irqflags to a local in that function, then doing a spin_unlock_irqrestore
> and restoring irqflags using an uninitialized local in this function...
>
> I'm assuming this will get sorted out with the locking changes we've been discussing off-list...
Correct, moving to per-queue locks and flags stored in the queue struct.
-Tyrel
>
>
>> + evt->done(evt);
>> + spin_lock_irqsave(vhost->host->host_lock, flags);
>> +}
>> +
>> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
>> +{
>> + struct ibmvfc_crq *crq;
>> +
>> + crq = &scrq->msgs[scrq->cur].crq;
>> + if (crq->valid & 0x80) {
>> + if (++scrq->cur == scrq->size)
>> + scrq->cur = 0;
>> + rmb();
>> + } else
>> + crq = NULL;
>> +
>> + return crq;
>> +}
>> +
>> +static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq)
>> +{
>> + struct ibmvfc_crq *crq;
>> + unsigned long flags;
>> + int done = 0;
>> +
>> + spin_lock_irqsave(scrq->vhost->host->host_lock, flags);
>> + while (!done) {
>> + while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
>> + ibmvfc_handle_scrq(crq, scrq->vhost);
>> + crq->valid = 0;
>> + wmb();
>> + }
>> +
>> + ibmvfc_toggle_scrq_irq(scrq, 1);
>> + if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
>> + ibmvfc_toggle_scrq_irq(scrq, 0);
>> + ibmvfc_handle_scrq(crq, scrq->vhost);
>> + crq->valid = 0;
>> + wmb();
>> + } else
>> + done = 1;
>> + }
>> + spin_unlock_irqrestore(scrq->vhost->host->host_lock, flags);
>> +}
>> +
>> /**
>> * ibmvfc_init_tgt - Set the next init job step for the target
>> * @tgt: ibmvfc target struct
>>
>
>
^ permalink raw reply
* Re: [PATCH v3 18/18] ibmvfc: drop host lock when completing commands in CRQ
From: Tyrel Datwyler @ 2020-12-05 0:20 UTC (permalink / raw)
To: Brian King, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <b048ede5-e673-4ba9-3c28-df077aa4467a@linux.vnet.ibm.com>
On 12/4/20 1:35 PM, Brian King wrote:
> On 12/2/20 8:08 PM, Tyrel Datwyler wrote:
>> The legacy CRQ holds the host lock the even while completing commands.
>> This presents a problem when in legacy single queue mode and
>> nr_hw_queues is greater than one since calling scsi_done() introduces
>> the potential for deadlock.
>>
>> If nr_hw_queues is greater than one drop the hostlock in the legacy CRQ
>> path when completing a command.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index e499599662ec..e2200bdff2be 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -2969,6 +2969,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
>> {
>> long rc;
>> struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
>> + unsigned long flags;
>>
>> switch (crq->valid) {
>> case IBMVFC_CRQ_INIT_RSP:
>> @@ -3039,7 +3040,12 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
>> del_timer(&evt->timer);
>> list_del(&evt->queue);
>> ibmvfc_trc_end(evt);
>> - evt->done(evt);
>> + if (nr_scsi_hw_queues > 1) {
>> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
>> + evt->done(evt);
>> + spin_lock_irqsave(vhost->host->host_lock, flags);
>> + } else
>> + evt->done(evt);
>
> Similar comment here as previously. The flags parameter is an output for
> spin_lock_irqsave but an input for spin_unlock_irqrestore. You'll need
> to rethink the locking here. You could just do a spin_unlock_irq / spin_lock_irq
> here and that would probably be OK, but probably isn't the best.
>
Yeah, this will also get its own lock and flags saved in the per-queue struct in
the next spin.
-Tyrel
>> }
>>
>> /**
>>
>
>
^ permalink raw reply
* [powerpc:next-test] BUILD REGRESSION 4e4ed87981c764498942c52004c620bb8f104eac
From: kernel test robot @ 2020-12-05 1:23 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: 4e4ed87981c764498942c52004c620bb8f104eac powerpc/pseries/mobility: refactor node lookup during DT update
Error/Warning reports:
https://lore.kernel.org/linuxppc-dev/202012042220.zO7hSFT2-lkp@intel.com
https://lore.kernel.org/linuxppc-dev/202012050432.SFbbjWMw-lkp@intel.com
Error/Warning in current branch:
arch/powerpc/kernel/rtas.c:938:21: error: no member named 'rtas_args_reentrant' in 'struct paca_struct'
arch/powerpc/kernel/vdso32/vgettimeofday.c:13:5: warning: no previous prototype for function '__c_kernel_clock_gettime64' [-Wmissing-prototypes]
arch/powerpc/kernel/vdso32/vgettimeofday.c:19:5: error: conflicting types for '__c_kernel_clock_getres'
arch/powerpc/kernel/vdso32/vgettimeofday.c:7:5: error: conflicting types for '__c_kernel_clock_gettime'
Error/Warning ids grouped by kconfigs:
clang_recent_errors
|-- powerpc-randconfig-r016-20201204
| `-- arch-powerpc-kernel-rtas.c:error:no-member-named-rtas_args_reentrant-in-struct-paca_struct
`-- powerpc64-randconfig-r011-20201204
|-- arch-powerpc-kernel-vdso32-vgettimeofday.c:error:conflicting-types-for-__c_kernel_clock_getres
|-- arch-powerpc-kernel-vdso32-vgettimeofday.c:error:conflicting-types-for-__c_kernel_clock_gettime
`-- arch-powerpc-kernel-vdso32-vgettimeofday.c:warning:no-previous-prototype-for-function-__c_kernel_clock_gettime64
elapsed time: 726m
configs tested: 101
configs skipped: 2
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
m68k amiga_defconfig
sh r7785rp_defconfig
sh microdev_defconfig
m68k m5275evb_defconfig
arm shmobile_defconfig
um x86_64_defconfig
ia64 tiger_defconfig
arm mv78xx0_defconfig
ia64 zx1_defconfig
nios2 allyesconfig
parisc generic-64bit_defconfig
powerpc sbc8548_defconfig
powerpc skiroot_defconfig
sh lboxre2_defconfig
powerpc motionpro_defconfig
mips jazz_defconfig
powerpc cell_defconfig
ia64 alldefconfig
mips maltasmvp_eva_defconfig
arm trizeps4_defconfig
arm spear3xx_defconfig
powerpc redwood_defconfig
m68k m5208evb_defconfig
arm stm32_defconfig
powerpc pseries_defconfig
powerpc amigaone_defconfig
powerpc mpc834x_itxgp_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 tinyconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a004-20201204
x86_64 randconfig-a006-20201204
x86_64 randconfig-a002-20201204
x86_64 randconfig-a001-20201204
x86_64 randconfig-a005-20201204
x86_64 randconfig-a003-20201204
i386 randconfig-a005-20201204
i386 randconfig-a004-20201204
i386 randconfig-a001-20201204
i386 randconfig-a002-20201204
i386 randconfig-a006-20201204
i386 randconfig-a003-20201204
i386 randconfig-a014-20201204
i386 randconfig-a013-20201204
i386 randconfig-a011-20201204
i386 randconfig-a015-20201204
i386 randconfig-a012-20201204
i386 randconfig-a016-20201204
riscv nommu_k210_defconfig
riscv allyesconfig
riscv nommu_virt_defconfig
riscv allnoconfig
riscv defconfig
riscv rv32_defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
clang tested configs:
x86_64 randconfig-a016-20201204
x86_64 randconfig-a012-20201204
x86_64 randconfig-a014-20201204
x86_64 randconfig-a013-20201204
x86_64 randconfig-a015-20201204
x86_64 randconfig-a011-20201204
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [powerpc:merge] BUILD SUCCESS 466556ab940c1dfc7aee8db02a8329f1f3efed3d
From: kernel test robot @ 2020-12-05 1:23 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: 466556ab940c1dfc7aee8db02a8329f1f3efed3d Automatic merge of 'next' into merge (2020-12-05 00:01)
elapsed time: 727m
configs tested: 115
configs skipped: 3
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
m68k amiga_defconfig
sh r7785rp_defconfig
sh microdev_defconfig
m68k m5275evb_defconfig
arm shmobile_defconfig
um x86_64_defconfig
ia64 tiger_defconfig
arm mv78xx0_defconfig
ia64 zx1_defconfig
arm lpc18xx_defconfig
sh titan_defconfig
mips tb0226_defconfig
nios2 allyesconfig
parisc generic-64bit_defconfig
powerpc sbc8548_defconfig
powerpc skiroot_defconfig
sh lboxre2_defconfig
powerpc motionpro_defconfig
mips jazz_defconfig
powerpc cell_defconfig
ia64 alldefconfig
powerpc tqm5200_defconfig
arc haps_hs_defconfig
arm spitz_defconfig
arm exynos_defconfig
arm aspeed_g4_defconfig
arm hackkit_defconfig
sparc sparc64_defconfig
powerpc wii_defconfig
arm versatile_defconfig
arc nsimosci_hs_defconfig
mips maltasmvp_eva_defconfig
arm trizeps4_defconfig
arm spear3xx_defconfig
nds32 alldefconfig
powerpc fsp2_defconfig
arm ep93xx_defconfig
xtensa audio_kc705_defconfig
sh hp6xx_defconfig
powerpc amigaone_defconfig
powerpc mpc834x_itxgp_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
csky defconfig
alpha defconfig
alpha allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
xtensa allyesconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 tinyconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a004-20201204
x86_64 randconfig-a006-20201204
x86_64 randconfig-a002-20201204
x86_64 randconfig-a001-20201204
x86_64 randconfig-a005-20201204
x86_64 randconfig-a003-20201204
i386 randconfig-a005-20201204
i386 randconfig-a004-20201204
i386 randconfig-a001-20201204
i386 randconfig-a002-20201204
i386 randconfig-a006-20201204
i386 randconfig-a003-20201204
i386 randconfig-a014-20201204
i386 randconfig-a013-20201204
i386 randconfig-a011-20201204
i386 randconfig-a015-20201204
i386 randconfig-a012-20201204
i386 randconfig-a016-20201204
riscv nommu_k210_defconfig
riscv allyesconfig
riscv nommu_virt_defconfig
riscv allnoconfig
riscv defconfig
riscv rv32_defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
clang tested configs:
x86_64 randconfig-a016-20201204
x86_64 randconfig-a012-20201204
x86_64 randconfig-a014-20201204
x86_64 randconfig-a013-20201204
x86_64 randconfig-a015-20201204
x86_64 randconfig-a011-20201204
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH] macintosh/adb-iop: Always wait for reply message from IOP
From: Finn Thain @ 2020-12-05 3:09 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: linuxppc-dev, Linux Kernel Mailing List, Joshua Thompson
In-Reply-To: <CAMuHMdX5yUaCWYsM7WgatYSDLZMcSckugOQxBBnBZOB_eJm=1g@mail.gmail.com>
On Fri, 4 Dec 2020, Geert Uytterhoeven wrote:
> Hi Finn,
>
> On Fri, Nov 20, 2020 at 5:54 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> > A recent patch incorrectly altered the adb-iop state machine behaviour
> > and introduced a regression that can appear intermittently as a
> > malfunctioning ADB input device. This seems to be caused when reply
> > packets from different ADB commands become mixed up, especially during
> > the adb bus scan. Fix this by unconditionally entering the awaiting_reply
> > state after sending an explicit command, even when the ADB command won't
> > generate a reply from the ADB device.
> >
> > Cc: Joshua Thompson <funaho@jurai.org>
> > Fixes: e2954e5f727f ("macintosh/adb-iop: Implement sending -> idle state transition")
> > Tested-by: Stan Johnson <userm57@yahoo.com>
> > Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
>
> Thanks for your patch!
>
> > --- a/drivers/macintosh/adb-iop.c
> > +++ b/drivers/macintosh/adb-iop.c
> > @@ -84,10 +84,7 @@ static void adb_iop_complete(struct iop_msg *msg)
> >
> > local_irq_save(flags);
> >
> > - if (current_req->reply_expected)
> > - adb_iop_state = awaiting_reply;
> > - else
> > - adb_iop_done();
> > + adb_iop_state = awaiting_reply;
> >
> > local_irq_restore(flags);
> > }
> > @@ -95,8 +92,9 @@ static void adb_iop_complete(struct iop_msg *msg)
> > /*
> > * Listen for ADB messages from the IOP.
> > *
> > - * This will be called when unsolicited messages (usually replies to TALK
> > - * commands or autopoll packets) are received.
> > + * This will be called when unsolicited IOP messages are received.
> > + * These IOP messages can carry ADB autopoll responses and also occur
> > + * after explicit ADB commands.
> > */
> >
> > static void adb_iop_listen(struct iop_msg *msg)
> > @@ -123,8 +121,10 @@ static void adb_iop_listen(struct iop_msg *msg)
> > if (adb_iop_state == awaiting_reply) {
> > struct adb_request *req = current_req;
> >
> > - req->reply_len = amsg->count + 1;
> > - memcpy(req->reply, &amsg->cmd, req->reply_len);
> > + if (req->reply_expected) {
> > + req->reply_len = amsg->count + 1;
> > + memcpy(req->reply, &amsg->cmd, req->reply_len);
> > + }
>
> So if we're not expecting a reply. It's ignored.
> Just wondering: what kind of messages are being dropped?
I believe they were empty, with flags == ADB_IOP_EXPLICIT|ADB_IOP_TIMEOUT.
> If reply packets from different ADB commands become mixed up, they are
> still (expected?) replies to messages we sent before. Why shouldn't we
> depend on receiving the replies?
>
It turns out that the IOP always generates reply messages, even when the
ADB command does not produce a reply packet (e.g. ADB Listen command).
The commit being fixed got that wrong.
So it's not really the ADB reply packets that are being mixed up, it's the
IOP messages that enclose them. The bug goes like this:
1. CPU sends a message to the IOP, expecting no response because this
message contains an ADB Listen command. The ADB command is now considered
complete.
2. CPU sends a second message to the IOP, this time expecting a response
because this message contains an ADB Talk command. This ADB command needs
a reply before it can be completed.
3. adb-iop driver receives an IOP message and assumes that it relates to
the Talk command. It's actually for the previous command. The Talk command
is now considered complete but it gets the wrong reply data.
4. adb-iop driver gets another IOP response message, which contains the
actual reply data for the Talk command, but this is dropped (the driver is
no longer in awaiting_reply state).
Please go ahead and add this analysis to the commit log if you think it
would help.
> >
> > req_done = true;
> > }
>
> Gr{oetje,eeting}s,
>
> Geert
>
>
^ permalink raw reply
* Re: [PATCH] macintosh/adb-iop: Send correct poll command
From: Finn Thain @ 2020-12-05 3:10 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: linuxppc-dev, Linux Kernel Mailing List, Joshua Thompson
In-Reply-To: <CAMuHMdVYf83+y1aUR6HqCgr-CLfWYvbuynpfogLrt3cXA-9_aA@mail.gmail.com>
On Fri, 4 Dec 2020, Geert Uytterhoeven wrote:
> Hi Finn,
>
> On Fri, Nov 20, 2020 at 5:54 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> > The behaviour of the IOP firmware is not well documented but we do know
> > that IOP message reply data can be used to issue new ADB commands.
> > Use the message reply to better control autopoll behaviour by sending
> > a Talk Register 0 command after every ADB response, not unlike the
> > algorithm in the via-macii driver. This poll command is addressed to
> > that device which last received a Talk command (explicit or otherwise).
> >
> > Cc: Joshua Thompson <funaho@jurai.org>
> > Fixes: fa3b5a9929fc ("macintosh/adb-iop: Implement idle -> sending state transition")
>
> WARNING: Unknown commit id 'fa3b5a9929fc', maybe rebased or not pulled?
>
> 32226e817043?
>
Yes, that's the one. I accidentally gave a commit id from one of my
backport branches.
> > Tested-by: Stan Johnson <userm57@yahoo.com>
> > Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
>
> Thanks, will queue in the m68k for-v5.11 branch.
>
Thanks.
> Gr{oetje,eeting}s,
>
> Geert
>
>
^ permalink raw reply
* [PATCH] MAINTAINERS: Update 68k Mac entry
From: Finn Thain @ 2020-12-05 3:46 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: linux-kernel, linux-m68k, linuxppc-dev, Joshua Thompson
Two files under drivers/macintosh are actually m68k-only. I think that
patches for these files should be reviewed in the appropriate forum and
merged via the appropriate tree, rather than falling to the powerpc
maintainers to deal with. Update the "M68K ON APPLE MACINTOSH" section
accordingly.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Joshua Thompson <funaho@jurai.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m68k@lists.linux-m68k.org
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 867157311dc8..e8fa0c9645d6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10322,6 +10322,8 @@ L: linux-m68k@lists.linux-m68k.org
S: Maintained
W: http://www.mac.linux-m68k.org/
F: arch/m68k/mac/
+F: drivers/macintosh/adb-iop.c
+F: drivers/macintosh/via-macii.c
M68K ON HP9000/300
M: Philip Blundell <philb@gnu.org>
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings
From: Nicholas Piggin @ 2020-12-05 4:49 UTC (permalink / raw)
To: akpm@linux-foundation.org, linux-mm@kvack.org, Edgecombe, Rick P
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
hch@infradead.org, lizefan@huawei.com,
Jonathan.Cameron@Huawei.com, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2e8e1f3e47736e8f5e749cee85b7036cbf9cb1b5.camel@intel.com>
Excerpts from Edgecombe, Rick P's message of December 5, 2020 4:33 am:
> On Fri, 2020-12-04 at 18:12 +1000, Nicholas Piggin wrote:
>> Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21
>> am:
>> > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote:
>> > > Support huge page vmalloc mappings. Config option
>> > > HAVE_ARCH_HUGE_VMALLOC
>> > > enables support on architectures that define HAVE_ARCH_HUGE_VMAP
>> > > and
>> > > supports PMD sized vmap mappings.
>> > >
>> > > vmalloc will attempt to allocate PMD-sized pages if allocating
>> > > PMD
>> > > size
>> > > or larger, and fall back to small pages if that was unsuccessful.
>> > >
>> > > Allocations that do not use PAGE_KERNEL prot are not permitted to
>> > > use
>> > > huge pages, because not all callers expect this (e.g., module
>> > > allocations vs strict module rwx).
>> >
>> > Several architectures (x86, arm64, others?) allocate modules
>> > initially
>> > with PAGE_KERNEL and so I think this test will not exclude module
>> > allocations in those cases.
>>
>> Ah, thanks. I guess archs must additionally ensure that their
>> PAGE_KERNEL allocations are suitable for huge page mappings before
>> enabling the option.
>>
>> If there is interest from those archs to support this, I have an
>> early (un-posted) patch that adds an explicit VM_HUGE flag that could
>> override the pessemistic arch default. It's not much trouble to add
>> this
>> to the large system hash allocations. It's very out of date now but
>> I
>> can at least give what I have to anyone doing an arch support that
>> wants it.
>
> Ahh, sorry, I totally missed that this was only enabled for powerpc.
>
> That patch might be useful for me actually. Or maybe a VM_NOHUGE, since
> there are only a few places where executable vmallocs are created? I'm
> not sure what the other issues are.
Yeah good question, VM_HUGE might be safer but maybe it would be
possible there's only a few problems that have to be annotated with
VM_NOHUGE, good point. I'll dig it out and see.
> I am endeavoring to have small module allocations share large pages, so
> this infrastructure is a big help already.
> https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/
Oh nice that's what I wanted to do next! We should try get this work
for x86 as well then.
Thanks,
Nick
^ permalink raw reply
* Re: [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting
From: Nicholas Piggin @ 2020-12-05 4:49 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-arch, Nadav Amit, X86 ML, Arnd Bergmann, Jann Horn,
Catalin Marinas, Rik van Riel, LKML, Linux-MM, Dave Hansen,
Will Deacon, Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <D9715BFE-744E-49B4-A10B-32735123BE6D@amacapital.net>
Excerpts from Andy Lutomirski's message of December 5, 2020 12:37 am:
>
>
>> On Dec 3, 2020, at 11:54 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm:
>>> This is a mockup. It's designed to illustrate the algorithm and how the
>>> code might be structured. There are several things blatantly wrong with
>>> it:
>>>
>>> The coding stype is not up to kernel standards. I have prototypes in the
>>> wrong places and other hacks.
>>>
>>> There's a problem with mm_cpumask() not being reliable.
>>
>> Interesting, this might be a way to reduce those IPIs with fairly
>> minimal fast path cost. Would be interesting to see how much performance
>> advantage it has over my dumb simple shoot-lazies.
>
> My real motivation isn’t really performance per se. I think there’s considerable value in keeping the core algorithms the same across all architectures, and I think my approach can manage that with only a single hint from the architecture as to which CPUs to scan.
>
> With shoot-lazies, in contrast, enabling it everywhere would either malfunction or have very poor performance or even DoS issues on arches like arm64 and s390x that don’t track mm_cpumask at all. I’m sure we could come up with some way to mitigate that, but I think that my approach may be better overall for keeping the core code uniform and relatively straightforward.
I'd go the other way. The mm_cpumark, TLB, and lazy maintainence is
different between architectures anyway. I'd keep the simple refcount,
and the pretty simple shoot-lazies approaches for now at least until
a bit more is done on other fronts. If x86 is shooting down lazies on
the final TLB flush as well, then I might be inclined to think that's
the better way to go in the long term. Shoot-lazies would be a bit of
a bolted on hack for powerpc/hash but it has ~zero impact to core code
really.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH 1/2] powerpc: Retire e200 core (mpc555x processor)
From: Scott Wood @ 2020-12-05 5:17 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu>
On Tue, 2020-11-17 at 05:07 +0000, Christophe Leroy wrote:
> There is no defconfig selecting CONFIG_E200, and no platform.
>
> e200 is an earlier version of booke, a predecessor of e500,
> with some particularities like an unified cache instead of both an
> instruction cache and a data cache.
>
> Remove it.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> arch/powerpc/Makefile | 1 -
> arch/powerpc/include/asm/cputable.h | 11 -----
> arch/powerpc/include/asm/mmu.h | 2 +-
> arch/powerpc/include/asm/reg.h | 5 --
> arch/powerpc/include/asm/reg_booke.h | 12 -----
> arch/powerpc/kernel/cpu_setup_fsl_booke.S | 9 ----
> arch/powerpc/kernel/cputable.c | 46 ------------------
> arch/powerpc/kernel/head_booke.h | 3 +-
> arch/powerpc/kernel/head_fsl_booke.S | 57 +----------------------
> arch/powerpc/kernel/setup_32.c | 2 -
> arch/powerpc/kernel/traps.c | 25 ----------
> arch/powerpc/mm/nohash/fsl_booke.c | 12 ++---
> arch/powerpc/platforms/Kconfig.cputype | 13 ++----
> 13 files changed, 11 insertions(+), 187 deletions(-)
Acked-by: Scott Wood <oss@buserror.net>
-Scott
^ permalink raw reply
* Re: [PATCH v2] clk: renesas: r9a06g032: Drop __packed for portability
From: Stephen Boyd @ 2020-12-05 6:24 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Geert Uytterhoeven, Michael Ellerman,
Michael Turquette, Paul Mackerras, Stephen Rothwell
Cc: Geert Uytterhoeven, linux-kernel, Gareth Williams,
linux-renesas-soc, linuxppc-dev, linux-clk
In-Reply-To: <20201130085743.1656317-1-geert+renesas@glider.be>
Quoting Geert Uytterhoeven (2020-11-30 00:57:43)
> The R9A06G032 clock driver uses an array of packed structures to reduce
> kernel size. However, this array contains pointers, which are no longer
> aligned naturally, and cannot be relocated on PPC64. Hence when
> compile-testing this driver on PPC64 with CONFIG_RELOCATABLE=y (e.g.
> PowerPC allyesconfig), the following warnings are produced:
>
> WARNING: 136 bad relocations
> c000000000616be3 R_PPC64_UADDR64 .rodata+0x00000000000cf338
> c000000000616bfe R_PPC64_UADDR64 .rodata+0x00000000000cf370
> ...
>
> Fix this by dropping the __packed attribute from the r9a06g032_clkdesc
> definition, trading a small size increase for portability.
>
> This increases the 156-entry clock table by 1 byte per entry, but due to
> the compiler generating more efficient code for unpacked accesses, the
> net size increase is only 76 bytes (gcc 9.3.0 on arm32).
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Fixes: 4c3d88526eba2143 ("clk: renesas: Renesas R9A06G032 clock driver")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
Acked-by: Stephen Boyd <sboyd@kernel.org>
Unless you want me to pick this up for clk-fixes?
^ permalink raw reply
* [PATCH v9 00/12] huge vmalloc mappings
From: Nicholas Piggin @ 2020-12-05 6:57 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: linux-arch, linux-kernel, Nicholas Piggin, Christoph Hellwig,
Zefan Li, Jonathan Cameron, Rick Edgecombe, linuxppc-dev
Hi Andrew,
A couple of things Rick noticed, he's working on huge module mappings
to help iTLB pressure and seems to think this series will be useful
infrastructure for his work.
I think it finally should be just about ready.
Thanks,
Nick
Since v8:
- Fixed nommu compile.
- Added Kconfig option help text
- Added VM_NOHUGE which should help archs implement it [suggested by Rick]
Since v7:
- Rebase, added some acks, compile fix
- Removed "order=" from vmallocinfo, it's a bit confusing (nr_pages
is in small page size for compatibility).
- Added arch_vmap_pmd_supported() test before starting to allocate
the large page, rather than only testing it when doing the map, to
avoid unsupported configs trying to allocate huge pages for no
reason.
Since v6:
- Fixed a false positive warning introduced in patch 2, found by
kbuild test robot.
Since v5:
- Split arch changes out better and make the constant folding work
- Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c
- Fix compile error on some archs
Since v4:
- Fixed an off-by-page-order bug in v4
- Several minor cleanups.
- Added page order to /proc/vmallocinfo
- Added hugepage to alloc_large_system_hage output.
- Made an architecture config option, powerpc only for now.
Since v3:
- Fixed an off-by-one bug in a loop
- Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail
- Hopefully this time fix the arm64 vmap stack bug, thanks Jonathan
Cameron for debugging the cause of this (hopefully).
Since v2:
- Rebased on vmalloc cleanups, split series into simpler pieces.
- Fixed several compile errors and warnings
- Keep the page array and accounting in small page units because
struct vm_struct is an interface (this should fix x86 vmap stack debug
assert). [Thanks Zefan]
Nicholas Piggin (12):
mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
mm: apply_to_pte_range warn and fail if a large pte is encountered
mm/vmalloc: rename vmap_*_range vmap_pages_*_range
mm/ioremap: rename ioremap_*_range to vmap_*_range
mm: HUGE_VMAP arch support cleanup
powerpc: inline huge vmap supported functions
arm64: inline huge vmap supported functions
x86: inline huge vmap supported functions
mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
mm/vmalloc: add vmap_range_noflush variant
mm/vmalloc: Hugepage vmalloc mappings
powerpc/64s/radix: Enable huge vmalloc mappings
.../admin-guide/kernel-parameters.txt | 2 +
arch/Kconfig | 10 +
arch/arm64/include/asm/vmalloc.h | 25 +
arch/arm64/mm/mmu.c | 26 -
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/vmalloc.h | 21 +
arch/powerpc/kernel/module.c | 13 +-
arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
arch/x86/include/asm/vmalloc.h | 23 +
arch/x86/mm/ioremap.c | 19 -
arch/x86/mm/pgtable.c | 13 -
include/linux/io.h | 9 -
include/linux/vmalloc.h | 27 ++
init/main.c | 1 -
mm/ioremap.c | 225 +--------
mm/memory.c | 66 ++-
mm/page_alloc.c | 5 +-
mm/vmalloc.c | 454 +++++++++++++++---
18 files changed, 564 insertions(+), 397 deletions(-)
--
2.23.0
^ permalink raw reply
* [PATCH v9 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
From: Nicholas Piggin @ 2020-12-05 6:57 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: linux-arch, linux-kernel, Nicholas Piggin, Christoph Hellwig,
Zefan Li, Jonathan Cameron, Rick Edgecombe, linuxppc-dev
In-Reply-To: <20201205065725.1286370-1-npiggin@gmail.com>
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.
[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
mm/vmalloc.c | 41 ++++++++++++++++++++++++++---------------
1 file changed, 26 insertions(+), 15 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a8b210..f85124e88bdb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -34,7 +34,7 @@
#include <linux/bitops.h>
#include <linux/rbtree_augmented.h>
#include <linux/overflow.h>
-
+#include <linux/pgtable.h>
#include <linux/uaccess.h>
#include <asm/tlbflush.h>
#include <asm/shmparam.h>
@@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
}
/*
- * Walk a vmap address to the struct page it maps.
+ * Walk a vmap address to the struct page it maps. Huge vmap mappings will
+ * return the tail page that corresponds to the base page address, which
+ * matches small vmap mappings.
*/
struct page *vmalloc_to_page(const void *vmalloc_addr)
{
@@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pgd_none(*pgd))
return NULL;
+ if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+ return NULL; /* XXX: no allowance for huge pgd */
+ if (WARN_ON_ONCE(pgd_bad(*pgd)))
+ return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
- pud = pud_offset(p4d, addr);
+ if (p4d_leaf(*p4d))
+ return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+ if (WARN_ON_ONCE(p4d_bad(*p4d)))
+ return NULL;
- /*
- * Don't dereference bad PUD or PMD (below) entries. This will also
- * identify huge mappings, which we may encounter on architectures
- * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
- * identified as vmalloc addresses by is_vmalloc_addr(), but are
- * not [unambiguously] associated with a struct page, so there is
- * no correct value to return for them.
- */
- WARN_ON_ONCE(pud_bad(*pud));
- if (pud_none(*pud) || pud_bad(*pud))
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud))
+ return NULL;
+ if (pud_leaf(*pud))
+ return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ if (WARN_ON_ONCE(pud_bad(*pud)))
return NULL;
+
pmd = pmd_offset(pud, addr);
- WARN_ON_ONCE(pmd_bad(*pmd));
- if (pmd_none(*pmd) || pmd_bad(*pmd))
+ if (pmd_none(*pmd))
+ return NULL;
+ if (pmd_leaf(*pmd))
+ return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ if (WARN_ON_ONCE(pmd_bad(*pmd)))
return NULL;
ptep = pte_offset_map(pmd, addr);
@@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pte_present(pte))
page = pte_page(pte);
pte_unmap(ptep);
+
return page;
}
EXPORT_SYMBOL(vmalloc_to_page);
--
2.23.0
^ permalink raw reply related
* [PATCH v9 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered
From: Nicholas Piggin @ 2020-12-05 6:57 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: linux-arch, linux-kernel, Nicholas Piggin, Christoph Hellwig,
Zefan Li, Jonathan Cameron, Rick Edgecombe, linuxppc-dev
In-Reply-To: <20201205065725.1286370-1-npiggin@gmail.com>
apply_to_pte_range might mistake a large pte for bad, or treat it as a
page table, resulting in a crash or corruption. Add a test to warn and
return error if large entries are found.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
mm/memory.c | 66 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 49 insertions(+), 17 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index c48f8df6e502..3d0f0bc5d573 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2429,13 +2429,21 @@ static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
}
do {
next = pmd_addr_end(addr, end);
- if (create || !pmd_none_or_clear_bad(pmd)) {
- err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
- create, mask);
- if (err)
- break;
+ if (pmd_none(*pmd) && !create)
+ continue;
+ if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+ return -EINVAL;
+ if (!pmd_none(*pmd) && WARN_ON_ONCE(pmd_bad(*pmd))) {
+ if (!create)
+ continue;
+ pmd_clear_bad(pmd);
}
+ err = apply_to_pte_range(mm, pmd, addr, next,
+ fn, data, create, mask);
+ if (err)
+ break;
} while (pmd++, addr = next, addr != end);
+
return err;
}
@@ -2457,13 +2465,21 @@ static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d,
}
do {
next = pud_addr_end(addr, end);
- if (create || !pud_none_or_clear_bad(pud)) {
- err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
- create, mask);
- if (err)
- break;
+ if (pud_none(*pud) && !create)
+ continue;
+ if (WARN_ON_ONCE(pud_leaf(*pud)))
+ return -EINVAL;
+ if (!pud_none(*pud) && WARN_ON_ONCE(pud_bad(*pud))) {
+ if (!create)
+ continue;
+ pud_clear_bad(pud);
}
+ err = apply_to_pmd_range(mm, pud, addr, next,
+ fn, data, create, mask);
+ if (err)
+ break;
} while (pud++, addr = next, addr != end);
+
return err;
}
@@ -2485,13 +2501,21 @@ static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd,
}
do {
next = p4d_addr_end(addr, end);
- if (create || !p4d_none_or_clear_bad(p4d)) {
- err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
- create, mask);
- if (err)
- break;
+ if (p4d_none(*p4d) && !create)
+ continue;
+ if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+ return -EINVAL;
+ if (!p4d_none(*p4d) && WARN_ON_ONCE(p4d_bad(*p4d))) {
+ if (!create)
+ continue;
+ p4d_clear_bad(p4d);
}
+ err = apply_to_pud_range(mm, p4d, addr, next,
+ fn, data, create, mask);
+ if (err)
+ break;
} while (p4d++, addr = next, addr != end);
+
return err;
}
@@ -2511,9 +2535,17 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr,
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
- if (!create && pgd_none_or_clear_bad(pgd))
+ if (pgd_none(*pgd) && !create)
continue;
- err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create, &mask);
+ if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+ return -EINVAL;
+ if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
+ if (!create)
+ continue;
+ pgd_clear_bad(pgd);
+ }
+ err = apply_to_p4d_range(mm, pgd, addr, next,
+ fn, data, create, &mask);
if (err)
break;
} while (pgd++, addr = next, addr != end);
--
2.23.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox