LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
From: Christophe Leroy @ 2021-01-14 12:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <1610625340.tqgg2ar1jg.astroid@bobo.none>



Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>>>
>>>
>>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>>> The page fault handling still has some complex logic particularly around
>>>> hash table handling, in asm. Implement this in C instead.
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>>    arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>>    arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>>>    arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>>>    arch/powerpc/mm/fault.c                       |  46 ++++--
>>>>    4 files changed, 107 insertions(+), 148 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>> index 066b1d34c7bc..60a669379aa0 100644
>>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>>    #define HPTE_NOHPTE_UPDATE	0x2
>>>>    #define HPTE_USE_KERNEL_KEY	0x4
>>>>    
>>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>>>    extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>>    			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>>    			  unsigned long flags, int ssize, int subpage_prot);
>>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>>>> index 6e53f7638737..bcb5e81d2088 100644
>>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>>     *
>>>>     * Handling:
>>>>     * - Hash MMU
>>>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>     *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>>>     *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>>>> - *   backed by Linux page tables.
>>>> + *   backed by Linux page table entries.
>>>>     *
>>>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>>>> - *   kernel mode due to user copy operations of course.
>>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>>>> + *   copy operations of course.
>>>>     *
>>>>     *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>>     *   MMU context, which may cause a DSI in the host, which must go to the
>>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>>>    	GEN_COMMON data_access
>>>>    	ld	r4,_DAR(r1)
>>>>    	ld	r5,_DSISR(r1)
>>>
>>> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
>>> - It would be more similar to other arches
>>
>> Other sub-archs?
>>
>>> - Would avoid doing it also in instruction fault
>>
>> True but it's hidden under an unlikely branch so won't really help
>> instruction fault.
>>
>>> - Would avoid that -1 return which looks more like a hack.
>>
>> I don't really see it as a hack, we return a code to asm caller to
>> direct whether to restore registers or not, we alrady have this
>> pattern.
>>
>> (I'm hoping all that might be go away one day by conrolling NV
>> regs from C if we can get good code generation but even if not we
>> still have it in the interrupt returns).
>>
>> That said I will give it a try here. At very least it might be a
>> better intermediate step.
> 
> Ah yes, this way doesn't work well for later patches because you end
> e.g., with the do_break call having to call the interrupt handler
> wrappers again when they actually expect to be in the asm entry state
> (e.g., irq soft-mask state) when called, and return via interrupt_return
> after the exit wrapper runs (which 64s uses to implement better context
> tracking for example).
> 
> That could possibly be hacked up to deal with multiple interrupt
> wrappers per interrupt, but I'd rather not go backwards.
> 
> That does leave the other sub archs as having this issue, but they don't
> do so much in their handlers. 32 doesn't have soft-mask or context
> tracking to deal with for example. We will need to fix this up though
> and unify things more.
> 

Not sure I understand what you mean exactly.

On the 8xx, do_break() is called by totally different exceptions:
- Exception 0x1c00 Data breakpoint ==> do_break()
- Exception 0x1300 Instruction TLB error ==> handle_page_fault()
- Exception 0x1400 Data TLB error ==> handle_page_fault()

On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or 
handle_page_fault() being called from very early in ASM.

If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice, 
is it ?

Christophe

^ permalink raw reply

* Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
From: Christophe Leroy @ 2021-01-14 12:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: linuxppc-dev@ozlabs.org, Sven Van Asbroeck, Linus Walleij,
	linux-kernel@vger.kernel.org, linux-spi
In-Reply-To: <20210114115958.GB4854@sirena.org.uk>



Le 14/01/2021 à 12:59, Mark Brown a écrit :
> On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote:
> 
>> Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.
> 
>> If I declare them as ACTIVE_HIGH instead, then I also have to set
>> spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and
>> forces the GPIO ACTIVE LOW.
> 
>> When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.
> 
> OK, so it sounds like you want SPI_CS_HIGH and that is being set
> correctly?
> 
>> In fsl_spi_chipselect(), we have
>>
>> 	bool pol = spi->mode & SPI_CS_HIGH
>>
>> Then
>> 	pdata->cs_control(spi, pol);
> 
>> So changing the board config is compensated by the above, and at the end it still doesn't work.
> 
> This is a driver bug, the driver set_cs() operation should not be
> modifying the value it is told to set.
> 

A driver bug ? Or maybe a change forgotten in commit  766c6b63aa04 ("spi: fix client driver 
breakages when using GPIO descriptors") ?

I'm almost sure it was not a bug, it is in line which what is said in the comment removed by the 
above mentionned commit.

I'll send a fix.

Christophe

^ permalink raw reply

* Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
From: Michal Suchánek @ 2021-01-14 12:40 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: ro, linuxppc-dev, Hari Bathini
In-Reply-To: <1603082970.5545yt7raj.astroid@bobo.none>

On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> >>> > Michal Suchánek <msuchanek@suse.de> writes:
> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >>> >>> > Hello,
> >>> >>> > 
> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
> >>> >>> > Reimplement book3s idle code in C").
> >>> >>> > 
> >>> >>> > The symptom is host locking up completely after some hours of KVM
> >>> >>> > workload with messages like
> >>> >>> > 
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
> >>> >>> > 
> >>> >>> > printed before the host locks up.
> >>> >>> > 
> >>> >>> > The machines run sandboxed builds which is a mixed workload resulting in
> >>> >>> > IO/single core/mutiple core load over time and there are periods of no
> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >>> >>> > setup/terdown is somewhat excercised as well.
> >>> >>> > 
> >>> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
> >>> >>> > 
> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
> >>> >>> > stable.
> >>> >>> > 
> >>> >>> > Config is attached.
> >>> >>> > 
> >>> >>> > I cannot easily revert this commit, especially if I want to use the same
> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
> >>> >>> > only to the new idle code.
> >>> >>> > 
> >>> >>> > Any idea what can be the problem?
> >>> >>> 
> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
> >>> >>> BMC unfortunately.
> >>> >>
> >>> >> It may be possible to set up fadump with a later kernel version that
> >>> >> supports it on powernv and dump the whole kernel.
> >>> > 
> >>> > Your firmware won't support it AFAIK.
> >>> > 
> >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
> >>> > good chance it won't work :/
> >>> 
> >>> I haven't had any luck yet reproducing this still. Testing with sub 
> >>> cores of various different combinations, etc. I'll keep trying though.
> >> 
> >> Hello,
> >> 
> >> I tried running some KVM guests to simulate the workload and what I get
> >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
> >> kernel and qemu 4.2.1 and 5.1.0
> >> 
> >> To start some guests I run
> >> 
> >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults -nographic -serial mon:telnet::444$i,server,wait & done
> >> 
> >> To simulate some workload I run
> >> 
> >> xz -zc9T0 < /dev/zero > /dev/null &
> >> while true; do
> >>     killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
> >> done &
> >> 
> >> on the host and add a job that executes this to the ramdisk. However, most
> >> guests never get to the point where the job is executed.
> >> 
> >> Any idea what might be the problem?
> > 
> > I would say try without pv queued spin locks (but if the same thing is 
> > happening with 5.3 then it must be something else I guess). 
> > 
> > I'll try to test a similar setup on a POWER8 here.
> 
> Couldn't reproduce the guest hang, they seem to run fine even with 
> queued spinlocks. Might have a different .config.
> 
> I might have got a lockup in the host (although different symptoms than 
> the original report). I'll look into that a bit further.

Hello,

any progress on this?

I considered reinstating the old assembly code for POWER[78] but even
the way it's called has changed slightly.

Thanks

Michal

^ permalink raw reply

* Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
From: Nicholas Piggin @ 2021-01-14 13:08 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: ro, linuxppc-dev, kvm-ppc, Hari Bathini
In-Reply-To: <20210114124023.GL6564@kitsune.suse.cz>

Excerpts from Michal Suchánek's message of January 14, 2021 10:40 pm:
> On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
>> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
>> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
>> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
>> >>> > Michal Suchánek <msuchanek@suse.de> writes:
>> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
>> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
>> >>> >>> > Hello,
>> >>> >>> > 
>> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
>> >>> >>> > Reimplement book3s idle code in C").
>> >>> >>> > 
>> >>> >>> > The symptom is host locking up completely after some hours of KVM
>> >>> >>> > workload with messages like
>> >>> >>> > 
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> >>> > 
>> >>> >>> > printed before the host locks up.
>> >>> >>> > 
>> >>> >>> > The machines run sandboxed builds which is a mixed workload resulting in
>> >>> >>> > IO/single core/mutiple core load over time and there are periods of no
>> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
>> >>> >>> > setup/terdown is somewhat excercised as well.
>> >>> >>> > 
>> >>> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
>> >>> >>> > 
>> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
>> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
>> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
>> >>> >>> > stable.
>> >>> >>> > 
>> >>> >>> > Config is attached.
>> >>> >>> > 
>> >>> >>> > I cannot easily revert this commit, especially if I want to use the same
>> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
>> >>> >>> > only to the new idle code.
>> >>> >>> > 
>> >>> >>> > Any idea what can be the problem?
>> >>> >>> 
>> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
>> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
>> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
>> >>> >>> BMC unfortunately.
>> >>> >>
>> >>> >> It may be possible to set up fadump with a later kernel version that
>> >>> >> supports it on powernv and dump the whole kernel.
>> >>> > 
>> >>> > Your firmware won't support it AFAIK.
>> >>> > 
>> >>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
>> >>> > good chance it won't work :/
>> >>> 
>> >>> I haven't had any luck yet reproducing this still. Testing with sub 
>> >>> cores of various different combinations, etc. I'll keep trying though.
>> >> 
>> >> Hello,
>> >> 
>> >> I tried running some KVM guests to simulate the workload and what I get
>> >> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
>> >> kernel and qemu 4.2.1 and 5.1.0
>> >> 
>> >> To start some guests I run
>> >> 
>> >> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults -nographic -serial mon:telnet::444$i,server,wait & done
>> >> 
>> >> To simulate some workload I run
>> >> 
>> >> xz -zc9T0 < /dev/zero > /dev/null &
>> >> while true; do
>> >>     killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
>> >> done &
>> >> 
>> >> on the host and add a job that executes this to the ramdisk. However, most
>> >> guests never get to the point where the job is executed.
>> >> 
>> >> Any idea what might be the problem?
>> > 
>> > I would say try without pv queued spin locks (but if the same thing is 
>> > happening with 5.3 then it must be something else I guess). 
>> > 
>> > I'll try to test a similar setup on a POWER8 here.
>> 
>> Couldn't reproduce the guest hang, they seem to run fine even with 
>> queued spinlocks. Might have a different .config.
>> 
>> I might have got a lockup in the host (although different symptoms than 
>> the original report). I'll look into that a bit further.
> 
> Hello,
> 
> any progress on this?

No progress, I still wasn't able to reproduce, and it fell off the 
radar sorry.

I expect hwthred_state must be getting corrupted somewhere or a
secondary thread getting stuck but I couldn't see where. I try pick
it up again thanks for the reminder.

Thanks,
Nick

^ permalink raw reply

* [PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
From: Christophe Leroy @ 2021-01-14 13:09 UTC (permalink / raw)
  To: Mark Brown, Linus Walleij, Sven Van Asbroeck
  Cc: linuxppc-dev, linux-kernel, linux-spi

Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
descriptors") broke fsl spi driver.

As now we fully rely on gpiolib for handling the polarity of
chip selects, the driver shall not alter the GPIO value anymore
when SPI_CS_HIGH is not set in spi->mode.

Fixes: 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 drivers/spi/spi-fsl-spi.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 649766808caf..9e297b2715f6 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -115,14 +115,13 @@ static void fsl_spi_chipselect(struct spi_device *spi, int value)
 {
 	struct mpc8xxx_spi *mpc8xxx_spi = spi_master_get_devdata(spi->master);
 	struct fsl_spi_platform_data *pdata;
-	bool pol = spi->mode & SPI_CS_HIGH;
 	struct spi_mpc8xxx_cs	*cs = spi->controller_state;
 
 	pdata = spi->dev.parent->parent->platform_data;
 
 	if (value == BITBANG_CS_INACTIVE) {
 		if (pdata->cs_control)
-			pdata->cs_control(spi, !pol);
+			pdata->cs_control(spi, false);
 	}
 
 	if (value == BITBANG_CS_ACTIVE) {
@@ -134,7 +133,7 @@ static void fsl_spi_chipselect(struct spi_device *spi, int value)
 		fsl_spi_change_mode(spi);
 
 		if (pdata->cs_control)
-			pdata->cs_control(spi, pol);
+			pdata->cs_control(spi, true);
 	}
 }
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
From: Nicholas Piggin @ 2021-01-14 13:17 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <c07a91b3-561e-7197-1498-874281059e20@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:
> 
> 
> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
>> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
>>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>>>>
>>>>
>>>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>>>> The page fault handling still has some complex logic particularly around
>>>>> hash table handling, in asm. Implement this in C instead.
>>>>>
>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>> ---
>>>>>    arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>>>    arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>>>>    arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>>>>    arch/powerpc/mm/fault.c                       |  46 ++++--
>>>>>    4 files changed, 107 insertions(+), 148 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> index 066b1d34c7bc..60a669379aa0 100644
>>>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>>>    #define HPTE_NOHPTE_UPDATE	0x2
>>>>>    #define HPTE_USE_KERNEL_KEY	0x4
>>>>>    
>>>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>>>>    extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>>>    			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>>>    			  unsigned long flags, int ssize, int subpage_prot);
>>>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>>>>> index 6e53f7638737..bcb5e81d2088 100644
>>>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>>>     *
>>>>>     * Handling:
>>>>>     * - Hash MMU
>>>>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>>>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>>>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>>     *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>>>>     *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>>>>> - *   backed by Linux page tables.
>>>>> + *   backed by Linux page table entries.
>>>>>     *
>>>>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>>>>> - *   kernel mode due to user copy operations of course.
>>>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>>>>> + *   copy operations of course.
>>>>>     *
>>>>>     *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>>>     *   MMU context, which may cause a DSI in the host, which must go to the
>>>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>>>>    	GEN_COMMON data_access
>>>>>    	ld	r4,_DAR(r1)
>>>>>    	ld	r5,_DSISR(r1)
>>>>
>>>> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
>>>> - It would be more similar to other arches
>>>
>>> Other sub-archs?
>>>
>>>> - Would avoid doing it also in instruction fault
>>>
>>> True but it's hidden under an unlikely branch so won't really help
>>> instruction fault.
>>>
>>>> - Would avoid that -1 return which looks more like a hack.
>>>
>>> I don't really see it as a hack, we return a code to asm caller to
>>> direct whether to restore registers or not, we alrady have this
>>> pattern.
>>>
>>> (I'm hoping all that might be go away one day by conrolling NV
>>> regs from C if we can get good code generation but even if not we
>>> still have it in the interrupt returns).
>>>
>>> That said I will give it a try here. At very least it might be a
>>> better intermediate step.
>> 
>> Ah yes, this way doesn't work well for later patches because you end
>> e.g., with the do_break call having to call the interrupt handler
>> wrappers again when they actually expect to be in the asm entry state
>> (e.g., irq soft-mask state) when called, and return via interrupt_return
>> after the exit wrapper runs (which 64s uses to implement better context
>> tracking for example).
>> 
>> That could possibly be hacked up to deal with multiple interrupt
>> wrappers per interrupt, but I'd rather not go backwards.
>> 
>> That does leave the other sub archs as having this issue, but they don't
>> do so much in their handlers. 32 doesn't have soft-mask or context
>> tracking to deal with for example. We will need to fix this up though
>> and unify things more.
>> 
> 
> Not sure I understand what you mean exactly.
> 
> On the 8xx, do_break() is called by totally different exceptions:
> - Exception 0x1c00 Data breakpoint ==> do_break()
> - Exception 0x1300 Instruction TLB error ==> handle_page_fault()
> - Exception 0x1400 Data TLB error ==> handle_page_fault()
> 
> On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or 
> handle_page_fault() being called from very early in ASM.
> 
> If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice, 
> is it ?

bad_page_fault is the problem, it has to go afterwards.

Once we have the changed 64s behaviour of do_page_fault, I don't know if 
there is any point leaving do_break in asm is there? I guess it is neat 
to treat it quite separately, I might need to count fast path branches...
I have done the split anyway already, so I will post your way first.

Thanks,
Nick

^ permalink raw reply

* Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
From: Mark Brown @ 2021-01-14 13:22 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev@ozlabs.org, Sven Van Asbroeck, Linus Walleij,
	linux-kernel@vger.kernel.org, linux-spi
In-Reply-To: <006d1594-8eec-3aad-1651-919071e89f3b@csgroup.eu>

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

On Thu, Jan 14, 2021 at 01:33:50PM +0100, Christophe Leroy wrote:
> Le 14/01/2021 à 12:59, Mark Brown a écrit :
> > On Thu, Jan 14, 2021 at 12:27:42PM +0100, Christophe Leroy wrote:

> > > Today I have in the DTS the CS GPIOs declared as ACTIVE_LOW.

> > > If I declare them as ACTIVE_HIGH instead, then I also have to set
> > > spi-cs-high property, otherwise of_gpio_flags_quirks() is not happy and
> > > forces the GPIO ACTIVE LOW.

> > > When I set spi-cs-high property, it sets the SPI_CS_HIGH bit in spi->mode.

> > OK, so it sounds like you want SPI_CS_HIGH and that is being set
> > correctly?

> > > In fsl_spi_chipselect(), we have
> > > 
> > > 	bool pol = spi->mode & SPI_CS_HIGH
> > > 
> > > Then
> > > 	pdata->cs_control(spi, pol);

> > > So changing the board config is compensated by the above, and at the end it still doesn't work.

> > This is a driver bug, the driver set_cs() operation should not be
> > modifying the value it is told to set.

> A driver bug ? Or maybe a change forgotten in commit  766c6b63aa04 ("spi:
> fix client driver breakages when using GPIO descriptors") ?

The expectation that the driver will be using the chip select exactly as
passed in and not attempting to implement SPI_CS_HIGH itself when it has
set_cs() has been there for a considerable time now, that's not new with
the cleanup.  Drivers should only be paying attention to SPI_CS_HIGH in
cases where the hardware controls the chip select autonomously and so
set_cs() can't be provided.

> I'm almost sure it was not a bug, it is in line which what is said in
> the comment removed by the above mentionned commit.

Please take a look at the list archive discussions around this - there's
been a lot of confusion with GPIO descriptors in particular due to there
being multiple places where you can set the inversion.  Note that the
situation you describe above is that you end up with all the various
places other than your driver agreeing that the chip select is active
high as it (AFAICT from what you're saying) actually is.  

For GPIO chipselects you should really fix the driver to just hand the
GPIO off to the core rather than trying to implement this itself, that
will avoid driver specific differences like this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
From: Christophe Leroy @ 2021-01-14 13:28 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <1610629691.3ut6cgruc6.astroid@bobo.none>



Le 14/01/2021 à 14:17, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:
>>
>>
>> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
>>> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
>>>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>>>>>
>>>>>
>>>>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>>>>> The page fault handling still has some complex logic particularly around
>>>>>> hash table handling, in asm. Implement this in C instead.
>>>>>>
>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>> ---
>>>>>>     arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>>>>     arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>>>>>     arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>>>>>     arch/powerpc/mm/fault.c                       |  46 ++++--
>>>>>>     4 files changed, 107 insertions(+), 148 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>>> index 066b1d34c7bc..60a669379aa0 100644
>>>>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>>>>     #define HPTE_NOHPTE_UPDATE	0x2
>>>>>>     #define HPTE_USE_KERNEL_KEY	0x4
>>>>>>     
>>>>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>>>>>     extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>>>>     			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>>>>     			  unsigned long flags, int ssize, int subpage_prot);
>>>>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>>>>>> index 6e53f7638737..bcb5e81d2088 100644
>>>>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>>>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>>>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>>>>      *
>>>>>>      * Handling:
>>>>>>      * - Hash MMU
>>>>>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>>>>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>>>>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>>>      *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>>>>>      *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>>>>>> - *   backed by Linux page tables.
>>>>>> + *   backed by Linux page table entries.
>>>>>>      *
>>>>>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>>>>>> - *   kernel mode due to user copy operations of course.
>>>>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>>>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>>>>>> + *   copy operations of course.
>>>>>>      *
>>>>>>      *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>>>>      *   MMU context, which may cause a DSI in the host, which must go to the
>>>>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>>>>>     	GEN_COMMON data_access
>>>>>>     	ld	r4,_DAR(r1)
>>>>>>     	ld	r5,_DSISR(r1)
>>>>>
>>>>> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
>>>>> - It would be more similar to other arches
>>>>
>>>> Other sub-archs?
>>>>
>>>>> - Would avoid doing it also in instruction fault
>>>>
>>>> True but it's hidden under an unlikely branch so won't really help
>>>> instruction fault.
>>>>
>>>>> - Would avoid that -1 return which looks more like a hack.
>>>>
>>>> I don't really see it as a hack, we return a code to asm caller to
>>>> direct whether to restore registers or not, we alrady have this
>>>> pattern.
>>>>
>>>> (I'm hoping all that might be go away one day by conrolling NV
>>>> regs from C if we can get good code generation but even if not we
>>>> still have it in the interrupt returns).
>>>>
>>>> That said I will give it a try here. At very least it might be a
>>>> better intermediate step.
>>>
>>> Ah yes, this way doesn't work well for later patches because you end
>>> e.g., with the do_break call having to call the interrupt handler
>>> wrappers again when they actually expect to be in the asm entry state
>>> (e.g., irq soft-mask state) when called, and return via interrupt_return
>>> after the exit wrapper runs (which 64s uses to implement better context
>>> tracking for example).
>>>
>>> That could possibly be hacked up to deal with multiple interrupt
>>> wrappers per interrupt, but I'd rather not go backwards.
>>>
>>> That does leave the other sub archs as having this issue, but they don't
>>> do so much in their handlers. 32 doesn't have soft-mask or context
>>> tracking to deal with for example. We will need to fix this up though
>>> and unify things more.
>>>
>>
>> Not sure I understand what you mean exactly.
>>
>> On the 8xx, do_break() is called by totally different exceptions:
>> - Exception 0x1c00 Data breakpoint ==> do_break()
>> - Exception 0x1300 Instruction TLB error ==> handle_page_fault()
>> - Exception 0x1400 Data TLB error ==> handle_page_fault()
>>
>> On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or
>> handle_page_fault() being called from very early in ASM.
>>
>> If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice,
>> is it ?
> 
> bad_page_fault is the problem, it has to go afterwards.
> 
> Once we have the changed 64s behaviour of do_page_fault, I don't know if
> there is any point leaving do_break in asm is there? I guess it is neat
> to treat it quite separately, I might need to count fast path branches...
> I have done the split anyway already, so I will post your way first.
> 

As far as I understand, not taken unlikely branches are costless (at least on book3s/32), so you 
would only suffer the cost of the logical 'and.' on the value of DSISR that you already have in a 
register. Should be in the noise.

bad_page_fault() is not in the fast path anymore since we now handle the exception fixup at the end 
of do_page_fault(). So I think it shouldn't be a concern to call the wrapper again for bad_page_fault()

Christophe

^ permalink raw reply

* Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
From: Christophe Leroy @ 2021-01-14 13:42 UTC (permalink / raw)
  To: Mark Brown
  Cc: linuxppc-dev@ozlabs.org, Sven Van Asbroeck, Linus Walleij,
	linux-kernel@vger.kernel.org, linux-spi
In-Reply-To: <20210114132258.GD4854@sirena.org.uk>



Le 14/01/2021 à 14:22, Mark Brown a écrit :
> 
> For GPIO chipselects you should really fix the driver to just hand the
> GPIO off to the core rather than trying to implement this itself, that
> will avoid driver specific differences like this.
> 

IIUC, it is not trivial as it requires implementing transfer_one() instead of the existing 
transfer_one_message() in the driver. Am I right ?

What's the difference/benefit of transfer_one() compared to the existing transfer_one_message() ?

^ permalink raw reply

* Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
From: Mark Brown @ 2021-01-14 13:59 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev@ozlabs.org, Sven Van Asbroeck, Linus Walleij,
	linux-kernel@vger.kernel.org, linux-spi
In-Reply-To: <adbf508d-ed5a-e06a-4a59-98df0229d7b4@csgroup.eu>

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

On Thu, Jan 14, 2021 at 02:42:26PM +0100, Christophe Leroy wrote:
> Le 14/01/2021 à 14:22, Mark Brown a écrit :

> > For GPIO chipselects you should really fix the driver to just hand the
> > GPIO off to the core rather than trying to implement this itself, that
> > will avoid driver specific differences like this.

> IIUC, it is not trivial as it requires implementing transfer_one() instead
> of the existing transfer_one_message() in the driver. Am I right ?

Yes, that's a good idea in general though.  It should normally be pretty
simple since the conversion is mostly just deleting code doing things
which will be handled by the core.

> What's the difference/benefit of transfer_one() compared to the existing transfer_one_message() ?

It factors out all the handling of chip selects, including per-transfer
chip select inversion and so on, and also factors out all the handling
of in-message delays.  If nothing else it reduces the amount of
duplicated code that might require maintainance can have issues with
misaligned expectations from the core or client drivers.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v5 03/21] powerpc: remove arguments from fault handler functions
From: Christophe Leroy @ 2021-01-14 14:12 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-4-npiggin@gmail.com>



Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
> Make mm fault handlers all just take the pt_regs * argument and load
> DAR/DSISR from that. Make those that return a value return long.
> 
> This is done to make the function signatures match other handlers, which
> will help with a future patch to add wrappers. Explicit arguments could
> be added for performance but that would require more wrapper macro
> variants.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/asm-prototypes.h     |  4 ++--
>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
>   arch/powerpc/include/asm/bug.h                |  2 +-
>   arch/powerpc/kernel/entry_32.S                |  6 +-----
>   arch/powerpc/kernel/exceptions-64e.S          |  2 --
>   arch/powerpc/kernel/exceptions-64s.S          | 14 ++------------
>   arch/powerpc/kernel/head_40x.S                | 10 +++++-----
>   arch/powerpc/kernel/head_8xx.S                |  6 +++---
>   arch/powerpc/kernel/head_book3s_32.S          |  5 ++---
>   arch/powerpc/kernel/head_booke.h              |  4 +---
>   arch/powerpc/mm/book3s64/hash_utils.c         |  8 +++++---
>   arch/powerpc/mm/book3s64/slb.c                | 11 +++++++----
>   arch/powerpc/mm/fault.c                       |  7 ++++---
>   13 files changed, 34 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 238eacfda7b0..a32157ce0551 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -277,7 +277,7 @@ reenable_mmu:
>   	 * r3 can be different from GPR3(r1) at this point, r9 and r11
>   	 * contains the old MSR and handler address respectively,
>   	 * r4 & r5 can contain page fault arguments that need to be passed

The line above should be dropped as well (its end on the line below is dropped already)


> -	 * along as well. r0, r6-r8, r12, CCR, CTR, XER etc... are left
> +	 * r0, r4-r8, r12, CCR, CTR, XER etc... are left
>   	 * clobbered as they aren't useful past this point.
>   	 */
>   

Christophe

^ permalink raw reply

* Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool
From: Florian Fainelli @ 2021-01-14 18:52 UTC (permalink / raw)
  To: Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
	mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
	Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	will, Konrad Rzeszutek Wilk, dan.j.williams, linuxppc-dev,
	Rob Herring, boris.ostrovsky, Andy Shevchenko, jgross,
	Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa, iommu,
	xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf29+8Yi93RacsZHr=qYBhQRwqujW6KZVVD=9xPMhpLH5pA@mail.gmail.com>

On 1/14/21 1:08 AM, Claire Chang wrote:
> On Wed, Jan 13, 2021 at 7:48 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> If a device is not behind an IOMMU, we look up the device node and set
>>> up the restricted DMA when the restricted-dma-pool is presented.
>>>
>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>> ---
>>
>> [snip]
>>
>>> +int of_dma_set_restricted_buffer(struct device *dev)
>>> +{
>>> +     struct device_node *node;
>>> +     int count, i;
>>> +
>>> +     if (!dev->of_node)
>>> +             return 0;
>>> +
>>> +     count = of_property_count_elems_of_size(dev->of_node, "memory-region",
>>> +                                             sizeof(phandle));
>>
>> You could have an early check for count < 0, along with an error
>> message, if that is deemed useful.
>>
>>> +     for (i = 0; i < count; i++) {
>>> +             node = of_parse_phandle(dev->of_node, "memory-region", i);
>>> +             if (of_device_is_compatible(node, "restricted-dma-pool"))
>>
>> And you may want to add here an of_device_is_available(node). A platform
>> that provides the Device Tree firmware and try to support multiple
>> different SoCs may try to determine if an IOMMU is present, and if it
>> is, it could be marking the restriced-dma-pool region with a 'status =
>> "disabled"' property, or any variant of that scheme.
> 
> This function is called only when there is no IOMMU present (check in
> drivers/of/device.c). I can still add of_device_is_available(node)
> here if you think it's helpful.

I believe it is, since boot loader can have a shared Device Tree blob
skeleton and do various adaptations based on the chip (that's what we
do) and adding a status property is much simpler than insertion new
nodes are run time.

> 
>>
>>> +                     return of_reserved_mem_device_init_by_idx(
>>> +                             dev, dev->of_node, i);
>>
>> This does not seem to be supporting more than one memory region, did not
>> you want something like instead:
>>
>>                 ret = of_reserved_mem_device_init_by_idx(...);
>>                 if (ret)
>>                         return ret;
>>
> 
> Yes. This implement only supports one restriced-dma-pool memory region
> with the assumption that there is only one memory region with the
> compatible string, restricted-dma-pool, in the dts. IIUC, it's similar
> to shared-dma-pool.

Then if here is such a known limitation it should be both documented and
enforced here, you shouldn ot be iterating over all of the phandles that
you find, stop at the first one and issue a warning if count > 1?
-- 
Florian

^ permalink raw reply

* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Brian King @ 2021-01-14 17:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	james.smart, james.bottomley, brking, linuxppc-dev
In-Reply-To: <20210114012738.GA237540@T590>

On 1/13/21 7:27 PM, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
>> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
>>> On 1/12/21 2:54 PM, Brian King wrote:
>>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>> as well as initial defaults for MQ enablement.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>>>> ---
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>>>>  2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> index ba95438a8912..9200fe49c57e 100644
>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>>>  	.max_sectors = IBMVFC_MAX_SECTORS,
>>>>>  	.shost_attrs = ibmvfc_attrs,
>>>>>  	.track_queue_depth = 1,
>>>>> +	.host_tagset = 1,
>>>>
>>>> This doesn't seem right. You are setting host_tagset, which means you want a
>>>> shared, host wide, tag set for commands. It also means that the total
>>>> queue depth for the host is can_queue. However, it looks like you are allocating
>>>> max_requests events for each sub crq, which means you are over allocating memory.
>>>
>>> With the shared tagset yes the queue depth for the host is can_queue, but this
>>> also implies that the max queue depth for each hw queue is also can_queue. So,
>>> in the worst case that all commands are queued down the same hw queue we need an
>>> event pool with can_queue commands.
>>>
>>>>
>>>> Looking at this closer, we might have bigger problems. There is a host wide
>>>> max number of commands that the VFC host supports, which gets returned on
>>>> NPIV Login. This value can change across a live migration event.
>>>
>>> From what I understand the max commands can only become less.
>>>
>>>>
>>>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>>>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>>>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>>>> to dynamically change the host queue depth once the tag set is setup. 
>>>>
>>>> Unless I'm missing something, our best options appear to either be to implement
>>>> our own host wide busy reference counting, which doesn't sound very good, or
>>>> we need to add some API to block / scsi that allows us to dynamically change
>>>> can_queue.
>>>
>>> Changing can_queue won't do use any good with the shared tagset becasue each
>>> queue still needs to be able to queue can_queue number of commands in the worst
>>> case.
>>
>> The issue I'm trying to highlight here is the following scenario:
>>
>> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
>>
>> 2. On our NPIV login response from the VIOS, we might get a lower value than we
>> initially set in shost->can_queue, so we update it, but nobody ever looks at it
>> again, and we don't have any protection against sending too many commands to the host.
>>
>>
>> Basically, we no longer have any code that ensures we don't send more
>> commands to the VIOS than we are told it supports. According to the architecture,
>> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
>> of a bug on our part.
>>
>> I don't think it was ever clearly defined in the API that a driver can
>> change shost->can_queue after calling scsi_add_host, but up until
>> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
>> it doesn't. 
> 
> Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
> uses .can_queue to create driver tag sbitmap and request pool.
> 
> So even thought without 6eb045e092ef, the updated .can_queue can't work
> as expected because the max driver tag depth has been fixed by blk-mq already.

There are two scenarios here. In the scenario of someone increasing can_queue
after the tag set is allocated, I agree, blk-mq will never take advantage
of this. However, in the scenario of someone *decreasing* can_queue after the
tag set is allocated, prior to 6eb045e092ef, the shost->host_busy code provided
this protection.

> 
> What 6eb045e092ef does is just to remove the double check on max
> host-wide allowed commands because that has been respected by blk-mq
> driver tag allocation already.
> 
>>
>> I started looking through drivers that do this, and so far, it looks like the
>> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
>>
>> We probably need an API that lets us change shost->can_queue dynamically.
> 
> I'd suggest to confirm changing .can_queue is one real usecase.

For ibmvfc, the total number of commands that the scsi host supports is very
much a dynamic value. It can increase and it can decrease. Live migrating
a logical partition from one system to another is the usual cause of
such a capability change. For ibmvfc, at least, this only ever happens
when we've self blocked the host and have sent back all outstanding I/O.

However, looking at other drivers that modify can_queue dynamically, this
doesn't always hold true. Looking at libfc, it looks to dynamically ramp
up and ramp down can_queue based on its ability to handle requests.

There are certainly a number of other drivers that change can_queue
after the tag set has been allocated. Some of these drivers could
likely be changed to avoid doing this, but changing them all will likely
be difficult.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* [PATCH 2/3] tty: vcc: Drop unnecessary if block
From: Uwe Kleine-König @ 2021-01-14 17:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby
  Cc: sparclinux, linuxppc-dev, David S . Miller, linux-kernel
In-Reply-To: <20210114175718.137483-1-u.kleine-koenig@pengutronix.de>

If vcc_probe() succeeded dev_set_drvdata() is called with a non-NULL
value, and if vcc_probe() failed vcc_remove() isn't called.

So there is no way dev_get_drvdata() can return NULL in vcc_remove() and
the check can just go away.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/tty/vcc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c
index 9ffd42e333b8..d9b0dc6deae9 100644
--- a/drivers/tty/vcc.c
+++ b/drivers/tty/vcc.c
@@ -681,9 +681,6 @@ static int vcc_remove(struct vio_dev *vdev)
 {
 	struct vcc_port *port = dev_get_drvdata(&vdev->dev);
 
-	if (!port)
-		return -ENODEV;
-
 	del_timer_sync(&port->rx_timer);
 	del_timer_sync(&port->tx_timer);
 
-- 
2.29.2


^ permalink raw reply related

* [PATCH 0/3] tty: some cleanups in remove functions
From: Uwe Kleine-König @ 2021-01-14 17:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby
  Cc: sparclinux, linuxppc-dev, David S . Miller, linux-kernel

Hello,

while working on changing the prototype of struct vio_driver::remove to
return void I noticed a few exit paths in such callbacks that return an
error code.

This is a bad thing because the return value is ignored (which is the
motivation to make it void) and the corresponding device then ends in
some limbo state.

Luckily for the three offenders here these cases cannot happen and are
simplified accordingly. This then makes the patch that changes the
remove callback's prototype simpler because it only changes prototypes
and drops "return 0"s.

Best regards and thanks for considering,
Uwe Kleine-König

Uwe Kleine-König (3):
  tty: hvcs: Drop unnecessary if block
  tty: vcc: Drop unnecessary if block
  tty: vcc: Drop impossible to hit WARN_ON

 drivers/tty/hvc/hvcs.c |  3 ---
 drivers/tty/vcc.c      | 10 ++--------
 2 files changed, 2 insertions(+), 11 deletions(-)

-- 
2.29.2


^ permalink raw reply

* [PATCH 3/3] tty: vcc: Drop impossible to hit WARN_ON
From: Uwe Kleine-König @ 2021-01-14 17:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby
  Cc: sparclinux, linuxppc-dev, David S . Miller, linux-kernel
In-Reply-To: <20210114175718.137483-1-u.kleine-koenig@pengutronix.de>

vcc_get() returns the port that has provided port->index. As the port that
is about to be removed isn't removed yet this trivially will find this
port. So simplify the call to not assign an identical value to the port
pointer and drop the warning that is never hit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/tty/vcc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/vcc.c b/drivers/tty/vcc.c
index d9b0dc6deae9..e2d6205f83ce 100644
--- a/drivers/tty/vcc.c
+++ b/drivers/tty/vcc.c
@@ -692,12 +692,9 @@ static int vcc_remove(struct vio_dev *vdev)
 		tty_vhangup(port->tty);
 
 	/* Get exclusive reference to VCC, ensures that there are no other
-	 * clients to this port
+	 * clients to this port. This cannot fail.
 	 */
-	port = vcc_get(port->index, true);
-
-	if (WARN_ON(!port))
-		return -ENODEV;
+	vcc_get(port->index, true);
 
 	tty_unregister_device(vcc_tty_driver, port->index);
 
-- 
2.29.2


^ permalink raw reply related

* Re: [PATCH] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
From: Mark Brown @ 2021-01-14 16:48 UTC (permalink / raw)
  To: Linus Walleij, Christophe Leroy, Sven Van Asbroeck
  Cc: linuxppc-dev, linux-kernel, linux-spi
In-Reply-To: <6b51cc2bfbca70d3e9b9da7b7aa4c7a9d793ca0e.1610629002.git.christophe.leroy@csgroup.eu>

On Thu, 14 Jan 2021 13:09:37 +0000 (UTC), Christophe Leroy wrote:
> Commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
> descriptors") broke fsl spi driver.
> 
> As now we fully rely on gpiolib for handling the polarity of
> chip selects, the driver shall not alter the GPIO value anymore
> when SPI_CS_HIGH is not set in spi->mode.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
      commit: 7a2da5d7960a64ee923fe3e31f01a1101052c66f

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
From: Lee Jones @ 2021-01-14 19:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paul Durrant, Kurt Kanzenbach, Alexei Starovoitov,
	Gustavo A. R. Silva, Peter Cammaert, Paul Mackerras,
	Sukadev Bhattiprolu, Wei Liu, Daniel Borkmann, John Fastabend,
	Santiago Leon, xen-devel, Grygorii Strashko, Thomas Falcon,
	Jesper Dangaard Brouer, Jens Osterkamp, Rusty Russell,
	Daris A Nevil, Lijun Pan, Ivan Khoronzhuk, Nicolas Pitre,
	Geoff Levand, netdev, linux-kernel, Erik Stahlman, John Allen,
	Utz Bacher, Dany Madden, bpf, linuxppc-dev, David S. Miller,
	Russell King
In-Reply-To: <20210114091453.30177d20@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Thu, 14 Jan 2021, Jakub Kicinski wrote:

> On Thu, 14 Jan 2021 08:33:49 +0000 Lee Jones wrote:
> > On Wed, 13 Jan 2021, Jakub Kicinski wrote:
> > 
> > > On Wed, 13 Jan 2021 16:41:16 +0000 Lee Jones wrote:  
> > > > Resending the stragglers again.                                                                                  
> > > > 
> > > > This set is part of a larger effort attempting to clean-up W=1                                                   
> > > > kernel builds, which are currently overwhelmingly riddled with                                                   
> > > > niggly little warnings.                                                                                          
> > > >                                                                                                                  
> > > > v2:                                                                                                              
> > > >  - Squashed IBM patches                                                                                      
> > > >  - Fixed real issue in SMSC
> > > >  - Added Andrew's Reviewed-by tags on remainder  
> > > 
> > > Does not apply, please rebase on net-next/master.  
> > 
> > These are based on Tuesday's next/master.
> 
> What's next/master?

I'm not sure if this is a joke, or not? :)

next/master == Linux Next.  The daily merged repo where all of the
*-next branches end up to ensure interoperability.  It's also the
branch that is most heavily tested by the auto-builders to ensure the
vast majority of issues are ironed out before hitting Mainline.

> This is net-next:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/

Looks like net-next gets merged into next/master:

commit 452958f1f3d1c8980a8414f9c37c8c6de24c7d32
Merge: 1eabba209a17a f50e2f9f79164
Author: Stephen Rothwell <sfr@canb.auug.org.au>
Date:   Thu Jan 14 10:35:40 2021 +1100

    Merge remote-tracking branch 'net-next/master'

So I'm not sure what it's conflicting with.

Do you have patches in net-next that didn't make it into next/master
for some reason?

I'll try to rebase again tomorrow.

Hopefully I am able to reproduce your issue by then.

> > I just rebased them now with no issue.
> > 
> > What conflict are you seeing?
> 
> Applying: net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
> error: patch failed: drivers/net/ethernet/smsc/smc91x.c:2192
> error: drivers/net/ethernet/smsc/smc91x.c: patch does not apply
> Patch failed at 0001 net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* [PATCH v5 02/21] ibmvfc: move event pool init/free routines
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

The next patch in this series reworks the event pool allocation calls to
happen within the individual queue allocation routines instead of as
independent calls.

Move the init/free routines earlier in ibmvfc.c to prevent undefined
reference errors when calling these functions from the queue allocation
code. No functional change.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 151 +++++++++++++++++----------------
 1 file changed, 76 insertions(+), 75 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 9200fe49c57e..cd9273a5fadb 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -716,6 +716,82 @@ static int ibmvfc_send_crq_init_complete(struct ibmvfc_host *vhost)
 	return ibmvfc_send_crq(vhost, 0xC002000000000000LL, 0);
 }
 
+/**
+ * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host
+ * @vhost:	ibmvfc host who owns the event pool
+ *
+ * Returns zero on success.
+ **/
+static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
+				  struct ibmvfc_queue *queue)
+{
+	int i;
+	struct ibmvfc_event_pool *pool = &queue->evt_pool;
+
+	ENTER;
+	pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
+	pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
+	if (!pool->events)
+		return -ENOMEM;
+
+	pool->iu_storage = dma_alloc_coherent(vhost->dev,
+					      pool->size * sizeof(*pool->iu_storage),
+					      &pool->iu_token, 0);
+
+	if (!pool->iu_storage) {
+		kfree(pool->events);
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&queue->sent);
+	INIT_LIST_HEAD(&queue->free);
+	spin_lock_init(&queue->l_lock);
+
+	for (i = 0; i < pool->size; ++i) {
+		struct ibmvfc_event *evt = &pool->events[i];
+
+		atomic_set(&evt->free, 1);
+		evt->crq.valid = 0x80;
+		evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i));
+		evt->xfer_iu = pool->iu_storage + i;
+		evt->vhost = vhost;
+		evt->queue = queue;
+		evt->ext_list = NULL;
+		list_add_tail(&evt->queue_list, &queue->free);
+	}
+
+	LEAVE;
+	return 0;
+}
+
+/**
+ * ibmvfc_free_event_pool - Frees memory of the event pool of a host
+ * @vhost:	ibmvfc host who owns the event pool
+ *
+ **/
+static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost,
+				   struct ibmvfc_queue *queue)
+{
+	int i;
+	struct ibmvfc_event_pool *pool = &queue->evt_pool;
+
+	ENTER;
+	for (i = 0; i < pool->size; ++i) {
+		list_del(&pool->events[i].queue_list);
+		BUG_ON(atomic_read(&pool->events[i].free) != 1);
+		if (pool->events[i].ext_list)
+			dma_pool_free(vhost->sg_pool,
+				      pool->events[i].ext_list,
+				      pool->events[i].ext_list_token);
+	}
+
+	kfree(pool->events);
+	dma_free_coherent(vhost->dev,
+			  pool->size * sizeof(*pool->iu_storage),
+			  pool->iu_storage, pool->iu_token);
+	LEAVE;
+}
+
 /**
  * ibmvfc_free_queue - Deallocate queue
  * @vhost:	ibmvfc host struct
@@ -1312,81 +1388,6 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)
 	strncpy(login_info->drc_name, location, IBMVFC_MAX_NAME);
 }
 
-/**
- * ibmvfc_init_event_pool - Allocates and initializes the event pool for a host
- * @vhost:	ibmvfc host who owns the event pool
- *
- * Returns zero on success.
- **/
-static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
-				  struct ibmvfc_queue *queue)
-{
-	int i;
-	struct ibmvfc_event_pool *pool = &queue->evt_pool;
-
-	ENTER;
-	pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
-	pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
-	if (!pool->events)
-		return -ENOMEM;
-
-	pool->iu_storage = dma_alloc_coherent(vhost->dev,
-					      pool->size * sizeof(*pool->iu_storage),
-					      &pool->iu_token, 0);
-
-	if (!pool->iu_storage) {
-		kfree(pool->events);
-		return -ENOMEM;
-	}
-
-	INIT_LIST_HEAD(&queue->sent);
-	INIT_LIST_HEAD(&queue->free);
-	spin_lock_init(&queue->l_lock);
-
-	for (i = 0; i < pool->size; ++i) {
-		struct ibmvfc_event *evt = &pool->events[i];
-		atomic_set(&evt->free, 1);
-		evt->crq.valid = 0x80;
-		evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i));
-		evt->xfer_iu = pool->iu_storage + i;
-		evt->vhost = vhost;
-		evt->queue = queue;
-		evt->ext_list = NULL;
-		list_add_tail(&evt->queue_list, &queue->free);
-	}
-
-	LEAVE;
-	return 0;
-}
-
-/**
- * ibmvfc_free_event_pool - Frees memory of the event pool of a host
- * @vhost:	ibmvfc host who owns the event pool
- *
- **/
-static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost,
-				   struct ibmvfc_queue *queue)
-{
-	int i;
-	struct ibmvfc_event_pool *pool = &queue->evt_pool;
-
-	ENTER;
-	for (i = 0; i < pool->size; ++i) {
-		list_del(&pool->events[i].queue_list);
-		BUG_ON(atomic_read(&pool->events[i].free) != 1);
-		if (pool->events[i].ext_list)
-			dma_pool_free(vhost->sg_pool,
-				      pool->events[i].ext_list,
-				      pool->events[i].ext_list_token);
-	}
-
-	kfree(pool->events);
-	dma_free_coherent(vhost->dev,
-			  pool->size * sizeof(*pool->iu_storage),
-			  pool->iu_storage, pool->iu_token);
-	LEAVE;
-}
-
 /**
  * ibmvfc_get_event - Gets the next free event in pool
  * @vhost:	ibmvfc host struct
-- 
2.27.0


^ permalink raw reply related

* [PATCH v5 00/21] ibmvfc: initial MQ development/enablement
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
	linuxppc-dev

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels.

This latest series is completely rebased and reimplemented on top of the recent
("ibmvfc: MQ prepartory locking work") series. [1]

[1] https://lore.kernel.org/linux-scsi/20210106201835.1053593-1-tyreld@linux.ibm.com/

changes in v5:
* Addressed comments from brking in following patches:
* Patch 18: Drop queue lock in loop after sending cancel event
	    Remove cancel event from list after completion
	    Return -EIO on unknown failure
* Patch 21: Removed can_queue rebase artifact and range check user supplied
            nr_scsi_hw_queue value

changes in v4:
* Series rebased and reworked on top of previous ibmvfc locking series
* Dropped all previous Reviewed-by tags

changes in v3:
* Patch 4: changed firmware support logging to dev_warn_once
* Patch 6: adjusted locking
* Patch 15: dropped logging verbosity, moved cancel event tracking into subqueue
* Patch 17: removed write permission for migration module parameters
	    drive hard reset after update to num of scsi channels

changes in v2:
* Patch 4: NULL'd scsi_scrq reference after deallocation
* Patch 6: Added switch case to handle XPORT event
* Patch 9: fixed ibmvfc_event leak and double free
* added support for cancel command with MQ
* added parameter toggles for MQ settings

Tyrel Datwyler (21):
  ibmvfc: add vhost fields and defaults for MQ enablement
  ibmvfc: move event pool init/free routines
  ibmvfc: init/free event pool during queue allocation/free
  ibmvfc: add size parameter to ibmvfc_init_event_pool
  ibmvfc: define hcall wrapper for registering a Sub-CRQ
  ibmvfc: add Subordinate CRQ definitions
  ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
  ibmvfc: add Sub-CRQ IRQ enable/disable routine
  ibmvfc: add handlers to drain and complete Sub-CRQ responses
  ibmvfc: define Sub-CRQ interrupt handler routine
  ibmvfc: map/request irq and register Sub-CRQ interrupt handler
  ibmvfc: implement channel enquiry and setup commands
  ibmvfc: advertise client support for using hardware channels
  ibmvfc: set and track hw queue in ibmvfc_event struct
  ibmvfc: send commands down HW Sub-CRQ when channelized
  ibmvfc: register Sub-CRQ handles with VIOS during channel setup
  ibmvfc: add cancel mad initialization helper
  ibmvfc: send Cancel MAD down each hw scsi channel
  ibmvfc: purge scsi channels after transport loss/reset
  ibmvfc: enable MQ and set reasonable defaults
  ibmvfc: provide modules parameters for MQ settings

 drivers/scsi/ibmvscsi/ibmvfc.c | 917 ++++++++++++++++++++++++++++-----
 drivers/scsi/ibmvscsi/ibmvfc.h |  39 ++
 2 files changed, 828 insertions(+), 128 deletions(-)

-- 
2.27.0


^ permalink raw reply

* [PATCH v5 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
 drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ba95438a8912..9200fe49c57e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
 	.max_sectors = IBMVFC_MAX_SECTORS,
 	.shost_attrs = ibmvfc_attrs,
 	.track_queue_depth = 1,
+	.host_tagset = 1,
 };
 
 /**
@@ -5290,6 +5291,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	shost->max_sectors = IBMVFC_MAX_SECTORS;
 	shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
 	shost->unique_id = shost->host_no;
+	shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1;
 
 	vhost = shost_priv(shost);
 	INIT_LIST_HEAD(&vhost->targets);
@@ -5300,6 +5302,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	vhost->partition_number = -1;
 	vhost->log_level = log_level;
 	vhost->task_set = 1;
+
+	vhost->mq_enabled = IBMVFC_MQ;
+	vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+	vhost->using_channels = 0;
+	vhost->do_enquiry = 1;
+
 	strcpy(vhost->partition_name, "UNKNOWN");
 	init_waitqueue_head(&vhost->work_wait_q);
 	init_waitqueue_head(&vhost->init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 632e977449c5..dd6d89292867 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,6 +41,11 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL	2
 #define IBMVFC_MAX_CDB_LEN		16
 #define IBMVFC_CLS3_ERROR		0
+#define IBMVFC_MQ			0
+#define IBMVFC_SCSI_CHANNELS		0
+#define IBMVFC_SCSI_HW_QUEUES		1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ	0
+#define IBMVFC_MIG_NO_N_TO_M		0
 
 /*
  * Ensure we have resources for ERP and initialization:
@@ -840,6 +845,10 @@ struct ibmvfc_host {
 	int delay_init;
 	int scan_complete;
 	int logged_in;
+	int mq_enabled;
+	int using_channels;
+	int do_enquiry;
+	int client_scsi_channels;
 	int aborting_passthru;
 	int events_to_log;
 #define IBMVFC_AE_LINKUP	0x0001
-- 
2.27.0


^ permalink raw reply related

* [PATCH v5 06/21] ibmvfc: add Subordinate CRQ definitions
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

Subordinate Command Response Queues (Sub CRQ) are used in conjunction
with the primary CRQ when more than one queue is needed by the virtual
IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc
adapters. This feature is a prerequisite for supporting multiple
hardware backed submission queues in the vfc adapter.

The Sub CRQ command element differs from the standard CRQ in that it is
32bytes long as opposed to 16bytes for the latter. Despite this extra
16bytes the ibmvfc protocol will use the original CRQ command element
mapped to the first 16bytes of the Sub CRQ element initially.

Add definitions for the Sub CRQ command element and queue.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index dd6d89292867..b9eed05c165f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -650,6 +650,11 @@ struct ibmvfc_crq {
 	volatile __be64 ioba;
 } __packed __aligned(8);
 
+struct ibmvfc_sub_crq {
+	struct ibmvfc_crq crq;
+	__be64 reserved[2];
+} __packed __aligned(8);
+
 enum ibmvfc_ae_link_state {
 	IBMVFC_AE_LS_LINK_UP		= 0x01,
 	IBMVFC_AE_LS_LINK_BOUNCED	= 0x02,
@@ -761,12 +766,14 @@ struct ibmvfc_event_pool {
 enum ibmvfc_msg_fmt {
 	IBMVFC_CRQ_FMT = 0,
 	IBMVFC_ASYNC_FMT,
+	IBMVFC_SUB_CRQ_FMT,
 };
 
 union ibmvfc_msgs {
 	void *handle;
 	struct ibmvfc_crq *crq;
 	struct ibmvfc_async_crq *async;
+	struct ibmvfc_sub_crq *scrq;
 };
 
 struct ibmvfc_queue {
@@ -781,6 +788,20 @@ struct ibmvfc_queue {
 	struct list_head sent;
 	struct list_head free;
 	spinlock_t l_lock;
+
+	/* Sub-CRQ fields */
+	struct ibmvfc_host *vhost;
+	unsigned long cookie;
+	unsigned long vios_cookie;
+	unsigned long hw_irq;
+	unsigned long irq;
+	unsigned long hwq_id;
+	char name[32];
+};
+
+struct ibmvfc_scsi_channels {
+	struct ibmvfc_queue *scrqs;
+	unsigned int active_queues;
 };
 
 enum ibmvfc_host_action {
-- 
2.27.0


^ permalink raw reply related

* [PATCH v5 04/21] ibmvfc: add size parameter to ibmvfc_init_event_pool
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

With the upcoming addition of Sub-CRQs the event pool size may vary
per-queue.

Add a size parameter to ibmvfc_init_event_pool such that different size
event pools can be requested by ibmvfc_alloc_queue.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 9330f5a65a7e..524e81164d70 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -723,19 +723,23 @@ static int ibmvfc_send_crq_init_complete(struct ibmvfc_host *vhost)
  * Returns zero on success.
  **/
 static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
-				  struct ibmvfc_queue *queue)
+				  struct ibmvfc_queue *queue,
+				  unsigned int size)
 {
 	int i;
 	struct ibmvfc_event_pool *pool = &queue->evt_pool;
 
 	ENTER;
-	pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
-	pool->events = kcalloc(pool->size, sizeof(*pool->events), GFP_KERNEL);
+	if (!size)
+		return 0;
+
+	pool->size = size;
+	pool->events = kcalloc(size, sizeof(*pool->events), GFP_KERNEL);
 	if (!pool->events)
 		return -ENOMEM;
 
 	pool->iu_storage = dma_alloc_coherent(vhost->dev,
-					      pool->size * sizeof(*pool->iu_storage),
+					      size * sizeof(*pool->iu_storage),
 					      &pool->iu_token, 0);
 
 	if (!pool->iu_storage) {
@@ -747,7 +751,7 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
 	INIT_LIST_HEAD(&queue->free);
 	spin_lock_init(&queue->l_lock);
 
-	for (i = 0; i < pool->size; ++i) {
+	for (i = 0; i < size; ++i) {
 		struct ibmvfc_event *evt = &pool->events[i];
 
 		atomic_set(&evt->free, 1);
@@ -5013,6 +5017,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
 {
 	struct device *dev = vhost->dev;
 	size_t fmt_size;
+	unsigned int pool_size = 0;
 
 	ENTER;
 	spin_lock_init(&queue->_lock);
@@ -5021,10 +5026,7 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
 	switch (fmt) {
 	case IBMVFC_CRQ_FMT:
 		fmt_size = sizeof(*queue->msgs.crq);
-		if (ibmvfc_init_event_pool(vhost, queue)) {
-			dev_err(dev, "Couldn't initialize event pool.\n");
-			return -ENOMEM;
-		}
+		pool_size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
 		break;
 	case IBMVFC_ASYNC_FMT:
 		fmt_size = sizeof(*queue->msgs.async);
@@ -5034,6 +5036,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
 		return -EINVAL;
 	}
 
+	if (ibmvfc_init_event_pool(vhost, queue, pool_size)) {
+		dev_err(dev, "Couldn't initialize event pool.\n");
+		return -ENOMEM;
+	}
+
 	queue->msgs.handle = (void *)get_zeroed_page(GFP_KERNEL);
 	if (!queue->msgs.handle)
 		return -ENOMEM;
-- 
2.27.0


^ permalink raw reply related

* [PATCH v5 07/21] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.

Also, provide deallocation cleanup routines.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 125 +++++++++++++++++++++++++++++++++
 drivers/scsi/ibmvscsi/ibmvfc.h |   1 +
 2 files changed, 126 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 612c7f3d7bd3..a198e118887d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -895,6 +895,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
 	unsigned long flags;
 	struct vio_dev *vdev = to_vio_dev(vhost->dev);
 	struct ibmvfc_queue *crq = &vhost->crq;
+	struct ibmvfc_queue *scrq;
+	int i;
 
 	/* Close the CRQ */
 	do {
@@ -912,6 +914,16 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
 	memset(crq->msgs.crq, 0, PAGE_SIZE);
 	crq->cur = 0;
 
+	if (vhost->scsi_scrqs.scrqs) {
+		for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+			scrq = &vhost->scsi_scrqs.scrqs[i];
+			spin_lock(scrq->q_lock);
+			memset(scrq->msgs.scrq, 0, PAGE_SIZE);
+			scrq->cur = 0;
+			spin_unlock(scrq->q_lock);
+		}
+	}
+
 	/* And re-open it again */
 	rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
 				crq->msg_token, PAGE_SIZE);
@@ -5045,6 +5057,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
 	case IBMVFC_ASYNC_FMT:
 		fmt_size = sizeof(*queue->msgs.async);
 		break;
+	case IBMVFC_SUB_CRQ_FMT:
+		fmt_size = sizeof(*queue->msgs.scrq);
+		/* We need one extra event for Cancel Commands */
+		pool_size = max_requests + 1;
+		break;
 	default:
 		dev_warn(dev, "Unknown command/response queue message format: %d\n", fmt);
 		return -EINVAL;
@@ -5136,6 +5153,107 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
 	return retrc;
 }
 
+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+				  int index)
+{
+	struct device *dev = vhost->dev;
+	struct vio_dev *vdev = to_vio_dev(dev);
+	struct ibmvfc_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+	int rc = -ENOMEM;
+
+	ENTER;
+
+	if (ibmvfc_alloc_queue(vhost, scrq, IBMVFC_SUB_CRQ_FMT))
+		return -ENOMEM;
+
+	rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+			   &scrq->cookie, &scrq->hw_irq);
+
+	if (rc) {
+		dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+		if (rc == H_PARAMETER)
+			dev_warn_once(dev, "Firmware may not support MQ\n");
+		goto reg_failed;
+	}
+
+	scrq->hwq_id = index;
+	scrq->vhost = vhost;
+
+	LEAVE;
+	return 0;
+
+reg_failed:
+	ibmvfc_free_queue(vhost, scrq);
+	LEAVE;
+	return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index)
+{
+	struct device *dev = vhost->dev;
+	struct vio_dev *vdev = to_vio_dev(dev);
+	struct ibmvfc_queue *scrq = &vhost->scsi_scrqs.scrqs[index];
+	long rc;
+
+	ENTER;
+
+	do {
+		rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+					scrq->cookie);
+	} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+	if (rc)
+		dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+	ibmvfc_free_queue(vhost, scrq);
+	LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+	int i, j;
+
+	ENTER;
+
+	vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
+					  sizeof(*vhost->scsi_scrqs.scrqs),
+					  GFP_KERNEL);
+	if (!vhost->scsi_scrqs.scrqs)
+		return -1;
+
+	for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
+		if (ibmvfc_register_scsi_channel(vhost, i)) {
+			for (j = i; j > 0; j--)
+				ibmvfc_deregister_scsi_channel(vhost, j - 1);
+			kfree(vhost->scsi_scrqs.scrqs);
+			vhost->scsi_scrqs.scrqs = NULL;
+			vhost->scsi_scrqs.active_queues = 0;
+			LEAVE;
+			return -1;
+		}
+	}
+
+	LEAVE;
+	return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
+{
+	int i;
+
+	ENTER;
+	if (!vhost->scsi_scrqs.scrqs)
+		return;
+
+	for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
+		ibmvfc_deregister_scsi_channel(vhost, i);
+
+	kfree(vhost->scsi_scrqs.scrqs);
+	vhost->scsi_scrqs.scrqs = NULL;
+	vhost->scsi_scrqs.active_queues = 0;
+	LEAVE;
+}
+
 /**
  * ibmvfc_free_mem - Free memory for vhost
  * @vhost:	ibmvfc host struct
@@ -5371,6 +5489,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 		goto remove_shost;
 	}
 
+	if (vhost->mq_enabled) {
+		rc = ibmvfc_init_sub_crqs(vhost);
+		if (rc)
+			dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", rc);
+	}
+
 	if (shost_to_fc_host(shost)->rqst_q)
 		blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
 	dev_set_drvdata(dev, vhost);
@@ -5427,6 +5551,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
 	list_splice_init(&vhost->purge, &purge);
 	spin_unlock_irqrestore(vhost->host->host_lock, flags);
 	ibmvfc_complete_purge(&purge);
+	ibmvfc_release_sub_crqs(vhost);
 	ibmvfc_release_crq_queue(vhost);
 
 	ibmvfc_free_mem(vhost);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index b9eed05c165f..bdafe9956649 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -850,6 +850,7 @@ struct ibmvfc_host {
 	mempool_t *tgt_pool;
 	struct ibmvfc_queue crq;
 	struct ibmvfc_queue async_crq;
+	struct ibmvfc_scsi_channels scsi_scrqs;
 	struct ibmvfc_npiv_login login_info;
 	union ibmvfc_npiv_login_data *login_buf;
 	dma_addr_t login_buf_dma;
-- 
2.27.0


^ permalink raw reply related

* [PATCH v5 08/21] ibmvfc: add Sub-CRQ IRQ enable/disable routine
From: Tyrel Datwyler @ 2021-01-14 20:31 UTC (permalink / raw)
  To: james.bottomley
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	Brian King, brking, linuxppc-dev
In-Reply-To: <20210114203148.246656-1-tyreld@linux.ibm.com>

Each Sub-CRQ has its own interrupt. A hypercall is required to toggle
the IRQ state. Provide the necessary mechanism via a helper function.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a198e118887d..5d7ada0ed0d6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3465,6 +3465,26 @@ static void ibmvfc_tasklet(void *data)
 	}
 }
 
+static int ibmvfc_toggle_scrq_irq(struct ibmvfc_queue *scrq, int enable)
+{
+	struct device *dev = scrq->vhost->dev;
+	struct vio_dev *vdev = to_vio_dev(dev);
+	unsigned long rc;
+	int irq_action = H_ENABLE_VIO_INTERRUPT;
+
+	if (!enable)
+		irq_action = H_DISABLE_VIO_INTERRUPT;
+
+	rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action,
+				scrq->hw_irq, 0, 0);
+
+	if (rc)
+		dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n",
+			enable ? "enable" : "disable", scrq->hwq_id, rc);
+
+	return rc;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:		ibmvfc target struct
-- 
2.27.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox