* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Christoph Hellwig @ 2019-05-31 16:33 UTC (permalink / raw)
To: David Miller
Cc: madalin.bucur, netdev, roy.pledge, linux-kernel, leoyang.li,
iommu, camelia.groza, linuxppc-dev, linux-arm-kernel,
laurentiu.tudor
In-Reply-To: <20190530.150844.1826796344374758568.davem@davemloft.net>
On Thu, May 30, 2019 at 03:08:44PM -0700, David Miller wrote:
> From: laurentiu.tudor@nxp.com
> Date: Thu, 30 May 2019 17:19:45 +0300
>
> > Depends on this pull request:
> >
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2019-May/653554.html
>
> I'm not sure how you want me to handle this.
The thing needs to be completely redone as it abuses parts of the
iommu API in a completely unacceptable way.
^ permalink raw reply
* RE: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Laurentiu Tudor @ 2019-05-31 16:46 UTC (permalink / raw)
To: Andreas Färber
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
Mian Yousaf Kaukab, linuxppc-dev@lists.ozlabs.org,
davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <d086216f-f3fc-c88a-3891-81e84e8bdb01@suse.de>
Hello Andreas,
> -----Original Message-----
> From: Andreas Färber <afaerber@suse.de>
> Sent: Friday, May 31, 2019 7:15 PM
>
> Hi Laurentiu,
>
> Am 30.05.19 um 16:19 schrieb laurentiu.tudor@nxp.com:
> > This patch series contains several fixes in preparation for SMMU
> > support on NXP LS1043A and LS1046A chips. Once these get picked up,
> > I'll submit the actual SMMU enablement patches consisting in the
> > required device tree changes.
>
> Have you thought through what will happen if this patch ordering is not
> preserved? In particular, a user installing a future U-Boot update with
> the DTB bits but booting a stable kernel without this patch series -
> wouldn't that regress dpaa then for our customers?
>
These are fixes for issues that popped out after enabling SMMU.
I do not expect them to break anything.
---
Best Regards, Laurentiu
^ permalink raw reply
* RE: [PATCH v3 5/6] dpaa_eth: fix iova handling for contiguous frames
From: Laurentiu Tudor @ 2019-05-31 16:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
linuxppc-dev@lists.ozlabs.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190531163229.GA8708@infradead.org>
Hi Christoph,
> -----Original Message-----
> From: Christoph Hellwig <hch@infradead.org>
> Sent: Friday, May 31, 2019 7:32 PM
>
> On Thu, May 30, 2019 at 05:19:50PM +0300, laurentiu.tudor@nxp.com wrote:
> > +static phys_addr_t dpaa_iova_to_phys(const struct dpaa_priv *priv,
> > + dma_addr_t addr)
> > +{
> > + return priv->domain ? iommu_iova_to_phys(priv->domain, addr) : addr;
> > +}
>
> Again, a driver using the iommu API must not call iommu_* APIs.
>
> This chane is not acceptable.
Unfortunately due to our hardware particularities we do not have alternatives. This is also the case for our next generation of ethernet drivers [1]. I'll let my colleagues that work on the ethernet drivers to comment more on this.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c#n37
---
Best Regards, Laurentiu
^ permalink raw reply
* Re: [PATCH v3 5/6] dpaa_eth: fix iova handling for contiguous frames
From: Christoph Hellwig @ 2019-05-31 16:55 UTC (permalink / raw)
To: Laurentiu Tudor
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li, Christoph Hellwig,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
linuxppc-dev@lists.ozlabs.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR04MB5134F5E31B993B2DC5275BB3EC190@VI1PR04MB5134.eurprd04.prod.outlook.com>
On Fri, May 31, 2019 at 04:53:16PM +0000, Laurentiu Tudor wrote:
> Unfortunately due to our hardware particularities we do not have alternatives. This is also the case for our next generation of ethernet drivers [1]. I'll let my colleagues that work on the ethernet drivers to comment more on this.
Then you need to enhance the DMA API to support your use case instead
of using an API only supported for two specific IOMMU implementations.
Remember in Linux you can should improve core code and not hack around
it in crappy ways making lots of assumptions in your drivers.
^ permalink raw reply
* RE: [PATCH v3 5/6] dpaa_eth: fix iova handling for contiguous frames
From: Laurentiu Tudor @ 2019-05-31 17:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
linuxppc-dev@lists.ozlabs.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190531165530.GA16487@infradead.org>
> -----Original Message-----
> From: Christoph Hellwig <hch@infradead.org>
> Sent: Friday, May 31, 2019 7:56 PM
>
> On Fri, May 31, 2019 at 04:53:16PM +0000, Laurentiu Tudor wrote:
> > Unfortunately due to our hardware particularities we do not have
> alternatives. This is also the case for our next generation of ethernet
> drivers [1]. I'll let my colleagues that work on the ethernet drivers to
> comment more on this.
>
> Then you need to enhance the DMA API to support your use case instead
> of using an API only supported for two specific IOMMU implementations.
>
> Remember in Linux you can should improve core code and not hack around
> it in crappy ways making lots of assumptions in your drivers.
Alright, I'm ok with that. I'll try to come up with something, will keep you in the loop.
---
Best Regards, Laurentiu
^ permalink raw reply
* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Robin Murphy @ 2019-05-31 17:03 UTC (permalink / raw)
To: Christoph Hellwig, David Miller
Cc: madalin.bucur, netdev, roy.pledge, linux-kernel, leoyang.li,
iommu, camelia.groza, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20190531163350.GB8708@infradead.org>
On 31/05/2019 17:33, Christoph Hellwig wrote:
> On Thu, May 30, 2019 at 03:08:44PM -0700, David Miller wrote:
>> From: laurentiu.tudor@nxp.com
>> Date: Thu, 30 May 2019 17:19:45 +0300
>>
>>> Depends on this pull request:
>>>
>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2019-May/653554.html
>>
>> I'm not sure how you want me to handle this.
>
> The thing needs to be completely redone as it abuses parts of the
> iommu API in a completely unacceptable way.
`git grep iommu_iova_to_phys drivers/{crypto,gpu,net}`
:(
I guess one alternative is for the offending drivers to maintain their
own lookup tables of mapped DMA addresses - I think at least some of
these things allow storing some kind of token in a descriptor, which
even if it's not big enough for a virtual address might be sufficient
for an index.
Robin.
^ permalink raw reply
* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Andreas Färber @ 2019-05-31 17:03 UTC (permalink / raw)
To: Laurentiu Tudor
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
Mian Yousaf Kaukab, linuxppc-dev@lists.ozlabs.org,
davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR04MB5134BFA391D8FF013762882FEC190@VI1PR04MB5134.eurprd04.prod.outlook.com>
Hello Laurentiu,
Am 31.05.19 um 18:46 schrieb Laurentiu Tudor:
>> -----Original Message-----
>> From: Andreas Färber <afaerber@suse.de>
>> Sent: Friday, May 31, 2019 7:15 PM
>>
>> Hi Laurentiu,
>>
>> Am 30.05.19 um 16:19 schrieb laurentiu.tudor@nxp.com:
>>> This patch series contains several fixes in preparation for SMMU
>>> support on NXP LS1043A and LS1046A chips. Once these get picked up,
>>> I'll submit the actual SMMU enablement patches consisting in the
>>> required device tree changes.
>>
>> Have you thought through what will happen if this patch ordering is not
>> preserved? In particular, a user installing a future U-Boot update with
>> the DTB bits but booting a stable kernel without this patch series -
>> wouldn't that regress dpaa then for our customers?
>>
>
> These are fixes for issues that popped out after enabling SMMU.
> I do not expect them to break anything.
That was not my question! You're missing my point: All your patches are
lacking a Fixes header in their commit message, for backporting them, to
avoid _your DT patches_ breaking the driver on stable branches!
Regards,
Andreas
--
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
^ permalink raw reply
* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Christoph Hellwig @ 2019-05-31 17:08 UTC (permalink / raw)
To: Robin Murphy
Cc: madalin.bucur, netdev, roy.pledge, linux-kernel, leoyang.li,
Christoph Hellwig, iommu, camelia.groza, linuxppc-dev,
David Miller, linux-arm-kernel
In-Reply-To: <37406608-df48-c7a0-6975-4b4ad408ba36@arm.com>
On Fri, May 31, 2019 at 06:03:30PM +0100, Robin Murphy wrote:
> > The thing needs to be completely redone as it abuses parts of the
> > iommu API in a completely unacceptable way.
>
> `git grep iommu_iova_to_phys drivers/{crypto,gpu,net}`
>
> :(
>
> I guess one alternative is for the offending drivers to maintain their own
> lookup tables of mapped DMA addresses - I think at least some of these
> things allow storing some kind of token in a descriptor, which even if it's
> not big enough for a virtual address might be sufficient for an index.
Well, we'll at least need DMA API wrappers that work on the dma addr
only and hide this madness underneath. And then tell if an given device
supports this and fail the probe otherwise.
^ permalink raw reply
* RE: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Laurentiu Tudor @ 2019-05-31 17:32 UTC (permalink / raw)
To: Andreas Färber
Cc: Madalin-cristian Bucur, netdev@vger.kernel.org, Roy Pledge,
linux-kernel@vger.kernel.org, Leo Li,
iommu@lists.linux-foundation.org, Camelia Alexandra Groza,
Mian Yousaf Kaukab, linuxppc-dev@lists.ozlabs.org,
davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <19cc3230-33b0-e465-6317-590780b33efa@suse.de>
> -----Original Message-----
> From: Andreas Färber <afaerber@suse.de>
> Sent: Friday, May 31, 2019 8:04 PM
>
> Hello Laurentiu,
>
> Am 31.05.19 um 18:46 schrieb Laurentiu Tudor:
> >> -----Original Message-----
> >> From: Andreas Färber <afaerber@suse.de>
> >> Sent: Friday, May 31, 2019 7:15 PM
> >>
> >> Hi Laurentiu,
> >>
> >> Am 30.05.19 um 16:19 schrieb laurentiu.tudor@nxp.com:
> >>> This patch series contains several fixes in preparation for SMMU
> >>> support on NXP LS1043A and LS1046A chips. Once these get picked up,
> >>> I'll submit the actual SMMU enablement patches consisting in the
> >>> required device tree changes.
> >>
> >> Have you thought through what will happen if this patch ordering is not
> >> preserved? In particular, a user installing a future U-Boot update with
> >> the DTB bits but booting a stable kernel without this patch series -
> >> wouldn't that regress dpaa then for our customers?
> >>
> >
> > These are fixes for issues that popped out after enabling SMMU.
> > I do not expect them to break anything.
>
> That was not my question! You're missing my point: All your patches are
> lacking a Fixes header in their commit message, for backporting them, to
> avoid _your DT patches_ breaking the driver on stable branches!
It does appear that I'm missing your point. For sure, the DT updates solely will
break the kernel without these fixes but I'm not sure I understand how this
could happen. My plan was to share the kernel dts patches sometime after this series
makes it through.
---
Best Regards, Laurentiu
^ permalink raw reply
* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Robin Murphy @ 2019-05-31 17:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: madalin.bucur, netdev, roy.pledge, linux-kernel, leoyang.li,
iommu, camelia.groza, linuxppc-dev, David Miller,
linux-arm-kernel
In-Reply-To: <20190531170804.GA12211@infradead.org>
On 31/05/2019 18:08, Christoph Hellwig wrote:
> On Fri, May 31, 2019 at 06:03:30PM +0100, Robin Murphy wrote:
>>> The thing needs to be completely redone as it abuses parts of the
>>> iommu API in a completely unacceptable way.
>>
>> `git grep iommu_iova_to_phys drivers/{crypto,gpu,net}`
>>
>> :(
>>
>> I guess one alternative is for the offending drivers to maintain their own
>> lookup tables of mapped DMA addresses - I think at least some of these
>> things allow storing some kind of token in a descriptor, which even if it's
>> not big enough for a virtual address might be sufficient for an index.
>
> Well, we'll at least need DMA API wrappers that work on the dma addr
> only and hide this madness underneath. And then tell if an given device
> supports this and fail the probe otherwise.
Bleh, I'm certainly not keen on formalising any kind of
dma_to_phys()/dma_to_virt() interface for this. Or are you just
proposing something like dma_unmap_sorry_sir_the_dog_ate_my_homework()
for drivers which have 'lost' the original VA they mapped?
Robin.
^ permalink raw reply
* Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
From: Christoph Hellwig @ 2019-05-31 17:46 UTC (permalink / raw)
To: Robin Murphy
Cc: madalin.bucur, netdev, roy.pledge, linux-kernel, leoyang.li,
Christoph Hellwig, iommu, camelia.groza, linuxppc-dev,
David Miller, linux-arm-kernel
In-Reply-To: <1b81c168-f5e0-f86a-f90e-22e8c3f2a602@arm.com>
On Fri, May 31, 2019 at 06:45:00PM +0100, Robin Murphy wrote:
> Bleh, I'm certainly not keen on formalising any kind of
> dma_to_phys()/dma_to_virt() interface for this. Or are you just proposing
> something like dma_unmap_sorry_sir_the_dog_ate_my_homework() for drivers
> which have 'lost' the original VA they mapped?
Yes, I guess we need that in some form. I've heard a report the IBM
emca ethernet driver has the same issue, and any SOC with it this
totally blows up dma-debug as they just never properly unmap.
^ permalink raw reply
* Re: [RFC] mm: Generalize notify_page_fault()
From: Matthew Wilcox @ 2019-05-31 17:48 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Mark Rutland, Michal Hocko, linux-ia64, linux-sh, Catalin Marinas,
Will Deacon, linux-mm, Paul Mackerras, sparclinux, linux-s390,
Yoshinori Sato, Russell King, Fenghua Yu, Stephen Rothwell,
Andrey Konovalov, linux-arm-kernel, Tony Luck, Heiko Carstens,
linux-kernel, Martin Schwidefsky, Andrew Morton, linuxppc-dev,
David S. Miller
In-Reply-To: <f1995445-d5ab-f292-d26c-809581002184@arm.com>
On Fri, May 31, 2019 at 02:17:43PM +0530, Anshuman Khandual wrote:
> On 05/30/2019 07:09 PM, Matthew Wilcox wrote:
> > On Thu, May 30, 2019 at 05:31:15PM +0530, Anshuman Khandual wrote:
> >> On 05/30/2019 04:36 PM, Matthew Wilcox wrote:
> >>> The two handle preemption differently. Why is x86 wrong and this one
> >>> correct?
> >>
> >> Here it expects context to be already non-preemptible where as the proposed
> >> generic function makes it non-preemptible with a preempt_[disable|enable]()
> >> pair for the required code section, irrespective of it's present state. Is
> >> not this better ?
> >
> > git log -p arch/x86/mm/fault.c
> >
> > search for 'kprobes'.
> >
> > tell me what you think.
>
> Are you referring to these following commits
>
> a980c0ef9f6d ("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()")
> b506a9d08bae ("x86: code clarification patch to Kprobes arch code")
>
> In particular the later one (b506a9d08bae). It explains how the invoking context
> in itself should be non-preemptible for the kprobes processing context irrespective
> of whether kprobe_running() or perhaps smp_processor_id() is safe or not. Hence it
> does not make much sense to continue when original invoking context is preemptible.
> Instead just bail out earlier. This seems to be making more sense than preempt
> disable-enable pair. If there are no concerns about this change from other platforms,
> I will change the preemption behavior in proposed generic function next time around.
Exactly.
So, any of the arch maintainers know of a reason they behave differently
from x86 in this regard? Or can Anshuman use the x86 implementation
for all the architectures supporting kprobes?
^ permalink raw reply
* [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
From: Nathan Chancellor @ 2019-05-31 18:53 UTC (permalink / raw)
To: Tyrel Datwyler, James E.J. Bottomley, Martin K. Petersen
Cc: clang-built-linux, Nathan Chancellor, linuxppc-dev, linux-kernel,
linux-scsi
clang warns:
drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case IBMVSCSI_HOST_ACTION_NONE:
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
here
if (rc) {
^~
Initialize rc to zero so that the atomic_set and dev_err statement don't
trigger for the cases that just break.
Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states")
Link: https://github.com/ClangBuiltLinux/linux/issues/502
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 727c31dc11a0..6714d8043e62 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct vio_dev *vdev)
static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
{
unsigned long flags;
- int rc;
+ int rc = 0;
char *action = "reset";
spin_lock_irqsave(hostdata->host->host_lock, flags);
--
2.22.0.rc2
^ permalink raw reply related
* [Bug 203517] WARNING: inconsistent lock state. inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
From: bugzilla-daemon @ 2019-05-31 21:27 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-203517-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=203517
--- Comment #9 from Erhard F. (erhard_f@mailbox.org) ---
Has it already landed in 5.1 stable? Have not seen it yet.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [next-20190530] Boot failure on PowerPC
From: Dexuan-Linux Cui @ 2019-05-31 18:13 UTC (permalink / raw)
To: Michael Ellerman
Cc: Sachin Sant, Dexuan Cui, linux-next, linuxppc-dev, v-lide
In-Reply-To: <87h89aohnu.fsf@concordia.ellerman.id.au>
On Fri, May 31, 2019 at 6:52 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> > Latest next fails to boot with a kernel panic on POWER9.
> >
> > [ 33.689332] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: write_irq_affinity.isra.5+0x15c/0x160
> > [ 33.689346] CPU: 35 PID: 4907 Comm: irqbalance Not tainted 5.2.0-rc2-next-20190530-autotest-autotest #1
> > [ 33.689352] Call Trace:
> > [ 33.689356] [c0000018d974bab0] [c000000000b5328c] dump_stack+0xb0/0xf4 (unreliable)
> > [ 33.689364] [c0000018d974baf0] [c000000000120694] panic+0x16c/0x408
> > [ 33.689370] [c0000018d974bb80] [c00000000012010c] __stack_chk_fail+0x2c/0x30
> > [ 33.689376] [c0000018d974bbe0] [c0000000001b859c] write_irq_affinity.isra.5+0x15c/0x160
> > [ 33.689383] [c0000018d974bd30] [c0000000004d6f30] proc_reg_write+0x90/0x110
> > [ 33.689388] [c0000018d974bd60] [c00000000041453c] __vfs_write+0x3c/0x70
> > [ 33.689394] [c0000018d974bd80] [c000000000418650] vfs_write+0xd0/0x250
> > [ 33.689399] [c0000018d974bdd0] [c000000000418a2c] ksys_write+0x7c/0x130
> > [ 33.689405] [c0000018d974be20] [c00000000000b688] system_call+0x5c/0x70
> >
> > Machine boots till login prompt and then panics few seconds later.
> >
> > Last known next build was May 24th. Will attempt few builds till May 30 to
> > narrow down this problem.
>
> My CI was fine with next-20190529 (9a15d2e3fd03e3).
>
> cheers
Hi Sachin,
It looks this patch may fix the issue:
https://lkml.org/lkml/2019/5/30/1630 , but I'm not sure.
Thanks
-- Dexuan
^ permalink raw reply
* [PATCH 02/16] mm: use untagged_addr() for get_user_pages_fast addresses
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
This will allow sparc64 to override its ADI tags for
get_user_pages and get_user_pages_fast.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/gup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index f173fcbaf1b2..9775f7675653 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2145,7 +2145,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
unsigned long flags;
int nr = 0;
- start &= PAGE_MASK;
+ start = untagged_addr(start) & PAGE_MASK;
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
@@ -2218,7 +2218,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned long addr, len, end;
int nr = 0, ret = 0;
- start &= PAGE_MASK;
+ start = untagged_addr(start) & PAGE_MASK;
addr = start;
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
--
2.20.1
^ permalink raw reply related
* [PATCH 01/16] uaccess: add untagged_addr definition for other arches
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras,
Catalin Marinas, sparclinux, linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
From: Andrey Konovalov <andreyknvl@google.com>
To allow arm64 syscalls to accept tagged pointers from userspace, we must
untag them when they are passed to the kernel. Since untagging is done in
generic parts of the kernel, the untagged_addr macro needs to be defined
for all architectures.
Define it as a noop for architectures other than arm64.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/mm.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..949d43e9c0b6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly;
#include <asm/pgtable.h>
#include <asm/processor.h>
+#ifndef untagged_addr
+#define untagged_addr(addr) (addr)
+#endif
+
#ifndef __pa_symbol
#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
#endif
--
2.20.1
^ permalink raw reply related
* RFC: switch the remaining architectures to use generic GUP v2
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
Hi Linus and maintainers,
below is a series to switch mips, sh and sparc64 to use the generic
GUP code so that we only have one codebase to touch for further
improvements to this code. I don't have hardware for any of these
architectures, and generally no clue about their page table
management, so handle with care.
Changes since v1:
- fix various issues found by the build bot
- cherry pick and use the untagged_addr helper form Andrey
- add various refactoring patches to share more code over architectures
- move the powerpc hugepd code to mm/gup.c and sync it with the generic
hup semantics
^ permalink raw reply
* [PATCH 06/16] sh: add the missing pud_page definition
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
sh only had pud_page_vaddr, but not pud_page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/sh/include/asm/pgtable-3level.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/sh/include/asm/pgtable-3level.h b/arch/sh/include/asm/pgtable-3level.h
index 7d8587eb65ff..8ff6fb6b4d19 100644
--- a/arch/sh/include/asm/pgtable-3level.h
+++ b/arch/sh/include/asm/pgtable-3level.h
@@ -37,6 +37,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
{
return pud_val(pud);
}
+#define pud_page(pud) virt_to_page((void *)pud_page_vaddr(pud))
#define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
--
2.20.1
^ permalink raw reply related
* [PATCH 03/16] mm: simplify gup_fast_permitted
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
Pass in the already calculated end value instead of recomputing it, and
leave the end > start check in the callers instead of duplicating them
in the arch code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/s390/include/asm/pgtable.h | 8 +-------
arch/x86/include/asm/pgtable_64.h | 8 +-------
mm/gup.c | 17 +++++++----------
3 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 9f0195d5fa16..9b274fcaacb6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1270,14 +1270,8 @@ static inline pte_t *pte_offset(pmd_t *pmd, unsigned long address)
#define pte_offset_map(pmd, address) pte_offset_kernel(pmd, address)
#define pte_unmap(pte) do { } while (0)
-static inline bool gup_fast_permitted(unsigned long start, int nr_pages)
+static inline bool gup_fast_permitted(unsigned long start, unsigned long end)
{
- unsigned long len, end;
-
- len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
- if (end < start)
- return false;
return end <= current->mm->context.asce_limit;
}
#define gup_fast_permitted gup_fast_permitted
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 0bb566315621..4990d26dfc73 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -259,14 +259,8 @@ extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
#define gup_fast_permitted gup_fast_permitted
-static inline bool gup_fast_permitted(unsigned long start, int nr_pages)
+static inline bool gup_fast_permitted(unsigned long start, unsigned long end)
{
- unsigned long len, end;
-
- len = (unsigned long)nr_pages << PAGE_SHIFT;
- end = start + len;
- if (end < start)
- return false;
if (end >> __VIRTUAL_MASK_SHIFT)
return false;
return true;
diff --git a/mm/gup.c b/mm/gup.c
index 9775f7675653..e7566f5ff9cf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2122,13 +2122,9 @@ static void gup_pgd_range(unsigned long addr, unsigned long end,
* Check if it's allowed to use __get_user_pages_fast() for the range, or
* we need to fall back to the slow version:
*/
-bool gup_fast_permitted(unsigned long start, int nr_pages)
+static bool gup_fast_permitted(unsigned long start, unsigned long end)
{
- unsigned long len, end;
-
- len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
- return end >= start;
+ return true;
}
#endif
@@ -2149,6 +2145,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
+ if (end < start)
+ return 0;
if (unlikely(!access_ok((void __user *)start, len)))
return 0;
@@ -2164,7 +2162,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
* block IPIs that come from THPs splitting.
*/
- if (gup_fast_permitted(start, nr_pages)) {
+ if (gup_fast_permitted(start, end)) {
local_irq_save(flags);
gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr);
local_irq_restore(flags);
@@ -2223,13 +2221,12 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
- if (nr_pages <= 0)
+ if (end < start)
return 0;
-
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;
- if (gup_fast_permitted(start, nr_pages)) {
+ if (gup_fast_permitted(start, end)) {
local_irq_disable();
gup_pgd_range(addr, end, gup_flags, pages, &nr);
local_irq_enable();
--
2.20.1
^ permalink raw reply related
* [PATCH 05/16] MIPS: use the generic get_user_pages_fast code
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
The mips code is mostly equivalent to the generic one, minus various
bugfixes and an arch override for gup_fast_permitted.
Note that this defines ARCH_HAS_PTE_SPECIAL for mips as mips has
pte_special and pte_mkspecial implemented and used in the existing
gup code. They are no-op stubs, though which makes me a little unsure
if this is really right thing to do.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/mips/Kconfig | 3 +
arch/mips/include/asm/pgtable.h | 3 +
arch/mips/mm/Makefile | 1 -
arch/mips/mm/gup.c | 303 --------------------------------
4 files changed, 6 insertions(+), 304 deletions(-)
delete mode 100644 arch/mips/mm/gup.c
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 70d3200476bf..64108a2a16d4 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -6,6 +6,7 @@ config MIPS
select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT
select ARCH_CLOCKSOURCE_DATA
select ARCH_HAS_ELF_RANDOMIZE
+ select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_SUPPORTS_UPROBES
@@ -34,6 +35,7 @@ config MIPS
select GENERIC_SCHED_CLOCK if !CAVIUM_OCTEON_SOC
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
+ select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
select HANDLE_DOMAIN_IRQ
select HAVE_ARCH_COMPILER_H
select HAVE_ARCH_JUMP_LABEL
@@ -55,6 +57,7 @@ config MIPS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
+ select HAVE_GENERIC_GUP
select HAVE_IDE
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 4ccb465ef3f2..7d27194e3b45 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -20,6 +20,7 @@
#include <asm/cmpxchg.h>
#include <asm/io.h>
#include <asm/pgtable-bits.h>
+#include <asm/cpu-features.h>
struct mm_struct;
struct vm_area_struct;
@@ -626,6 +627,8 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#define gup_fast_permitted(start, end) (!cpu_has_dc_aliases)
+
#include <asm-generic/pgtable.h>
/*
diff --git a/arch/mips/mm/Makefile b/arch/mips/mm/Makefile
index f34d7ff5eb60..1e8d335025d7 100644
--- a/arch/mips/mm/Makefile
+++ b/arch/mips/mm/Makefile
@@ -7,7 +7,6 @@ obj-y += cache.o
obj-y += context.o
obj-y += extable.o
obj-y += fault.o
-obj-y += gup.o
obj-y += init.o
obj-y += mmap.o
obj-y += page.o
diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
deleted file mode 100644
index 4c2b4483683c..000000000000
--- a/arch/mips/mm/gup.c
+++ /dev/null
@@ -1,303 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Lockless get_user_pages_fast for MIPS
- *
- * Copyright (C) 2008 Nick Piggin
- * Copyright (C) 2008 Novell Inc.
- * Copyright (C) 2011 Ralf Baechle
- */
-#include <linux/sched.h>
-#include <linux/mm.h>
-#include <linux/vmstat.h>
-#include <linux/highmem.h>
-#include <linux/swap.h>
-#include <linux/hugetlb.h>
-
-#include <asm/cpu-features.h>
-#include <asm/pgtable.h>
-
-static inline pte_t gup_get_pte(pte_t *ptep)
-{
-#if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
- pte_t pte;
-
-retry:
- pte.pte_low = ptep->pte_low;
- smp_rmb();
- pte.pte_high = ptep->pte_high;
- smp_rmb();
- if (unlikely(pte.pte_low != ptep->pte_low))
- goto retry;
-
- return pte;
-#else
- return READ_ONCE(*ptep);
-#endif
-}
-
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- pte_t *ptep = pte_offset_map(&pmd, addr);
- do {
- pte_t pte = gup_get_pte(ptep);
- struct page *page;
-
- if (!pte_present(pte) ||
- pte_special(pte) || (write && !pte_write(pte))) {
- pte_unmap(ptep);
- return 0;
- }
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
- page = pte_page(pte);
- get_page(page);
- SetPageReferenced(page);
- pages[*nr] = page;
- (*nr)++;
-
- } while (ptep++, addr += PAGE_SIZE, addr != end);
-
- pte_unmap(ptep - 1);
- return 1;
-}
-
-static inline void get_head_page_multiple(struct page *page, int nr)
-{
- VM_BUG_ON(page != compound_head(page));
- VM_BUG_ON(page_count(page) == 0);
- page_ref_add(page, nr);
- SetPageReferenced(page);
-}
-
-static int gup_huge_pmd(pmd_t pmd, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- pte_t pte = *(pte_t *)&pmd;
- struct page *head, *page;
- int refs;
-
- if (write && !pte_write(pte))
- return 0;
- /* hugepages are never "special" */
- VM_BUG_ON(pte_special(pte));
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
- refs = 0;
- head = pte_page(pte);
- page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- do {
- VM_BUG_ON(compound_head(page) != head);
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
-
- get_head_page_multiple(head, refs);
- return 1;
-}
-
-static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- unsigned long next;
- pmd_t *pmdp;
-
- pmdp = pmd_offset(&pud, addr);
- do {
- pmd_t pmd = *pmdp;
-
- next = pmd_addr_end(addr, end);
- if (pmd_none(pmd))
- return 0;
- if (unlikely(pmd_huge(pmd))) {
- if (!gup_huge_pmd(pmd, addr, next, write, pages,nr))
- return 0;
- } else {
- if (!gup_pte_range(pmd, addr, next, write, pages,nr))
- return 0;
- }
- } while (pmdp++, addr = next, addr != end);
-
- return 1;
-}
-
-static int gup_huge_pud(pud_t pud, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- pte_t pte = *(pte_t *)&pud;
- struct page *head, *page;
- int refs;
-
- if (write && !pte_write(pte))
- return 0;
- /* hugepages are never "special" */
- VM_BUG_ON(pte_special(pte));
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
- refs = 0;
- head = pte_page(pte);
- page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- do {
- VM_BUG_ON(compound_head(page) != head);
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
-
- get_head_page_multiple(head, refs);
- return 1;
-}
-
-static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- unsigned long next;
- pud_t *pudp;
-
- pudp = pud_offset(&pgd, addr);
- do {
- pud_t pud = *pudp;
-
- next = pud_addr_end(addr, end);
- if (pud_none(pud))
- return 0;
- if (unlikely(pud_huge(pud))) {
- if (!gup_huge_pud(pud, addr, next, write, pages,nr))
- return 0;
- } else {
- if (!gup_pmd_range(pud, addr, next, write, pages,nr))
- return 0;
- }
- } while (pudp++, addr = next, addr != end);
-
- return 1;
-}
-
-/*
- * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
- * back to the regular GUP.
- * Note a difference with get_user_pages_fast: this always returns the
- * number of pages pinned, 0 if no pages were pinned.
- */
-int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
- struct page **pages)
-{
- struct mm_struct *mm = current->mm;
- unsigned long addr, len, end;
- unsigned long next;
- unsigned long flags;
- pgd_t *pgdp;
- int nr = 0;
-
- start &= PAGE_MASK;
- addr = start;
- len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
- if (unlikely(!access_ok((void __user *)start, len)))
- return 0;
-
- /*
- * XXX: batch / limit 'nr', to avoid large irq off latency
- * needs some instrumenting to determine the common sizes used by
- * important workloads (eg. DB2), and whether limiting the batch
- * size will decrease performance.
- *
- * It seems like we're in the clear for the moment. Direct-IO is
- * the main guy that batches up lots of get_user_pages, and even
- * they are limited to 64-at-a-time which is not so many.
- */
- /*
- * This doesn't prevent pagetable teardown, but does prevent
- * the pagetables and pages from being freed.
- *
- * So long as we atomically load page table pointers versus teardown,
- * we can follow the address down to the page and take a ref on it.
- */
- local_irq_save(flags);
- pgdp = pgd_offset(mm, addr);
- do {
- pgd_t pgd = *pgdp;
-
- next = pgd_addr_end(addr, end);
- if (pgd_none(pgd))
- break;
- if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
- break;
- } while (pgdp++, addr = next, addr != end);
- local_irq_restore(flags);
-
- return nr;
-}
-
-/**
- * get_user_pages_fast() - pin user pages in memory
- * @start: starting user address
- * @nr_pages: number of pages from start to pin
- * @gup_flags: flags modifying pin behaviour
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_pages long.
- *
- * Attempt to pin user pages in memory without taking mm->mmap_sem.
- * If not successful, it will fall back to taking the lock and
- * calling get_user_pages().
- *
- * Returns number of pages pinned. This may be fewer than the number
- * requested. If nr_pages is 0 or negative, returns 0. If no pages
- * were pinned, returns -errno.
- */
-int get_user_pages_fast(unsigned long start, int nr_pages,
- unsigned int gup_flags, struct page **pages)
-{
- struct mm_struct *mm = current->mm;
- unsigned long addr, len, end;
- unsigned long next;
- pgd_t *pgdp;
- int ret, nr = 0;
-
- start &= PAGE_MASK;
- addr = start;
- len = (unsigned long) nr_pages << PAGE_SHIFT;
-
- end = start + len;
- if (end < start || cpu_has_dc_aliases)
- goto slow_irqon;
-
- /* XXX: batch / limit 'nr' */
- local_irq_disable();
- pgdp = pgd_offset(mm, addr);
- do {
- pgd_t pgd = *pgdp;
-
- next = pgd_addr_end(addr, end);
- if (pgd_none(pgd))
- goto slow;
- if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
- pages, &nr))
- goto slow;
- } while (pgdp++, addr = next, addr != end);
- local_irq_enable();
-
- VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
- return nr;
-slow:
- local_irq_enable();
-
-slow_irqon:
- /* Try to get the remaining pages with get_user_pages */
- start += nr << PAGE_SHIFT;
- pages += nr;
-
- ret = get_user_pages_unlocked(start, (end - start) >> PAGE_SHIFT,
- pages, gup_flags);
-
- /* Have to be a bit careful with return values */
- if (nr > 0) {
- if (ret < 0)
- ret = nr;
- else
- ret += nr;
- }
- return ret;
-}
--
2.20.1
^ permalink raw reply related
* [PATCH 08/16] sparc64: add the missing pgd_page definition
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
sparc64 only had pgd_page_vaddr, but not pgd_page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/sparc/include/asm/pgtable_64.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 22500c3be7a9..dcf970e82262 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -861,6 +861,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL)
#define pgd_page_vaddr(pgd) \
((unsigned long) __va(pgd_val(pgd)))
+#define pgd_page(pgd) virt_to_page(__va(pgd_val(pgd)))
#define pgd_present(pgd) (pgd_val(pgd) != 0U)
#define pgd_clear(pgdp) (pgd_val(*(pgdp)) = 0UL)
--
2.20.1
^ permalink raw reply related
* [PATCH 07/16] sh: use the generic get_user_pages_fast code
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
The sh code is mostly equivalent to the generic one, minus various
bugfixes and two arch overrides that this patch adds to pgtable.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/sh/Kconfig | 2 +
arch/sh/include/asm/pgtable.h | 37 +++++
arch/sh/mm/Makefile | 2 +-
arch/sh/mm/gup.c | 277 ----------------------------------
4 files changed, 40 insertions(+), 278 deletions(-)
delete mode 100644 arch/sh/mm/gup.c
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index b77f512bb176..6fddfc3c9710 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -14,6 +14,7 @@ config SUPERH
select HAVE_ARCH_TRACEHOOK
select HAVE_PERF_EVENTS
select HAVE_DEBUG_BUGVERBOSE
+ select HAVE_GENERIC_GUP
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select ARCH_HAS_GCOV_PROFILE_ALL
@@ -63,6 +64,7 @@ config SUPERH
config SUPERH32
def_bool "$(ARCH)" = "sh"
select ARCH_32BIT_OFF_T
+ select GUP_GET_PTE_LOW_HIGH if X2TLB
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_IOREMAP_PROT if MMU && !X2TLB
diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
index 3587103afe59..9085d1142fa3 100644
--- a/arch/sh/include/asm/pgtable.h
+++ b/arch/sh/include/asm/pgtable.h
@@ -149,6 +149,43 @@ extern void paging_init(void);
extern void page_table_range_init(unsigned long start, unsigned long end,
pgd_t *pgd);
+static inline bool __pte_access_permitted(pte_t pte, u64 prot)
+{
+ return (pte_val(pte) & (prot | _PAGE_SPECIAL)) == prot;
+}
+
+#ifdef CONFIG_X2TLB
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+ u64 prot = _PAGE_PRESENT;
+
+ prot |= _PAGE_EXT(_PAGE_EXT_KERN_READ | _PAGE_EXT_USER_READ);
+ if (write)
+ prot |= _PAGE_EXT(_PAGE_EXT_KERN_WRITE | _PAGE_EXT_USER_WRITE);
+ return __pte_access_permitted(pte, prot);
+}
+#elif defined(CONFIG_SUPERH64)
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+ u64 prot = _PAGE_PRESENT | _PAGE_USER | _PAGE_READ;
+
+ if (write)
+ prot |= _PAGE_WRITE;
+ return __pte_access_permitted(pte, prot);
+}
+#else
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+ u64 prot = _PAGE_PRESENT | _PAGE_USER;
+
+ if (write)
+ prot |= _PAGE_RW;
+ return __pte_access_permitted(pte, prot);
+}
+#endif
+
+#define pte_access_permitted pte_access_permitted
+
/* arch/sh/mm/mmap.c */
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index fbe5e79751b3..5051b38fd5b6 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -17,7 +17,7 @@ cacheops-$(CONFIG_CPU_SHX3) += cache-shx3.o
obj-y += $(cacheops-y)
mmu-y := nommu.o extable_32.o
-mmu-$(CONFIG_MMU) := extable_$(BITS).o fault.o gup.o ioremap.o kmap.o \
+mmu-$(CONFIG_MMU) := extable_$(BITS).o fault.o ioremap.o kmap.o \
pgtable.o tlbex_$(BITS).o tlbflush_$(BITS).o
obj-y += $(mmu-y)
diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
deleted file mode 100644
index 277c882f7489..000000000000
--- a/arch/sh/mm/gup.c
+++ /dev/null
@@ -1,277 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Lockless get_user_pages_fast for SuperH
- *
- * Copyright (C) 2009 - 2010 Paul Mundt
- *
- * Cloned from the x86 and PowerPC versions, by:
- *
- * Copyright (C) 2008 Nick Piggin
- * Copyright (C) 2008 Novell Inc.
- */
-#include <linux/sched.h>
-#include <linux/mm.h>
-#include <linux/vmstat.h>
-#include <linux/highmem.h>
-#include <asm/pgtable.h>
-
-static inline pte_t gup_get_pte(pte_t *ptep)
-{
-#ifndef CONFIG_X2TLB
- return READ_ONCE(*ptep);
-#else
- /*
- * With get_user_pages_fast, we walk down the pagetables without
- * taking any locks. For this we would like to load the pointers
- * atomically, but that is not possible with 64-bit PTEs. What
- * we do have is the guarantee that a pte will only either go
- * from not present to present, or present to not present or both
- * -- it will not switch to a completely different present page
- * without a TLB flush in between; something that we are blocking
- * by holding interrupts off.
- *
- * Setting ptes from not present to present goes:
- * ptep->pte_high = h;
- * smp_wmb();
- * ptep->pte_low = l;
- *
- * And present to not present goes:
- * ptep->pte_low = 0;
- * smp_wmb();
- * ptep->pte_high = 0;
- *
- * We must ensure here that the load of pte_low sees l iff pte_high
- * sees h. We load pte_high *after* loading pte_low, which ensures we
- * don't see an older value of pte_high. *Then* we recheck pte_low,
- * which ensures that we haven't picked up a changed pte high. We might
- * have got rubbish values from pte_low and pte_high, but we are
- * guaranteed that pte_low will not have the present bit set *unless*
- * it is 'l'. And get_user_pages_fast only operates on present ptes, so
- * we're safe.
- *
- * gup_get_pte should not be used or copied outside gup.c without being
- * very careful -- it does not atomically load the pte or anything that
- * is likely to be useful for you.
- */
- pte_t pte;
-
-retry:
- pte.pte_low = ptep->pte_low;
- smp_rmb();
- pte.pte_high = ptep->pte_high;
- smp_rmb();
- if (unlikely(pte.pte_low != ptep->pte_low))
- goto retry;
-
- return pte;
-#endif
-}
-
-/*
- * The performance critical leaf functions are made noinline otherwise gcc
- * inlines everything into a single function which results in too much
- * register pressure.
- */
-static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
- unsigned long end, int write, struct page **pages, int *nr)
-{
- u64 mask, result;
- pte_t *ptep;
-
-#ifdef CONFIG_X2TLB
- result = _PAGE_PRESENT | _PAGE_EXT(_PAGE_EXT_KERN_READ | _PAGE_EXT_USER_READ);
- if (write)
- result |= _PAGE_EXT(_PAGE_EXT_KERN_WRITE | _PAGE_EXT_USER_WRITE);
-#elif defined(CONFIG_SUPERH64)
- result = _PAGE_PRESENT | _PAGE_USER | _PAGE_READ;
- if (write)
- result |= _PAGE_WRITE;
-#else
- result = _PAGE_PRESENT | _PAGE_USER;
- if (write)
- result |= _PAGE_RW;
-#endif
-
- mask = result | _PAGE_SPECIAL;
-
- ptep = pte_offset_map(&pmd, addr);
- do {
- pte_t pte = gup_get_pte(ptep);
- struct page *page;
-
- if ((pte_val(pte) & mask) != result) {
- pte_unmap(ptep);
- return 0;
- }
- VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
- page = pte_page(pte);
- get_page(page);
- __flush_anon_page(page, addr);
- flush_dcache_page(page);
- pages[*nr] = page;
- (*nr)++;
-
- } while (ptep++, addr += PAGE_SIZE, addr != end);
- pte_unmap(ptep - 1);
-
- return 1;
-}
-
-static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- unsigned long next;
- pmd_t *pmdp;
-
- pmdp = pmd_offset(&pud, addr);
- do {
- pmd_t pmd = *pmdp;
-
- next = pmd_addr_end(addr, end);
- if (pmd_none(pmd))
- return 0;
- if (!gup_pte_range(pmd, addr, next, write, pages, nr))
- return 0;
- } while (pmdp++, addr = next, addr != end);
-
- return 1;
-}
-
-static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
- int write, struct page **pages, int *nr)
-{
- unsigned long next;
- pud_t *pudp;
-
- pudp = pud_offset(&pgd, addr);
- do {
- pud_t pud = *pudp;
-
- next = pud_addr_end(addr, end);
- if (pud_none(pud))
- return 0;
- if (!gup_pmd_range(pud, addr, next, write, pages, nr))
- return 0;
- } while (pudp++, addr = next, addr != end);
-
- return 1;
-}
-
-/*
- * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
- * back to the regular GUP.
- * Note a difference with get_user_pages_fast: this always returns the
- * number of pages pinned, 0 if no pages were pinned.
- */
-int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
- struct page **pages)
-{
- struct mm_struct *mm = current->mm;
- unsigned long addr, len, end;
- unsigned long next;
- unsigned long flags;
- pgd_t *pgdp;
- int nr = 0;
-
- start &= PAGE_MASK;
- addr = start;
- len = (unsigned long) nr_pages << PAGE_SHIFT;
- end = start + len;
- if (unlikely(!access_ok((void __user *)start, len)))
- return 0;
-
- /*
- * This doesn't prevent pagetable teardown, but does prevent
- * the pagetables and pages from being freed.
- */
- local_irq_save(flags);
- pgdp = pgd_offset(mm, addr);
- do {
- pgd_t pgd = *pgdp;
-
- next = pgd_addr_end(addr, end);
- if (pgd_none(pgd))
- break;
- if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
- break;
- } while (pgdp++, addr = next, addr != end);
- local_irq_restore(flags);
-
- return nr;
-}
-
-/**
- * get_user_pages_fast() - pin user pages in memory
- * @start: starting user address
- * @nr_pages: number of pages from start to pin
- * @gup_flags: flags modifying pin behaviour
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_pages long.
- *
- * Attempt to pin user pages in memory without taking mm->mmap_sem.
- * If not successful, it will fall back to taking the lock and
- * calling get_user_pages().
- *
- * Returns number of pages pinned. This may be fewer than the number
- * requested. If nr_pages is 0 or negative, returns 0. If no pages
- * were pinned, returns -errno.
- */
-int get_user_pages_fast(unsigned long start, int nr_pages,
- unsigned int gup_flags, struct page **pages)
-{
- struct mm_struct *mm = current->mm;
- unsigned long addr, len, end;
- unsigned long next;
- pgd_t *pgdp;
- int nr = 0;
-
- start &= PAGE_MASK;
- addr = start;
- len = (unsigned long) nr_pages << PAGE_SHIFT;
-
- end = start + len;
- if (end < start)
- goto slow_irqon;
-
- local_irq_disable();
- pgdp = pgd_offset(mm, addr);
- do {
- pgd_t pgd = *pgdp;
-
- next = pgd_addr_end(addr, end);
- if (pgd_none(pgd))
- goto slow;
- if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
- pages, &nr))
- goto slow;
- } while (pgdp++, addr = next, addr != end);
- local_irq_enable();
-
- VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
- return nr;
-
- {
- int ret;
-
-slow:
- local_irq_enable();
-slow_irqon:
- /* Try to get the remaining pages with get_user_pages */
- start += nr << PAGE_SHIFT;
- pages += nr;
-
- ret = get_user_pages_unlocked(start,
- (end - start) >> PAGE_SHIFT, pages,
- gup_flags);
-
- /* Have to be a bit careful with return values */
- if (nr > 0) {
- if (ret < 0)
- ret = nr;
- else
- ret += nr;
- }
-
- return ret;
- }
-}
--
2.20.1
^ permalink raw reply related
* [PATCH 04/16] mm: lift the x86_32 PAE version of gup_get_pte to common code
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
The split low/high access is the only non-READ_ONCE version of
gup_get_pte that did show up in the various arch implemenations.
Lift it to common code and drop the ifdef based arch override.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pgtable-3level.h | 47 ------------------------
arch/x86/kvm/mmu.c | 2 +-
mm/Kconfig | 3 ++
mm/gup.c | 51 ++++++++++++++++++++++++---
5 files changed, 52 insertions(+), 52 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d1ba31..7cd53cc59f0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -121,6 +121,7 @@ config X86
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+ select GUP_GET_PTE_LOW_HIGH if X86_PAE
select HARDLOCKUP_CHECK_TIMESTAMP if X86_64
select HAVE_ACPI_APEI if ACPI
select HAVE_ACPI_APEI_NMI if ACPI
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index f8b1ad2c3828..e3633795fb22 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -285,53 +285,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
#define __pte_to_swp_entry(pte) (__swp_entry(__pteval_swp_type(pte), \
__pteval_swp_offset(pte)))
-#define gup_get_pte gup_get_pte
-/*
- * WARNING: only to be used in the get_user_pages_fast() implementation.
- *
- * With get_user_pages_fast(), we walk down the pagetables without taking
- * any locks. For this we would like to load the pointers atomically,
- * but that is not possible (without expensive cmpxchg8b) on PAE. What
- * we do have is the guarantee that a PTE will only either go from not
- * present to present, or present to not present or both -- it will not
- * switch to a completely different present page without a TLB flush in
- * between; something that we are blocking by holding interrupts off.
- *
- * Setting ptes from not present to present goes:
- *
- * ptep->pte_high = h;
- * smp_wmb();
- * ptep->pte_low = l;
- *
- * And present to not present goes:
- *
- * ptep->pte_low = 0;
- * smp_wmb();
- * ptep->pte_high = 0;
- *
- * We must ensure here that the load of pte_low sees 'l' iff pte_high
- * sees 'h'. We load pte_high *after* loading pte_low, which ensures we
- * don't see an older value of pte_high. *Then* we recheck pte_low,
- * which ensures that we haven't picked up a changed pte high. We might
- * have gotten rubbish values from pte_low and pte_high, but we are
- * guaranteed that pte_low will not have the present bit set *unless*
- * it is 'l'. Because get_user_pages_fast() only operates on present ptes
- * we're safe.
- */
-static inline pte_t gup_get_pte(pte_t *ptep)
-{
- pte_t pte;
-
- do {
- pte.pte_low = ptep->pte_low;
- smp_rmb();
- pte.pte_high = ptep->pte_high;
- smp_rmb();
- } while (unlikely(pte.pte_low != ptep->pte_low));
-
- return pte;
-}
-
#include <asm/pgtable-invert.h>
#endif /* _ASM_X86_PGTABLE_3LEVEL_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1e9ba81accba..3f7cd11168f9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -653,7 +653,7 @@ static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
/*
* The idea using the light way get the spte on x86_32 guest is from
- * gup_get_pte(arch/x86/mm/gup.c).
+ * gup_get_pte (mm/gup.c).
*
* An spte tlb flush may be pending, because kvm_set_pte_rmapp
* coalesces them and we are running out of the MMU lock. Therefore
diff --git a/mm/Kconfig b/mm/Kconfig
index f0c76ba47695..fe51f104a9e0 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -762,6 +762,9 @@ config GUP_BENCHMARK
See tools/testing/selftests/vm/gup_benchmark.c
+config GUP_GET_PTE_LOW_HIGH
+ bool
+
config ARCH_HAS_PTE_SPECIAL
bool
diff --git a/mm/gup.c b/mm/gup.c
index e7566f5ff9cf..a86d65cd7051 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1683,17 +1683,60 @@ struct page *get_dump_page(unsigned long addr)
* This code is based heavily on the PowerPC implementation by Nick Piggin.
*/
#ifdef CONFIG_HAVE_GENERIC_GUP
+#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
+/*
+ * WARNING: only to be used in the get_user_pages_fast() implementation.
+ *
+ * With get_user_pages_fast(), we walk down the pagetables without taking any
+ * locks. For this we would like to load the pointers atomically, but sometimes
+ * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE). What
+ * we do have is the guarantee that a PTE will only either go from not present
+ * to present, or present to not present or both -- it will not switch to a
+ * completely different present page without a TLB flush in between; something
+ * that we are blocking by holding interrupts off.
+ *
+ * Setting ptes from not present to present goes:
+ *
+ * ptep->pte_high = h;
+ * smp_wmb();
+ * ptep->pte_low = l;
+ *
+ * And present to not present goes:
+ *
+ * ptep->pte_low = 0;
+ * smp_wmb();
+ * ptep->pte_high = 0;
+ *
+ * We must ensure here that the load of pte_low sees 'l' IFF pte_high sees 'h'.
+ * We load pte_high *after* loading pte_low, which ensures we don't see an older
+ * value of pte_high. *Then* we recheck pte_low, which ensures that we haven't
+ * picked up a changed pte high. We might have gotten rubbish values from
+ * pte_low and pte_high, but we are guaranteed that pte_low will not have the
+ * present bit set *unless* it is 'l'. Because get_user_pages_fast() only
+ * operates on present ptes we're safe.
+ */
+static inline pte_t gup_get_pte(pte_t *ptep)
+{
+ pte_t pte;
-#ifndef gup_get_pte
+ do {
+ pte.pte_low = ptep->pte_low;
+ smp_rmb();
+ pte.pte_high = ptep->pte_high;
+ smp_rmb();
+ } while (unlikely(pte.pte_low != ptep->pte_low));
+
+ return pte;
+}
+#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
/*
- * We assume that the PTE can be read atomically. If this is not the case for
- * your architecture, please provide the helper.
+ * We require that the PTE can be read atomically.
*/
static inline pte_t gup_get_pte(pte_t *ptep)
{
return READ_ONCE(*ptep);
}
-#endif
+#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
static void undo_dev_pagemap(int *nr, int nr_start, struct page **pages)
{
--
2.20.1
^ permalink raw reply related
* [PATCH 09/16] sparc64: define untagged_addr()
From: Christoph Hellwig @ 2019-06-01 7:49 UTC (permalink / raw)
To: Linus Torvalds, Paul Burton, James Hogan, Yoshinori Sato,
Rich Felker, David S. Miller
Cc: linux-sh, Andrey Konovalov, x86, linux-mips, Nicholas Piggin,
linux-kernel, linux-mm, Khalid Aziz, Paul Mackerras, sparclinux,
linuxppc-dev
In-Reply-To: <20190601074959.14036-1-hch@lst.de>
Add a helper to untag a user pointer. This is needed for ADI support
in get_user_pages_fast.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/sparc/include/asm/pgtable_64.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index dcf970e82262..a93eca29e85a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1076,6 +1076,28 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
}
#define io_remap_pfn_range io_remap_pfn_range
+static inline unsigned long untagged_addr(unsigned long start)
+{
+ if (adi_capable()) {
+ long addr = start;
+
+ /* If userspace has passed a versioned address, kernel
+ * will not find it in the VMAs since it does not store
+ * the version tags in the list of VMAs. Storing version
+ * tags in list of VMAs is impractical since they can be
+ * changed any time from userspace without dropping into
+ * kernel. Any address search in VMAs will be done with
+ * non-versioned addresses. Ensure the ADI version bits
+ * are dropped here by sign extending the last bit before
+ * ADI bits. IOMMU does not implement version tags.
+ */
+ return (addr << (long)adi_nbits()) >> (long)adi_nbits();
+ }
+
+ return start;
+}
+#define untagged_addr untagged_addr
+
#include <asm/tlbflush.h>
#include <asm-generic/pgtable.h>
--
2.20.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox