* Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
From: Waiman Long @ 2020-07-08 23:58 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
Cc: linux-arch, Peter Zijlstra, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, Will Deacon
In-Reply-To: <62fa6343-e084-75c3-01c9-349a4617e67c@redhat.com>
On 7/8/20 7:50 PM, Waiman Long wrote:
> On 7/8/20 1:10 AM, Nicholas Piggin wrote:
>> Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:
>>> On 7/7/20 1:57 AM, Nicholas Piggin wrote:
>>>> Yes, powerpc could certainly get more performance out of the slow
>>>> paths, and then there are a few parameters to tune.
>>>>
>>>> We don't have a good alternate patching for function calls yet, but
>>>> that would be something to do for native vs pv.
>>>>
>>>> And then there seem to be one or two tunable parameters we could
>>>> experiment with.
>>>>
>>>> The paravirt locks may need a bit more tuning. Some simple testing
>>>> under KVM shows we might be a bit slower in some cases. Whether this
>>>> is fairness or something else I'm not sure. The current simple pv
>>>> spinlock code can do a directed yield to the lock holder CPU, whereas
>>>> the pv qspl here just does a general yield. I think we might actually
>>>> be able to change that to also support directed yield. Though I'm
>>>> not sure if this is actually the cause of the slowdown yet.
>>> Regarding the paravirt lock, I have taken a further look into the
>>> current PPC spinlock code. There is an equivalent of pv_wait() but no
>>> pv_kick(). Maybe PPC doesn't really need that.
>> So powerpc has two types of wait, either undirected "all processors" or
>> directed to a specific processor which has been preempted by the
>> hypervisor.
>>
>> The simple spinlock code does a directed wait, because it knows the CPU
>> which is holding the lock. In this case, there is a sequence that is
>> used to ensure we don't wait if the condition has become true, and the
>> target CPU does not need to kick the waiter it will happen automatically
>> (see splpar_spin_yield). This is preferable because we only wait as
>> needed and don't require the kick operation.
> Thanks for the explanation.
>>
>> The pv spinlock code I did uses the undirected wait, because we don't
>> know the CPU number which we are waiting on. This is undesirable because
>> it's higher overhead and the wait is not so accurate.
>>
>> I think perhaps we could change things so we wait on the correct CPU
>> when queued, which might be good enough (we could also put the lock
>> owner CPU in the spinlock word, if we add another format).
>
> The LS byte of the lock word is used to indicate locking status. If we
> have less than 255 cpus, we can put the (cpu_nr + 1) into the lock
> byte. The special 0xff value can be used to indicate a cpu number >=
> 255 for indirect yield. The required change to the qspinlock code will
> be minimal, I think.
BTW, we can also keep track of the previous cpu in the waiting queue.
Due to lock stealing, that may not be the cpu that is holding the lock.
Maybe we can use this, if available, in case the cpu number is >= 255.
Regards,
Longman
^ permalink raw reply
* Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
From: Waiman Long @ 2020-07-08 23:54 UTC (permalink / raw)
To: Peter Zijlstra, Nicholas Piggin
Cc: linux-arch, Will Deacon, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, linuxppc-dev
In-Reply-To: <20200708084106.GE597537@hirez.programming.kicks-ass.net>
On 7/8/20 4:41 AM, Peter Zijlstra wrote:
> On Tue, Jul 07, 2020 at 03:57:06PM +1000, Nicholas Piggin wrote:
>> Yes, powerpc could certainly get more performance out of the slow
>> paths, and then there are a few parameters to tune.
> Can you clarify? The slow path is already in use on ARM64 which is weak,
> so I doubt there's superfluous serialization present. And Will spend a
> fair amount of time on making that thing guarantee forward progressm, so
> there just isn't too much room to play.
>
>> We don't have a good alternate patching for function calls yet, but
>> that would be something to do for native vs pv.
> Going by your jump_label implementation, support for static_call should
> be fairly straight forward too, no?
>
> https://lkml.kernel.org/r/20200624153024.794671356@infradead.org
>
Speaking of static_call, I am also looking forward to it. Do you have an
idea when that will be merged?
Cheers,
Longman
^ permalink raw reply
* Re: Failure to build librseq on ppc
From: Segher Boessenkool @ 2020-07-08 23:53 UTC (permalink / raw)
To: Michael Ellerman
Cc: linuxppc-dev, Boqun Feng, Mathieu Desnoyers, Michael Jeanson
In-Reply-To: <87k0ze2nv4.fsf@mpe.ellerman.id.au>
Hi!
On Wed, Jul 08, 2020 at 10:27:27PM +1000, Michael Ellerman wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > You'll have to show the actual failing machine code, and with enough
> > context that we can relate this to the source code.
> >
> > -save-temps helps, or use -S instead of -c, etc.
>
> Attached below.
Thanks!
> I think that's from:
>
> #define LOAD_WORD "ld "
>
> #define RSEQ_ASM_OP_CMPEQ(var, expect, label) \
> LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \
The way this hardcodes r17 *will* break, btw. The compiler will not
likely want to use r17 as long as your code (after inlining etc.!) stays
small, but there is Murphy's law.
Anyway... something in rseq_str is wrong, missing %X<n>. This may
have to do with the abuse of inline asm here, making a fix harder :-(
Segher
^ permalink raw reply
* Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
From: Waiman Long @ 2020-07-08 23:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-arch, Will Deacon, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, Nicholas Piggin, linuxppc-dev
In-Reply-To: <20200708083210.GD597537@hirez.programming.kicks-ass.net>
On 7/8/20 4:32 AM, Peter Zijlstra wrote:
> On Tue, Jul 07, 2020 at 11:33:45PM -0400, Waiman Long wrote:
>> From 5d7941a498935fb225b2c7a3108cbf590114c3db Mon Sep 17 00:00:00 2001
>> From: Waiman Long <longman@redhat.com>
>> Date: Tue, 7 Jul 2020 22:29:16 -0400
>> Subject: [PATCH 2/9] locking/pvqspinlock: Introduce
>> CONFIG_PARAVIRT_QSPINLOCKS_LITE
>>
>> Add a new PARAVIRT_QSPINLOCKS_LITE config option that allows
>> architectures to use the PV qspinlock code without the need to use or
>> implement a pv_kick() function, thus eliminating the atomic unlock
>> overhead. The non-atomic queued_spin_unlock() can be used instead.
>> The pv_wait() function will still be needed, but it can be a dummy
>> function.
>>
>> With that option set, the hybrid PV queued/unfair locking code should
>> still be able to make it performant enough in a paravirtualized
> How is this supposed to work? If there is no kick, you have no control
> over who wakes up and fairness goes out the window entirely.
>
> You don't even begin to explain...
>
I don't have a full understanding of how the PPC hypervisor work myself.
Apparently, a cpu kick may not be needed.
This is just a test patch to see if it yields better result. It is
subjected to further modifcation.
Cheers,
Longman
^ permalink raw reply
* Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
From: Waiman Long @ 2020-07-08 23:50 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
Cc: linux-arch, Peter Zijlstra, Boqun Feng, linux-kernel, kvm-ppc,
virtualization, Ingo Molnar, Will Deacon
In-Reply-To: <1594184204.ncuq7vstsz.astroid@bobo.none>
On 7/8/20 1:10 AM, Nicholas Piggin wrote:
> Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:
>> On 7/7/20 1:57 AM, Nicholas Piggin wrote:
>>> Yes, powerpc could certainly get more performance out of the slow
>>> paths, and then there are a few parameters to tune.
>>>
>>> We don't have a good alternate patching for function calls yet, but
>>> that would be something to do for native vs pv.
>>>
>>> And then there seem to be one or two tunable parameters we could
>>> experiment with.
>>>
>>> The paravirt locks may need a bit more tuning. Some simple testing
>>> under KVM shows we might be a bit slower in some cases. Whether this
>>> is fairness or something else I'm not sure. The current simple pv
>>> spinlock code can do a directed yield to the lock holder CPU, whereas
>>> the pv qspl here just does a general yield. I think we might actually
>>> be able to change that to also support directed yield. Though I'm
>>> not sure if this is actually the cause of the slowdown yet.
>> Regarding the paravirt lock, I have taken a further look into the
>> current PPC spinlock code. There is an equivalent of pv_wait() but no
>> pv_kick(). Maybe PPC doesn't really need that.
> So powerpc has two types of wait, either undirected "all processors" or
> directed to a specific processor which has been preempted by the
> hypervisor.
>
> The simple spinlock code does a directed wait, because it knows the CPU
> which is holding the lock. In this case, there is a sequence that is
> used to ensure we don't wait if the condition has become true, and the
> target CPU does not need to kick the waiter it will happen automatically
> (see splpar_spin_yield). This is preferable because we only wait as
> needed and don't require the kick operation.
Thanks for the explanation.
>
> The pv spinlock code I did uses the undirected wait, because we don't
> know the CPU number which we are waiting on. This is undesirable because
> it's higher overhead and the wait is not so accurate.
>
> I think perhaps we could change things so we wait on the correct CPU
> when queued, which might be good enough (we could also put the lock
> owner CPU in the spinlock word, if we add another format).
The LS byte of the lock word is used to indicate locking status. If we
have less than 255 cpus, we can put the (cpu_nr + 1) into the lock byte.
The special 0xff value can be used to indicate a cpu number >= 255 for
indirect yield. The required change to the qspinlock code will be
minimal, I think.
>> Attached are two
>> additional qspinlock patches that adds a CONFIG_PARAVIRT_QSPINLOCKS_LITE
>> option to not require pv_kick(). There is also a fixup patch to be
>> applied after your patchset.
>>
>> I don't have access to a PPC LPAR with shared processor at the moment,
>> so I can't test the performance of the paravirt code. Would you mind
>> adding my patches and do some performance test on your end to see if it
>> gives better result?
> Great, I'll do some tests. Any suggestions for what to try?
I will just like to see if it will produce some better performance
result compared with your current version.
Cheers,
Longman
^ permalink raw reply
* [PATCH 2/2] selftests/powerpc: Use proper error code to check fault address
From: Haren Myneni @ 2020-07-08 23:28 UTC (permalink / raw)
To: mpe; +Cc: tulioqm, linuxppc-dev, abali, rzinsly
In-Reply-To: <f8af60fd4167c9c04ee5ab47147b9e95bcb3b9ff.camel@linux.ibm.com>
ERR_NX_TRANSLATION (CSB.CC=5) is for internal to VAS for fault
handling and should not used by OS. ERR_NX_AT_FAULT(CSB.CC=250) is the
proper error code reported by OS when NX encounters address translation
failure.
This patch uses ERR_NX_AT_FAULT (CSB.CC=250) to determine the fault
address when the request is not successful.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
---
tools/testing/selftests/powerpc/nx-gzip/gunz_test.c | 4 ++--
tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
index 6ee0fde..7c23d3d 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
@@ -698,13 +698,13 @@ int decompress_file(int argc, char **argv, void *devhandle)
switch (cc) {
- case ERR_NX_TRANSLATION:
+ case ERR_NX_AT_FAULT:
/* We touched the pages ahead of time. In the most common case
* we shouldn't be here. But may be some pages were paged out.
* Kernel should have placed the faulting address to fsaddr.
*/
- NXPRT(fprintf(stderr, "ERR_NX_TRANSLATION %p\n",
+ NXPRT(fprintf(stderr, "ERR_NX_AT_FAULT %p\n",
(void *)cmdp->crb.csb.fsaddr));
if (pgfault_retries == NX_MAX_FAULTS) {
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
index 7496a83..02dffb6 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -306,13 +306,13 @@ int compress_file(int argc, char **argv, void *handle)
lzcounts, cmdp, handle);
if (cc != ERR_NX_OK && cc != ERR_NX_TPBC_GT_SPBC &&
- cc != ERR_NX_TRANSLATION) {
+ cc != ERR_NX_AT_FAULT) {
fprintf(stderr, "nx error: cc= %d\n", cc);
exit(-1);
}
/* Page faults are handled by the user code */
- if (cc == ERR_NX_TRANSLATION) {
+ if (cc == ERR_NX_AT_FAULT) {
NXPRT(fprintf(stderr, "page fault: cc= %d, ", cc));
NXPRT(fprintf(stderr, "try= %d, fsa= %08llx\n",
fault_tries,
--
1.8.3.1
^ permalink raw reply related
* [PATCH 1/2] powerpc/vas: Report proper error for address translation failure
From: Haren Myneni @ 2020-07-08 23:19 UTC (permalink / raw)
To: mpe; +Cc: tulioqm, abali, linuxppc-dev, rzinsly
DMA controller uses CC=5 internally for translation fault handling. So
OS should be using CC=250 and should report this error to the user space
when NX encounters address translation failure on the request buffer.
Not an issue in earlier releases as NX does not get faults on
kernel addresses.
This patch defines CSB_CC_ADDRESS_TRANSLATION(250) and updates
CSB.CC with this proper error code for user space.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
---
Documentation/powerpc/vas-api.rst | 2 +-
arch/powerpc/include/asm/icswx.h | 2 ++
arch/powerpc/platforms/powernv/vas-fault.c | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/Documentation/powerpc/vas-api.rst b/Documentation/powerpc/vas-api.rst
index 1217c2f..78627cc 100644
--- a/Documentation/powerpc/vas-api.rst
+++ b/Documentation/powerpc/vas-api.rst
@@ -213,7 +213,7 @@ request buffers are not in memory. The operating system handles the fault by
updating CSB with the following data:
csb.flags = CSB_V;
- csb.cc = CSB_CC_TRANSLATION;
+ csb.cc = CSB_CC_ADDRESS_TRANSLATION;
csb.ce = CSB_CE_TERMINATION;
csb.address = fault_address;
diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 965b1f3..b1c9a57 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -77,6 +77,8 @@ struct coprocessor_completion_block {
#define CSB_CC_CHAIN (37)
#define CSB_CC_SEQUENCE (38)
#define CSB_CC_HW (39)
+/* User space address traslation failure */
+#define CSB_CC_ADDRESS_TRANSLATION (250)
#define CSB_SIZE (0x10)
#define CSB_ALIGN CSB_SIZE
diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c
index 266a6ca..33e89d4 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -79,7 +79,7 @@ static void update_csb(struct vas_window *window,
csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
memset(&csb, 0, sizeof(csb));
- csb.cc = CSB_CC_TRANSLATION;
+ csb.cc = CSB_CC_ADDRESS_TRANSLATION;
csb.ce = CSB_CE_TERMINATION;
csb.cs = 0;
csb.count = 0;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable
From: Reza Arbab @ 2020-07-08 20:30 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Bharata B Rao
In-Reply-To: <20200625064547.228448-3-aneesh.kumar@linux.ibm.com>
On Thu, Jun 25, 2020 at 12:15:45PM +0530, Aneesh Kumar K.V wrote:
>remove_pagetable() isn't freeing PUD table. This causes memory
>leak during memory unplug. Fix this.
This has come up before:
https://lore.kernel.org/linuxppc-dev/20190731061920.GA18807@in.ibm.com/
tl;dr, x86 intentionally does not free, and it wasn't quite clear if
their motivation also applies to us. Probably not, but I thought it was
worth mentioning again.
--
Reza Arbab
^ permalink raw reply
* Re: kernel since 5.6 do not boot anymore on Apple PowerBook
From: Christophe Leroy @ 2020-07-08 18:44 UTC (permalink / raw)
To: Giuseppe Sacco, linuxppc-dev
In-Reply-To: <e6878657490aa34b54b3daf0430073078a9840e7.camel@sguazz.it>
Le 08/07/2020 à 19:36, Giuseppe Sacco a écrit :
> Hi Cristophe,
>
> Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
> scritto:
>> Hi
>>
>> Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :
>>> Hello,
>>> while trying to debug a problem using git bisect, I am now at a point
>>> where I cannot build the kernel at all. This is the error message I
>>> get:
>>>
>>> $ LANG=C make ARCH=powerpc \
>>> CROSS_COMPILE=powerpc-linux- \
>>> CONFIG_MODULE_COMPRESS_GZIP=true \
>>> INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
>>> -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
>>> CONFIG_DEBUG_INFO_COMPRESSED=1 \
>>> install modules_install
>>> make[2]: *** No rule to make target 'vmlinux', needed by
>>
>> Surprising.
>>
>> Did you make any change to Makefiles ?
>
> No
>
>> Are you in the middle of a bisect ? If so, if the previous builds
>> worked, I'd do 'git bisect skip'
>
> Yes, the previous one worked.
>
>> What's the result with:
>>
>> LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
>
> $ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
> CALL scripts/checksyscalls.sh
> CALL scripts/atomic/check-atomics.sh
> CHK include/generated/compile.h
> CC kernel/module.o
> kernel/module.c: In function 'do_init_module':
> kernel/module.c:3593:2: error: implicit declaration of function
> 'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
> function-declaration]
> 3593 | module_enable_ro(mod, true);
> | ^~~~~~~~~~~~~~~~
> | module_enable_x
> cc1: some warnings being treated as errors
> make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
> make: *** [Makefile:1735: kernel] Error 2
>
> So, should I 'git bisect skip'?
Ah yes, I had the exact same problem last time I bisected.
So yes do 'git bisect skip'. You'll probably hit this problem half a
dozen of times, but at the end you should get a usefull bisect anyway.
Christophe
^ permalink raw reply
* Re: [PATCH 3/3] misc: cxl: flash: Remove unused variable 'drc_index'
From: kernel test robot @ 2020-07-08 18:07 UTC (permalink / raw)
To: Lee Jones, arnd, gregkh
Cc: kbuild-all, Andrew Donnellan, linuxppc-dev, linux-kernel,
Frederic Barrat, Lee Jones
In-Reply-To: <20200708125711.3443569-4-lee.jones@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 3138 bytes --]
Hi Lee,
I love your patch! Yet something to improve:
[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on soc/for-next linux/master linus/master v5.8-rc4 next-20200708]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Lee-Jones/Mop-up-last-remaining-patches-for-Misc/20200708-205913
base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git 8ab11d705c3b33ae4c6ca05eefaf025b7c5dbeaf
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
drivers/misc/cxl/flash.c: In function 'update_devicetree':
>> drivers/misc/cxl/flash.c:216:6: error: value computed is not used [-Werror=unused-value]
216 | *data++;
| ^~~~~~~
cc1: all warnings being treated as errors
vim +216 drivers/misc/cxl/flash.c
172
173 static int update_devicetree(struct cxl *adapter, s32 scope)
174 {
175 struct update_nodes_workarea *unwa;
176 u32 action, node_count;
177 int token, rc, i;
178 __be32 *data, phandle;
179 char *buf;
180
181 token = rtas_token("ibm,update-nodes");
182 if (token == RTAS_UNKNOWN_SERVICE)
183 return -EINVAL;
184
185 buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
186 if (!buf)
187 return -ENOMEM;
188
189 unwa = (struct update_nodes_workarea *)&buf[0];
190 unwa->unit_address = cpu_to_be64(adapter->guest->handle);
191 do {
192 rc = rcall(token, buf, scope);
193 if (rc && rc != 1)
194 break;
195
196 data = (__be32 *)buf + 4;
197 while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
198 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
199 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
200 pr_devel("device reconfiguration - action: %#x, nodes: %#x\n",
201 action, node_count);
202 data++;
203
204 for (i = 0; i < node_count; i++) {
205 phandle = *data++;
206
207 switch (action) {
208 case OPCODE_DELETE:
209 /* nothing to do */
210 break;
211 case OPCODE_UPDATE:
212 update_node(phandle, scope);
213 break;
214 case OPCODE_ADD:
215 /* nothing to do, just move pointer */
> 216 *data++;
217 break;
218 }
219 }
220 }
221 } while (rc == 1);
222
223 kfree(buf);
224 return 0;
225 }
226
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26292 bytes --]
^ permalink raw reply
* [PATCH 1/1] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
From: Mathieu Desnoyers @ 2020-07-08 17:54 UTC (permalink / raw)
To: Christophe Leroy
Cc: linux-kernel, # v2 . 6 . 28+, Mathieu Desnoyers, Paul Mackerras,
linuxppc-dev
The placeholder for instruction selection should use the second
argument's operand, which is %1, not %0. This could generate incorrect
assembly code if the instruction selection for argument %0 ever differs
from argument %1.
Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: <stable@vger.kernel.org> # v2.6.28+
---
arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +-
arch/powerpc/include/asm/nohash/pgtable.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 224912432821..f1467b3c417a 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -529,7 +529,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
- stw%U0%X0 %L2,%1"
+ stw%U1%X1 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..a00e4c1746d6 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -199,7 +199,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
- stw%U0%X0 %L2,%1"
+ stw%U1%X1 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
return;
--
2.11.0
^ permalink raw reply related
* Re: kernel since 5.6 do not boot anymore on Apple PowerBook
From: Giuseppe Sacco @ 2020-07-08 17:36 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <c2a89243-6135-4edd-2c1c-42c2159b5a1e@csgroup.eu>
Hi Cristophe,
Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
scritto:
> Hi
>
> Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :
> > Hello,
> > while trying to debug a problem using git bisect, I am now at a point
> > where I cannot build the kernel at all. This is the error message I
> > get:
> >
> > $ LANG=C make ARCH=powerpc \
> > CROSS_COMPILE=powerpc-linux- \
> > CONFIG_MODULE_COMPRESS_GZIP=true \
> > INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
> > -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
> > CONFIG_DEBUG_INFO_COMPRESSED=1 \
> > install modules_install
> > make[2]: *** No rule to make target 'vmlinux', needed by
>
> Surprising.
>
> Did you make any change to Makefiles ?
No
> Are you in the middle of a bisect ? If so, if the previous builds
> worked, I'd do 'git bisect skip'
Yes, the previous one worked.
> What's the result with:
>
> LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
$ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
CC kernel/module.o
kernel/module.c: In function 'do_init_module':
kernel/module.c:3593:2: error: implicit declaration of function
'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
function-declaration]
3593 | module_enable_ro(mod, true);
| ^~~~~~~~~~~~~~~~
| module_enable_x
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
make: *** [Makefile:1735: kernel] Error 2
So, should I 'git bisect skip'?
Thank you,
Giuseppe
^ permalink raw reply
* Re: kernel since 5.6 do not boot anymore on Apple PowerBook
From: Christophe Leroy @ 2020-07-08 17:09 UTC (permalink / raw)
To: Giuseppe Sacco, linuxppc-dev
In-Reply-To: <498426507489f2c8e32daaf7af1105b5adba552f.camel@sguazz.it>
Hi
Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :
> Hello,
> while trying to debug a problem using git bisect, I am now at a point
> where I cannot build the kernel at all. This is the error message I
> get:
>
> $ LANG=C make ARCH=powerpc \
> CROSS_COMPILE=powerpc-linux- \
> CONFIG_MODULE_COMPRESS_GZIP=true \
> INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
> -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
> CONFIG_DEBUG_INFO_COMPRESSED=1 \
> install modules_install
> make[2]: *** No rule to make target 'vmlinux', needed by
Surprising.
Did you make any change to Makefiles ?
Are you in the middle of a bisect ? If so, if the previous builds
worked, I'd do 'git bisect skip'
What's the result with:
LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
Christophe
> 'arch/powerpc/boot/zImage.pmac'. Stop.
> make[1]: *** [arch/powerpc/Makefile:407: install] Error 2
> make: *** [Makefile:328: __build_one_by_one] Error 2
>
> How can I continue?
>
> Thank you,
> Giuseppe
>
^ permalink raw reply
* Re: kernel since 5.6 do not boot anymore on Apple PowerBook
From: Giuseppe Sacco @ 2020-07-08 17:00 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <aab7a9fefe9ccfa272fbc45eeaa8228fced14d3b.camel@sguazz.it>
Hello,
while trying to debug a problem using git bisect, I am now at a point
where I cannot build the kernel at all. This is the error message I
get:
$ LANG=C make ARCH=powerpc \
CROSS_COMPILE=powerpc-linux- \
CONFIG_MODULE_COMPRESS_GZIP=true \
INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
-j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
CONFIG_DEBUG_INFO_COMPRESSED=1 \
install modules_install
make[2]: *** No rule to make target 'vmlinux', needed by
'arch/powerpc/boot/zImage.pmac'. Stop.
make[1]: *** [arch/powerpc/Makefile:407: install] Error 2
make: *** [Makefile:328: __build_one_by_one] Error 2
How can I continue?
Thank you,
Giuseppe
^ permalink raw reply
* Re: powerpc: Incorrect stw operand modifier in __set_pte_at
From: Christophe Leroy @ 2020-07-08 16:16 UTC (permalink / raw)
To: Mathieu Desnoyers, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Kumar Gala
Cc: linuxppc-dev
In-Reply-To: <873469922.2744.1594219513228.JavaMail.zimbra@efficios.com>
Le 08/07/2020 à 16:45, Mathieu Desnoyers a écrit :
> Hi,
>
> Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
> (where n should be the operand number) within the Linux kernel led
> me to spot those 2 weird cases:
>
> arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()
>
> __asm__ __volatile__("\
> stw%U0%X0 %2,%0\n\
> eieio\n\
> stw%U0%X0 %L2,%1"
> : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> : "r" (pte) : "memory");
>
> I would have expected the stw to be:
>
> stw%U1%X1 %L2,%1"
>
> and:
> arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()
>
> __asm__ __volatile__("\
> stw%U0%X0 %2,%0\n\
> eieio\n\
> stw%U0%X0 %L2,%1"
> : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> : "r" (pte) : "memory");
>
> where I would have expected:
>
> stw%U1%X1 %L2,%1"
>
> Is it a bug or am I missing something ?
Well spotted. I guess it's definitly a bug.
Introduced 12 years ago by commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bf2b5cd
("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support").
It's gone unnoticed until now it seems.
Can you submit a patch for it ?
Christophe
^ permalink raw reply
* Re: Failure to build librseq on ppc
From: Christophe Leroy @ 2020-07-08 16:11 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Boqun Feng, linuxppc-dev, Michael Jeanson
In-Reply-To: <1137155888.2676.1594218740683.JavaMail.zimbra@efficios.com>
Le 08/07/2020 à 16:32, Mathieu Desnoyers a écrit :
> ----- On Jul 8, 2020, at 10:21 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
>
>> Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :
>>> ----- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool segher@kernel.crashing.org
>>>> wrote:
>>> [...]
>>>>>
>>>>> So perhaps you have code like
>>>>>
>>>>> int *p;
>>>>> int x;
>>>>> ...
>>>>> asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
>>>>
>>>> We indeed have explicit "lwz" and "stw" instructions in there.
>>>>
>>>>>
>>>>> where that last line should actually read
>>>>>
>>>>> asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
>>>>
>>>> Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
>>>>
>>>> There has been some level of extra CPP macro coating around those instructions
>>>> to
>>>> support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
>>>> trivial.
>>>> Let me see what can be done here.
>>>
>>> I did the following changes which appear to generate valid asm.
>>> See attached corresponding .S output.
>>>
>>> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's
>>> pretty much
>>> always used with e.g. "lwz%U1%X1". I could find one blog post discussing that %U
>>> is about
>>> update flag, and nothing about %X. Are those documented ?
>>
>> As far as I can see, %U is mentioned in
>> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
>> powerpc subpart, at the "m" constraint.
>
> Yep, I did notice it, but mistakenly thought it was only needed for "m<>" operand,
> not "m".
You are right, AFAIU on recent versions of GCC, %U has no effect without m<>
Christophe
>
> Thanks,
>
> Mathieu
>
>>
>> For the %X I don't know.
>>
>> Christophe
>>
>>>
>>> Although it appears to generate valid asm, I have the feeling I'm relying on
>>> undocumented
>>> features here. :-/
>
^ permalink raw reply
* [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
In-Reply-To: <20200708152449.316476-1-hch@lst.de>
Use the DMA API bypass mechanism for direct window mappings. This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all. It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.
Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.
Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/device.h | 5 --
arch/powerpc/kernel/dma-iommu.c | 90 ++++---------------------------
3 files changed, 10 insertions(+), 86 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9b091d3587222..be868bfbe76ecf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -152,6 +152,7 @@ config PPC
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select DMA_OPS if PPC64
+ select DMA_OPS_BYPASS if PPC64
select DYNAMIC_FTRACE if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 266542769e4bd1..452402215e1210 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
* drivers/macintosh/macio_asic.c
*/
struct dev_archdata {
- /*
- * Set to %true if the dma_iommu_ops are requested to use a direct
- * window instead of dynamically mapping memory.
- */
- bool iommu_bypass : 1;
/*
* These two used to be a union. However, with the hybrid ops we need
* both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de288..569fecd7b5b234 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
* Generic iommu implementation
*/
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
- return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
- dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
- unsigned long attrs)
-{
- return dev->archdata.iommu_bypass &&
- (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
/* Allocates a contiguous real buffer and creates mappings over it.
* Returns the virtual address of the buffer and sets dma_handle
* to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
unsigned long attrs)
{
- if (dma_iommu_alloc_bypass(dev))
- return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
{
- if (dma_iommu_alloc_bypass(dev))
- dma_direct_free(dev, size, vaddr, dma_handle, attrs);
- else
- iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
- dma_handle);
+ iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
}
/* Creates TCEs for a user provided buffer. The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction direction,
unsigned long attrs)
{
- if (dma_iommu_map_bypass(dev, attrs))
- return dma_direct_map_page(dev, page, offset, size, direction,
- attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
size, dma_get_mask(dev), direction, attrs);
}
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction direction,
unsigned long attrs)
{
- if (!dma_iommu_map_bypass(dev, attrs))
- iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
- direction, attrs);
- else
- dma_direct_unmap_page(dev, dma_handle, size, direction, attrs);
+ iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
+ attrs);
}
@@ -91,8 +62,6 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction,
unsigned long attrs)
{
- if (dma_iommu_map_bypass(dev, attrs))
- return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
dma_get_mask(dev), direction, attrs);
}
@@ -101,11 +70,8 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction,
unsigned long attrs)
{
- if (!dma_iommu_map_bypass(dev, attrs))
- ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
+ ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
direction, attrs);
- else
- dma_direct_unmap_sg(dev, sglist, nelems, direction, attrs);
}
static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
@@ -113,8 +79,9 @@ static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_controller *phb = pci_bus_to_host(pdev->bus);
- return phb->controller_ops.iommu_bypass_supported &&
- phb->controller_ops.iommu_bypass_supported(pdev, mask);
+ if (iommu_fixed_is_weak || !phb->controller_ops.iommu_bypass_supported)
+ return false;
+ return phb->controller_ops.iommu_bypass_supported(pdev, mask);
}
/* We support DMA to/from any memory page via the iommu */
@@ -123,7 +90,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
struct iommu_table *tbl = get_iommu_table_base(dev);
if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
- dev->archdata.iommu_bypass = true;
+ dev->dma_ops_bypass = true;
dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
return 1;
}
@@ -141,7 +108,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
}
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
- dev->archdata.iommu_bypass = false;
+ dev->dma_ops_bypass = false;
return 1;
}
@@ -153,47 +120,12 @@ u64 dma_iommu_get_required_mask(struct device *dev)
if (!tbl)
return 0;
- if (dev_is_pci(dev)) {
- u64 bypass_mask = dma_direct_get_required_mask(dev);
-
- if (dma_iommu_bypass_supported(dev, bypass_mask))
- return bypass_mask;
- }
-
mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
mask += mask - 1;
return mask;
}
-static void dma_iommu_sync_for_cpu(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir)
-{
- if (dma_iommu_alloc_bypass(dev))
- dma_direct_sync_single_for_cpu(dev, addr, size, dir);
-}
-
-static void dma_iommu_sync_for_device(struct device *dev, dma_addr_t addr,
- size_t sz, enum dma_data_direction dir)
-{
- if (dma_iommu_alloc_bypass(dev))
- dma_direct_sync_single_for_device(dev, addr, sz, dir);
-}
-
-extern void dma_iommu_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
- if (dma_iommu_alloc_bypass(dev))
- dma_direct_sync_sg_for_cpu(dev, sgl, nents, dir);
-}
-
-extern void dma_iommu_sync_sg_for_device(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
- if (dma_iommu_alloc_bypass(dev))
- dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
-}
-
const struct dma_map_ops dma_iommu_ops = {
.alloc = dma_iommu_alloc_coherent,
.free = dma_iommu_free_coherent,
@@ -203,10 +135,6 @@ const struct dma_map_ops dma_iommu_ops = {
.map_page = dma_iommu_map_page,
.unmap_page = dma_iommu_unmap_page,
.get_required_mask = dma_iommu_get_required_mask,
- .sync_single_for_cpu = dma_iommu_sync_for_cpu,
- .sync_single_for_device = dma_iommu_sync_for_device,
- .sync_sg_for_cpu = dma_iommu_sync_sg_for_cpu,
- .sync_sg_for_device = dma_iommu_sync_sg_for_device,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
};
--
2.26.2
^ permalink raw reply related
* [PATCH 4/5] dma-mapping: add a dma_ops_bypass flag to struct device
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
In-Reply-To: <20200708152449.316476-1-hch@lst.de>
Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough. Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/device.h | 8 +++++
kernel/dma/Kconfig | 8 +++++
kernel/dma/mapping.c | 74 +++++++++++++++++++++++++++++-------------
3 files changed, 68 insertions(+), 22 deletions(-)
diff --git a/include/linux/device.h b/include/linux/device.h
index 4c4af98321ebd6..1f71acf37f78d7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -523,6 +523,11 @@ struct dev_links_info {
* sync_state() callback.
* @dma_coherent: this particular device is dma coherent, even if the
* architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ * streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ * and optionall (if the coherent mask is large enough) also
+ * for dma allocations. This flag is managed by the dma ops
+ * instance from ->dma_supported.
*
* At the lowest level, every device in a Linux system is represented by an
* instance of struct device. The device structure contains the information
@@ -623,6 +628,9 @@ struct device {
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
bool dma_coherent:1;
#endif
+#ifdef CONFIG_DMA_OPS_BYPASS
+ bool dma_ops_bypass : 1;
+#endif
};
static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 5cfb2428593ac7..f4770fcfa62bb3 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -8,6 +8,14 @@ config HAS_DMA
config DMA_OPS
bool
+#
+# IOMMU drivers that can bypass the IOMMU code and optionally use the direct
+# mapping fast path should select this option and set the dma_ops_bypass
+# flag in struct device where applicable
+#
+config DMA_OPS_BYPASS
+ bool
+
config NEED_SG_DMA_LENGTH
bool
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b53953024512fe..0d129421e75fc8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,9 +105,35 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
}
EXPORT_SYMBOL(dmam_alloc_attrs);
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+static bool dma_go_direct(struct device *dev, dma_addr_t mask,
+ const struct dma_map_ops *ops)
{
- return likely(!ops);
+ if (likely(!ops))
+ return true;
+#ifdef CONFIG_DMA_OPS_BYPASS
+ if (dev->dma_ops_bypass)
+ return min_not_zero(mask, dev->bus_dma_limit) >=
+ dma_direct_get_required_mask(dev);
+#endif
+ return false;
+}
+
+
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+ const struct dma_map_ops *ops)
+{
+ return dma_go_direct(dev, dev->coherent_dma_mask, ops);
+}
+
+static inline bool dma_map_direct(struct device *dev,
+ const struct dma_map_ops *ops)
+{
+ return dma_go_direct(dev, *dev->dma_mask, ops);
}
dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
@@ -118,7 +144,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
dma_addr_t addr;
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -134,7 +160,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -153,7 +179,7 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
int ents;
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -172,7 +198,7 @@ void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
else if (ops->unmap_sg)
ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -191,7 +217,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
else if (ops->map_resource)
addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -207,7 +233,7 @@ void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (!dma_is_direct(ops) && ops->unmap_resource)
+ if (!dma_map_direct(dev, ops) && ops->unmap_resource)
ops->unmap_resource(dev, addr, size, dir, attrs);
debug_dma_unmap_resource(dev, addr, size, dir);
}
@@ -219,7 +245,7 @@ void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
else if (ops->sync_single_for_cpu)
ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -233,7 +259,7 @@ void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_sync_single_for_device(dev, addr, size, dir);
else if (ops->sync_single_for_device)
ops->sync_single_for_device(dev, addr, size, dir);
@@ -247,7 +273,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
else if (ops->sync_sg_for_cpu)
ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -261,7 +287,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
else if (ops->sync_sg_for_device)
ops->sync_sg_for_device(dev, sg, nelems, dir);
@@ -302,7 +328,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
size, attrs);
if (!ops->get_sgtable)
@@ -372,7 +398,7 @@ bool dma_can_mmap(struct device *dev)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
return dma_direct_can_mmap(dev);
return ops->mmap != NULL;
}
@@ -397,7 +423,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
attrs);
if (!ops->mmap)
@@ -410,7 +436,7 @@ u64 dma_get_required_mask(struct device *dev)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
return dma_direct_get_required_mask(dev);
if (ops->get_required_mask)
return ops->get_required_mask(dev);
@@ -441,7 +467,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
/* let the implementation decide on the zone to allocate from: */
flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
else if (ops->alloc)
cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -473,7 +499,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
return;
debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
else if (ops->free)
ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -484,7 +510,11 @@ int dma_supported(struct device *dev, u64 mask)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ /*
+ * ->dma_supported sets the bypass flag, so we must always call
+ * into the method here unless the device is truly direct mapped.
+ */
+ if (!ops)
return dma_direct_supported(dev, mask);
if (!ops->dma_supported)
return 1;
@@ -540,7 +570,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
+ if (dma_alloc_direct(dev, ops))
arch_dma_cache_sync(dev, vaddr, size, dir);
else if (ops->cache_sync)
ops->cache_sync(dev, vaddr, size, dir);
@@ -552,7 +582,7 @@ size_t dma_max_mapping_size(struct device *dev)
const struct dma_map_ops *ops = get_dma_ops(dev);
size_t size = SIZE_MAX;
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
size = dma_direct_max_mapping_size(dev);
else if (ops && ops->max_mapping_size)
size = ops->max_mapping_size(dev);
@@ -565,7 +595,7 @@ bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops))
+ if (dma_map_direct(dev, ops))
return dma_direct_need_sync(dev, dma_addr);
return ops->sync_single_for_cpu || ops->sync_single_for_device;
}
--
2.26.2
^ permalink raw reply related
* [PATCH 3/5] dma-mapping: make support for dma ops optional
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
In-Reply-To: <20200708152449.316476-1-hch@lst.de>
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/alpha/Kconfig | 1 +
arch/arm/Kconfig | 1 +
arch/ia64/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/parisc/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
drivers/iommu/Kconfig | 2 ++
drivers/misc/mic/Kconfig | 1 +
drivers/vdpa/Kconfig | 1 +
drivers/xen/Kconfig | 1 +
include/linux/device.h | 3 ++-
include/linux/dma-mapping.h | 12 +++++++++++-
kernel/dma/Kconfig | 4 ++++
kernel/dma/Makefile | 3 ++-
17 files changed, 33 insertions(+), 3 deletions(-)
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 10862c5a8c7682..9c5f06e8eb9bc0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -7,6 +7,7 @@ config ALPHA
select ARCH_NO_PREEMPT
select ARCH_NO_SG_CHAIN
select ARCH_USE_CMPXCHG_LOCKREF
+ select DMA_OPS if PCI
select FORCE_PCI if !ALPHA_JENSEN
select PCI_DOMAINS if PCI
select PCI_SYSCALL if PCI
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2ac74904a3ce58..bee35b0187e452 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -41,6 +41,7 @@ config ARM
select CPU_PM if SUSPEND || CPU_IDLE
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
select DMA_DECLARE_COHERENT
+ select DMA_OPS
select DMA_REMAP if MMU
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 1fa2fe2ef053f8..5b4ec80bf5863a 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -192,6 +192,7 @@ config IA64_SGI_UV
config IA64_HP_SBA_IOMMU
bool "HP SBA IOMMU support"
+ select DMA_OPS
default y
help
Say Y here to add support for the SBA IOMMU found on HP zx1 and
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6fee1a133e9d6a..8a458105e445b6 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -367,6 +367,7 @@ config MACH_JAZZ
select ARC_PROMLIB
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+ select DMA_OPS
select FW_ARC
select FW_ARC32
select ARCH_MAY_HAVE_PC_FDC
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 8e4c3708773d08..38c1eafc1f1ae9 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -14,6 +14,7 @@ config PARISC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_NO_SG_CHAIN
select ARCH_SUPPORTS_MEMORY_FAILURE
+ select DMA_OPS
select RTC_CLASS
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff5a..e9b091d3587222 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -151,6 +151,7 @@ config PPC
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
+ select DMA_OPS if PPC64
select DYNAMIC_FTRACE if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c7d7ede6300c59..687fe23f61cc8d 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -113,6 +113,7 @@ config S390
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
+ select DMA_OPS if PCI
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_AUTOPROBE
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5bf2dc163540fc..5db1faaaee31c8 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -15,6 +15,7 @@ config SPARC
default y
select ARCH_MIGHT_HAVE_PC_PARPORT if SPARC64 && PCI
select ARCH_MIGHT_HAVE_PC_SERIO
+ select DMA_OPS
select OF
select OF_PROMTREE
select HAVE_ASM_MODVERSIONS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 883da0abf7790c..96ab92754158dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -909,6 +909,7 @@ config DMI
config GART_IOMMU
bool "Old AMD GART IOMMU support"
+ select DMA_OPS
select IOMMU_HELPER
select SWIOTLB
depends on X86_64 && PCI && AMD_NB
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a5c..d6ce878a7e8684 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -97,6 +97,7 @@ config OF_IOMMU
# IOMMU-agnostic DMA-mapping layer
config IOMMU_DMA
bool
+ select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
select IRQ_MSI_IOMMU
@@ -183,6 +184,7 @@ config DMAR_TABLE
config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
depends on PCI_MSI && ACPI && (X86 || IA64)
+ select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
select NEED_DMA_MAP_STATE
diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 8f201d019f5a4d..a9ec0b25ac40c9 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -49,6 +49,7 @@ config INTEL_MIC_HOST
tristate "Intel MIC Host Driver"
depends on 64BIT && PCI && X86
depends on INTEL_MIC_BUS && SCIF_BUS && MIC_COSM && VOP_BUS
+ select DMA_OPS
help
This enables Host Driver support for the Intel Many Integrated
Core (MIC) family of PCIe form factor coprocessor devices that
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3e1ceb8e9f2b52..d93a69b12f81e3 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -11,6 +11,7 @@ if VDPA
config VDPA_SIM
tristate "vDPA device simulator"
depends on RUNTIME_TESTING_MENU && HAS_DMA
+ select DMA_OPS
select VHOST_RING
default n
help
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 727f11eb46b2bf..1d339ef924228c 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -179,6 +179,7 @@ config XEN_GRANT_DMA_ALLOC
config SWIOTLB_XEN
def_bool y
+ select DMA_OPS
select SWIOTLB
config XEN_PCIDEV_BACKEND
diff --git a/include/linux/device.h b/include/linux/device.h
index 15460a5ac024a1..4c4af98321ebd6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -568,8 +568,9 @@ struct device {
#ifdef CONFIG_GENERIC_MSI_IRQ
struct list_head msi_list;
#endif
-
+#ifdef CONFIG_DMA_OPS
const struct dma_map_ops *dma_ops;
+#endif
u64 *dma_mask; /* dma mask (if dma'able device) */
u64 coherent_dma_mask;/* Like dma_mask, but for
alloc_coherent mappings as
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bd0a6f5ee44581..39da883c861954 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -191,6 +191,7 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
#ifdef CONFIG_HAS_DMA
#include <asm/dma-mapping.h>
+#ifdef CONFIG_DMA_OPS
static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
{
if (dev->dma_ops)
@@ -203,7 +204,16 @@ static inline void set_dma_ops(struct device *dev,
{
dev->dma_ops = dma_ops;
}
-
+#else /* CONFIG_DMA_OPS */
+static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+ return NULL;
+}
+static inline void set_dma_ops(struct device *dev,
+ const struct dma_map_ops *dma_ops)
+{
+}
+#endif /* CONFIG_DMA_OPS */
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 1da3f44f2565b4..5cfb2428593ac7 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -5,6 +5,9 @@ config HAS_DMA
depends on !NO_DMA
default y
+config DMA_OPS
+ bool
+
config NEED_SG_DMA_LENGTH
bool
@@ -60,6 +63,7 @@ config DMA_NONCOHERENT_CACHE_SYNC
config DMA_VIRT_OPS
bool
depends on HAS_DMA
+ select DMA_OPS
config SWIOTLB
bool
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index 370f63344e9cd9..32c7c1942bbd6c 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_HAS_DMA) += mapping.o direct.o dummy.o
+obj-$(CONFIG_HAS_DMA) += mapping.o direct.o
+obj-$(CONFIG_DMA_OPS) += dummy.o
obj-$(CONFIG_DMA_CMA) += contiguous.o
obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o
obj-$(CONFIG_DMA_VIRT_OPS) += virt.o
--
2.26.2
^ permalink raw reply related
* [PATCH 2/5] dma-mapping: inline the fast path dma-direct calls
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
In-Reply-To: <20200708152449.316476-1-hch@lst.de>
Inline the single page map/unmap/sync dma-direct calls into the now
out of line generic wrappers. This restores the behavior of a single
function call that we had before moving the generic calls out of line.
Besides the dma-mapping callers there are just a few callers in IOMMU
drivers that have a bypass mode, and more of those are going to be
switched to the generic bypass soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/dma-direct.h | 92 ++++++++++++++++++++++++++++----------
kernel/dma/direct.c | 65 ---------------------------
2 files changed, 69 insertions(+), 88 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 78dc3524adf880..dbb19dd9869054 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -1,10 +1,16 @@
/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Internals of the DMA direct mapping implementation. Only for use by the
+ * DMA mapping code and IOMMU drivers.
+ */
#ifndef _LINUX_DMA_DIRECT_H
#define _LINUX_DMA_DIRECT_H 1
#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/memblock.h> /* for min_low_pfn */
#include <linux/mem_encrypt.h>
+#include <linux/swiotlb.h>
extern unsigned int zone_dma_bits;
@@ -86,25 +92,17 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
unsigned long attrs);
int dma_direct_supported(struct device *dev, u64 mask);
bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs);
dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir, unsigned long attrs);
+size_t dma_direct_max_mapping_size(struct device *dev);
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir);
#else
-static inline void dma_direct_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
-}
static inline void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
{
@@ -114,34 +112,82 @@ static inline void dma_direct_sync_sg_for_device(struct device *dev,
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
defined(CONFIG_SWIOTLB)
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_cpu(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir);
void dma_direct_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir);
#else
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-}
static inline void dma_direct_unmap_sg(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir,
unsigned long attrs)
{
}
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+ phys_addr_t paddr = dma_to_phys(dev, addr);
+
+ if (unlikely(is_swiotlb_buffer(paddr)))
+ swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
+
+ if (!dev_is_dma_coherent(dev))
+ arch_sync_dma_for_device(paddr, size, dir);
+}
+
static inline void dma_direct_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size, enum dma_data_direction dir)
{
+ phys_addr_t paddr = dma_to_phys(dev, addr);
+
+ if (!dev_is_dma_coherent(dev)) {
+ arch_sync_dma_for_cpu(paddr, size, dir);
+ arch_sync_dma_for_cpu_all();
+ }
+
+ if (unlikely(is_swiotlb_buffer(paddr)))
+ swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
}
-static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+
+static inline dma_addr_t dma_direct_map_page(struct device *dev,
+ struct page *page, unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
+ phys_addr_t phys = page_to_phys(page) + offset;
+ dma_addr_t dma_addr = phys_to_dma(dev, phys);
+
+ if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+ return swiotlb_map(dev, phys, size, dir, attrs);
+
+ if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+ if (swiotlb_force != SWIOTLB_NO_FORCE)
+ return swiotlb_map(dev, phys, size, dir, attrs);
+
+ dev_WARN_ONCE(dev, 1,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ return DMA_MAPPING_ERROR;
+ }
+
+ if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ arch_sync_dma_for_device(phys, size, dir);
+ return dma_addr;
}
-#endif
-size_t dma_direct_max_mapping_size(struct device *dev);
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ phys_addr_t phys = dma_to_phys(dev, addr);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ dma_direct_sync_single_for_cpu(dev, addr, size, dir);
+
+ if (unlikely(is_swiotlb_buffer(phys)))
+ swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
+}
#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6d1975c4a26873..3078e36941e6d4 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -10,11 +10,9 @@
#include <linux/dma-direct.h>
#include <linux/scatterlist.h>
#include <linux/dma-contiguous.h>
-#include <linux/dma-noncoherent.h>
#include <linux/pfn.h>
#include <linux/vmalloc.h>
#include <linux/set_memory.h>
-#include <linux/swiotlb.h>
/*
* Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
@@ -304,18 +302,6 @@ void dma_direct_free(struct device *dev, size_t size,
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
- phys_addr_t paddr = dma_to_phys(dev, addr);
-
- if (unlikely(is_swiotlb_buffer(paddr)))
- swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
-
- if (!dev_is_dma_coherent(dev))
- arch_sync_dma_for_device(paddr, size, dir);
-}
-
void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
{
@@ -339,20 +325,6 @@ void dma_direct_sync_sg_for_device(struct device *dev,
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_cpu(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
- phys_addr_t paddr = dma_to_phys(dev, addr);
-
- if (!dev_is_dma_coherent(dev)) {
- arch_sync_dma_for_cpu(paddr, size, dir);
- arch_sync_dma_for_cpu_all();
- }
-
- if (unlikely(is_swiotlb_buffer(paddr)))
- swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
-}
-
void dma_direct_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
{
@@ -374,18 +346,6 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
}
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- phys_addr_t phys = dma_to_phys(dev, addr);
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- dma_direct_sync_single_for_cpu(dev, addr, size, dir);
-
- if (unlikely(is_swiotlb_buffer(phys)))
- swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
-}
-
void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs)
{
@@ -398,31 +358,6 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
}
#endif
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- phys_addr_t phys = page_to_phys(page) + offset;
- dma_addr_t dma_addr = phys_to_dma(dev, phys);
-
- if (unlikely(swiotlb_force == SWIOTLB_FORCE))
- return swiotlb_map(dev, phys, size, dir, attrs);
-
- if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
- if (swiotlb_force != SWIOTLB_NO_FORCE)
- return swiotlb_map(dev, phys, size, dir, attrs);
-
- dev_WARN_ONCE(dev, 1,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- return DMA_MAPPING_ERROR;
- }
-
- if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- arch_sync_dma_for_device(phys, size, dir);
- return dma_addr;
-}
-
int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
{
--
2.26.2
^ permalink raw reply related
* [PATCH 1/5] dma-mapping: move the remaining DMA API calls out of line
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
In-Reply-To: <20200708152449.316476-1-hch@lst.de>
For a long time the DMA API has been implemented inline in dma-mapping.h,
but the function bodies can be quite large. Move them all out of line.
This also removes all the dma_direct_* exports as those are just
implementation details and should never be used by drivers directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/dma-direct.h | 58 +++++++++
include/linux/dma-mapping.h | 247 ++++--------------------------------
kernel/dma/direct.c | 9 --
kernel/dma/mapping.c | 164 ++++++++++++++++++++++++
4 files changed, 244 insertions(+), 234 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5184735a0fe8eb..78dc3524adf880 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -86,4 +86,62 @@ int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
unsigned long attrs);
int dma_direct_supported(struct device *dev, u64 mask);
bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+ enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs);
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+ defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+#else
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_device(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+ defined(CONFIG_SWIOTLB)
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+#else
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void dma_direct_unmap_sg(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+}
+static inline void dma_direct_sync_single_for_cpu(struct device *dev,
+ dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+ struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+size_t dma_direct_max_mapping_size(struct device *dev);
+
#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a33ed3954ed465..bd0a6f5ee44581 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,73 +188,6 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
}
#endif /* CONFIG_DMA_DECLARE_COHERENT */
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
-{
- return likely(!ops);
-}
-
-/*
- * All the dma_direct_* declarations are here just for the indirect call bypass,
- * and must not be used directly drivers!
- */
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
- enum dma_data_direction dir, unsigned long attrs);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
-
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
- defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir);
-#else
-static inline void dma_direct_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
-}
-static inline void dma_direct_sync_sg_for_device(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-}
-#endif
-
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
- defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
- defined(CONFIG_SWIOTLB)
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
- int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_cpu(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir);
-#else
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-}
-static inline void dma_direct_unmap_sg(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
-}
-static inline void dma_direct_sync_single_for_cpu(struct device *dev,
- dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
-}
-static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
- struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-}
-#endif
-
-size_t dma_direct_max_mapping_size(struct device *dev);
-
#ifdef CONFIG_HAS_DMA
#include <asm/dma-mapping.h>
@@ -271,164 +204,6 @@ static inline void set_dma_ops(struct device *dev,
dev->dma_ops = dma_ops;
}
-static inline dma_addr_t dma_map_page_attrs(struct device *dev,
- struct page *page, size_t offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- dma_addr_t addr;
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
- else
- addr = ops->map_page(dev, page, offset, size, dir, attrs);
- debug_dma_map_page(dev, page, offset, size, dir, addr);
-
- return addr;
-}
-
-static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- dma_direct_unmap_page(dev, addr, size, dir, attrs);
- else if (ops->unmap_page)
- ops->unmap_page(dev, addr, size, dir, attrs);
- debug_dma_unmap_page(dev, addr, size, dir);
-}
-
-/*
- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
- */
-static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- int ents;
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
- else
- ents = ops->map_sg(dev, sg, nents, dir, attrs);
- BUG_ON(ents < 0);
- debug_dma_map_sg(dev, sg, nents, ents, dir);
-
- return ents;
-}
-
-static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
- int nents, enum dma_data_direction dir,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- debug_dma_unmap_sg(dev, sg, nents, dir);
- if (dma_is_direct(ops))
- dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
- else if (ops->unmap_sg)
- ops->unmap_sg(dev, sg, nents, dir, attrs);
-}
-
-static inline dma_addr_t dma_map_resource(struct device *dev,
- phys_addr_t phys_addr,
- size_t size,
- enum dma_data_direction dir,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- dma_addr_t addr = DMA_MAPPING_ERROR;
-
- BUG_ON(!valid_dma_direction(dir));
-
- /* Don't allow RAM to be mapped */
- if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
- return DMA_MAPPING_ERROR;
-
- if (dma_is_direct(ops))
- addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
- else if (ops->map_resource)
- addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
- debug_dma_map_resource(dev, phys_addr, size, dir, addr);
- return addr;
-}
-
-static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
- size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (!dma_is_direct(ops) && ops->unmap_resource)
- ops->unmap_resource(dev, addr, size, dir, attrs);
- debug_dma_unmap_resource(dev, addr, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
- size_t size,
- enum dma_data_direction dir)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- dma_direct_sync_single_for_cpu(dev, addr, size, dir);
- else if (ops->sync_single_for_cpu)
- ops->sync_single_for_cpu(dev, addr, size, dir);
- debug_dma_sync_single_for_cpu(dev, addr, size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
- dma_addr_t addr, size_t size,
- enum dma_data_direction dir)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- dma_direct_sync_single_for_device(dev, addr, size, dir);
- else if (ops->sync_single_for_device)
- ops->sync_single_for_device(dev, addr, size, dir);
- debug_dma_sync_single_for_device(dev, addr, size, dir);
-}
-
-static inline void
-dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
- int nelems, enum dma_data_direction dir)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
- else if (ops->sync_sg_for_cpu)
- ops->sync_sg_for_cpu(dev, sg, nelems, dir);
- debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
-}
-
-static inline void
-dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
- int nelems, enum dma_data_direction dir)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_is_direct(ops))
- dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
- else if (ops->sync_sg_for_device)
- ops->sync_sg_for_device(dev, sg, nelems, dir);
- debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
-
-}
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
@@ -439,6 +214,28 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
return 0;
}
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
+ int nents, enum dma_data_direction dir,
+ unsigned long attrs);
+dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir);
+void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir);
+void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+ int nelems, enum dma_data_direction dir);
+void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+ int nelems, enum dma_data_direction dir);
void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, unsigned long attrs);
void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 95866b64758100..6d1975c4a26873 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -315,7 +315,6 @@ void dma_direct_sync_single_for_device(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_device(paddr, size, dir);
}
-EXPORT_SYMBOL(dma_direct_sync_single_for_device);
void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
@@ -335,7 +334,6 @@ void dma_direct_sync_sg_for_device(struct device *dev,
dir);
}
}
-EXPORT_SYMBOL(dma_direct_sync_sg_for_device);
#endif
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
@@ -354,7 +352,6 @@ void dma_direct_sync_single_for_cpu(struct device *dev,
if (unlikely(is_swiotlb_buffer(paddr)))
swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
}
-EXPORT_SYMBOL(dma_direct_sync_single_for_cpu);
void dma_direct_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
@@ -376,7 +373,6 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu_all();
}
-EXPORT_SYMBOL(dma_direct_sync_sg_for_cpu);
void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
@@ -389,7 +385,6 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
if (unlikely(is_swiotlb_buffer(phys)))
swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
}
-EXPORT_SYMBOL(dma_direct_unmap_page);
void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs)
@@ -401,7 +396,6 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
attrs);
}
-EXPORT_SYMBOL(dma_direct_unmap_sg);
#endif
dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
@@ -428,7 +422,6 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
arch_sync_dma_for_device(phys, size, dir);
return dma_addr;
}
-EXPORT_SYMBOL(dma_direct_map_page);
int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
@@ -450,7 +443,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
return 0;
}
-EXPORT_SYMBOL(dma_direct_map_sg);
dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
@@ -467,7 +459,6 @@ dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
return dma_addr;
}
-EXPORT_SYMBOL(dma_direct_map_resource);
int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index a8c18c9a796fdc..b53953024512fe 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,170 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
}
EXPORT_SYMBOL(dmam_alloc_attrs);
+static inline bool dma_is_direct(const struct dma_map_ops *ops)
+{
+ return likely(!ops);
+}
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ dma_addr_t addr;
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ else
+ addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ debug_dma_map_page(dev, page, offset, size, dir, addr);
+
+ return addr;
+}
+EXPORT_SYMBOL(dma_map_page_attrs);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ dma_direct_unmap_page(dev, addr, size, dir, attrs);
+ else if (ops->unmap_page)
+ ops->unmap_page(dev, addr, size, dir, attrs);
+ debug_dma_unmap_page(dev, addr, size, dir);
+}
+EXPORT_SYMBOL(dma_unmap_page_attrs);
+
+/*
+ * dma_maps_sg_attrs returns 0 on error and > 0 on success.
+ * It should never return a value < 0.
+ */
+int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ int ents;
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
+ else
+ ents = ops->map_sg(dev, sg, nents, dir, attrs);
+ BUG_ON(ents < 0);
+ debug_dma_map_sg(dev, sg, nents, ents, dir);
+
+ return ents;
+}
+EXPORT_SYMBOL(dma_map_sg_attrs);
+
+void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
+ int nents, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ debug_dma_unmap_sg(dev, sg, nents, dir);
+ if (dma_is_direct(ops))
+ dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
+ else if (ops->unmap_sg)
+ ops->unmap_sg(dev, sg, nents, dir, attrs);
+}
+EXPORT_SYMBOL(dma_unmap_sg_attrs);
+
+dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ dma_addr_t addr = DMA_MAPPING_ERROR;
+
+ BUG_ON(!valid_dma_direction(dir));
+
+ /* Don't allow RAM to be mapped */
+ if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
+ return DMA_MAPPING_ERROR;
+
+ if (dma_is_direct(ops))
+ addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
+ else if (ops->map_resource)
+ addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
+
+ debug_dma_map_resource(dev, phys_addr, size, dir, addr);
+ return addr;
+}
+EXPORT_SYMBOL(dma_map_resource);
+
+void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (!dma_is_direct(ops) && ops->unmap_resource)
+ ops->unmap_resource(dev, addr, size, dir, attrs);
+ debug_dma_unmap_resource(dev, addr, size, dir);
+}
+EXPORT_SYMBOL(dma_unmap_resource);
+
+void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ dma_direct_sync_single_for_cpu(dev, addr, size, dir);
+ else if (ops->sync_single_for_cpu)
+ ops->sync_single_for_cpu(dev, addr, size, dir);
+ debug_dma_sync_single_for_cpu(dev, addr, size, dir);
+}
+EXPORT_SYMBOL(dma_sync_single_for_cpu);
+
+void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ dma_direct_sync_single_for_device(dev, addr, size, dir);
+ else if (ops->sync_single_for_device)
+ ops->sync_single_for_device(dev, addr, size, dir);
+ debug_dma_sync_single_for_device(dev, addr, size, dir);
+}
+EXPORT_SYMBOL(dma_sync_single_for_device);
+
+void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+ int nelems, enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
+ else if (ops->sync_sg_for_cpu)
+ ops->sync_sg_for_cpu(dev, sg, nelems, dir);
+ debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
+}
+EXPORT_SYMBOL(dma_sync_sg_for_cpu);
+
+void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+ int nelems, enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!valid_dma_direction(dir));
+ if (dma_is_direct(ops))
+ dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
+ else if (ops->sync_sg_for_device)
+ ops->sync_sg_for_device(dev, sg, nelems, dir);
+ debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
+}
+EXPORT_SYMBOL(dma_sync_sg_for_device);
+
/*
* Create scatter-list for the already allocated DMA buffer.
*/
--
2.26.2
^ permalink raw reply related
* generic DMA bypass flag v4
From: Christoph Hellwig @ 2020-07-08 15:24 UTC (permalink / raw)
To: iommu, Alexey Kardashevskiy
Cc: Björn Töpel, Daniel Borkmann, Greg Kroah-Hartman,
Joerg Roedel, Robin Murphy, linux-kernel, Jesper Dangaard Brouer,
linuxppc-dev, Lu Baolu
Hi all,
I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance. The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code. As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.
These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.
The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.
Note that as-is this breaks the XSK buffer pool, which unfortunately
poked directly into DMA internals. A fix for that is already queued
up in the netdev tree.
Jesper and XDP gang: this should not regress any performance as
the dma-direct calls are now inlined into the out of line DMA mapping
calls. But if you can verify the performance numbers that would be
greatly appreciated.
A git tree is available here:
git://git.infradead.org/users/hch/misc.git dma-bypass.4
Gitweb:
git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.4
Changes since v3:
- add config options for the dma ops bypass and dma ops themselves
to not increase the size of tinyconfig builds
Changes since v2:
- move the dma mapping helpers out of line
- check for possible direct mappings using the dma mask
Changes since v1:
- rebased to the current dma-mapping-for-next tree
Diffstat:
arch/alpha/Kconfig | 1
arch/arm/Kconfig | 1
arch/ia64/Kconfig | 1
arch/mips/Kconfig | 1
arch/parisc/Kconfig | 1
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/device.h | 5
arch/powerpc/kernel/dma-iommu.c | 90 +------------
arch/s390/Kconfig | 1
arch/sparc/Kconfig | 1
arch/x86/Kconfig | 1
drivers/iommu/Kconfig | 2
drivers/misc/mic/Kconfig | 1
drivers/vdpa/Kconfig | 1
drivers/xen/Kconfig | 1
include/linux/device.h | 11 +
include/linux/dma-direct.h | 104 +++++++++++++++
include/linux/dma-mapping.h | 251 ++++----------------------------------
kernel/dma/Kconfig | 12 +
kernel/dma/Makefile | 3
kernel/dma/direct.c | 74 -----------
kernel/dma/mapping.c | 214 ++++++++++++++++++++++++++++++--
22 files changed, 385 insertions(+), 394 deletions(-)
^ permalink raw reply
* powerpc: Incorrect stw operand modifier in __set_pte_at
From: Mathieu Desnoyers @ 2020-07-08 14:45 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras; +Cc: linuxppc-dev
Hi,
Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
(where n should be the operand number) within the Linux kernel led
me to spot those 2 weird cases:
arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
stw%U0%X0 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
I would have expected the stw to be:
stw%U1%X1 %L2,%1"
and:
arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
stw%U0%X0 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
where I would have expected:
stw%U1%X1 %L2,%1"
Is it a bug or am I missing something ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: Failure to build librseq on ppc
From: Mathieu Desnoyers @ 2020-07-08 14:32 UTC (permalink / raw)
To: Christophe Leroy; +Cc: Boqun Feng, linuxppc-dev, Michael Jeanson
In-Reply-To: <96994487-ae4a-3bfb-b0f1-34228e51bea2@csgroup.eu>
----- On Jul 8, 2020, at 10:21 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
> Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :
>> ----- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>>
>>> ----- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool segher@kernel.crashing.org
>>> wrote:
>> [...]
>>>>
>>>> So perhaps you have code like
>>>>
>>>> int *p;
>>>> int x;
>>>> ...
>>>> asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
>>>
>>> We indeed have explicit "lwz" and "stw" instructions in there.
>>>
>>>>
>>>> where that last line should actually read
>>>>
>>>> asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
>>>
>>> Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
>>>
>>> There has been some level of extra CPP macro coating around those instructions
>>> to
>>> support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
>>> trivial.
>>> Let me see what can be done here.
>>
>> I did the following changes which appear to generate valid asm.
>> See attached corresponding .S output.
>>
>> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's
>> pretty much
>> always used with e.g. "lwz%U1%X1". I could find one blog post discussing that %U
>> is about
>> update flag, and nothing about %X. Are those documented ?
>
> As far as I can see, %U is mentioned in
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
> powerpc subpart, at the "m" constraint.
Yep, I did notice it, but mistakenly thought it was only needed for "m<>" operand,
not "m".
Thanks,
Mathieu
>
> For the %X I don't know.
>
> Christophe
>
>>
>> Although it appears to generate valid asm, I have the feeling I'm relying on
>> undocumented
>> features here. :-/
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: Failure to build librseq on ppc
From: Christophe Leroy @ 2020-07-08 14:21 UTC (permalink / raw)
To: Mathieu Desnoyers, Segher Boessenkool
Cc: Boqun Feng, linuxppc-dev, Michael Jeanson
In-Reply-To: <1623833219.1877.1594216801865.JavaMail.zimbra@efficios.com>
Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :
> ----- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
>
>> ----- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool segher@kernel.crashing.org
>> wrote:
> [...]
>>>
>>> So perhaps you have code like
>>>
>>> int *p;
>>> int x;
>>> ...
>>> asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
>>
>> We indeed have explicit "lwz" and "stw" instructions in there.
>>
>>>
>>> where that last line should actually read
>>>
>>> asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
>>
>> Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
>>
>> There has been some level of extra CPP macro coating around those instructions
>> to
>> support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
>> trivial.
>> Let me see what can be done here.
>
> I did the following changes which appear to generate valid asm.
> See attached corresponding .S output.
>
> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's pretty much
> always used with e.g. "lwz%U1%X1". I could find one blog post discussing that %U is about
> update flag, and nothing about %X. Are those documented ?
As far as I can see, %U is mentioned in
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
powerpc subpart, at the "m" constraint.
For the %X I don't know.
Christophe
>
> Although it appears to generate valid asm, I have the feeling I'm relying on undocumented
> features here. :-/
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox