* Re: [PATCH RFC PKS/PMEM 48/58] drivers/md: Utilize new kmap_thread()
From: Coly Li @ 2020-10-10 2:20 UTC (permalink / raw)
To: ira.weiny
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4,
Kent Overstreet, Fenghua Yu, linux-afs, linux-um, intel-gfx,
ecryptfs, linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Dan Williams, Andrew Morton,
linux-cachefs, linux-nfs, linux-ntfs-dev, netdev, kexec,
linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev,
linux-btrfs
In-Reply-To: <20201009195033.3208459-49-ira.weiny@intel.com>
On 2020/10/10 03:50, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> These kmap() calls are localized to a single thread. To avoid the over
> head of global PKRS updates use the new kmap_thread() call.
>
Hi Ira,
There were a number of options considered.
1) Attempt to change all the thread local kmap() calls to kmap_atomic()
2) Introduce a flags parameter to kmap() to indicate if the mapping
should be global or not
3) Change ~20-30 call sites to 'kmap_global()' to indicate that they
require a global mapping of the pages
4) Change ~209 call sites to 'kmap_thread()' to indicate that the
mapping is to be used within that thread of execution only
I copied the above information from patch 00/58 to this message. The
idea behind kmap_thread() is fine to me, but as you said the new api is
very easy to be missed in new code (even for me). I would like to be
supportive to option 2) introduce a flag to kmap(), then we won't forget
the new thread-localized kmap method, and people won't ask why a
_thread() function is called but no kthread created.
Thanks.
Coly Li
> Cc: Coly Li <colyli@suse.de> (maintainer:BCACHE (BLOCK LAYER CACHE))
> Cc: Kent Overstreet <kent.overstreet@gmail.com> (maintainer:BCACHE (BLOCK LAYER CACHE))
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
> drivers/md/bcache/request.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index c7cadaafa947..a4571f6d09dd 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k)
> uint64_t csum = 0;
>
> bio_for_each_segment(bv, bio, iter) {
> - void *d = kmap(bv.bv_page) + bv.bv_offset;
> + void *d = kmap_thread(bv.bv_page) + bv.bv_offset;
>
> csum = bch_crc64_update(csum, d, bv.bv_len);
> - kunmap(bv.bv_page);
> + kunmap_thread(bv.bv_page);
> }
>
> k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
>
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Alexander Viro @ 2020-10-10 1:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-arch, Kees Cook, Alexander Viro, the arch/x86 maintainers,
Linux Kernel Mailing List, Alexey Dobriyan, Eric Biggers,
Luis Chamberlain, Al Viro, linux-fsdevel, linuxppc-dev,
Christoph Hellwig
In-Reply-To: <CAHk-=wigvcmp-jcgoNCbx45W7j3=0jA320CfpskwuoEjefM7nQ@mail.gmail.com>
On Fri, Oct 09, 2020 at 06:29:13PM -0700, Linus Torvalds wrote:
> On Fri, Oct 9, 2020 at 6:19 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Okay, that makes more sense. So the patchset from Matthew
> > https://lkml.kernel.org/linux-fsdevel/20201003025534.21045-1-willy@infradead.org/T/#u
> > isn't what you had in mind.
>
> No.
>
> That first patch makes sense - it's just the "ppos can be NULL" patch.
>
> But as mentioned, NULL isn't "shorthand for zero". It's just "pipes
> don't _have_ a pos, trying to pass in some explicit position is
> crazy".
>
> So no, the other patches in that set are a bit odd, I think.
>
> SOME of them look potentially fine - the bpfilter one seems to be
> valid, for example, because it's literally about reading/writing a
> pipe. And maybe the sysctl one is similarly sensible - I didn't check
> the context of that one.
FWIW, I hadn't pushed that branch out (or merged it into #for-next yet);
for one thing, uml part (mconsole) is simply broken, for another...
IMO ##5--8 are asking for kernel_pread() and if you look at binfmt_elf.c,
you'll see elf_read() being pretty much that. acct.c, keys and usermode
parts are asking for kernel_pwrite() as well.
I've got stuck looking through the drivers/target stuff - it would've
been another kernel_pwrite() candidate, but it smells like its use of
filp_open() is really asking for trouble, starting with symlink attacks.
Not sure - I'm not familiar with the area, but...
^ permalink raw reply
* RE: [PATCH v3 1/5] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr1-alt-reg' property
From: Ran Wang @ 2020-10-10 1:57 UTC (permalink / raw)
To: Rob Herring
Cc: devicetree@vger.kernel.org, Biwen Li, Shawn Guo,
linux-kernel@vger.kernel.org, Leo Li, Ran Wang,
linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200929082234.36619-1-ran.wang_1@nxp.com>
Hi Rob,
On Tuesday, September 29, 2020 4:23 PM, Ran Wang wrote:
>
> From: Biwen Li <biwen.li@nxp.com>
>
> The 'fsl,ippdexpcr1-alt-reg' property is used to handle an errata A-008646 on
> LS1021A.
>
> Signed-off-by: Biwen Li <biwen.li@nxp.com>
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> ---
> Change in v3:
> - Simplize related proterty definition and rename it.
Could you please comment for this version? Thanks in advance.
Regards,
Ran
> Change in v2:
> - None
>
> Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> index 5a33619..62a22fc 100644
> --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> @@ -34,6 +34,9 @@ Chassis Version Example Chips
> Optional properties:
> - little-endian : RCPM register block is Little Endian. Without it RCPM
> will be Big Endian (default case).
> + - fsl,ippdexpcr1-alt-reg : The property is trying to workaround a
> + hardware issue (found on SoC LS1021A only), if pressent, RCPM driver
> + will use SCFG_SPARECR8 as a shadow register for RCPM_IPPDEXPCR1.
>
> Example:
> The RCPM node for T4240:
> @@ -43,6 +46,15 @@ The RCPM node for T4240:
> #fsl,rcpm-wakeup-cells = <2>;
> };
>
> +The RCPM node for LS1021A:
> + rcpm: rcpm@1ee2140 {
> + compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+";
> + reg = <0x0 0x1ee2140 0x0 0x8>;
> + #fsl,rcpm-wakeup-cells = <2>;
> + fsl,ippdexpcr1-alt-reg;
> + };
> +
> +
> * Freescale RCPM Wakeup Source Device Tree Bindings
> -------------------------------------------
> Required fsl,rcpm-wakeup property should be added to a device node if the
> device
> --
> 2.7.4
^ permalink raw reply
* [PATCH RFC PKS/PMEM 53/58] lib: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, Song Liu, linux-efi, kvm, linux-doc, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, samba-technical, Ira Weiny, ceph-devel, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, John Fastabend, amd-gfx, linux-afs,
cluster-devel, linux-cachefs, intel-wired-lan, Yonghong Song,
linux-ext4, Andrii Nakryiko, Fenghua Yu, linux-um, intel-gfx,
ecryptfs, linux-erofs, reiserfs-devel, linux-block,
Jérôme Glisse, Alexander Viro, xen-devel, KP Singh,
Dan Williams, io-uring, linux-bcache, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, Martin KaFai Lau, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
lib/iov_iter.c | 12 ++++++------
lib/test_bpf.c | 4 ++--
lib/test_hmm.c | 8 ++++----
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f12..1d47f957cf95 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -208,7 +208,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
}
/* Too bad - revert to non-atomic kmap */
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
from = kaddr + offset;
left = copyout(buf, from, copy);
copy -= left;
@@ -225,7 +225,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
from += copy;
bytes -= copy;
}
- kunmap(page);
+ kunmap_thread(page);
done:
if (skip == iov->iov_len) {
@@ -292,7 +292,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
}
/* Too bad - revert to non-atomic kmap */
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
to = kaddr + offset;
left = copyin(to, buf, copy);
copy -= left;
@@ -309,7 +309,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
to += copy;
bytes -= copy;
}
- kunmap(page);
+ kunmap_thread(page);
done:
if (skip == iov->iov_len) {
@@ -1742,10 +1742,10 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
return 0;
iterate_all_kinds(i, bytes, v, -EINVAL, ({
- w.iov_base = kmap(v.bv_page) + v.bv_offset;
+ w.iov_base = kmap_thread(v.bv_page) + v.bv_offset;
w.iov_len = v.bv_len;
err = f(&w, context);
- kunmap(v.bv_page);
+ kunmap_thread(v.bv_page);
err;}), ({
w = v;
err = f(&w, context);})
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index ca7d635bccd9..441f822f56ba 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6506,11 +6506,11 @@ static void *generate_test_data(struct bpf_test *test, int sub)
if (!page)
goto err_kfree_skb;
- ptr = kmap(page);
+ ptr = kmap_thread(page);
if (!ptr)
goto err_free_page;
memcpy(ptr, test->frag_data, MAX_DATA);
- kunmap(page);
+ kunmap_thread(page);
skb_add_rx_frag(skb, 0, page, 0, MAX_DATA, MAX_DATA);
}
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e7dc3de355b7..e40d26f97f45 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -329,9 +329,9 @@ static int dmirror_do_read(struct dmirror *dmirror, unsigned long start,
if (!page)
return -ENOENT;
- tmp = kmap(page);
+ tmp = kmap_thread(page);
memcpy(ptr, tmp, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);
ptr += PAGE_SIZE;
bounce->cpages++;
@@ -398,9 +398,9 @@ static int dmirror_do_write(struct dmirror *dmirror, unsigned long start,
if (!page || xa_pointer_tag(entry) != DPT_XA_TAG_WRITE)
return -ENOENT;
- tmp = kmap(page);
+ tmp = kmap_thread(page);
memcpy(tmp, ptr, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);
ptr += PAGE_SIZE;
bounce->cpages++;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 40/58] net: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, Aviad Yehezkel, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, linux-efi, linux-mtd, linux-kselftest,
samba-technical, linux-f2fs-devel, Ira Weiny, ceph-devel,
drbd-dev, devel, linux-cifs, linux-nilfs, xen-devel,
Daniel Borkmann, linux-nvdimm, Boris Pismenny, x86,
John Fastabend, amd-gfx, io-uring, cluster-devel, linux-cachefs,
intel-wired-lan, Alexey Kuznetsov, linux-ext4, linux-rdma,
Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs, linux-erofs,
reiserfs-devel, linux-block, linux-bcache, Jakub Kicinski,
Dan Williams, Trond Myklebust, linux-nfs, linux-scsi,
Hideaki YOSHIFUJI, netdev, kexec, linux-kernel, David S. Miller,
linux-ntfs-dev, target-devel, linux-fsdevel, bpf, linuxppc-dev,
Anna Schumaker, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls in these drivers are localized to a single thread.
To avoid the over head of global PKRS updates use the new kmap_thread()
call.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Boris Pismenny <borisp@nvidia.com>
Cc: Aviad Yehezkel <aviadye@nvidia.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
net/ceph/messenger.c | 4 ++--
net/core/datagram.c | 4 ++--
net/core/sock.c | 8 ++++----
net/ipv4/ip_output.c | 4 ++--
net/sunrpc/cache.c | 4 ++--
net/sunrpc/xdr.c | 8 ++++----
net/tls/tls_device.c | 4 ++--
7 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d4d7a0e52491..0c49b8e333da 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1535,10 +1535,10 @@ static u32 ceph_crc32c_page(u32 crc, struct page *page,
{
char *kaddr;
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
BUG_ON(kaddr == NULL);
crc = crc32c(crc, kaddr + page_offset, length);
- kunmap(page);
+ kunmap_thread(page);
return crc;
}
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 639745d4f3b9..cbd0a343074a 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -441,14 +441,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
end = start + skb_frag_size(frag);
if ((copy = end - offset) > 0) {
struct page *page = skb_frag_page(frag);
- u8 *vaddr = kmap(page);
+ u8 *vaddr = kmap_thread(page);
if (copy > len)
copy = len;
n = INDIRECT_CALL_1(cb, simple_copy_to_iter,
vaddr + skb_frag_off(frag) + offset - start,
copy, data, to);
- kunmap(page);
+ kunmap_thread(page);
offset += n;
if (n != copy)
goto short_copy;
diff --git a/net/core/sock.c b/net/core/sock.c
index 6c5c6b18eff4..9b46a75cd8c1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2846,11 +2846,11 @@ ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset, siz
ssize_t res;
struct msghdr msg = {.msg_flags = flags};
struct kvec iov;
- char *kaddr = kmap(page);
+ char *kaddr = kmap_thread(page);
iov.iov_base = kaddr + offset;
iov.iov_len = size;
res = kernel_sendmsg(sock, &msg, &iov, 1, size);
- kunmap(page);
+ kunmap_thread(page);
return res;
}
EXPORT_SYMBOL(sock_no_sendpage);
@@ -2861,12 +2861,12 @@ ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page,
ssize_t res;
struct msghdr msg = {.msg_flags = flags};
struct kvec iov;
- char *kaddr = kmap(page);
+ char *kaddr = kmap_thread(page);
iov.iov_base = kaddr + offset;
iov.iov_len = size;
res = kernel_sendmsg_locked(sk, &msg, &iov, 1, size);
- kunmap(page);
+ kunmap_thread(page);
return res;
}
EXPORT_SYMBOL(sock_no_sendpage_locked);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e6f2ada9e7d5..05304fb251a4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -949,9 +949,9 @@ csum_page(struct page *page, int offset, int copy)
{
char *kaddr;
__wsum csum;
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
csum = csum_partial(kaddr + offset, copy, 0);
- kunmap(page);
+ kunmap_thread(page);
return csum;
}
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index baef5ee43dbb..88193f2a8e6f 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -935,9 +935,9 @@ static ssize_t cache_downcall(struct address_space *mapping,
if (!page)
goto out_slow;
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
ret = cache_do_downcall(kaddr, buf, count, cd);
- kunmap(page);
+ kunmap_thread(page);
unlock_page(page);
put_page(page);
return ret;
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index be11d672b5b9..00afbb48fb0a 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1353,7 +1353,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base,
base &= ~PAGE_MASK;
avail_page = min_t(unsigned int, PAGE_SIZE - base,
avail_here);
- c = kmap(*ppages) + base;
+ c = kmap_thread(*ppages) + base;
while (avail_here) {
avail_here -= avail_page;
@@ -1429,9 +1429,9 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base,
}
}
if (avail_here) {
- kunmap(*ppages);
+ kunmap_thread(*ppages);
ppages++;
- c = kmap(*ppages);
+ c = kmap_thread(*ppages);
}
avail_page = min(avail_here,
@@ -1471,7 +1471,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base,
out:
kfree(elem);
if (ppages)
- kunmap(*ppages);
+ kunmap_thread(*ppages);
return err;
}
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index b74e2741f74f..ead5b1c485f8 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -576,13 +576,13 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
goto out;
}
- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
iov.iov_base = kaddr + offset;
iov.iov_len = size;
iov_iter_kvec(&msg_iter, WRITE, &iov, 1, size);
rc = tls_push_data(sk, &msg_iter, size,
flags, TLS_RECORD_TYPE_DATA);
- kunmap(page);
+ kunmap_thread(page);
out:
release_sock(sk);
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:49 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, Jason Gunthorpe,
Doug Ledford, linux-cachefs, intel-wired-lan, Bernard Metzler,
linux-ext4, Fenghua Yu, Faisal Latif, linux-um, intel-gfx,
ecryptfs, linux-erofs, linux-nfs, reiserfs-devel, linux-block,
linux-bcache, xen-devel, Dan Williams, Shiraz Saleem, io-uring,
Mike Marciniszyn, linux-ntfs-dev, netdev, Dennis Dalessandro,
kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
The kmap() calls in these drivers are localized to a single thread. To
avoid the over head of global PKRS updates use the new kmap_thread()
call.
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/infiniband/hw/hfi1/sdma.c | 4 ++--
drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +++++-----
drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +++++++-------
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index 04575c9afd61..09d206e3229a 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -3130,7 +3130,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx,
}
if (type == SDMA_MAP_PAGE) {
- kvaddr = kmap(page);
+ kvaddr = kmap_thread(page);
kvaddr += offset;
} else if (WARN_ON(!kvaddr)) {
__sdma_txclean(dd, tx);
@@ -3140,7 +3140,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx,
memcpy(tx->coalesce_buf + tx->coalesce_idx, kvaddr, len);
tx->coalesce_idx += len;
if (type == SDMA_MAP_PAGE)
- kunmap(page);
+ kunmap_thread(page);
/* If there is more data, return */
if (tx->tlen - tx->coalesce_idx)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c b/drivers/infiniband/hw/i40iw/i40iw_cm.c
index a3b95805c154..122d7a5642a1 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_cm.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c
@@ -3721,7 +3721,7 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
ibmr->device = iwpd->ibpd.device;
iwqp->lsmm_mr = ibmr;
if (iwqp->page)
- iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+ iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp,
iwqp->ietf_mem.va,
(accept.size + conn_param->private_data_len),
@@ -3729,12 +3729,12 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
} else {
if (iwqp->page)
- iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+ iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, NULL, 0, 0);
}
if (iwqp->page)
- kunmap(iwqp->page);
+ kunmap_thread(iwqp->page);
iwqp->cm_id = cm_id;
cm_node->cm_id = cm_id;
@@ -4102,10 +4102,10 @@ static void i40iw_cm_event_connected(struct i40iw_cm_event *event)
i40iw_cm_init_tsa_conn(iwqp, cm_node);
read0 = (cm_node->send_rdma0_op == SEND_RDMA_READ_ZERO);
if (iwqp->page)
- iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
+ iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_rtt(&iwqp->sc_qp, read0);
if (iwqp->page)
- kunmap(iwqp->page);
+ kunmap_thread(iwqp->page);
memset(&attr, 0, sizeof(attr));
attr.qp_state = IB_QPS_RTS;
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index d19d8325588b..4ed37c328d02 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -76,7 +76,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr)
if (unlikely(!p))
return -EFAULT;
- buffer = kmap(p);
+ buffer = kmap_thread(p);
if (likely(PAGE_SIZE - off >= bytes)) {
memcpy(paddr, buffer + off, bytes);
@@ -84,7 +84,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr)
unsigned long part = bytes - (PAGE_SIZE - off);
memcpy(paddr, buffer + off, part);
- kunmap(p);
+ kunmap_thread(p);
if (!mem->is_pbl)
p = siw_get_upage(mem->umem,
@@ -96,10 +96,10 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr)
if (unlikely(!p))
return -EFAULT;
- buffer = kmap(p);
+ buffer = kmap_thread(p);
memcpy(paddr + part, buffer, bytes - part);
}
- kunmap(p);
+ kunmap_thread(p);
}
}
return (int)bytes;
@@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
page_array[seg] = p;
if (!c_tx->use_sendpage) {
- iov[seg].iov_base = kmap(p) + fp_off;
+ iov[seg].iov_base = kmap_thread(p) + fp_off;
iov[seg].iov_len = plen;
/* Remember for later kunmap() */
@@ -518,9 +518,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
plen);
} else if (do_crc) {
crypto_shash_update(c_tx->mpa_crc_hd,
- kmap(p) + fp_off,
+ kmap_thread(p) + fp_off,
plen);
- kunmap(p);
+ kunmap_thread(p);
}
} else {
u64 va = sge->laddr + sge_off;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 06/58] kmap: Introduce k[un]map_thread debugging
From: ira.weiny @ 2020-10-09 19:49 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: Juri Lelli, linux-aio, linux-efi, kvm, linux-doc, linux-mmc,
Dave Hansen, dri-devel, Ben Segall, linux-mm, target-devel,
linux-mtd, linux-kselftest, samba-technical, Ira Weiny,
ceph-devel, drbd-dev, devel, linux-cifs, linux-nilfs,
Vincent Guittot, linux-scsi, linux-nvdimm, linux-rdma, x86,
amd-gfx, io-uring, cluster-devel, linux-cachefs, intel-wired-lan,
Mel Gorman, xen-devel, linux-ext4, Fenghua Yu, linux-afs,
linux-um, intel-gfx, ecryptfs, linux-erofs, reiserfs-devel,
Steven Rostedt, linux-block, linux-bcache, Dan Williams,
Dietmar Eggemann, linux-nfs, linux-ntfs-dev, netdev, kexec,
linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev,
linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
Most kmap() callers use the map within a single thread and have no need
for the protection domain to be enabled globally.
To differentiate these kmap users, new k[un]map_thread() calls were
introduced which are thread local.
To aid in debugging the new use of kmap_thread(), add a reference count,
a check on that count, and tracing to ID where mapping errors occur.
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
include/linux/highmem.h | 5 +++
include/linux/sched.h | 5 +++
include/trace/events/kmap_thread.h | 56 ++++++++++++++++++++++++++++++
init/init_task.c | 3 ++
kernel/fork.c | 15 ++++++++
lib/Kconfig.debug | 8 +++++
mm/debug.c | 23 ++++++++++++
7 files changed, 115 insertions(+)
create mode 100644 include/trace/events/kmap_thread.h
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef7813544719..22d1c000802e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -247,6 +247,10 @@ static inline void kunmap(struct page *page)
__kunmap(page, true);
}
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+void *kmap_thread(struct page *page);
+void kunmap_thread(struct page *page);
+#else
static inline void *kmap_thread(struct page *page)
{
return __kmap(page, false);
@@ -255,6 +259,7 @@ static inline void kunmap_thread(struct page *page)
{
__kunmap(page, false);
}
+#endif
/*
* Prevent people trying to call kunmap_atomic() as if it were kunmap()
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25d97ab6c757..4627ea4a49e6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1318,6 +1318,11 @@ struct task_struct {
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
unsigned int dev_page_access_ref;
#endif
+
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ unsigned int kmap_thread_cnt;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/include/trace/events/kmap_thread.h b/include/trace/events/kmap_thread.h
new file mode 100644
index 000000000000..e7143cfe0daf
--- /dev/null
+++ b/include/trace/events/kmap_thread.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ *
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kmap_thread
+
+#if !defined(_TRACE_KMAP_THREAD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KMAP_THREAD_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(kmap_thread_template,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt),
+
+ TP_STRUCT__entry(
+ __field(int, pid)
+ __field(struct page *, page)
+ __field(void *, caller_addr)
+ __field(int, cnt)
+ ),
+
+ TP_fast_assign(
+ __entry->pid = tsk->pid;
+ __entry->page = page;
+ __entry->caller_addr = caller_addr;
+ __entry->cnt = cnt;
+ ),
+
+ TP_printk("PID %d; (%d) %pS %p",
+ __entry->pid,
+ __entry->cnt,
+ __entry->caller_addr,
+ __entry->page
+ )
+);
+
+DEFINE_EVENT(kmap_thread_template, kmap_thread,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt));
+
+DEFINE_EVENT(kmap_thread_template, kunmap_thread,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt));
+
+
+#endif /* _TRACE_KMAP_THREAD_H */
+
+#include <trace/define_trace.h>
diff --git a/init/init_task.c b/init/init_task.c
index 9b39f25de59b..19f09965eb34 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -212,6 +212,9 @@ struct task_struct init_task
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
.dev_page_access_ref = 0,
#endif
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ .kmap_thread_cnt = 0,
+#endif
};
EXPORT_SYMBOL(init_task);
diff --git a/kernel/fork.c b/kernel/fork.c
index b6a3ee328a89..2c66e49b7614 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -722,6 +722,17 @@ static inline void put_signal_struct(struct signal_struct *sig)
free_signal_struct(sig);
}
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+static void check_outstanding_kmap_thread(struct task_struct *tsk)
+{
+ if (tsk->kmap_thread_cnt)
+ pr_warn(KERN_ERR "WARNING: PID %d; Failed to kunmap_thread() [cnt %d]\n",
+ tsk->pid, tsk->kmap_thread_cnt);
+}
+#else
+static void check_outstanding_kmap_thread(struct task_struct *tsk) { }
+#endif
+
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
@@ -734,6 +745,7 @@ void __put_task_struct(struct task_struct *tsk)
exit_creds(tsk);
delayacct_tsk_free(tsk);
put_signal_struct(tsk->signal);
+ check_outstanding_kmap_thread(tsk);
if (!profile_handoff_task(tsk))
free_task(tsk);
@@ -943,6 +955,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
#endif
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
tsk->dev_page_access_ref = 0;
+#endif
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ tsk->kmap_thread_cnt = 0;
#endif
return tsk;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f015c09ba5a1..6507b43d5b0c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -858,6 +858,14 @@ config DEBUG_HIGHMEM
This option enables additional error checking for high memory
systems. Disable for production systems.
+config DEBUG_KMAP_THREAD
+ bool "Kmap debugging"
+ depends on DEBUG_KERNEL
+ help
+ This option enables additional error checking for kernel mapping code
+ specifically the k[un]map_thread() calls. Disable for production
+ systems.
+
config HAVE_DEBUG_STACKOVERFLOW
bool
diff --git a/mm/debug.c b/mm/debug.c
index ca8d1cacdecc..68d186f3570e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -320,3 +320,26 @@ void page_init_poison(struct page *page, size_t size)
}
EXPORT_SYMBOL_GPL(page_init_poison);
#endif /* CONFIG_DEBUG_VM */
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/kmap_thread.h>
+
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+void *kmap_thread(struct page *page)
+{
+ trace_kmap_thread(current, page, __builtin_return_address(0),
+ current->kmap_thread_cnt);
+ current->kmap_thread_cnt++;
+ return __kmap(page, false);
+}
+EXPORT_SYMBOL_GPL(kmap_thread);
+
+void kunmap_thread(struct page *page)
+{
+ __kunmap(page, false);
+ current->kmap_thread_cnt--;
+ trace_kunmap_thread(current, page, __builtin_return_address(0),
+ current->kmap_thread_cnt);
+}
+EXPORT_SYMBOL_GPL(kunmap_thread);
+#endif
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 03/58] memremap: Add zone device access protection
From: ira.weiny @ 2020-10-09 19:49 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: Juri Lelli, linux-aio, linux-efi, kvm, linux-doc, linux-mmc,
Dave Hansen, dri-devel, Ben Segall, linux-mm, target-devel,
linux-mtd, linux-kselftest, samba-technical, Ira Weiny,
ceph-devel, drbd-dev, devel, linux-cifs, linux-nilfs,
Vincent Guittot, linux-scsi, linux-nvdimm, linux-rdma, x86,
amd-gfx, io-uring, cluster-devel, linux-cachefs, intel-wired-lan,
Mel Gorman, xen-devel, linux-ext4, Fenghua Yu, linux-afs,
linux-um, intel-gfx, ecryptfs, linux-erofs, reiserfs-devel,
Steven Rostedt, linux-block, linux-bcache, Dan Williams,
Dietmar Eggemann, linux-nfs, linux-ntfs-dev, netdev, kexec,
linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev,
linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
Device managed memory exposes itself to the kernel direct map which
allows stray pointers to access these device memories.
Stray pointers to normal memory may result in a crash or other
undesirable behavior which, while unfortunate, are usually recoverable
with a reboot. Stray access, specifically stray writes, to areas such
as non-volatile memory are permanent in nature and thus are more likely
to result in permanent user data loss vs stray access to other memory
areas.
Furthermore, we protect against reads which can help with speculative
reads to poison areas as well. But this is a secondary reason.
Set up an infrastructure for extra device access protection. Then
implement the new protection using the new Protection Keys Supervisor
(PKS) on architectures which support it.
To enable this extra protection devices specify a flag in the pgmap to
indicate that these areas wish to use additional protection.
Kernel code which intends to access this memory can do so automatically
through the use of the kmap infrastructure calling into
dev_access_[enable|disable]() described here. The kmap infrastructure
is implemented in a follow on patch.
In addition, users can directly enable/disable the access through
dev_access_[enable|disable]() if they have a priori knowledge of the
type of pages they are accessing.
All calls to enable/disable protection flow through
dev_access_[enable|disable]() and are nestable by the use of a per task
reference count. This reference count does 2 things.
1) Allows a thread to nest calls to disable protection such that the
first call to re-enable protection does not 'break' the last access of
the pmem device memory.
2) Provides faster performance by avoiding lots of MSR writes. For
example, looping over a sequence of pmem pages.
In addition, we must ensure the reference count is preserved through an
exception so we add the count to irqentry_state_t and save/restore the
reference count while giving exceptions their own count should they use
a kmap call.
The following shows how this works through an exception:
...
// ref == 0
dev_access_enable() // ref += 1 ==> disable protection
irq()
// enable protection
// ref = 0
_handler()
dev_access_enable() // ref += 1 ==> disable protection
dev_access_disable() // ref -= 1 ==> enable protection
// WARN_ON(ref != 0)
// disable protection
do_pmem_thing() // all good here
dev_access_disable() // ref -= 1 ==> 0 ==> enable protection
...
Nested exceptions operate the same way with each exception storing the
interrupted exception state all the way down.
The pkey value is never free'ed as this optimizes the implementation to
be either on or off using a static branch conditional in the fast paths.
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
arch/x86/entry/common.c | 21 +++++++++
include/linux/entry-common.h | 3 ++
include/linux/memremap.h | 1 +
include/linux/mm.h | 43 +++++++++++++++++
include/linux/sched.h | 3 ++
init/init_task.c | 3 ++
kernel/fork.c | 3 ++
mm/Kconfig | 13 ++++++
mm/memremap.c | 90 ++++++++++++++++++++++++++++++++++++
9 files changed, 180 insertions(+)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 86ad32e0095e..3680724c1a4d 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -264,12 +264,27 @@ noinstr void idtentry_exit_nmi(struct pt_regs *regs, irqentry_state_t *irq_state
*
* NOTE That the thread saved PKRS must be preserved separately to ensure
* global overrides do not 'stick' on a thread.
+ *
+ * Furthermore, Zone Device Access Protection maintains access in a re-entrant
+ * manner through a reference count which also needs to be maintained should
+ * exception handlers use those interfaces for memory access. Here we start
+ * off the exception handler ref count to 0 and ensure it is 0 when the
+ * exception is done. Then restore it for the interrupted task.
*/
noinstr void irq_save_pkrs(irqentry_state_t *state)
{
if (!cpu_feature_enabled(X86_FEATURE_PKS))
return;
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ /*
+ * Save the ref count of the current running process and set it to 0
+ * for any irq users to properly track re-entrance
+ */
+ state->pkrs_ref = current->dev_page_access_ref;
+ current->dev_page_access_ref = 0;
+#endif
+
/*
* The thread_pkrs must be maintained separately to prevent global
* overrides from 'sticking' on a thread.
@@ -286,6 +301,12 @@ noinstr void irq_restore_pkrs(irqentry_state_t *state)
write_pkrs(state->pkrs);
current->thread.saved_pkrs = state->thread_pkrs;
+
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ WARN_ON_ONCE(current->dev_page_access_ref != 0);
+ /* Restore the interrupted process reference */
+ current->dev_page_access_ref = state->pkrs_ref;
+#endif
}
#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index c3b361ffa059..06743cce2dbf 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -343,6 +343,9 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs);
#ifndef irqentry_state
typedef struct irqentry_state {
#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ unsigned int pkrs_ref;
+#endif
u32 pkrs;
u32 thread_pkrs;
#endif
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index e5862746751b..b6713ee7b218 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -89,6 +89,7 @@ struct dev_pagemap_ops {
};
#define PGMAP_ALTMAP_VALID (1 << 0)
+#define PGMAP_PROT_ENABLED (1 << 1)
/**
* struct dev_pagemap - metadata for ZONE_DEVICE mappings
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..9e845515ff15 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1141,6 +1141,49 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
}
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+DECLARE_STATIC_KEY_FALSE(dev_protection_static_key);
+
+/*
+ * We make page_is_access_protected() as quick as possible.
+ * 1) If no mappings have been enabled with extra protection we skip this
+ * entirely
+ * 2) Skip pages which are not ZONE_DEVICE
+ * 3) Only then check if this particular page was mapped with extra
+ * protections.
+ */
+static inline bool page_is_access_protected(struct page *page)
+{
+ if (!static_branch_unlikely(&dev_protection_static_key))
+ return false;
+ if (!is_zone_device_page(page))
+ return false;
+ if (page->pgmap->flags & PGMAP_PROT_ENABLED)
+ return true;
+ return false;
+}
+
+void __dev_access_enable(bool global);
+void __dev_access_disable(bool global);
+static __always_inline void dev_access_enable(bool global)
+{
+ if (static_branch_unlikely(&dev_protection_static_key))
+ __dev_access_enable(global);
+}
+static __always_inline void dev_access_disable(bool global)
+{
+ if (static_branch_unlikely(&dev_protection_static_key))
+ __dev_access_disable(global);
+}
+#else
+static inline bool page_is_access_protected(struct page *page)
+{
+ return false;
+}
+static inline void dev_access_enable(bool global) { }
+static inline void dev_access_disable(bool global) { }
+#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */
+
/* 127: arbitrary random number, small enough to assemble well */
#define page_ref_zero_or_close_to_overflow(page) \
((unsigned int) page_ref_count(page) + 127u <= 127u)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935..25d97ab6c757 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,9 @@ struct task_struct {
struct callback_head mce_kill_me;
#endif
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ unsigned int dev_page_access_ref;
+#endif
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/init/init_task.c b/init/init_task.c
index f6889fce64af..9b39f25de59b 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -209,6 +209,9 @@ struct task_struct init_task
#ifdef CONFIG_SECCOMP
.seccomp = { .filter_count = ATOMIC_INIT(0) },
#endif
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ .dev_page_access_ref = 0,
+#endif
};
EXPORT_SYMBOL(init_task);
diff --git a/kernel/fork.c b/kernel/fork.c
index da8d360fb032..b6a3ee328a89 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -940,6 +940,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
#ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
+#endif
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+ tsk->dev_page_access_ref = 0;
#endif
return tsk;
diff --git a/mm/Kconfig b/mm/Kconfig
index 1b9bc004d9bc..01dd75720ae6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -794,6 +794,19 @@ config ZONE_DEVICE
If FS_DAX is enabled, then say Y.
+config ZONE_DEVICE_ACCESS_PROTECTION
+ bool "Device memory access protection"
+ depends on ZONE_DEVICE
+ depends on ARCH_HAS_SUPERVISOR_PKEYS
+
+ help
+ Enable the option of having access protections on device memory
+ areas. This protects against access to device memory which is not
+ intended such as stray writes. This feature is particularly useful
+ to protect against corruption of persistent memory.
+
+ If in doubt, say 'Y'.
+
config DEV_PAGEMAP_OPS
bool
diff --git a/mm/memremap.c b/mm/memremap.c
index fbfc79fd9c24..edad2aa0bd24 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -6,12 +6,16 @@
#include <linux/memory_hotplug.h>
#include <linux/mm.h>
#include <linux/pfn_t.h>
+#include <linux/pkeys.h>
#include <linux/swap.h>
#include <linux/mmzone.h>
#include <linux/swapops.h>
#include <linux/types.h>
#include <linux/wait_bit.h>
#include <linux/xarray.h>
+#include <uapi/asm-generic/mman-common.h>
+
+#define PKEY_INVALID (INT_MIN)
static DEFINE_XARRAY(pgmap_array);
@@ -67,6 +71,89 @@ static void devmap_managed_enable_put(void)
}
#endif /* CONFIG_DEV_PAGEMAP_OPS */
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
+/*
+ * Note; all devices which have asked for protections share the same key. The
+ * key may, or may not, have been provided by the core. If not, protection
+ * will remain disabled. The key acquisition is attempted at init time and
+ * never again. So we don't have to worry about dev_page_pkey changing.
+ */
+static int dev_page_pkey = PKEY_INVALID;
+DEFINE_STATIC_KEY_FALSE(dev_protection_static_key);
+EXPORT_SYMBOL(dev_protection_static_key);
+
+static pgprot_t dev_pgprot_get(struct dev_pagemap *pgmap, pgprot_t prot)
+{
+ if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID) {
+ pgprotval_t val = pgprot_val(prot);
+
+ static_branch_inc(&dev_protection_static_key);
+ prot = __pgprot(val | _PAGE_PKEY(dev_page_pkey));
+ }
+ return prot;
+}
+
+static void dev_pgprot_put(struct dev_pagemap *pgmap)
+{
+ if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID)
+ static_branch_dec(&dev_protection_static_key);
+}
+
+void __dev_access_disable(bool global)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ if (!--current->dev_page_access_ref)
+ pks_mknoaccess(dev_page_pkey, global);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__dev_access_disable);
+
+void __dev_access_enable(bool global)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ /* 0 clears the PKEY_DISABLE_ACCESS bit, allowing access */
+ if (!current->dev_page_access_ref++)
+ pks_mkrdwr(dev_page_pkey, global);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__dev_access_enable);
+
+/**
+ * dev_access_protection_init: Configure a PKS key domain for device pages
+ *
+ * The domain defaults to the protected state. Device page mappings should set
+ * the PGMAP_PROT_ENABLED flag when mapping pages.
+ *
+ * Note the pkey is never free'ed. This is run at init time and we either get
+ * the key or we do not. We need to do this to maintian a constant key (or
+ * not) as device memory is added or removed.
+ */
+static int __init __dev_access_protection_init(void)
+{
+ int pkey = pks_key_alloc("Device Memory");
+
+ if (pkey < 0)
+ return 0;
+
+ dev_page_pkey = pkey;
+
+ return 0;
+}
+subsys_initcall(__dev_access_protection_init);
+#else
+static pgprot_t dev_pgprot_get(struct dev_pagemap *pgmap, pgprot_t prot)
+{
+ return prot;
+}
+static void dev_pgprot_put(struct dev_pagemap *pgmap)
+{
+}
+#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */
+
static void pgmap_array_delete(struct resource *res)
{
xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end),
@@ -156,6 +243,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
pgmap_array_delete(res);
WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
devmap_managed_enable_put();
+ dev_pgprot_put(pgmap);
}
EXPORT_SYMBOL_GPL(memunmap_pages);
@@ -191,6 +279,8 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
int error, is_ram;
bool need_devmap_managed = true;
+ params.pgprot = dev_pgprot_get(pgmap, params.pgprot);
+
switch (pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()
From: Eric Biggers @ 2020-10-10 1:30 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, ira.weiny, Thomas Gleixner, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, Ingo Molnar,
intel-wired-lan, kexec, xen-devel, linux-ext4, bpf, Dan Williams,
Fenghua Yu, intel-gfx, ecryptfs, linux-um, reiserfs-devel,
linux-block, linux-bcache, Borislav Petkov, Andy Lutomirski,
Jaegeuk Kim, ceph-devel, io-uring, linux-cachefs, linux-nfs,
linux-ntfs-dev, netdev, linuxppc-dev, samba-technical,
linux-kernel, linux-f2fs-devel, linux-fsdevel, Andrew Morton,
linux-erofs, linux-btrfs
In-Reply-To: <20201010003954.GW20115@casper.infradead.org>
On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote:
> On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
> > >
> > > static inline void f2fs_copy_page(struct page *src, struct page *dst)
> > > {
> > > - char *src_kaddr = kmap(src);
> > > - char *dst_kaddr = kmap(dst);
> > > + char *src_kaddr = kmap_thread(src);
> > > + char *dst_kaddr = kmap_thread(dst);
> > >
> > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > > - kunmap(dst);
> > > - kunmap(src);
> > > + kunmap_thread(dst);
> > > + kunmap_thread(src);
> > > }
> >
> > Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> > The pages are only mapped to do a memcpy(), then they're immediately unmapped.
>
> Maybe you missed the earlier thread from Thomas trying to do something
> similar for rather different reasons ...
>
> https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
I did miss it. I'm not subscribed to any of the mailing lists it was sent to.
Anyway, it shouldn't matter. Patchsets should be standalone, and not require
reading random prior threads on linux-kernel to understand.
And I still don't really understand. After this patchset, there is still code
nearly identical to the above (doing a temporary mapping just for a memcpy) that
would still be using kmap_atomic(). Is the idea that later, such code will be
converted to use kmap_thread() instead? If not, why use one over the other?
- Eric
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Linus Torvalds @ 2020-10-10 1:29 UTC (permalink / raw)
To: Eric Biggers, Alexander Viro
Cc: linux-arch, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <20201010011919.GC1122@sol.localdomain>
On Fri, Oct 9, 2020 at 6:19 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Okay, that makes more sense. So the patchset from Matthew
> https://lkml.kernel.org/linux-fsdevel/20201003025534.21045-1-willy@infradead.org/T/#u
> isn't what you had in mind.
No.
That first patch makes sense - it's just the "ppos can be NULL" patch.
But as mentioned, NULL isn't "shorthand for zero". It's just "pipes
don't _have_ a pos, trying to pass in some explicit position is
crazy".
So no, the other patches in that set are a bit odd, I think.
SOME of them look potentially fine - the bpfilter one seems to be
valid, for example, because it's literally about reading/writing a
pipe. And maybe the sysctl one is similarly sensible - I didn't check
the context of that one.
But no, NULL shouldn't mean "start at position zero, and we don't care
about the result".
Linus
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Eric Biggers @ 2020-10-10 1:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-arch, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <CAHk-=whcEzYjkqdpZciHh+iAdUttvfWZYoiHiF67XuTXB1YJLw@mail.gmail.com>
On Fri, Oct 09, 2020 at 06:03:31PM -0700, Linus Torvalds wrote:
> On Fri, Oct 9, 2020 at 3:06 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > It's a bit unintuitive that ppos=NULL means "use pos 0", not "use file->f_pos".
>
> That's not at all what it means.
>
> A NULL ppos means "this has no position at all", and is what we use
> for FMODE_STREAM file descriptors (ie sockets, pipes, etc).
>
> It also means that we don't do the locking for position updates.
>
> The fact that "ki_pos" gets set to zero is just because it needs to be
> _something_. It shouldn't actually ever be used for stream devices.
>
Okay, that makes more sense. So the patchset from Matthew
https://lkml.kernel.org/linux-fsdevel/20201003025534.21045-1-willy@infradead.org/T/#u
isn't what you had in mind.
- Eric
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Linus Torvalds @ 2020-10-10 1:03 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-arch, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <20201009220633.GA1122@sol.localdomain>
On Fri, Oct 9, 2020 at 3:06 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> It's a bit unintuitive that ppos=NULL means "use pos 0", not "use file->f_pos".
That's not at all what it means.
A NULL ppos means "this has no position at all", and is what we use
for FMODE_STREAM file descriptors (ie sockets, pipes, etc).
It also means that we don't do the locking for position updates.
The fact that "ki_pos" gets set to zero is just because it needs to be
_something_. It shouldn't actually ever be used for stream devices.
Linus
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()
From: Matthew Wilcox @ 2020-10-10 0:39 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, ira.weiny, Thomas Gleixner, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, Ingo Molnar,
intel-wired-lan, kexec, xen-devel, linux-ext4, bpf, Dan Williams,
Fenghua Yu, intel-gfx, ecryptfs, linux-um, reiserfs-devel,
linux-block, linux-bcache, Borislav Petkov, Andy Lutomirski,
Jaegeuk Kim, ceph-devel, io-uring, linux-cachefs, linux-nfs,
linux-ntfs-dev, netdev, linuxppc-dev, samba-technical,
linux-kernel, linux-f2fs-devel, linux-fsdevel, Andrew Morton,
linux-erofs, linux-btrfs
In-Reply-To: <20201009213434.GA839@sol.localdomain>
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
> > The kmap() calls in this FS are localized to a single thread. To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
> >
> > static inline void f2fs_copy_page(struct page *src, struct page *dst)
> > {
> > - char *src_kaddr = kmap(src);
> > - char *dst_kaddr = kmap(dst);
> > + char *src_kaddr = kmap_thread(src);
> > + char *dst_kaddr = kmap_thread(dst);
> >
> > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > - kunmap(dst);
> > - kunmap(src);
> > + kunmap_thread(dst);
> > + kunmap_thread(src);
> > }
>
> Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately unmapped.
Maybe you missed the earlier thread from Thomas trying to do something
similar for rather different reasons ...
https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Eric Biggers @ 2020-10-09 22:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-arch, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <CAHk-=wgj=mKeN-EfV5tKwJNeHPLG0dybq+R5ZyGuc4WeUnqcmA@mail.gmail.com>
On Fri, Oct 02, 2020 at 09:27:09AM -0700, Linus Torvalds wrote:
> On Thu, Oct 1, 2020 at 3:41 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Better
> > loff_t dummy = 0;
> > ...
> > wr = __kernel_write(file, data, bytes, &dummy);
>
> No, just fix __kernel_write() to work correctly.
>
> The fact is, NULL _is_ the right pointer for ppos these days.
>
> That commit by Christoph is buggy: it replaces new_sync_write() with a
> buggy open-coded version.
>
> Notice how new_sync_write does
>
> kiocb.ki_pos = (ppos ? *ppos : 0);
> ,,,
> if (ret > 0 && ppos)
> *ppos = kiocb.ki_pos;
>
> but the open-coded version doesn't.
>
> So just fix that in linux-next. The *last* thing we want is to have
> different semantics for the "same" kernel functions.
It's a bit unintuitive that ppos=NULL means "use pos 0", not "use file->f_pos".
Anyway, it works. The important thing is, this is still broken in linux-next...
- Eric
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 09/58] drivers/gpu: Utilize new kmap_thread()
From: Daniel Vetter @ 2020-10-09 22:03 UTC (permalink / raw)
To: ira.weiny
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
amd-gfx, linux-kselftest, samba-technical, Thomas Gleixner,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, x86, ceph-devel, Patrik Jakobsson,
io-uring, cluster-devel, David Airlie, Ingo Molnar,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-afs,
linux-um, intel-gfx, ecryptfs, linux-erofs, reiserfs-devel,
linux-block, linux-bcache, Borislav Petkov, Andy Lutomirski,
Dan Williams, bpf, linux-cachefs, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, Daniel Vetter,
linux-fsdevel, Andrew Morton, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-10-ira.weiny@intel.com>
On Fri, Oct 09, 2020 at 12:49:44PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> These kmap() calls in the gpu stack are localized to a single thread.
> To avoid the over head of global PKRS updates use the new kmap_thread()
> call.
>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
I'm guessing the entire pile goes in through some other tree. If so:
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If you want this to land through maintainer trees, then we need a
per-driver split (since aside from amdgpu and radeon they're all different
subtrees).
btw the two kmap calls in drm you highlight in the cover letter should
also be convertible to kmap_thread. We only hold vmalloc mappings for a
longer time (or it'd be quite a driver bug). So if you want maybe throw
those two as two additional patches on top, and we can do some careful
review & testing for them.
-Daniel
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++++++------
> drivers/gpu/drm/gma500/gma_display.c | 4 ++--
> drivers/gpu/drm/gma500/mmu.c | 10 +++++-----
> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++--
> .../gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 ++--
> drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++----
> drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++--
> drivers/gpu/drm/i915/gt/intel_gtt.c | 4 ++--
> drivers/gpu/drm/i915/gt/shmem_utils.c | 4 ++--
> drivers/gpu/drm/i915/i915_gem.c | 8 ++++----
> drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++--
> drivers/gpu/drm/i915/selftests/i915_perf.c | 4 ++--
> drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++--
> 13 files changed, 37 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 978bae731398..bd564bccb7a3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf,
>
> page = adev->gart.pages[p];
> if (page) {
> - ptr = kmap(page);
> + ptr = kmap_thread(page);
> ptr += off;
>
> r = copy_to_user(buf, ptr, cur_size);
> - kunmap(adev->gart.pages[p]);
> + kunmap_thread(adev->gart.pages[p]);
> } else
> r = clear_user(buf, cur_size);
>
> @@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char __user *buf,
> if (p->mapping != adev->mman.bdev.dev_mapping)
> return -EPERM;
>
> - ptr = kmap(p);
> + ptr = kmap_thread(p);
> r = copy_to_user(buf, ptr + off, bytes);
> - kunmap(p);
> + kunmap_thread(p);
> if (r)
> return -EFAULT;
>
> @@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, const char __user *buf,
> if (p->mapping != adev->mman.bdev.dev_mapping)
> return -EPERM;
>
> - ptr = kmap(p);
> + ptr = kmap_thread(p);
> r = copy_from_user(ptr + off, buf, bytes);
> - kunmap(p);
> + kunmap_thread(p);
> if (r)
> return -EFAULT;
>
> diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c
> index 3df6d6e850f5..35f4e55c941f 100644
> --- a/drivers/gpu/drm/gma500/gma_display.c
> +++ b/drivers/gpu/drm/gma500/gma_display.c
> @@ -400,9 +400,9 @@ int gma_crtc_cursor_set(struct drm_crtc *crtc,
> /* Copy the cursor to cursor mem */
> tmp_dst = dev_priv->vram_addr + cursor_gt->offset;
> for (i = 0; i < cursor_pages; i++) {
> - tmp_src = kmap(gt->pages[i]);
> + tmp_src = kmap_thread(gt->pages[i]);
> memcpy(tmp_dst, tmp_src, PAGE_SIZE);
> - kunmap(gt->pages[i]);
> + kunmap_thread(gt->pages[i]);
> tmp_dst += PAGE_SIZE;
> }
>
> diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c
> index 505044c9a673..fba7a3a461fd 100644
> --- a/drivers/gpu/drm/gma500/mmu.c
> +++ b/drivers/gpu/drm/gma500/mmu.c
> @@ -192,20 +192,20 @@ struct psb_mmu_pd *psb_mmu_alloc_pd(struct psb_mmu_driver *driver,
> pd->invalid_pte = 0;
> }
>
> - v = kmap(pd->dummy_pt);
> + v = kmap_thread(pd->dummy_pt);
> for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i)
> v[i] = pd->invalid_pte;
>
> - kunmap(pd->dummy_pt);
> + kunmap_thread(pd->dummy_pt);
>
> - v = kmap(pd->p);
> + v = kmap_thread(pd->p);
> for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i)
> v[i] = pd->invalid_pde;
>
> - kunmap(pd->p);
> + kunmap_thread(pd->p);
>
> clear_page(kmap(pd->dummy_page));
> - kunmap(pd->dummy_page);
> + kunmap_thread(pd->dummy_page);
>
> pd->tables = vmalloc_user(sizeof(struct psb_mmu_pt *) * 1024);
> if (!pd->tables)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 38113d3c0138..274424795fb7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -566,9 +566,9 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
> if (err < 0)
> goto fail;
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
> memcpy(vaddr, data, len);
> - kunmap(page);
> + kunmap_thread(page);
>
> err = pagecache_write_end(file, file->f_mapping,
> offset, len, len,
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index 7ffc3c751432..b466c677d007 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -1754,7 +1754,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
> return -EINVAL;
> }
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
> if (!vaddr) {
> pr_err("No (mappable) scratch page!\n");
> return -EINVAL;
> @@ -1765,7 +1765,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
> pr_err("Inconsistent initial state of scratch page!\n");
> err = -EINVAL;
> }
> - kunmap(page);
> + kunmap_thread(page);
>
> return err;
> }
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 9c7402ce5bf9..447df22e2e06 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -143,7 +143,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
> intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
>
> p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
> - cpu = kmap(p) + offset_in_page(offset);
> + cpu = kmap_thread(p) + offset_in_page(offset);
> drm_clflush_virt_range(cpu, sizeof(*cpu));
> if (*cpu != (u32)page) {
> pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
> @@ -161,7 +161,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj,
> }
> *cpu = 0;
> drm_clflush_virt_range(cpu, sizeof(*cpu));
> - kunmap(p);
> + kunmap_thread(p);
>
> out:
> __i915_vma_put(vma);
> @@ -236,7 +236,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
> intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
>
> p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
> - cpu = kmap(p) + offset_in_page(offset);
> + cpu = kmap_thread(p) + offset_in_page(offset);
> drm_clflush_virt_range(cpu, sizeof(*cpu));
> if (*cpu != (u32)page) {
> pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
> @@ -254,7 +254,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj,
> }
> *cpu = 0;
> drm_clflush_virt_range(cpu, sizeof(*cpu));
> - kunmap(p);
> + kunmap_thread(p);
> if (err)
> return err;
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> index 7fb36b12fe7a..38da348282f1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> @@ -731,7 +731,7 @@ static void swizzle_page(struct page *page)
> char *vaddr;
> int i;
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
>
> for (i = 0; i < PAGE_SIZE; i += 128) {
> memcpy(temp, &vaddr[i], 64);
> @@ -739,7 +739,7 @@ static void swizzle_page(struct page *page)
> memcpy(&vaddr[i + 64], temp, 64);
> }
>
> - kunmap(page);
> + kunmap_thread(page);
> }
>
> /**
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 2a72cce63fd9..4cfb24e9ed62 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -312,9 +312,9 @@ static void poison_scratch_page(struct page *page, unsigned long size)
> do {
> void *vaddr;
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
> memset(vaddr, POISON_FREE, PAGE_SIZE);
> - kunmap(page);
> + kunmap_thread(page);
>
> page = pfn_to_page(page_to_pfn(page) + 1);
> size -= PAGE_SIZE;
> diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
> index 43c7acbdc79d..a40d3130cebf 100644
> --- a/drivers/gpu/drm/i915/gt/shmem_utils.c
> +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
> @@ -142,12 +142,12 @@ static int __shmem_rw(struct file *file, loff_t off,
> if (IS_ERR(page))
> return PTR_ERR(page);
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
> if (write)
> memcpy(vaddr + offset_in_page(off), ptr, this);
> else
> memcpy(ptr, vaddr + offset_in_page(off), this);
> - kunmap(page);
> + kunmap_thread(page);
> put_page(page);
>
> len -= this;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9aa3066cb75d..cae8300fd224 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -312,14 +312,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data,
> char *vaddr;
> int ret;
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
>
> if (needs_clflush)
> drm_clflush_virt_range(vaddr + offset, len);
>
> ret = __copy_to_user(user_data, vaddr + offset, len);
>
> - kunmap(page);
> + kunmap_thread(page);
>
> return ret ? -EFAULT : 0;
> }
> @@ -708,7 +708,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
> char *vaddr;
> int ret;
>
> - vaddr = kmap(page);
> + vaddr = kmap_thread(page);
>
> if (needs_clflush_before)
> drm_clflush_virt_range(vaddr + offset, len);
> @@ -717,7 +717,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
> if (!ret && needs_clflush_after)
> drm_clflush_virt_range(vaddr + offset, len);
>
> - kunmap(page);
> + kunmap_thread(page);
>
> return ret ? -EFAULT : 0;
> }
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 3e6cbb0d1150..aecd469b6b6e 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1058,9 +1058,9 @@ i915_vma_coredump_create(const struct intel_gt *gt,
>
> drm_clflush_pages(&page, 1);
>
> - s = kmap(page);
> + s = kmap_thread(page);
> ret = compress_page(compress, s, dst, false);
> - kunmap(page);
> + kunmap_thread(page);
>
> drm_clflush_pages(&page, 1);
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c
> index c2d001d9c0ec..7f7ef2d056f4 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_perf.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c
> @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg)
> }
>
> /* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */
> - scratch = kmap(ce->vm->scratch[0].base.page);
> + scratch = kmap_thread(ce->vm->scratch[0].base.page);
> memset(scratch, POISON_FREE, PAGE_SIZE);
>
> rq = intel_context_create_request(ce);
> @@ -405,7 +405,7 @@ static int live_noa_gpr(void *arg)
> out_rq:
> i915_request_put(rq);
> out_ce:
> - kunmap(ce->vm->scratch[0].base.page);
> + kunmap_thread(ce->vm->scratch[0].base.page);
> intel_context_put(ce);
> out:
> stream_destroy(stream);
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 004344dce140..0aba0cac51e1 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -1013,11 +1013,11 @@ static ssize_t radeon_ttm_gtt_read(struct file *f, char __user *buf,
>
> page = rdev->gart.pages[p];
> if (page) {
> - ptr = kmap(page);
> + ptr = kmap_thread(page);
> ptr += off;
>
> r = copy_to_user(buf, ptr, cur_size);
> - kunmap(rdev->gart.pages[p]);
> + kunmap_thread(rdev->gart.pages[p]);
> } else
> r = clear_user(buf, cur_size);
>
> --
> 2.28.0.rc0.12.gb6a658bd00c9
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()
From: Eric Biggers @ 2020-10-09 21:34 UTC (permalink / raw)
To: ira.weiny
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, Thomas Gleixner, drbd-dev, devel, linux-cifs,
linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma, x86, amd-gfx,
linux-afs, cluster-devel, Ingo Molnar, intel-wired-lan, kexec,
xen-devel, linux-ext4, bpf, Dan Williams, Fenghua Yu, intel-gfx,
ecryptfs, linux-um, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Jaegeuk Kim, ceph-devel,
io-uring, linux-cachefs, linux-nfs, linux-ntfs-dev, netdev,
linuxppc-dev, samba-technical, linux-kernel, linux-f2fs-devel,
linux-fsdevel, Andrew Morton, linux-erofs, linux-btrfs
In-Reply-To: <20201009195033.3208459-23-ira.weiny@intel.com>
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> The kmap() calls in this FS are localized to a single thread. To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
>
> Cc: Jaegeuk Kim <jaegeuk@kernel.org>
> Cc: Chao Yu <chao@kernel.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
> fs/f2fs/f2fs.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index d9e52a7f3702..ff72a45a577e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
>
> static inline void f2fs_copy_page(struct page *src, struct page *dst)
> {
> - char *src_kaddr = kmap(src);
> - char *dst_kaddr = kmap(dst);
> + char *src_kaddr = kmap_thread(src);
> + char *dst_kaddr = kmap_thread(dst);
>
> memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> - kunmap(dst);
> - kunmap(src);
> + kunmap_thread(dst);
> + kunmap_thread(src);
> }
Wouldn't it make more sense to switch cases like this to kmap_atomic()?
The pages are only mapped to do a memcpy(), then they're immediately unmapped.
- Eric
^ permalink raw reply
* [PATCH RFC PKS/PMEM 58/58] [dax|pmem]: Enable stray access protection
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
Protecting against stray writes is particularly important for PMEM
because, unlike writes to anonymous memory, writes to PMEM persists
across a reboot. Thus data corruption could result in permanent loss of
data.
While stray writes are more serious than reads, protection is also
enabled for reads. This helps to detect bugs in code which would
incorrectly access device memory and prevents a more serious machine
checks should those bug reads from a poison page.
Enable stray access protection by setting the flag in pgmap which
requests it. There is no option presented to the user. If Zone Device
Access Protection not be supported this flag will have no affect.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/dax/device.c | 2 ++
drivers/nvdimm/pmem.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 1e89513f3c59..e6fb35b4f0fb 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -430,6 +430,8 @@ int dev_dax_probe(struct device *dev)
}
dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC;
+ dev_dax->pgmap.flags |= PGMAP_PROT_ENABLED;
+
addr = devm_memremap_pages(dev, &dev_dax->pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index e4dc1ae990fc..9fcd8338e23f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -426,6 +426,8 @@ static int pmem_attach_disk(struct device *dev,
return -EBUSY;
}
+ pmem->pgmap.flags |= PGMAP_PROT_ENABLED;
+
q = blk_alloc_queue(dev_to_node(dev));
if (!q)
return -ENOMEM;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 57/58] nvdimm/pmem: Stray access protection for pmem->virt_addr
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
The pmem driver uses a cached virtual address to access its memory
directly. Because the nvdimm driver is well aware of the special
protections it has mapped memory with, we call dev_access_[en|dis]able()
around the direct pmem->virt_addr (pmem_addr) usage instead of the
unnecessary overhead of trying to get a page to kmap.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/nvdimm/pmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fab29b514372..e4dc1ae990fc 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
return BLK_STS_IOERR;
+ dev_access_enable(false);
rc = read_pmem(page, page_off, pmem_addr, len);
+ dev_access_disable(false);
flush_dcache_page(page);
return rc;
}
@@ -180,11 +182,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
* after clear poison.
*/
flush_dcache_page(page);
+ dev_access_enable(false);
write_pmem(pmem_addr, page, page_off, len);
if (unlikely(bad_pmem)) {
rc = pmem_clear_poison(pmem, pmem_off, len);
write_pmem(pmem_addr, page, page_off, len);
}
+ dev_access_disable(false);
return rc;
}
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 56/58] dax: Stray access protection for dax_direct_access()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
dax_direct_access() is a special case of accessing pmem via a page
offset and without a struct page.
Because the dax driver is well aware of the special protections it has
mapped memory with, call dev_access_[en|dis]able() directly instead of
the unnecessary overhead of trying to get a page to kmap.
Similar to kmap, we leverage existing functions, dax_read_[un]lock(),
because they are already required to surround the use of the memory
returned from dax_direct_access().
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/dax/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e84070b55463..0ddb3ee73e36 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,7 @@ static DEFINE_SPINLOCK(dax_host_lock);
int dax_read_lock(void)
{
+ dev_access_enable(false);
return srcu_read_lock(&dax_srcu);
}
EXPORT_SYMBOL_GPL(dax_read_lock);
@@ -37,6 +38,7 @@ EXPORT_SYMBOL_GPL(dax_read_lock);
void dax_read_unlock(int id)
{
srcu_read_unlock(&dax_srcu, id);
+ dev_access_disable(false);
}
EXPORT_SYMBOL_GPL(dax_read_unlock);
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 55/58] samples: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, kexec, Dave Hansen,
dri-devel, linux-mm, Kirti Wankhede, target-devel, linux-mtd,
linux-kselftest, samba-technical, Ira Weiny, ceph-devel, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, linux-mmc, linux-kernel, linux-f2fs-devel, linux-fsdevel,
bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
samples/vfio-mdev/mbochs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index 3cc5e5921682..6d95422c0b46 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -479,12 +479,12 @@ static ssize_t mdev_access(struct mdev_device *mdev, char *buf, size_t count,
pos -= MBOCHS_MMIO_BAR_OFFSET;
poff = pos & ~PAGE_MASK;
pg = __mbochs_get_page(mdev_state, pos >> PAGE_SHIFT);
- map = kmap(pg);
+ map = kmap_thread(pg);
if (is_write)
memcpy(map + poff, buf, count);
else
memcpy(buf, map + poff, count);
- kunmap(pg);
+ kunmap_thread(pg);
put_page(pg);
} else {
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 54/58] powerpc: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, Dave Hansen, dri-devel,
linux-mm, target-devel, Paul Mackerras, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, linux-mmc, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, xen-devel, Dan Williams, io-uring, linux-nfs,
linux-ntfs-dev, netdev, kexec, linux-kernel, linux-f2fs-devel,
linux-fsdevel, linux-mtd, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
arch/powerpc/mm/mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 42e25874f5a8..6ef557b8dda6 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -573,9 +573,9 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
{
unsigned long maddr;
- maddr = (unsigned long) kmap(page) + (addr & ~PAGE_MASK);
+ maddr = (unsigned long) kmap_thread(page) + (addr & ~PAGE_MASK);
flush_icache_range(maddr, maddr + len);
- kunmap(page);
+ kunmap_thread(page);
}
/*
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 52/58] mm: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
mm/memory.c | 8 ++++----
mm/swapfile.c | 4 ++--
mm/userfaultfd.c | 4 ++--
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..75a054882d7a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4945,7 +4945,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
if (bytes > PAGE_SIZE-offset)
bytes = PAGE_SIZE-offset;
- maddr = kmap(page);
+ maddr = kmap_thread(page);
if (write) {
copy_to_user_page(vma, page, addr,
maddr + offset, buf, bytes);
@@ -4954,7 +4954,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
copy_from_user_page(vma, page, addr,
buf, maddr + offset, bytes);
}
- kunmap(page);
+ kunmap_thread(page);
put_page(page);
}
len -= bytes;
@@ -5216,14 +5216,14 @@ long copy_huge_page_from_user(struct page *dst_page,
for (i = 0; i < pages_per_huge_page; i++) {
if (allow_pagefault)
- page_kaddr = kmap(dst_page + i);
+ page_kaddr = kmap_thread(dst_page + i);
else
page_kaddr = kmap_atomic(dst_page + i);
rc = copy_from_user(page_kaddr,
(const void __user *)(src + i * PAGE_SIZE),
PAGE_SIZE);
if (allow_pagefault)
- kunmap(dst_page + i);
+ kunmap_thread(dst_page + i);
else
kunmap_atomic(page_kaddr);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index debc94155f74..e3296ff95648 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3219,7 +3219,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
error = PTR_ERR(page);
goto bad_swap_unlock_inode;
}
- swap_header = kmap(page);
+ swap_header = kmap_thread(page);
maxpages = read_swap_header(p, swap_header, inode);
if (unlikely(!maxpages)) {
@@ -3395,7 +3395,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
filp_close(swap_file, NULL);
out:
if (page && !IS_ERR(page)) {
- kunmap(page);
+ kunmap_thread(page);
put_page(page);
}
if (name)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 9a3d451402d7..4d38c881bb2d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -586,11 +586,11 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
mmap_read_unlock(dst_mm);
BUG_ON(!page);
- page_kaddr = kmap(page);
+ page_kaddr = kmap_thread(page);
err = copy_from_user(page_kaddr,
(const void __user *) src_addr,
PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);
if (unlikely(err)) {
err = -EFAULT;
goto out;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 51/58] kernel: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
netdev, kexec, linux-kernel, linux-f2fs-devel, Eric Biederman,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
This kmap() call is localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
kernel/kexec_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..272a9920c0d6 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image,
if (result < 0)
goto out;
- ptr = kmap(page);
+ ptr = kmap_thread(page);
/* Start with a clear page */
clear_page(ptr);
ptr += maddr & ~PAGE_MASK;
@@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image,
memcpy(ptr, kbuf, uchunk);
else
result = copy_from_user(ptr, buf, uchunk);
- kunmap(page);
+ kunmap_thread(page);
if (result) {
result = -EFAULT;
goto out;
@@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image,
goto out;
}
arch_kexec_post_alloc_pages(page_address(page), 1, 0);
- ptr = kmap(page);
+ ptr = kmap_thread(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
PAGE_SIZE - (maddr & ~PAGE_MASK));
@@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
else
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
- kunmap(page);
+ kunmap_thread(page);
arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 50/58] drivers/android: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
Greg Kroah-Hartman, kexec, linux-kernel, linux-f2fs-devel, netdev,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/android/binder_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 69609696a843..5f50856caad7 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1118,9 +1118,9 @@ binder_alloc_copy_user_to_buffer(struct binder_alloc *alloc,
page = binder_alloc_get_page(alloc, buffer,
buffer_offset, &pgoff);
size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
- kptr = kmap(page) + pgoff;
+ kptr = kmap_thread(page) + pgoff;
ret = copy_from_user(kptr, from, size);
- kunmap(page);
+ kunmap_thread(page);
if (ret)
return bytes - size + ret;
bytes -= size;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
* [PATCH RFC PKS/PMEM 49/58] drivers/misc: Utilize new kmap_thread()
From: ira.weiny @ 2020-10-09 19:50 UTC (permalink / raw)
To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Andy Lutomirski, Peter Zijlstra
Cc: linux-aio, linux-efi, kvm, linux-doc, linux-mmc, Dave Hansen,
dri-devel, linux-mm, target-devel, linux-mtd, linux-kselftest,
samba-technical, Ira Weiny, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
x86, amd-gfx, linux-afs, cluster-devel, linux-cachefs,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-um,
intel-gfx, ecryptfs, linux-erofs, reiserfs-devel, linux-block,
linux-bcache, Dan Williams, io-uring, linux-nfs, linux-ntfs-dev,
Greg Kroah-Hartman, kexec, linux-kernel, linux-f2fs-devel, netdev,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-1-ira.weiny@intel.com>
From: Ira Weiny <ira.weiny@intel.com>
These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index 8531ae781195..f308abb8ad03 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -343,7 +343,7 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue,
size_t to_copy;
if (kernel_if->host)
- va = kmap(kernel_if->u.h.page[page_index]);
+ va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -357,12 +357,12 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue,
if (!copy_from_iter_full((u8 *)va + page_offset, to_copy,
from)) {
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
}
return VMCI_SUCCESS;
@@ -391,7 +391,7 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
int err;
if (kernel_if->host)
- va = kmap(kernel_if->u.h.page[page_index]);
+ va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -405,12 +405,12 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
err = copy_to_iter((u8 *)va + page_offset, to_copy, to);
if (err != to_copy) {
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
}
return VMCI_SUCCESS;
--
2.28.0.rc0.12.gb6a658bd00c9
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox