* Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command
From: Brijesh Singh @ 2017-03-16 18:34 UTC (permalink / raw)
To: Paolo Bonzini, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd
Cc: brijesh.singh
In-Reply-To: <05dbd756-e8e9-9384-2759-898e230a6909@redhat.com>
On 03/16/2017 06:03 AM, Paolo Bonzini wrote:
>
>
> On 02/03/2017 16:18, Brijesh Singh wrote:
>> + data = (void *) get_zeroed_page(GFP_KERNEL);
>
> The page does not need to be zeroed, does it?
>
No, we don't have to zero it. I will fix it.
>> +
>> + if ((len & 15) || (dst_addr & 15)) {
>> + /* if destination address and length are not 16-byte
>> + * aligned then:
>> + * a) decrypt destination page into temporary buffer
>> + * b) copy source data into temporary buffer at correct offset
>> + * c) encrypt temporary buffer
>> + */
>> + ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, &argp->error);
>
> Ah, I see now you're using this function here for read-modify-write.
> data is already pinned here, so even if you keep the function it makes
> sense to push pinning out of __sev_dbg_decrypt_page and into
> sev_dbg_decrypt.
I can push out pinning part outside __sev_dbg_decrypt_page
>
>> + if (ret)
>> + goto err_3;
>> + d_off = dst_addr & (PAGE_SIZE - 1);
>> +
>> + if (copy_from_user(data + d_off,
>> + (uint8_t *)debug.src_addr, len)) {
>> + ret = -EFAULT;
>> + goto err_3;
>> + }
>> +
>> + encrypt->length = PAGE_SIZE;
>
> Why decrypt/re-encrypt all the page instead of just the 16 byte area
> around the [dst_addr, dst_addr+len) range?
>
good catch, I should be fine just decrypting a 16 byte area. Will fix in next rev
>> + encrypt->src_addr = __psp_pa(data);
>> + encrypt->dst_addr = __sev_page_pa(inpages[0]);
>> + } else {
>> + if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) {
>> + ret = -EFAULT;
>> + goto err_3;
>> + }
>
> Do you need copy_from_user, or can you just pin/unpin memory as for
> DEBUG_DECRYPT?
>
We can work either with pin/unpin or copy_from_user. I think I choose copy_from_user because
in most of time ENCRYPT path was used when I set breakpoint through gdb which basically
requires copying pretty small data into guest memory. It may be very much possible that
someone can try to copy lot more data and then pin/unpin can speedup the things.
-Brijesh
^ permalink raw reply
* Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
From: Borislav Petkov @ 2017-03-16 18:28 UTC (permalink / raw)
To: Brijesh Singh, Paolo Bonzini
Cc: simon.guinot, linux-efi, kvm, rkrcmar, matt, linux-pci,
linus.walleij, gary.hook, linux-mm, paul.gortmaker, hpa, cl,
dan.j.williams, aarcange, sfr, andriy.shevchenko, herbert, bhe,
xemul, joro, x86, peterz, piotr.luc, mingo, msalter, ross.zwisler,
dyoung, thomas.lendacky, jroedel, keescook, arnd, toshi.kani,
mathieu.desnoyers, luto, devel, bhelgaas, tglx, mchehab,
iamjoonsoo.kim, labbott
In-Reply-To: <cb6a9a56-2c52-d98d-3ff6-3b61d0e5875e@amd.com>
On Fri, Mar 10, 2017 at 04:41:56PM -0600, Brijesh Singh wrote:
> I can take a look at fixing those warning. In my initial attempt was to create
> a new function to clear encryption bit but it ended up looking very similar to
> __change_page_attr_set_clr() hence decided to extend the exiting function to
> use memblock_alloc().
... except that having all that SEV-specific code in main code paths is
yucky and I'd like to avoid it, if possible.
> Early in boot process, guest kernel allocates some structure (its either
> statically allocated or dynamic allocated via memblock_alloc). And shares the physical
> address of these structure with hypervisor. Since entire guest memory area is mapped
> as encrypted hence those structure's are mapped as encrypted memory range. We need
> a method to clear the encryption bit. Sometime these structure maybe part of 2M pages
> and need to split into smaller pages.
So how hard would it be if the hypervisor allocated that memory for the
guest instead? It would allocate it decrypted and guest would need to
access it decrypted too. All in preparation for SEV-ES which will need a
block of unencrypted memory for the guest anyway...
> In most cases, guest and hypervisor communication starts as soon as guest provides
> the physical address to hypervisor. So we must map the pages as decrypted before
> sharing the physical address to hypervisor.
See above: so purely theoretically speaking, the hypervisor could prep
that decrypted range for the guest. I'd look in Paolo's direction,
though, for the feasibility of something like that.
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH v3 1/3] clk: meson-gxbb: expose clock CLKID_RNG0
From: Kevin Hilman @ 2017-03-16 18:24 UTC (permalink / raw)
To: Herbert Xu
Cc: Heiner Kallweit, Jerome Brunet, Neil Armstrong, linux-amlogic,
linux-crypto, Stephen Boyd, Michael Turquette, linux-clk,
devicetree
In-Reply-To: <20170316100743.GA12137@gondor.apana.org.au>
Hi Herbert,
Herbert Xu <herbert@gondor.apana.org.au> writes:
> On Wed, Feb 22, 2017 at 07:55:24AM +0100, Heiner Kallweit wrote:
>> Expose clock CLKID_RNG0 which is needed for the HW random number generator.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>
> All patches applied. Thanks.
Actually, can you just apply [PATCH 4/4] to your tree?
The clock and DT patches need to go through their respective trees or
will otherwise have conflicts with other things going in via those
trees.
Thanks,
Kevin
^ permalink raw reply
* Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command
From: Brijesh Singh @ 2017-03-16 18:20 UTC (permalink / raw)
To: Paolo Bonzini, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd
Cc: brijesh.singh
In-Reply-To: <14021d2a-2a94-a0c8-88db-acbc04b4daac@redhat.com>
On 03/16/2017 05:48 AM, Paolo Bonzini wrote:
>
>
> On 02/03/2017 16:17, Brijesh Singh wrote:
>> +static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
>> + unsigned long *n)
>> +{
>> + struct page **pages;
>> + int first, last;
>> + unsigned long npages, pinned;
>> +
>> + /* Get number of pages */
>> + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
>> + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
>> + npages = (last - first + 1);
>> +
>> + pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL);
>> + if (!pages)
>> + return NULL;
>> +
>> + /* pin the user virtual address */
>> + down_read(¤t->mm->mmap_sem);
>> + pinned = get_user_pages_fast(uaddr, npages, 1, pages);
>> + up_read(¤t->mm->mmap_sem);
>
> get_user_pages_fast, like get_user_pages_unlocked, must be called
> without mmap_sem held.
Sure.
>
>> + if (pinned != npages) {
>> + printk(KERN_ERR "SEV: failed to pin %ld pages (got %ld)\n",
>> + npages, pinned);
>> + goto err;
>> + }
>> +
>> + *n = npages;
>> + return pages;
>> +err:
>> + if (pinned > 0)
>> + release_pages(pages, pinned, 0);
>> + kfree(pages);
>> +
>> + return NULL;
>> +}
>>
>> + /* the array of pages returned by get_user_pages() is a page-aligned
>> + * memory. Since the user buffer is probably not page-aligned, we need
>> + * to calculate the offset within a page for first update entry.
>> + */
>> + offset = uaddr & (PAGE_SIZE - 1);
>> + len = min_t(size_t, (PAGE_SIZE - offset), ulen);
>> + ulen -= len;
>> +
>> + /* update first page -
>> + * special care need to be taken for the first page because we might
>> + * be dealing with offset within the page
>> + */
>
> No need to special case the first page; just set "offset = 0" inside the
> loop after the first iteration.
>
Will do.
-Brijesh
^ permalink raw reply
* Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active
From: Brijesh Singh @ 2017-03-16 18:17 UTC (permalink / raw)
To: Paolo Bonzini, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd
Cc: brijesh.singh
In-Reply-To: <453770c9-f9d7-4806-dbae-d19876f2a22e@redhat.com>
On 03/16/2017 05:38 AM, Paolo Bonzini wrote:
>
>
> On 02/03/2017 16:18, Brijesh Singh wrote:
>> The SEV memory encryption engine uses a tweak such that two identical
>> plaintexts at different location will have a different ciphertexts.
>> So swapping or moving ciphertexts of two pages will not result in
>> plaintexts being swapped. Relocating (or migrating) a physical backing pages
>> for SEV guest will require some additional steps. The current SEV key
>> management spec [1] does not provide commands to swap or migrate (move)
>> ciphertexts. For now we pin the memory allocated for the SEV guest. In
>> future when SEV key management spec provides the commands to support the
>> page migration we can update the KVM code to remove the pinning logical
>> without making any changes into userspace (qemu).
>>
>> The patch pins userspace memory when a new slot is created and unpin the
>> memory when slot is removed.
>>
>> [1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf
>
> This is not enough, because memory can be hidden temporarily from the
> guest and remapped later. Think of a PCI BAR that is backed by RAM, or
> also SMRAM. The pinning must be kept even in that case.
>
> You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map
> to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM. In
> QEMU you can use a RAMBlockNotifier to invoke the ioctls.
>
I was hoping to avoid adding new ioctl, but I see your point. Will add a pair of ioctl's
and use RAMBlocNotifier to invoke those ioctls.
-Brijesh
^ permalink raw reply
* Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
From: Borislav Petkov @ 2017-03-16 16:29 UTC (permalink / raw)
To: Tom Lendacky
Cc: Brijesh Singh, Paolo Bonzini, simon.guinot, linux-efi, kvm,
rkrcmar, matt, linux-pci, linus.walleij, gary.hook, linux-mm,
paul.gortmaker, hpa, cl, dan.j.williams, aarcange, sfr,
andriy.shevchenko, herbert, bhe, xemul, joro, x86, peterz,
piotr.luc, mingo, msalter, ross.zwisler, dyoung, jroedel,
keescook, arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <cb1693fc-6876-a448-f485-1a6c70aa6ff5@amd.com>
On Thu, Mar 16, 2017 at 11:11:26AM -0500, Tom Lendacky wrote:
> Not quite. The guest still needs to understand about the encryption mask
> so that it can protect memory by setting the encryption mask in the
> pagetable entries. It can also decide when to share memory with the
> hypervisor by not setting the encryption mask in the pagetable entries.
Ok, so the kernel - by that I mean both the baremetal and guest kernel -
needs to know whether we're encrypting stuff. So it needs to know about
SME.
> "Instruction fetches are always decrypted under SEV" means that,
> regardless of how a virtual address is mapped, encrypted or decrypted,
> if an instruction fetch is performed by the CPU from that address it
> will always be decrypted. This is to prevent the hypervisor from
> injecting executable code into the guest since it would have to be
> valid encrypted instructions.
Ok, so the guest needs to map its pages encrypted.
Which reminds me, KSM might be a PITA to enable with SEV but that's a
different story. :)
> There are many areas that use the same logic, but there are certain
> situations where we need to check between SME vs SEV (e.g. DMA operation
> setup or decrypting the trampoline area) and act accordingly.
Right, and I'd like to keep those areas where it differs at minimum and
nicely cordoned off from the main paths.
So looking back at the current patch in this subthread:
we do check
* CPUID 0x40000000
* 8000_001F[EAX] for SME
* 8000_001F[EBX][5:0] for the encryption bits.
So how about we generate the following CPUID picture for the guest:
CPUID_Fn8000001F_EAX = ...10b
That is, SME bit is cleared, SEV is set. This will mean for the guest
kernel that SEV is enabled and you can avoid yourself the 0x40000000
leaf check and the additional KVM feature bit glue.
10b configuration will be invalid for baremetal as - I'm assuming - you
can't have SEV=1b with SME=0b. It will be a virt-only configuration and
this way you can even avoid the hypervisor-specific detection but do
that for all.
Hmmm?
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply
* Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
From: Tom Lendacky @ 2017-03-16 16:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: Brijesh Singh, Paolo Bonzini, simon.guinot, linux-efi, kvm,
rkrcmar, matt, linux-pci, linus.walleij, gary.hook, linux-mm,
paul.gortmaker, hpa, cl, dan.j.williams, aarcange, sfr,
andriy.shevchenko, herbert, bhe, xemul, joro, x86, peterz,
piotr.luc, mingo, msalter, ross.zwisler, dyoung, jroedel,
keescook, arnd, toshi.kani, mathieu.desnoyers, luto, devel
In-Reply-To: <20170316150957.dos6wp3pmhos4hkj@pd.tnic>
On 3/16/2017 10:09 AM, Borislav Petkov wrote:
> On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote:
>> Because there are differences between how SME and SEV behave
>> (instruction fetches are always decrypted under SEV, DMA to an
>> encrypted location is not supported under SEV, etc.) we need to
>> determine which mode we are in so that things can be setup properly
>> during boot. For example, if SEV is active the kernel will already
>> be encrypted and so we don't perform that step or the trampoline area
>> for bringing up an AP must be decrypted for SME but encrypted for SEV.
>
> So with SEV enabled, it seems to me a guest doesn't know anything about
> encryption and can run as if SME is disabled. So sme_active() will be
> false. And then the kernel can bypass all that code dealing with SME.
>
> So a guest should simply run like on baremetal with no SME, IMHO.
>
Not quite. The guest still needs to understand about the encryption mask
so that it can protect memory by setting the encryption mask in the
pagetable entries. It can also decide when to share memory with the
hypervisor by not setting the encryption mask in the pagetable entries.
> But then there's that part: "instruction fetches are always decrypted
> under SEV". What does that mean exactly? And how much of that code can
"Instruction fetches are always decrypted under SEV" means that,
regardless of how a virtual address is mapped, encrypted or decrypted,
if an instruction fetch is performed by the CPU from that address it
will always be decrypted. This is to prevent the hypervisor from
injecting executable code into the guest since it would have to be
valid encrypted instructions.
> be reused so that
>
> * SME on baremetal
> * SEV on guest
>
> use the same logic?
There are many areas that use the same logic, but there are certain
situations where we need to check between SME vs SEV (e.g. DMA operation
setup or decrypting the trampoline area) and act accordingly.
Thanks,
Tom
>
> Having the larger SEV preparation part on the kvm host side is perfectly
> fine. But I'd like to keep kernel initialization paths clean.
>
> Thanks.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
From: Borislav Petkov @ 2017-03-16 15:09 UTC (permalink / raw)
To: Tom Lendacky
Cc: Brijesh Singh, Paolo Bonzini, simon.guinot, linux-efi, kvm,
rkrcmar, matt, linux-pci, linus.walleij, gary.hook, linux-mm,
paul.gortmaker, hpa, cl, dan.j.williams, aarcange, sfr,
andriy.shevchenko, herbert, bhe, xemul, joro, x86, peterz,
piotr.luc, mingo, msalter, ross.zwisler, dyoung, jroedel,
keescook, arnd, toshi.kani, mathieu.desnoyers, luto, devel,
bhelgaas, tglx, mcheha
In-Reply-To: <b27126ee-aff0-ab11-706b-fc6d8d4901db@amd.com>
On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote:
> Because there are differences between how SME and SEV behave
> (instruction fetches are always decrypted under SEV, DMA to an
> encrypted location is not supported under SEV, etc.) we need to
> determine which mode we are in so that things can be setup properly
> during boot. For example, if SEV is active the kernel will already
> be encrypted and so we don't perform that step or the trampoline area
> for bringing up an AP must be decrypted for SME but encrypted for SEV.
So with SEV enabled, it seems to me a guest doesn't know anything about
encryption and can run as if SME is disabled. So sme_active() will be
false. And then the kernel can bypass all that code dealing with SME.
So a guest should simply run like on baremetal with no SME, IMHO.
But then there's that part: "instruction fetches are always decrypted
under SEV". What does that mean exactly? And how much of that code can
be reused so that
* SME on baremetal
* SEV on guest
use the same logic?
Having the larger SEV preparation part on the kvm host side is perfectly
fine. But I'd like to keep kernel initialization paths clean.
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
From: Tom Lendacky @ 2017-03-16 14:28 UTC (permalink / raw)
To: Borislav Petkov, Brijesh Singh
Cc: Paolo Bonzini, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, dyoung, jroedel, keescook, arnd, toshi.kani,
<math
In-Reply-To: <20170316101656.dcwgtn4qdtyp5hip@pd.tnic>
On 3/16/2017 5:16 AM, Borislav Petkov wrote:
> On Fri, Mar 10, 2017 at 10:35:30AM -0600, Brijesh Singh wrote:
>> We could update this patch to use the below logic:
>>
>> * CPUID(0) - Check for AuthenticAMD
>> * CPID(1) - Check if under hypervisor
>> * CPUID(0x80000000) - Check for highest supported leaf
>> * CPUID(0x8000001F).EAX - Check for SME and SEV support
>> * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set
>
> Actually, it is still not clear to me *why* we need to do anything
> special wrt SEV in the guest.
>
> Lemme clarify: why can't the guest boot just like a normal Linux on
> baremetal and use the SME(!) detection code to set sme_enable and so
> on? IOW, I'd like to avoid all those checks whether we're running under
> hypervisor and handle all that like we're running on baremetal.
Because there are differences between how SME and SEV behave
(instruction fetches are always decrypted under SEV, DMA to an
encrypted location is not supported under SEV, etc.) we need to
determine which mode we are in so that things can be setup properly
during boot. For example, if SEV is active the kernel will already
be encrypted and so we don't perform that step or the trampoline area
for bringing up an AP must be decrypted for SME but encrypted for SEV.
The hypervisor check will provide that ability to determine how we
handle things.
Thanks,
Tom
>
^ permalink raw reply
* [PATCH] md5: remove from lib and only live in crypto
From: Jason A. Donenfeld @ 2017-03-16 14:18 UTC (permalink / raw)
To: herbert, linux-crypto, linux-kernel; +Cc: Jason A. Donenfeld
The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
In the last patch like this, we managed to get rid of halfmd4 from this
file. In this series we get rid of md5, now that the patches have landed
that remove such improper md5 usage from the kernel. When a final
dependency on the (dead) sha1 is removed, then cryptohash.h will be removed
all together. This patch is for the md5 removal.
crypto/md5.c | 95 +++++++++++++++++++++++++++++++++++++++++++++-
include/linux/cryptohash.h | 5 ---
lib/Makefile | 2 +-
lib/md5.c | 95 ----------------------------------------------
4 files changed, 95 insertions(+), 102 deletions(-)
delete mode 100644 lib/md5.c
diff --git a/crypto/md5.c b/crypto/md5.c
index 2355a7c25c45..f7ae1a48225b 100644
--- a/crypto/md5.c
+++ b/crypto/md5.c
@@ -21,9 +21,11 @@
#include <linux/module.h>
#include <linux/string.h>
#include <linux/types.h>
-#include <linux/cryptohash.h>
#include <asm/byteorder.h>
+#define MD5_DIGEST_WORDS 4
+#define MD5_MESSAGE_BYTES 64
+
const u8 md5_zero_message_hash[MD5_DIGEST_SIZE] = {
0xd4, 0x1d, 0x8c, 0xd9, 0x8f, 0x00, 0xb2, 0x04,
0xe9, 0x80, 0x09, 0x98, 0xec, 0xf8, 0x42, 0x7e,
@@ -47,6 +49,97 @@ static inline void cpu_to_le32_array(u32 *buf, unsigned int words)
}
}
+#define F1(x, y, z) (z ^ (x & (y ^ z)))
+#define F2(x, y, z) F1(z, x, y)
+#define F3(x, y, z) (x ^ y ^ z)
+#define F4(x, y, z) (y ^ (x | ~z))
+
+#define MD5STEP(f, w, x, y, z, in, s) \
+ (w += f(x, y, z) + in, w = (w<<s | w>>(32-s)) + x)
+
+static void md5_transform(__u32 *hash, __u32 const *in)
+{
+ u32 a, b, c, d;
+
+ a = hash[0];
+ b = hash[1];
+ c = hash[2];
+ d = hash[3];
+
+ MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
+ MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
+ MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
+ MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
+ MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
+ MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
+ MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
+ MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
+ MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
+ MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
+ MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17);
+ MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
+ MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
+ MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
+ MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
+ MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
+
+ MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
+ MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
+ MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
+ MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
+ MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
+ MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
+ MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
+ MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
+ MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
+ MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
+ MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
+ MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
+ MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
+ MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
+ MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
+ MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
+
+ MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
+ MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
+ MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
+ MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
+ MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
+ MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
+ MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
+ MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
+ MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
+ MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
+ MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
+ MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
+ MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
+ MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
+ MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
+ MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
+
+ MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
+ MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
+ MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
+ MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
+ MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6);
+ MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10);
+ MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15);
+ MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21);
+ MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6);
+ MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10);
+ MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15);
+ MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21);
+ MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6);
+ MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10);
+ MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15);
+ MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21);
+
+ hash[0] += a;
+ hash[1] += b;
+ hash[2] += c;
+ hash[3] += d;
+}
+
static inline void md5_transform_helper(struct md5_state *ctx)
{
le32_to_cpu_array(ctx->block, sizeof(ctx->block) / sizeof(u32));
diff --git a/include/linux/cryptohash.h b/include/linux/cryptohash.h
index 3252799832cf..df4d3e943d28 100644
--- a/include/linux/cryptohash.h
+++ b/include/linux/cryptohash.h
@@ -10,9 +10,4 @@
void sha_init(__u32 *buf);
void sha_transform(__u32 *digest, const char *data, __u32 *W);
-#define MD5_DIGEST_WORDS 4
-#define MD5_MESSAGE_BYTES 64
-
-void md5_transform(__u32 *hash, __u32 const *in);
-
#endif
diff --git a/lib/Makefile b/lib/Makefile
index 320ac46a8725..acbc16bed9af 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -19,7 +19,7 @@ KCOV_INSTRUMENT_dynamic_debug.o := n
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o timerqueue.o\
idr.o int_sqrt.o extable.o \
- sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
+ sha1.o chacha20.o irq_regs.o argv_split.o \
flex_proportions.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
earlycpio.o seq_buf.o siphash.o \
diff --git a/lib/md5.c b/lib/md5.c
deleted file mode 100644
index bb0cd01d356d..000000000000
--- a/lib/md5.c
+++ /dev/null
@@ -1,95 +0,0 @@
-#include <linux/compiler.h>
-#include <linux/export.h>
-#include <linux/cryptohash.h>
-
-#define F1(x, y, z) (z ^ (x & (y ^ z)))
-#define F2(x, y, z) F1(z, x, y)
-#define F3(x, y, z) (x ^ y ^ z)
-#define F4(x, y, z) (y ^ (x | ~z))
-
-#define MD5STEP(f, w, x, y, z, in, s) \
- (w += f(x, y, z) + in, w = (w<<s | w>>(32-s)) + x)
-
-void md5_transform(__u32 *hash, __u32 const *in)
-{
- u32 a, b, c, d;
-
- a = hash[0];
- b = hash[1];
- c = hash[2];
- d = hash[3];
-
- MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
- MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
- MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
- MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
- MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
- MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
- MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
- MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
- MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
- MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
- MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17);
- MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
- MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
- MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
- MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
- MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
-
- MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
- MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
- MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
- MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
- MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
- MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
- MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
- MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
- MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
- MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
- MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
- MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
- MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
- MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
- MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
- MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
-
- MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
- MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
- MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
- MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
- MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
- MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
- MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
- MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
- MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
- MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
- MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
- MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
- MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
- MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
- MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
- MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
-
- MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
- MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
- MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
- MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
- MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6);
- MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10);
- MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15);
- MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21);
- MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6);
- MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10);
- MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15);
- MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21);
- MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6);
- MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10);
- MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15);
- MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21);
-
- hash[0] += a;
- hash[1] += b;
- hash[2] += c;
- hash[3] += d;
-}
-EXPORT_SYMBOL(md5_transform);
--
2.11.1
^ permalink raw reply related
* Re: CRYPTO_MAX_ALG_NAME is too low
From: Alexander Sverdlin @ 2017-03-16 14:16 UTC (permalink / raw)
To: Herbert Xu, David S. Miller; +Cc: linux-crypto
In-Reply-To: <e96825bd-4e1d-c6c1-160e-0dc10f7f3981@nokia.com>
Hello!
On 10/03/17 12:55, Alexander Sverdlin wrote:
> Hello crypto maintainers!
>
> We've found and example of the ipsec algorithm combination, which doesn't fit
> into CRYPTO_MAX_ALG_NAME long buffers:
>
> ip x s add src 1.1.1.1 dst 1.1.1.2 proto esp spi 0 mode tunnel enc des3_ede 0x0 auth sha256 0x0 flag esn replay-window 256
>
> produces "echainiv(authencesn(hmac(sha256-generic),cbc(des3_ede-generic)))"
> on the machines without optimized crypto drivers, which doesn't fit into current
> 64-bytes buffers.
>
> I see two possible options:
>
> a) split CRYPTO_MAX_ALG_NAME into CRYPTO_MAX_ALG_NAME + CRYPTO_MAX_DRV_NAME pair
> and make later, say, 96, because the former probably cannot be changed because of
> numerous user-space exports. And change half of the code to use new define.
>
> b) rename *-generic algorithms to *-gen, so that cra_driver_name will be shortened,
> while MODULE_ALIAS_CRYPTO() could still be maintained in old and new form.
>
> What are your thoughts?
Any?
This is a regression caused by 856e3f4092
("crypto: seqiv - Add support for new AEAD interface")
As I've said above, I can offer one of the two solutions, which patch should I send?
Or do you see any better alternatives?
--
Best regards,
Alexander Sverdlin.
^ permalink raw reply
* Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support
From: Horia Geantă @ 2017-03-16 14:13 UTC (permalink / raw)
To: Herbert Xu, Scott Wood, Roy Pledge
Cc: linux-arm-kernel@lists.infradead.org, Claudiu Manoil,
Cristian Stoica, Dan Douglass, linux-crypto@vger.kernel.org,
Vakul Garg, Alexandru Porosanu
In-Reply-To: <20170316084249.GB11653@gondor.apana.org.au>
On 3/16/2017 10:43 AM, Herbert Xu wrote:
> On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote:
>> The patchset adds support for CAAM Queue Interface (QI), the additional
>> interface (besides job ring) available for submitting jobs to the engine
>> on platforms having DPAA (Datapath Acceleration Architecture).
>>
>> Patches 1-4 are QMan dependencies.
>> I would prefer to take them through the crypto tree,
>> but I am open to suggestions.
>>
>> Patch 5 adds a missing double inclusion guard in desc_constr.h.
>>
>> Patch 6 adds the caam/qi job submission backend.
>>
>> Patch 7 adds algorithms (ablkcipher and authenc) that run on top
>> of caam/qi. For now, their priority is set lower than caam/jr.
>
> I'm fine with the crypto bits.
>
Herbert,
Thanks for the review.
Should I submit formally, i.e. without the [RFC] prefix?
Scott, Roy,
Do you have any comments wrt. soc/qman patches or with them going
through the crypto tree?
Thanks,
Horia
^ permalink raw reply
* Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
From: Paolo Bonzini @ 2017-03-16 13:15 UTC (permalink / raw)
To: Brijesh Singh, Borislav Petkov
Cc: simon.guinot, linux-efi, kvm, rkrcmar, matt, linux-pci,
linus.walleij, gary.hook, linux-mm, paul.gortmaker, hpa, cl,
dan.j.williams, aarcange, sfr, andriy.shevchenko, herbert, bhe,
xemul, joro, x86, peterz, piotr.luc, mingo, msalter, ross.zwisler,
dyoung, thomas.lendacky, jroedel, keescook, arnd, toshi.kani,
mathieu.desnoyers, luto, devel, bhelgaas
In-Reply-To: <cb6a9a56-2c52-d98d-3ff6-3b61d0e5875e@amd.com>
[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]
On 10/03/2017 23:41, Brijesh Singh wrote:
>> Maybe there's a reason this fires:
>>
>> WARNING: modpost: Found 2 section mismatch(es).
>> To see full details build your kernel with:
>> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
>>
>> WARNING: vmlinux.o(.text+0x48edc): Section mismatch in reference from
>> the function __change_page_attr() to the function
>> .init.text:memblock_alloc()
>> The function __change_page_attr() references
>> the function __init memblock_alloc().
>> This is often because __change_page_attr lacks a __init
>> annotation or the annotation of memblock_alloc is wrong.
>>
>> WARNING: vmlinux.o(.text+0x491d1): Section mismatch in reference from
>> the function __change_page_attr() to the function
>> .meminit.text:memblock_free()
>> The function __change_page_attr() references
>> the function __meminit memblock_free().
>> This is often because __change_page_attr lacks a __meminit
>> annotation or the annotation of memblock_free is wrong.
>>
>> But maybe Paolo might have an even better idea...
>
> I am sure he will have better idea :)
Not sure if it's better or worse, but an alternative idea is to turn
__change_page_attr and __change_page_attr_set_clr inside out, so that:
1) the alloc_pages/__free_page happens in __change_page_attr_set_clr;
2) __change_page_attr_set_clr overall does not beocome more complex.
Then you can introduce __early_change_page_attr_set_clr and/or
early_kernel_map_pages_in_pgd, for use in your next patches. They use
the memblock allocator instead of alloc/free_page
The attached patch is compile-tested only and almost certainly has some
thinko in it. But it even skims a few lines from the code so the idea
might have some merit.
Paolo
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: alloc-in-cpa-set-clr.patch --]
[-- Type: text/x-patch; name="alloc-in-cpa-set-clr.patch", Size: 10219 bytes --]
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 28d42130243c..953c8e697562 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -490,11 +490,12 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
}
static int
-try_preserve_large_page(pte_t *kpte, unsigned long address,
+try_preserve_large_page(pte_t **p_kpte, unsigned long address,
struct cpa_data *cpa)
{
unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
- pte_t new_pte, old_pte, *tmp;
+ pte_t *kpte = *p_kpte;
+ pte_t new_pte, old_pte;
pgprot_t old_prot, new_prot, req_prot;
int i, do_split = 1;
enum pg_level level;
@@ -507,8 +508,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
* Check for races, another CPU might have split this page
* up already:
*/
- tmp = _lookup_address_cpa(cpa, address, &level);
- if (tmp != kpte)
+ *p_kpte = _lookup_address_cpa(cpa, address, &level);
+ if (*p_kpte != kpte)
goto out_unlock;
switch (level) {
@@ -634,17 +635,18 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
unsigned int i, level;
pte_t *tmp;
pgprot_t ref_prot;
+ int retry = 1;
+ if (!debug_pagealloc_enabled())
+ spin_lock(&cpa_lock);
spin_lock(&pgd_lock);
/*
* Check for races, another CPU might have split this page
* up for us already:
*/
tmp = _lookup_address_cpa(cpa, address, &level);
- if (tmp != kpte) {
- spin_unlock(&pgd_lock);
- return 1;
- }
+ if (tmp != kpte)
+ goto out;
paravirt_alloc_pte(&init_mm, page_to_pfn(base));
@@ -671,10 +673,11 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
break;
default:
- spin_unlock(&pgd_lock);
- return 1;
+ goto out;
}
+ retry = 0;
+
/*
* Set the GLOBAL flags only if the PRESENT flag is set
* otherwise pmd/pte_present will return true even on a non
@@ -718,28 +721,34 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
* going on.
*/
__flush_tlb_all();
- spin_unlock(&pgd_lock);
- return 0;
-}
-
-static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
- unsigned long address)
-{
- struct page *base;
+out:
+ spin_unlock(&pgd_lock);
+ /*
+ * Do a global flush tlb after splitting the large page
+ * and before we do the actual change page attribute in the PTE.
+ *
+ * With out this, we violate the TLB application note, that says
+ * "The TLBs may contain both ordinary and large-page
+ * translations for a 4-KByte range of linear addresses. This
+ * may occur if software modifies the paging structures so that
+ * the page size used for the address range changes. If the two
+ * translations differ with respect to page frame or attributes
+ * (e.g., permissions), processor behavior is undefined and may
+ * be implementation-specific."
+ *
+ * We do this global tlb flush inside the cpa_lock, so that we
+ * don't allow any other cpu, with stale tlb entries change the
+ * page attribute in parallel, that also falls into the
+ * just split large page entry.
+ */
+ if (!retry)
+ flush_tlb_all();
if (!debug_pagealloc_enabled())
spin_unlock(&cpa_lock);
- base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
- if (!debug_pagealloc_enabled())
- spin_lock(&cpa_lock);
- if (!base)
- return -ENOMEM;
-
- if (__split_large_page(cpa, kpte, address, base))
- __free_page(base);
- return 0;
+ return retry;
}
static bool try_to_free_pte_page(pte_t *pte)
@@ -1166,30 +1175,26 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
}
}
-static int __change_page_attr(struct cpa_data *cpa, int primary)
+static int __change_page_attr(struct cpa_data *cpa, pte_t **p_kpte, unsigned long address,
+ int primary)
{
- unsigned long address;
- int do_split, err;
unsigned int level;
pte_t *kpte, old_pte;
+ int err = 0;
- if (cpa->flags & CPA_PAGES_ARRAY) {
- struct page *page = cpa->pages[cpa->curpage];
- if (unlikely(PageHighMem(page)))
- return 0;
- address = (unsigned long)page_address(page);
- } else if (cpa->flags & CPA_ARRAY)
- address = cpa->vaddr[cpa->curpage];
- else
- address = *cpa->vaddr;
-repeat:
- kpte = _lookup_address_cpa(cpa, address, &level);
- if (!kpte)
- return __cpa_process_fault(cpa, address, primary);
+ if (!debug_pagealloc_enabled())
+ spin_lock(&cpa_lock);
+ *p_kpte = kpte = _lookup_address_cpa(cpa, address, &level);
+ if (!kpte) {
+ err = __cpa_process_fault(cpa, address, primary);
+ goto out;
+ }
old_pte = *kpte;
- if (pte_none(old_pte))
- return __cpa_process_fault(cpa, address, primary);
+ if (pte_none(old_pte)) {
+ err = __cpa_process_fault(cpa, address, primary);
+ goto out;
+ }
if (level == PG_LEVEL_4K) {
pte_t new_pte;
@@ -1228,59 +1233,27 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
cpa->flags |= CPA_FLUSHTLB;
}
cpa->numpages = 1;
- return 0;
+ goto out;
}
/*
* Check, whether we can keep the large page intact
* and just change the pte:
*/
- do_split = try_preserve_large_page(kpte, address, cpa);
- /*
- * When the range fits into the existing large page,
- * return. cp->numpages and cpa->tlbflush have been updated in
- * try_large_page:
- */
- if (do_split <= 0)
- return do_split;
-
- /*
- * We have to split the large page:
- */
- err = split_large_page(cpa, kpte, address);
- if (!err) {
- /*
- * Do a global flush tlb after splitting the large page
- * and before we do the actual change page attribute in the PTE.
- *
- * With out this, we violate the TLB application note, that says
- * "The TLBs may contain both ordinary and large-page
- * translations for a 4-KByte range of linear addresses. This
- * may occur if software modifies the paging structures so that
- * the page size used for the address range changes. If the two
- * translations differ with respect to page frame or attributes
- * (e.g., permissions), processor behavior is undefined and may
- * be implementation-specific."
- *
- * We do this global tlb flush inside the cpa_lock, so that we
- * don't allow any other cpu, with stale tlb entries change the
- * page attribute in parallel, that also falls into the
- * just split large page entry.
- */
- flush_tlb_all();
- goto repeat;
- }
+ err = try_preserve_large_page(p_kpte, address, cpa);
+out:
+ if (!debug_pagealloc_enabled())
+ spin_unlock(&cpa_lock);
return err;
}
static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias);
-static int cpa_process_alias(struct cpa_data *cpa)
+static int cpa_process_alias(struct cpa_data *cpa, unsigned long vaddr)
{
struct cpa_data alias_cpa;
unsigned long laddr = (unsigned long)__va(cpa->pfn << PAGE_SHIFT);
- unsigned long vaddr;
int ret;
if (!pfn_range_is_mapped(cpa->pfn, cpa->pfn + 1))
@@ -1290,16 +1263,6 @@ static int cpa_process_alias(struct cpa_data *cpa)
* No need to redo, when the primary call touched the direct
* mapping already:
*/
- if (cpa->flags & CPA_PAGES_ARRAY) {
- struct page *page = cpa->pages[cpa->curpage];
- if (unlikely(PageHighMem(page)))
- return 0;
- vaddr = (unsigned long)page_address(page);
- } else if (cpa->flags & CPA_ARRAY)
- vaddr = cpa->vaddr[cpa->curpage];
- else
- vaddr = *cpa->vaddr;
-
if (!(within(vaddr, PAGE_OFFSET,
PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT)))) {
@@ -1338,33 +1301,64 @@ static int cpa_process_alias(struct cpa_data *cpa)
return 0;
}
+static unsigned long cpa_address(struct cpa_data *cpa, unsigned long numpages)
+{
+ /*
+ * Store the remaining nr of pages for the large page
+ * preservation check.
+ */
+ /* for array changes, we can't use large page */
+ cpa->numpages = 1;
+ if (cpa->flags & CPA_PAGES_ARRAY) {
+ struct page *page = cpa->pages[cpa->curpage];
+ if (unlikely(PageHighMem(page)))
+ return -EINVAL;
+ return (unsigned long)page_address(page);
+ } else if (cpa->flags & CPA_ARRAY) {
+ return cpa->vaddr[cpa->curpage];
+ } else {
+ cpa->numpages = numpages;
+ return *cpa->vaddr;
+ }
+}
+
+static void cpa_advance(struct cpa_data *cpa)
+{
+ if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY))
+ cpa->curpage++;
+ else
+ *cpa->vaddr += cpa->numpages * PAGE_SIZE;
+}
+
static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
{
unsigned long numpages = cpa->numpages;
+ unsigned long vaddr;
+ struct page *base;
+ pte_t *kpte;
int ret;
while (numpages) {
- /*
- * Store the remaining nr of pages for the large page
- * preservation check.
- */
- cpa->numpages = numpages;
- /* for array changes, we can't use large page */
- if (cpa->flags & (CPA_ARRAY | CPA_PAGES_ARRAY))
- cpa->numpages = 1;
-
- if (!debug_pagealloc_enabled())
- spin_lock(&cpa_lock);
- ret = __change_page_attr(cpa, checkalias);
- if (!debug_pagealloc_enabled())
- spin_unlock(&cpa_lock);
- if (ret)
- return ret;
-
- if (checkalias) {
- ret = cpa_process_alias(cpa);
- if (ret)
+ vaddr = cpa_address(cpa, numpages);
+ if (!IS_ERR_VALUE(vaddr)) {
+repeat:
+ ret = __change_page_attr(cpa, &kpte, vaddr, checkalias);
+ if (ret < 0)
return ret;
+ if (ret) {
+ base = alloc_page(GFP_KERNEL|__GFP_NOTRACK);
+ if (!base)
+ return -ENOMEM;
+ if (__split_large_page(cpa, kpte, vaddr, base))
+ __free_page(base);
+ goto repeat;
+ }
+
+ if (checkalias) {
+ ret = cpa_process_alias(cpa, vaddr);
+ if (ret < 0)
+ return ret;
+ }
}
/*
@@ -1374,11 +1368,7 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
*/
BUG_ON(cpa->numpages > numpages || !cpa->numpages);
numpages -= cpa->numpages;
- if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY))
- cpa->curpage++;
- else
- *cpa->vaddr += cpa->numpages * PAGE_SIZE;
-
+ cpa_advance(cpa);
}
return 0;
}
^ permalink raw reply related
* Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
From: Daniel Axtens @ 2017-03-16 12:54 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev, linux-crypto; +Cc: anton
In-Reply-To: <87y3w5jy8y.fsf@concordia.ellerman.id.au>
> So although this sits in arch/powerpc, it's heavy on the crypto which is
> not my area of expertise (to say the least!), so I think it should
> probably go via Herbert and the crypto tree?
That was my thought as well. Sorry - probably should have put that in
the comments somewhere.
Regards,
Daniel
^ permalink raw reply
* Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
From: Stephan Müller @ 2017-03-16 12:20 UTC (permalink / raw)
To: Stephan Müller; +Cc: Herbert Xu, linux-crypto
In-Reply-To: <4808097.uXnqYIEybL@tauon.atsec.com>
Am Donnerstag, 16. März 2017, 11:18:33 CET schrieb Stephan Müller:
Hi Stephan,
> Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:
>
> Hi Herbert,
>
> > First of all you're only limiting the amount of memory occupied
> > by the SG list which is not the same thing as the memory pinned
> > down by the actual recvmsg.
>
> I am fully aware of that. As this was present in the code, I thought I could
> reuse that approach.
>
> Are you saying that you want to stop this approach?
I would like to bring another point to the table for this discussion relating
to the maximum amount of memory to be used as well as for the size of the RX-
SGL: allow us please to consider skcipher_sendmsg (sendpage works conceptually
similar, so it should be covered with this discussion as well).
skcipher_alloc_sgl used in the sendmsg code path is conceptually identical to
the RX-SGL allocation that I propose here: it allocates a new SGL with
MAX_SGL_ENTS entries as long as the caller sends data. It limits the amount of
memory to be allocated based on the maximum size of the SG management data and
not the actual plaintext/ciphertext data sent by user space. I again was
therefore under the impression that my suggested approach in the recvmsg code
path regarding allocation of memory would be allowed.
Regarding the amount of data processed with the RX-SGL, I would like to
consider the size of the TX-SGL. In the skcipher case, the RX-SGL (or the
anticipated RX data to be processed) does not need to be larger than the TX-
SGL. Hence the currently existing while() loop of the upstream code that we
discuss here will only be executed as often to fulfill the available TX-SGL
data size. Given that both are limited on the sock_kmalloc memory limitation,
I would imply that using a conceptually identical allocation approach for the
TX SGL and RX SGL would be a sufficient replacement for the while-loop without
being visible to user space (i.e. without causing a limit that was not there
before).
Ciao
Stephan
^ permalink raw reply
* Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
From: Paolo Bonzini @ 2017-03-16 12:28 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846771545.2349.9373586041426414252.stgit@brijesh-build-machine>
On 02/03/2017 16:15, Brijesh Singh wrote:
>
> __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
> - struct page *base)
> + pte_t *pbase, unsigned long new_pfn)
> {
> - pte_t *pbase = (pte_t *)page_address(base);
Just one comment and I'll reply to Boris, I think you can compute pbase
with pfn_to_kaddr, and avoid adding a new argument.
> */
> - __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE)));
> + __set_pmd_pte(kpte, address,
> + native_make_pte((new_pfn << PAGE_SHIFT) + _KERNPG_TABLE));
And this probably is better written as:
__set_pmd_pte(kpte, address, pfn_pte(new_pfn, __pgprot(_KERNPG_TABLE));
Paolo
^ permalink raw reply
* Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
From: Anton Blanchard @ 2017-03-16 11:13 UTC (permalink / raw)
To: David Laight
Cc: linuxppc-dev@lists.ozlabs.org, linux-crypto@vger.kernel.org,
'Daniel Axtens'
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DCFFB1A81@AcuExch.aculab.com>
Hi David,
> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.
btrfs data checksumming is one area.
> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.
A branch on POWER8 is a three cycle redirect. The vpmsum instructions
are 6 cycles.
> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.
It went through quite a lot of simulation to reach peak performance.
The loop is quite delicate, we have to pace it just right to avoid
some pipeline reject conditions.
Note also that we already modulo schedule the loop across three
iterations, required to hide the latency of the vpmsum instructions.
Anton
^ permalink raw reply
* Re: [RFC PATCH v2 16/32] x86: kvm: Provide support to create Guest and HV shared per-CPU variables
From: Paolo Bonzini @ 2017-03-16 11:06 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846773666.2349.9492983018843773590.stgit@brijesh-build-machine>
On 02/03/2017 16:15, Brijesh Singh wrote:
> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
> variable at compile time and share its physical address with hypervisor.
> It presents a challege when SEV is active in guest OS. When SEV is active,
> guest memory is encrypted with guest key and hypervisor will no longer able
> to modify the guest memory. When SEV is active, we need to clear the
> encryption attribute of shared physical addresses so that both guest and
> hypervisor can access the data.
>
> To solve this problem, I have tried these three options:
>
> 1) Convert the static per-CPU to dynamic per-CPU allocation. When SEV is
> detected then clear the encryption attribute. But while doing so I found
> that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init was
> called.
>
> 2) Since the encryption attributes works on PAGE_SIZE hence add some extra
> padding to 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime
> clear the encryption attribute of the full PAGE. The downside of this was
> now we need to modify structure which may break the compatibility.
>
> 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
> used to hold the compile time shared per-CPU variables. When SEV is
> detected we map this section with encryption attribute cleared.
>
> This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED
> macro to create a compile time per-CPU variable. When SEV is detected we
> map the per-CPU variable as decrypted (i.e with encryption attribute cleared).
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Looks good to me.
Paolo
> ---
> arch/x86/kernel/kvm.c | 43 +++++++++++++++++++++++++++++++------
> include/asm-generic/vmlinux.lds.h | 3 +++
> include/linux/percpu-defs.h | 9 ++++++++
> 3 files changed, 48 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 099fcba..706a08e 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
>
> early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
>
> -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) __aligned(64);
> static int has_steal_clock = 0;
>
> /*
> @@ -290,6 +290,22 @@ static void __init paravirt_ops_setup(void)
> #endif
> }
>
> +static int kvm_map_percpu_hv_shared(void *addr, unsigned long size)
> +{
> + /* When SEV is active, the percpu static variables initialized
> + * in data section will contain the encrypted data so we first
> + * need to decrypt it and then map it as decrypted.
> + */
> + if (sev_active()) {
> + unsigned long pa = slow_virt_to_phys(addr);
> +
> + sme_early_decrypt(pa, size);
> + return early_set_memory_decrypted(addr, size);
> + }
> +
> + return 0;
> +}
> +
> static void kvm_register_steal_time(void)
> {
> int cpu = smp_processor_id();
> @@ -298,12 +314,17 @@ static void kvm_register_steal_time(void)
> if (!has_steal_clock)
> return;
>
> + if (kvm_map_percpu_hv_shared(st, sizeof(*st))) {
> + pr_err("kvm-stealtime: failed to map hv_shared percpu\n");
> + return;
> + }
> +
> wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
> pr_info("kvm-stealtime: cpu %d, msr %llx\n",
> cpu, (unsigned long long) slow_virt_to_phys(st));
> }
>
> -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
>
> static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
> {
> @@ -327,25 +348,33 @@ static void kvm_guest_cpu_init(void)
> if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
> u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
>
> + if (kvm_map_percpu_hv_shared(this_cpu_ptr(&apf_reason),
> + sizeof(struct kvm_vcpu_pv_apf_data)))
> + goto skip_asyncpf;
> #ifdef CONFIG_PREEMPT
> pa |= KVM_ASYNC_PF_SEND_ALWAYS;
> #endif
> wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
> __this_cpu_write(apf_reason.enabled, 1);
> - printk(KERN_INFO"KVM setup async PF for cpu %d\n",
> - smp_processor_id());
> + printk(KERN_INFO"KVM setup async PF for cpu %d msr %llx\n",
> + smp_processor_id(), pa);
> }
> -
> +skip_asyncpf:
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
> unsigned long pa;
> /* Size alignment is implied but just to make it explicit. */
> BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4);
> + if (kvm_map_percpu_hv_shared(this_cpu_ptr(&kvm_apic_eoi),
> + sizeof(unsigned long)))
> + goto skip_pv_eoi;
> __this_cpu_write(kvm_apic_eoi, 0);
> pa = slow_virt_to_phys(this_cpu_ptr(&kvm_apic_eoi))
> | KVM_MSR_ENABLED;
> wrmsrl(MSR_KVM_PV_EOI_EN, pa);
> + printk(KERN_INFO"KVM setup PV EOI for cpu %d msr %lx\n",
> + smp_processor_id(), pa);
> }
> -
> +skip_pv_eoi:
> if (has_steal_clock)
> kvm_register_steal_time();
> }
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 0968d13..8d29910 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -773,6 +773,9 @@
> . = ALIGN(cacheline); \
> *(.data..percpu) \
> *(.data..percpu..shared_aligned) \
> + . = ALIGN(PAGE_SIZE); \
> + *(.data..percpu..hv_shared) \
> + . = ALIGN(PAGE_SIZE); \
> VMLINUX_SYMBOL(__per_cpu_end) = .;
>
> /**
> diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
> index 8f16299..5af366e 100644
> --- a/include/linux/percpu-defs.h
> +++ b/include/linux/percpu-defs.h
> @@ -172,6 +172,15 @@
> #define DEFINE_PER_CPU_READ_MOSTLY(type, name) \
> DEFINE_PER_CPU_SECTION(type, name, "..read_mostly")
>
> +/* Declaration/definition used for per-CPU variables that must be shared
> + * between hypervisor and guest OS.
> + */
> +#define DECLARE_PER_CPU_HV_SHARED(type, name) \
> + DECLARE_PER_CPU_SECTION(type, name, "..hv_shared")
> +
> +#define DEFINE_PER_CPU_HV_SHARED(type, name) \
> + DEFINE_PER_CPU_SECTION(type, name, "..hv_shared")
> +
> /*
> * Intermodule exports for per-CPU variables. sparse forgets about
> * address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
>
^ permalink raw reply
* Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command
From: Paolo Bonzini @ 2017-03-16 11:03 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846790758.2349.16768762953657853550.stgit@brijesh-build-machine>
On 02/03/2017 16:18, Brijesh Singh wrote:
> + data = (void *) get_zeroed_page(GFP_KERNEL);
The page does not need to be zeroed, does it?
> +
> + if ((len & 15) || (dst_addr & 15)) {
> + /* if destination address and length are not 16-byte
> + * aligned then:
> + * a) decrypt destination page into temporary buffer
> + * b) copy source data into temporary buffer at correct offset
> + * c) encrypt temporary buffer
> + */
> + ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, &argp->error);
Ah, I see now you're using this function here for read-modify-write.
data is already pinned here, so even if you keep the function it makes
sense to push pinning out of __sev_dbg_decrypt_page and into
sev_dbg_decrypt.
> + if (ret)
> + goto err_3;
> + d_off = dst_addr & (PAGE_SIZE - 1);
> +
> + if (copy_from_user(data + d_off,
> + (uint8_t *)debug.src_addr, len)) {
> + ret = -EFAULT;
> + goto err_3;
> + }
> +
> + encrypt->length = PAGE_SIZE;
Why decrypt/re-encrypt all the page instead of just the 16 byte area
around the [dst_addr, dst_addr+len) range?
> + encrypt->src_addr = __psp_pa(data);
> + encrypt->dst_addr = __sev_page_pa(inpages[0]);
> + } else {
> + if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) {
> + ret = -EFAULT;
> + goto err_3;
> + }
Do you need copy_from_user, or can you just pin/unpin memory as for
DEBUG_DECRYPT?
Paolo
> + d_off = dst_addr & (PAGE_SIZE - 1);
> + encrypt->length = len;
> + encrypt->src_addr = __psp_pa(data);
> + encrypt->dst_addr = __sev_page_pa(inpages[0]);
> + encrypt->dst_addr += d_off;
> + }
> +
> + encrypt->handle = sev_get_handle(kvm);
> + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_ENCRYPT, encrypt, &argp->error);
^ permalink raw reply
* Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command
From: Paolo Bonzini @ 2017-03-16 10:54 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846789744.2349.167641684941925238.stgit@brijesh-build-machine>
On 02/03/2017 16:18, Brijesh Singh wrote:
> +static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src,
> + void *dst, int *error)
> +{
> + inpages = sev_pin_memory(src, PAGE_SIZE, &npages);
> + if (!inpages) {
> + ret = -ENOMEM;
> + goto err_1;
> + }
> +
> + data->handle = sev_get_handle(kvm);
> + data->dst_addr = __psp_pa(dst);
> + data->src_addr = __sev_page_pa(inpages[0]);
> + data->length = PAGE_SIZE;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error);
> + if (ret)
> + printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n",
> + ret, *error);
> + sev_unpin_memory(inpages, npages);
> +err_1:
> + kfree(data);
> + return ret;
> +}
> +
> +static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + void *data;
> + int ret, offset, len;
> + struct kvm_sev_dbg debug;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(&debug, (void *)argp->data,
> + sizeof(struct kvm_sev_dbg)))
> + return -EFAULT;
> + /*
> + * TODO: add support for decrypting length which crosses the
> + * page boundary.
> + */
> + offset = debug.src_addr & (PAGE_SIZE - 1);
> + if (offset + debug.length > PAGE_SIZE)
> + return -EINVAL;
> +
Please do add it, it doesn't seem very different from what you're doing
in LAUNCH_UPDATE_DATA. There's no need for a separate
__sev_dbg_decrypt_page function, you can just pin/unpin here and do a
per-page loop as in LAUNCH_UPDATE_DATA.
Paolo
^ permalink raw reply
* Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command
From: Paolo Bonzini @ 2017-03-16 10:48 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846786714.2349.17724971671841396908.stgit@brijesh-build-machine>
On 02/03/2017 16:17, Brijesh Singh wrote:
> +static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
> + unsigned long *n)
> +{
> + struct page **pages;
> + int first, last;
> + unsigned long npages, pinned;
> +
> + /* Get number of pages */
> + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
> + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
> + npages = (last - first + 1);
> +
> + pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL);
> + if (!pages)
> + return NULL;
> +
> + /* pin the user virtual address */
> + down_read(¤t->mm->mmap_sem);
> + pinned = get_user_pages_fast(uaddr, npages, 1, pages);
> + up_read(¤t->mm->mmap_sem);
get_user_pages_fast, like get_user_pages_unlocked, must be called
without mmap_sem held.
> + if (pinned != npages) {
> + printk(KERN_ERR "SEV: failed to pin %ld pages (got %ld)\n",
> + npages, pinned);
> + goto err;
> + }
> +
> + *n = npages;
> + return pages;
> +err:
> + if (pinned > 0)
> + release_pages(pages, pinned, 0);
> + kfree(pages);
> +
> + return NULL;
> +}
>
> + /* the array of pages returned by get_user_pages() is a page-aligned
> + * memory. Since the user buffer is probably not page-aligned, we need
> + * to calculate the offset within a page for first update entry.
> + */
> + offset = uaddr & (PAGE_SIZE - 1);
> + len = min_t(size_t, (PAGE_SIZE - offset), ulen);
> + ulen -= len;
> +
> + /* update first page -
> + * special care need to be taken for the first page because we might
> + * be dealing with offset within the page
> + */
No need to special case the first page; just set "offset = 0" inside the
loop after the first iteration.
Paolo
> + data->handle = sev_get_handle(kvm);
> + data->length = len;
> + data->address = __sev_page_pa(inpages[0]) + offset;
> + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA,
> + data, &argp->error);
> + if (ret)
> + goto err_3;
> +
> + /* update remaining pages */
> + for (i = 1; i < nr_pages; i++) {
> +
> + len = min_t(size_t, PAGE_SIZE, ulen);
> + ulen -= len;
> + data->length = len;
> + data->address = __sev_page_pa(inpages[i]);
> + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA,
> + data, &argp->error);
> + if (ret)
> + goto err_3;
> + }
^ permalink raw reply
* Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
From: Michael Ellerman @ 2017-03-16 10:45 UTC (permalink / raw)
To: Daniel Axtens, linuxppc-dev, linux-crypto; +Cc: anton, Daniel Axtens
In-Reply-To: <20170315123737.20234-1-dja@axtens.net>
Daniel Axtens <dja@axtens.net> writes:
> The core nuts and bolts of the crc32c vpmsum algorithm will
> also work for a number of other CRC algorithms with different
> polynomials. Factor out the function into a new asm file.
>
> To handle multiple users of the function, a user simply
> provides constants, defines the name of their CRC function,
> and then #includes the core algorithm file.
>
> Cc: Anton Blanchard <anton@samba.org>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> --
>
> It's possible at this point to argue that the address
> of the constant tables should be passed in to the function,
> rather than doing this somewhat unconventional #include.
>
> However, we're about to add further #ifdef's back into the core
> that will be provided by the encapsulaing code, and which couldn't
> be done as a variable without performance loss.
> ---
> arch/powerpc/crypto/crc32-vpmsum_core.S | 726 ++++++++++++++++++++++++++++++++
> arch/powerpc/crypto/crc32c-vpmsum_asm.S | 714 +------------------------------
> 2 files changed, 729 insertions(+), 711 deletions(-)
> create mode 100644 arch/powerpc/crypto/crc32-vpmsum_core.S
So although this sits in arch/powerpc, it's heavy on the crypto which is
not my area of expertise (to say the least!), so I think it should
probably go via Herbert and the crypto tree?
cheers
^ permalink raw reply
* Re: [PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs
From: Michael Ellerman @ 2017-03-16 10:44 UTC (permalink / raw)
To: Daniel Axtens, linuxppc-dev, linux-crypto; +Cc: anton, Daniel Axtens
In-Reply-To: <20170315123737.20234-2-dja@axtens.net>
Daniel Axtens <dja@axtens.net> writes:
> When CRC32c was included in the kernel, Anton ripped out
> the #ifdefs around reflected polynomials, because CRC32c
> is always reflected. However, not all CRCs use reflection
> so we'd like to make it optional.
>
> Restore the REFLECT parts from Anton's original CRC32
> implementation (https://github.com/antonblanchard/crc32-vpmsum)
>
> That implementation is available under GPLv2+, so we're OK
> from a licensing point of view:
> https://github.com/antonblanchard/crc32-vpmsum/blob/master/LICENSE.TXT
It's also written by Anton and copyright IBM, so you (we (IBM)) could
always just relicense it anyway.
So doubly OK IMO.
cheers
^ permalink raw reply
* Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active
From: Paolo Bonzini @ 2017-03-16 10:38 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846793743.2349.8478208161427437950.stgit@brijesh-build-machine>
On 02/03/2017 16:18, Brijesh Singh wrote:
> The SEV memory encryption engine uses a tweak such that two identical
> plaintexts at different location will have a different ciphertexts.
> So swapping or moving ciphertexts of two pages will not result in
> plaintexts being swapped. Relocating (or migrating) a physical backing pages
> for SEV guest will require some additional steps. The current SEV key
> management spec [1] does not provide commands to swap or migrate (move)
> ciphertexts. For now we pin the memory allocated for the SEV guest. In
> future when SEV key management spec provides the commands to support the
> page migration we can update the KVM code to remove the pinning logical
> without making any changes into userspace (qemu).
>
> The patch pins userspace memory when a new slot is created and unpin the
> memory when slot is removed.
>
> [1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf
This is not enough, because memory can be hidden temporarily from the
guest and remapped later. Think of a PCI BAR that is backed by RAM, or
also SMRAM. The pinning must be kept even in that case.
You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map
to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM. In
QEMU you can use a RAMBlockNotifier to invoke the ioctls.
Paolo
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
> arch/x86/include/asm/kvm_host.h | 6 +++
> arch/x86/kvm/svm.c | 93 +++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 3 +
> 3 files changed, 102 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fcc4710..9dc59f0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -723,6 +723,7 @@ struct kvm_sev_info {
> unsigned int handle; /* firmware handle */
> unsigned int asid; /* asid for this guest */
> int sev_fd; /* SEV device fd */
> + struct list_head pinned_memory_slot;
> };
>
> struct kvm_arch {
> @@ -1043,6 +1044,11 @@ struct kvm_x86_ops {
> void (*setup_mce)(struct kvm_vcpu *vcpu);
>
> int (*memory_encryption_op)(struct kvm *kvm, void __user *argp);
> +
> + void (*prepare_memory_region)(struct kvm *kvm,
> + struct kvm_memory_slot *memslot,
> + const struct kvm_userspace_memory_region *mem,
> + enum kvm_mr_change change);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 13996d6..ab973f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -498,12 +498,21 @@ static inline bool gif_set(struct vcpu_svm *svm)
> }
>
> /* Secure Encrypted Virtualization */
> +struct kvm_sev_pinned_memory_slot {
> + struct list_head list;
> + unsigned long npages;
> + struct page **pages;
> + unsigned long userspace_addr;
> + short id;
> +};
> +
> static unsigned int max_sev_asid;
> static unsigned long *sev_asid_bitmap;
> static void sev_deactivate_handle(struct kvm *kvm);
> static void sev_decommission_handle(struct kvm *kvm);
> static int sev_asid_new(void);
> static void sev_asid_free(int asid);
> +static void sev_unpin_memory(struct page **pages, unsigned long npages);
> #define __sev_page_pa(x) ((page_to_pfn(x) << PAGE_SHIFT) | sme_me_mask)
>
> static bool kvm_sev_enabled(void)
> @@ -1544,9 +1553,25 @@ static inline int avic_free_vm_id(int id)
>
> static void sev_vm_destroy(struct kvm *kvm)
> {
> + struct list_head *pos, *q;
> + struct kvm_sev_pinned_memory_slot *pinned_slot;
> + struct list_head *head = &kvm->arch.sev_info.pinned_memory_slot;
> +
> if (!sev_guest(kvm))
> return;
>
> + /* if guest memory is pinned then unpin it now */
> + if (!list_empty(head)) {
> + list_for_each_safe(pos, q, head) {
> + pinned_slot = list_entry(pos,
> + struct kvm_sev_pinned_memory_slot, list);
> + sev_unpin_memory(pinned_slot->pages,
> + pinned_slot->npages);
> + list_del(pos);
> + kfree(pinned_slot);
> + }
> + }
> +
> /* release the firmware resources */
> sev_deactivate_handle(kvm);
> sev_decommission_handle(kvm);
> @@ -5663,6 +5688,8 @@ static int sev_pre_start(struct kvm *kvm, int *asid)
> }
> *asid = ret;
> ret = 0;
> +
> + INIT_LIST_HEAD(&kvm->arch.sev_info.pinned_memory_slot);
> }
>
> return ret;
> @@ -6189,6 +6216,71 @@ static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
> return ret;
> }
>
> +static struct kvm_sev_pinned_memory_slot *sev_find_pinned_memory_slot(
> + struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> + struct kvm_sev_pinned_memory_slot *i;
> + struct list_head *head = &kvm->arch.sev_info.pinned_memory_slot;
> +
> + list_for_each_entry(i, head, list) {
> + if (i->userspace_addr == slot->userspace_addr &&
> + i->id == slot->id)
> + return i;
> + }
> +
> + return NULL;
> +}
> +
> +static void amd_prepare_memory_region(struct kvm *kvm,
> + struct kvm_memory_slot *memslot,
> + const struct kvm_userspace_memory_region *mem,
> + enum kvm_mr_change change)
> +{
> + struct kvm_sev_pinned_memory_slot *pinned_slot;
> + struct list_head *head = &kvm->arch.sev_info.pinned_memory_slot;
> +
> + mutex_lock(&kvm->lock);
> +
> + if (!sev_guest(kvm))
> + goto unlock;
> +
> + if (change == KVM_MR_CREATE) {
> +
> + if (!mem->memory_size)
> + goto unlock;
> +
> + pinned_slot = kmalloc(sizeof(*pinned_slot), GFP_KERNEL);
> + if (pinned_slot == NULL)
> + goto unlock;
> +
> + pinned_slot->pages = sev_pin_memory(mem->userspace_addr,
> + mem->memory_size, &pinned_slot->npages);
> + if (pinned_slot->pages == NULL) {
> + kfree(pinned_slot);
> + goto unlock;
> + }
> +
> + sev_clflush_pages(pinned_slot->pages, pinned_slot->npages);
> +
> + pinned_slot->id = memslot->id;
> + pinned_slot->userspace_addr = mem->userspace_addr;
> + list_add_tail(&pinned_slot->list, head);
> +
> + } else if (change == KVM_MR_DELETE) {
> +
> + pinned_slot = sev_find_pinned_memory_slot(kvm, memslot);
> + if (!pinned_slot)
> + goto unlock;
> +
> + sev_unpin_memory(pinned_slot->pages, pinned_slot->npages);
> + list_del(&pinned_slot->list);
> + kfree(pinned_slot);
> + }
> +
> +unlock:
> + mutex_unlock(&kvm->lock);
> +}
> +
> static int amd_memory_encryption_cmd(struct kvm *kvm, void __user *argp)
> {
> int r = -ENOTTY;
> @@ -6355,6 +6447,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .update_pi_irte = svm_update_pi_irte,
>
> .memory_encryption_op = amd_memory_encryption_cmd,
> + .prepare_memory_region = amd_prepare_memory_region,
> };
>
> static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6a737e9..e05069d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8195,6 +8195,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> const struct kvm_userspace_memory_region *mem,
> enum kvm_mr_change change)
> {
> + if (kvm_x86_ops->prepare_memory_region)
> + kvm_x86_ops->prepare_memory_region(kvm, memslot, mem, change);
> +
> return 0;
> }
>
>
^ permalink raw reply
* Re: [RFC PATCH v2 24/32] kvm: x86: prepare for SEV guest management API support
From: Paolo Bonzini @ 2017-03-16 10:33 UTC (permalink / raw)
To: Brijesh Singh, simon.guinot, linux-efi, kvm, rkrcmar, matt,
linux-pci, linus.walleij, gary.hook, linux-mm, paul.gortmaker,
hpa, cl, dan.j.williams, aarcange, sfr, andriy.shevchenko,
herbert, bhe, xemul, joro, x86, peterz, piotr.luc, mingo, msalter,
ross.zwisler, bp, dyoung, thomas.lendacky, jroedel, keescook,
arnd, toshi.kani, mathieu.desnoyers, luto
In-Reply-To: <148846784278.2349.17771314083820274411.stgit@brijesh-build-machine>
On 02/03/2017 16:17, Brijesh Singh wrote:
> ASID management:
> - Reserve asid range for SEV guest, SEV asid range is obtained through
> CPUID Fn8000_001f[ECX]. A non-SEV guest can use any asid outside the SEV
> asid range.
How is backwards compatibility handled?
> - SEV guest must have asid value within asid range obtained through CPUID.
> - SEV guest must have the same asid for all vcpu's. A TLB flush is required
> if different vcpu for the same ASID is to be run on the same host CPU.
[...]
> +
> + /* which host cpu was used for running this vcpu */
> + bool last_cpuid;
Should be unsigned int.
>
> + /* Assign the asid allocated for this SEV guest */
> + svm->vmcb->control.asid = asid;
> +
> + /* Flush guest TLB:
> + * - when different VMCB for the same ASID is to be run on the
> + * same host CPU
> + * or
> + * - this VMCB was executed on different host cpu in previous VMRUNs.
> + */
> + if (sd->sev_vmcbs[asid] != (void *)svm->vmcb ||
Why the cast?
> + svm->last_cpuid != cpu)
> + svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
If there is a match, you don't need to do anything else (neither reset
the asid, nor mark it as dirty, nor update the fields), so:
if (sd->sev_vmcbs[asid] == svm->vmcb &&
svm->last_cpuid == cpu)
return;
svm->last_cpuid = cpu;
sd->sev_vmcbs[asid] = svm->vmcb;
svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
svm->vmcb->control.asid = asid;
mark_dirty(svm->vmcb, VMCB_ASID);
(plus comments ;)).
Also, why not TLB_CONTROL_FLUSH_ASID if possible?
> + svm->last_cpuid = cpu;
> + sd->sev_vmcbs[asid] = (void *)svm->vmcb;
> +
> + mark_dirty(svm->vmcb, VMCB_ASID);
[...]
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index fef7d83..9df37a2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1284,6 +1284,104 @@ struct kvm_s390_ucas_mapping {
> /* Memory Encryption Commands */
> #define KVM_MEMORY_ENCRYPT_OP _IOWR(KVMIO, 0xb8, unsigned long)
>
> +/* Secure Encrypted Virtualization mode */
> +enum sev_cmd_id {
Please add documentation in Documentation/virtual/kvm/memory_encrypt.txt.
Paolo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox