* [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
2016-01-04 11:19 [PATCH v3 0/2] Allow gmap fault to retry Dominik Dingel
@ 2016-01-04 11:19 ` Dominik Dingel
2016-01-04 11:19 ` [PATCH 2/2] s390/mm: enable fixup_user_fault retrying Dominik Dingel
2016-01-04 17:08 ` [PATCH v3 0/2] Allow gmap fault to retry Andrea Arcangeli
2 siblings, 0 replies; 4+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
linux-kernel
With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to
pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we
ever unlocked mmap_sem.
This patch brings in the logic to handle retries as well as it cleans up
the current documentation. fixup_user_fault was not having the same
semantics as filemap_fault. It never indicated if a retry happened and so
a caller wasn't able to handle that case. So we now changed the behaviour
to always retry a locked mmap_sem.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
arch/s390/mm/pgtable.c | 8 +++++---
include/linux/mm.h | 5 +++--
kernel/futex.c | 2 +-
mm/gup.c | 30 +++++++++++++++++++++++++-----
4 files changed, 34 insertions(+), 11 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..b15759c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -585,7 +585,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
rc = vmaddr;
goto out_up;
}
- if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags)) {
+ if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
rc = -EFAULT;
goto out_up;
}
@@ -730,7 +730,8 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
break;
}
/* Get the page mapped */
- if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
+ if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
+ NULL)) {
rc = -EFAULT;
break;
}
@@ -805,7 +806,8 @@ retry:
if (!(pte_val(*ptep) & _PAGE_INVALID) &&
(pte_val(*ptep) & _PAGE_PROTECT)) {
pte_unmap_unlock(ptep, ptl);
- if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) {
+ if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
+ NULL)) {
up_read(&mm->mmap_sem);
return -EFAULT;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..7783073 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1163,7 +1163,8 @@ int invalidate_inode_page(struct page *page);
extern int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
- unsigned long address, unsigned int fault_flags);
+ unsigned long address, unsigned int fault_flags,
+ bool *unlocked);
#else
static inline int handle_mm_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
@@ -1175,7 +1176,7 @@ static inline int handle_mm_fault(struct mm_struct *mm,
}
static inline int fixup_user_fault(struct task_struct *tsk,
struct mm_struct *mm, unsigned long address,
- unsigned int fault_flags)
+ unsigned int fault_flags, bool *unlocked)
{
/* should never happen if there's no MMU */
BUG();
diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..fb640c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -639,7 +639,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
down_read(&mm->mmap_sem);
ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
- FAULT_FLAG_WRITE);
+ FAULT_FLAG_WRITE, NULL);
up_read(&mm->mmap_sem);
return ret < 0 ? ret : 0;
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..493d543 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,6 +564,8 @@ EXPORT_SYMBOL(__get_user_pages);
* @mm: mm_struct of target mm
* @address: user address
* @fault_flags:flags to pass down to handle_mm_fault()
+ * @unlocked: did we unlock the mmap_sem while retrying, maybe NULL if caller
+ * does not allow retry
*
* This is meant to be called in the specific scenario where for locking reasons
* we try to access user memory in atomic context (within a pagefault_disable()
@@ -575,22 +577,28 @@ EXPORT_SYMBOL(__get_user_pages);
* The main difference with get_user_pages() is that this function will
* unconditionally call handle_mm_fault() which will in turn perform all the
* necessary SW fixup of the dirty and young bits in the PTE, while
- * handle_mm_fault() only guarantees to update these in the struct page.
+ * get_user_pages() only guarantees to update these in the struct page.
*
* This is important for some architectures where those bits also gate the
* access permission to the page because they are maintained in software. On
* such architectures, gup() will not be enough to make a subsequent access
* succeed.
*
- * This has the same semantics wrt the @mm->mmap_sem as does filemap_fault().
+ * This function will not return with an unlocked mmap_sem. So it has not the
+ * same semantics wrt the @mm->mmap_sem as does filemap_fault().
*/
int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
- unsigned long address, unsigned int fault_flags)
+ unsigned long address, unsigned int fault_flags,
+ bool *unlocked)
{
struct vm_area_struct *vma;
vm_flags_t vm_flags;
- int ret;
+ int ret, major = 0;
+
+ if (unlocked)
+ fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+retry:
vma = find_extend_vma(mm, address);
if (!vma || address < vma->vm_start)
return -EFAULT;
@@ -600,6 +608,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
return -EFAULT;
ret = handle_mm_fault(mm, vma, address, fault_flags);
+ major |= ret & VM_FAULT_MAJOR;
if (ret & VM_FAULT_ERROR) {
if (ret & VM_FAULT_OOM)
return -ENOMEM;
@@ -609,8 +618,19 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
return -EFAULT;
BUG();
}
+
+ if (ret & VM_FAULT_RETRY) {
+ down_read(&mm->mmap_sem);
+ if (!(fault_flags & FAULT_FLAG_TRIED)) {
+ *unlocked = true;
+ fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
+ fault_flags |= FAULT_FLAG_TRIED;
+ goto retry;
+ }
+ }
+
if (tsk) {
- if (ret & VM_FAULT_MAJOR)
+ if (major)
tsk->maj_flt++;
else
tsk->min_flt++;
--
2.3.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH 2/2] s390/mm: enable fixup_user_fault retrying
2016-01-04 11:19 [PATCH v3 0/2] Allow gmap fault to retry Dominik Dingel
2016-01-04 11:19 ` [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock Dominik Dingel
@ 2016-01-04 11:19 ` Dominik Dingel
2016-01-04 17:08 ` [PATCH v3 0/2] Allow gmap fault to retry Andrea Arcangeli
2 siblings, 0 replies; 4+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
linux-kernel
By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd. As during these retries we might drop the mmap_sem we
need to check if that happened and redo the complete chain of actions.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
arch/s390/mm/pgtable.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index b15759c..3c5456d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -578,17 +578,29 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
{
unsigned long vmaddr;
int rc;
+ bool unlocked;
down_read(&gmap->mm->mmap_sem);
+
+retry:
+ unlocked = false;
vmaddr = __gmap_translate(gmap, gaddr);
if (IS_ERR_VALUE(vmaddr)) {
rc = vmaddr;
goto out_up;
}
- if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
+ if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
+ &unlocked)) {
rc = -EFAULT;
goto out_up;
}
+ /*
+ * In the case that fixup_user_fault unlocked the mmap_sem during
+ * faultin redo __gmap_translate to not race with a map/unmap_segment.
+ */
+ if (unlocked)
+ goto retry;
+
rc = __gmap_link(gmap, gaddr, vmaddr);
out_up:
up_read(&gmap->mm->mmap_sem);
@@ -717,12 +729,14 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
spinlock_t *ptl;
pte_t *ptep, entry;
pgste_t pgste;
+ bool unlocked;
int rc = 0;
if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK))
return -EINVAL;
down_read(&gmap->mm->mmap_sem);
while (len) {
+ unlocked = false;
/* Convert gmap address and connect the page tables */
addr = __gmap_translate(gmap, gaddr);
if (IS_ERR_VALUE(addr)) {
@@ -731,10 +745,13 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
}
/* Get the page mapped */
if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
- NULL)) {
+ &unlocked)) {
rc = -EFAULT;
break;
}
+ /* While trying to map mmap_sem got unlocked. Let us retry */
+ if (unlocked)
+ continue;
rc = __gmap_link(gmap, gaddr, addr);
if (rc)
break;
@@ -795,9 +812,11 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
spinlock_t *ptl;
pgste_t old, new;
pte_t *ptep;
+ bool unlocked;
down_read(&mm->mmap_sem);
retry:
+ unlocked = false;
ptep = get_locked_pte(mm, addr, &ptl);
if (unlikely(!ptep)) {
up_read(&mm->mmap_sem);
@@ -806,8 +825,12 @@ retry:
if (!(pte_val(*ptep) & _PAGE_INVALID) &&
(pte_val(*ptep) & _PAGE_PROTECT)) {
pte_unmap_unlock(ptep, ptl);
+ /*
+ * We do not really care about unlocked. We will retry either
+ * way. But this allows fixup_user_fault to enable userfaultfd.
+ */
if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
- NULL)) {
+ &unlocked)) {
up_read(&mm->mmap_sem);
return -EFAULT;
}
--
2.3.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v3 0/2] Allow gmap fault to retry
2016-01-04 11:19 [PATCH v3 0/2] Allow gmap fault to retry Dominik Dingel
2016-01-04 11:19 ` [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock Dominik Dingel
2016-01-04 11:19 ` [PATCH 2/2] s390/mm: enable fixup_user_fault retrying Dominik Dingel
@ 2016-01-04 17:08 ` Andrea Arcangeli
2 siblings, 0 replies; 4+ messages in thread
From: Andrea Arcangeli @ 2016-01-04 17:08 UTC (permalink / raw)
To: Dominik Dingel
Cc: Kirill A. Shutemov, Martin Schwidefsky, Christian Borntraeger,
Jason J. Herne, linux-s390, linux-mm, Andrew Morton,
David Rientjes, Eric B Munson, Naoya Horiguchi, Mel Gorman,
Heiko Carstens, Paolo Bonzini, linux-kernel
On Mon, Jan 04, 2016 at 12:19:53PM +0100, Dominik Dingel wrote:
> Hello,
>
> sorry for the delay since the last version.
>
> During Jasons work with postcopy migration support for s390 a problem regarding
> gmap faults was discovered.
>
> The gmap code will call fixup_user_fault which will end up always in
> handle_mm_fault. Till now we never cared about retries, but as the userfaultfd
> code kind of relies on it. this needs some fix.
>
> This patchset does not take care of the futex code. I will now look closer at
> this.
>
> Thanks,
> Dominik
>
> v2 -> v3:
> - In case of retrying check vma again
> - Do the accounting of major/minor faults once
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
>
> v1 -> v2:
> - Instread of passing the VM_FAULT_RETRY from fixup_user_fault we do retries
> within fixup_user_fault, like get_user_pages_locked do.
> - gmap code will now take retry if fixup_user_fault drops the lock.
>
> Dominik Dingel (2):
> mm: bring in additional flag for fixup_user_fault to signal unlock
> s390/mm: enable fixup_user_fault retrying
>
> arch/s390/mm/pgtable.c | 31 ++++++++++++++++++++++++++++---
> include/linux/mm.h | 5 +++--
> kernel/futex.c | 2 +-
> mm/gup.c | 30 +++++++++++++++++++++++++-----
> 4 files changed, 57 insertions(+), 11 deletions(-)
>
> --
> 2.3.0
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread