All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Holt <holt@sgi.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>,
	Ingo Molnar <mingo@elte.hu>, Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	Jack Steiner <steiner@sgi.com>,
	linux-mm@kvack.org
Subject: Can get_user_pages( ,write=1, force=1, ) result in a read-only pte and _count=2?
Date: Wed, 18 Jun 2008 11:41:58 -0500	[thread overview]
Message-ID: <20080618164158.GC10062@sgi.com> (raw)

I am running into a problem where I think a call to get_user_pages(...,
write=1, force=1,...) is returning a readable pte and a page ref count
of 2.  I have not yet trapped the event, but I think I see one place
where this _may_ be happening.

In the sles10 kernel source:
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
		unsigned long start, int len, int write, int force,
		struct page **pages, struct vm_area_struct **vmas)
{
...
retry:
			cond_resched();
			while (!(page = follow_page(vma, start, foll_flags))) {
				int ret;
				ret = __handle_mm_fault(mm, vma, start,
						foll_flags & FOLL_WRITE);
...
				/*
				 * The VM_FAULT_WRITE bit tells us that do_wp_page has
				 * broken COW when necessary, even if maybe_mkwrite
				 * decided not to set pte_write. We can thus safely do
				 * subsequent page lookups as if they were reads.
				 */
				if (ret & VM_FAULT_WRITE)
					foll_flags &= ~FOLL_WRITE;

				cond_resched();
			}

The case I am seeing is under heavy memory pressure.

I think the first pass at follow_page has failed and we called
__handle_mm_fault().  At the time in __handle_mm_fault where the page table
is unlocked, there is a writable pte in the processes page table, and a
struct page with a reference count of 1.  ret will have VM_FAULT_WRITE
set so the get_user_pages code will clear FOLL_WRITE from foll_flags.

Between the time above and the second attempt at follow_page, the
page gets swapped out.  The second attempt at follow_page, now without
FOLL_WRITE (and FOLL_GET is set) will result in a read-only pte with a
reference count of 2.  Any subsequent write fault by the process will
result in a COW break and the process pointing at a different page than
the get_user_pages() returned page.

Is this sequence plausible or am I missing something key?

If this sequence is plausible, I need to know how to either work around
this problem or if it should really be fixed in the kernel.

Thanks,
Robin Holt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2008-06-18 16:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-18 16:41 Robin Holt [this message]
2008-06-18 17:29 ` Can get_user_pages( ,write=1, force=1, ) result in a read-only pte and _count=2? Nick Piggin
2008-06-18 19:01   ` Hugh Dickins
2008-06-18 20:33     ` Robin Holt
2008-06-18 21:46       ` Hugh Dickins
2008-06-19  3:31         ` Nick Piggin
2008-06-19  3:34           ` Nick Piggin
2008-06-19 11:39           ` Hugh Dickins
2008-06-19 12:07             ` Nick Piggin
2008-06-19 12:21               ` Nick Piggin
2008-06-19 17:48                 ` Christoph Lameter
2008-06-19 12:34               ` Hugh Dickins
2008-06-19 12:53                 ` Nick Piggin
2008-06-19 13:25                   ` Hugh Dickins
2008-06-19 13:35                     ` Robin Holt
2008-06-19 16:32         ` Robin Holt
2008-06-20  9:23           ` Nick Piggin
2008-06-19  3:07     ` Nick Piggin
2008-06-19 11:09       ` Hugh Dickins
2008-06-19 13:38         ` Robin Holt
2008-06-19 13:49           ` Hugh Dickins
2008-06-23 15:54             ` Robin Holt
2008-06-23 16:48               ` Hugh Dickins
2008-06-23 17:52                 ` Robin Holt
2008-06-23 20:58                   ` Hugh Dickins
2008-06-24 11:56                     ` Robin Holt
2008-06-24 15:19                     ` Robin Holt
2008-06-24 20:19                       ` Hugh Dickins
2008-06-23 19:11             ` Robin Holt
2008-06-23 19:12               ` Robin Holt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080618164158.GC10062@sgi.com \
    --to=holt@sgi.com \
    --cc=clameter@sgi.com \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.