From: "Jörn Engel" <joern@logfs.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: mgorman@suse.de, linux-mm@kvack.org
Subject: Re: [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec
Date: Mon, 5 Aug 2013 13:21:43 -0400 [thread overview]
Message-ID: <20130805172142.GB470@logfs.org> (raw)
In-Reply-To: <1375713125-18163-4-git-send-email-vbabka@suse.cz>
On Mon, 5 August 2013 16:32:02 +0200, Vlastimil Babka wrote:
>
> /*
> + * Munlock a batch of pages from the same zone
> + *
> + * The work is split to two main phases. First phase clears the Mlocked flag
> + * and attempts to isolate the pages, all under a single zone lru lock.
> + * The second phase finishes the munlock only for pages where isolation
> + * succeeded.
> + */
> +static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
> +{
> + int i;
> + int nr = pagevec_count(pvec);
> +
> + /* Phase 1: page isolation */
> + spin_lock_irq(&zone->lru_lock);
> + for (i = 0; i < nr; i++) {
> + struct page *page = pvec->pages[i];
> +
> + if (TestClearPageMlocked(page)) {
> + struct lruvec *lruvec;
> + int lru;
> +
> + /* we have disabled interrupts */
> + __mod_zone_page_state(zone, NR_MLOCK, -1);
> +
> + switch (__isolate_lru_page(page,
> + ISOLATE_UNEVICTABLE)) {
> + case 0:
> + lruvec = mem_cgroup_page_lruvec(page, zone);
> + lru = page_lru(page);
> + del_page_from_lru_list(page, lruvec, lru);
> + break;
> +
> + case -EINVAL:
> + __munlock_isolation_failed(page);
> + goto skip_munlock;
> +
> + default:
> + BUG();
> + }
On purely aesthetic grounds I don't like the switch too much. A bit
more serious is that you don't handle -EBUSY gracefully. I guess you
would have to mlock() the empty zero page to excercise this code path.
> + } else {
> +skip_munlock:
> + /*
> + * We won't be munlocking this page in the next phase
> + * but we still need to release the follow_page_mask()
> + * pin.
> + */
> + pvec->pages[i] = NULL;
> + put_page(page);
> + }
> + }
> + spin_unlock_irq(&zone->lru_lock);
> +
> + /* Phase 2: page munlock and putback */
> + for (i = 0; i < nr; i++) {
> + struct page *page = pvec->pages[i];
> +
> + if (unlikely(!page))
> + continue;
Whenever I see likely() or unlikely() I wonder whether it really makes
a difference or whether it is just cargo-cult programming. My best
guess is that about half of them are cargo-cult.
> + lock_page(page);
> + __munlock_isolated_page(page);
> + unlock_page(page);
> + put_page(page); /* pin from follow_page_mask() */
> + }
> + pagevec_reinit(pvec);
> +}
> +
> +/*
> * munlock_vma_pages_range() - munlock all pages in the vma range.'
> * @vma - vma containing range to be munlock()ed.
> * @start - start address in @vma of the range
> @@ -230,11 +315,16 @@ static int __mlock_posix_error_return(long retval)
> void munlock_vma_pages_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> + struct pagevec pvec;
> + struct zone *zone = NULL;
> +
> + pagevec_init(&pvec, 0);
> vma->vm_flags &= ~VM_LOCKED;
>
> while (start < end) {
> struct page *page;
> unsigned int page_mask, page_increm;
> + struct zone *pagezone;
>
> /*
> * Although FOLL_DUMP is intended for get_dump_page(),
> @@ -246,20 +336,47 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
> page = follow_page_mask(vma, start, FOLL_GET | FOLL_DUMP,
> &page_mask);
> if (page && !IS_ERR(page)) {
> - lock_page(page);
> - /*
> - * Any THP page found by follow_page_mask() may have
> - * gotten split before reaching munlock_vma_page(),
> - * so we need to recompute the page_mask here.
> - */
> - page_mask = munlock_vma_page(page);
> - unlock_page(page);
> - put_page(page);
> + pagezone = page_zone(page);
> + /* The whole pagevec must be in the same zone */
> + if (pagezone != zone) {
> + if (pagevec_count(&pvec))
> + __munlock_pagevec(&pvec, zone);
> + zone = pagezone;
> + }
> + if (PageTransHuge(page)) {
> + /*
> + * THP pages are not handled by pagevec due
> + * to their possible split (see below).
> + */
> + if (pagevec_count(&pvec))
> + __munlock_pagevec(&pvec, zone);
Should you re-initialize the pvec after this call?
> + lock_page(page);
> + /*
> + * Any THP page found by follow_page_mask() may
> + * have gotten split before reaching
> + * munlock_vma_page(), so we need to recompute
> + * the page_mask here.
> + */
> + page_mask = munlock_vma_page(page);
> + unlock_page(page);
> + put_page(page); /* follow_page_mask() */
> + } else {
> + /*
> + * Non-huge pages are handled in batches
> + * via pagevec. The pin from
> + * follow_page_mask() prevents them from
> + * collapsing by THP.
> + */
> + if (pagevec_add(&pvec, page) == 0)
> + __munlock_pagevec(&pvec, zone);
> + }
> }
> page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask);
> start += page_increm * PAGE_SIZE;
> cond_resched();
> }
> + if (pagevec_count(&pvec))
> + __munlock_pagevec(&pvec, zone);
> }
The rest looks good to my untrained eyes.
JA?rn
--
One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-08-05 18:54 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
2013-08-05 17:21 ` Jörn Engel [this message]
2013-08-06 13:27 ` Vlastimil Babka
2013-08-06 16:21 ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
2013-08-05 17:23 ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path Vlastimil Babka
2013-08-05 17:31 ` [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Jörn Engel
2013-08-06 16:39 ` Jörn Engel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130805172142.GB470@logfs.org \
--to=joern@logfs.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.