All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haggai Eran <haggaie@mellanox.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>, <linux-mm@kvack.org>,
	<kvm@vger.kernel.org>,
	Alex Fishman <alex.fishman@ravellosystems.com>,
	Mike Rapoport <mike.rapoport@ravellosystems.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>
Subject: Re: set_pte_at_notify regression
Date: Sun, 12 Jan 2014 13:56:12 +0200	[thread overview]
Message-ID: <52D282DC.6050902@mellanox.com> (raw)
In-Reply-To: <20140110165705.GE1141@redhat.com>

Hi,

On 10/01/2014 18:57, Andrea Arcangeli wrote:
> Hi!
>
> On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote:
>> It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the 
>> semantic of set_pte_at_notify.
>> The change of calling first to mmu_notifier_invalidate_range_start, then 
>> to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end
>> not only increase the amount of locks kvm have to take and release by 
>> factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping
>> the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t 
>> have any spte to set and it acctuly get called for nothing, the result is
>> increasing of vmexits for kvm from both do_wp_page and replace_page, and 
>> broken semantic of set_pte_at_notify.
>
> Agreed.
>
> I would suggest to change set_pte_at_notify to return if change_pte
> was missing in some mmu notifier attached to this mm, so we can do
> something like:
>
>    ptep = page_check_address(page, mm, addr, &ptl, 0);
>    [..]
>    notify_missing = false;
>    if (... ) {
>       	entry = ptep_clear_flush(...);
>         [..]
> 	notify_missing = set_pte_at_notify(mm, addr, ptep, entry);
>    }
>    pte_unmap_unlock(ptep, ptl);
>    if (notify_missing)
>    	mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr);
>
> and drop the range calls. This will provide sleepability and at the
> same time it won't screw the ability of change_pte to update sptes (by
> leaving those established by the time change_pte runs).

I think it would be better for notifiers that do not support change_pte
to keep getting both range_start and range_end notifiers. Otherwise, the
invalidate_page notifier might end up marking the old page as dirty
after it was already replaced in the primary page table.

Perhaps we can have a flag in the mmu_notifier, similar to the
notify_missing returned value here, that determines in these cases
whether to call the invalidate_range_start/end pair, or just the
set_pte_at_notify.

Thanks,
Haggai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Haggai Eran <haggaie@mellanox.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>,
	linux-mm@kvack.org, kvm@vger.kernel.org,
	Alex Fishman <alex.fishman@ravellosystems.com>,
	Mike Rapoport <mike.rapoport@ravellosystems.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>
Subject: Re: set_pte_at_notify regression
Date: Sun, 12 Jan 2014 13:56:12 +0200	[thread overview]
Message-ID: <52D282DC.6050902@mellanox.com> (raw)
In-Reply-To: <20140110165705.GE1141@redhat.com>

Hi,

On 10/01/2014 18:57, Andrea Arcangeli wrote:
> Hi!
>
> On Fri, Jan 10, 2014 at 06:38:06PM +0200, Izik Eidus wrote:
>> It look like commit 6bdb913f0a70a4dfb7f066fb15e2d6f960701d00 break the 
>> semantic of set_pte_at_notify.
>> The change of calling first to mmu_notifier_invalidate_range_start, then 
>> to set_pte_at_notify, and then to mmu_notifier_invalidate_range_end
>> not only increase the amount of locks kvm have to take and release by 
>> factor of 3, but in addition mmu_notifier_invalidate_range_start is zapping
>> the pte entry from kvm, so when set_pte_at_notify get called, it doesn`t 
>> have any spte to set and it acctuly get called for nothing, the result is
>> increasing of vmexits for kvm from both do_wp_page and replace_page, and 
>> broken semantic of set_pte_at_notify.
>
> Agreed.
>
> I would suggest to change set_pte_at_notify to return if change_pte
> was missing in some mmu notifier attached to this mm, so we can do
> something like:
>
>    ptep = page_check_address(page, mm, addr, &ptl, 0);
>    [..]
>    notify_missing = false;
>    if (... ) {
>       	entry = ptep_clear_flush(...);
>         [..]
> 	notify_missing = set_pte_at_notify(mm, addr, ptep, entry);
>    }
>    pte_unmap_unlock(ptep, ptl);
>    if (notify_missing)
>    	mmu_notifier_invalidate_page_if_missing_change_pte(mm, addr);
>
> and drop the range calls. This will provide sleepability and at the
> same time it won't screw the ability of change_pte to update sptes (by
> leaving those established by the time change_pte runs).

I think it would be better for notifiers that do not support change_pte
to keep getting both range_start and range_end notifiers. Otherwise, the
invalidate_page notifier might end up marking the old page as dirty
after it was already replaced in the primary page table.

Perhaps we can have a flag in the mmu_notifier, similar to the
notify_missing returned value here, that determines in these cases
whether to call the invalidate_range_start/end pair, or just the
set_pte_at_notify.

Thanks,
Haggai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-01-12 11:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-10 16:38 set_pte_at_notify regression Izik Eidus
2014-01-10 16:57 ` Andrea Arcangeli
2014-01-12 11:56   ` Haggai Eran [this message]
2014-01-12 11:56     ` Haggai Eran
2014-01-12 17:50     ` Andrea Arcangeli
2014-01-16  7:52       ` Haggai Eran
2014-01-16  7:52         ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D282DC.6050902@mellanox.com \
    --to=haggaie@mellanox.com \
    --cc=aarcange@redhat.com \
    --cc=alex.fishman@ravellosystems.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.rapoport@ravellosystems.com \
    --cc=ogerlitz@mellanox.com \
    --cc=sagig@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.