From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54713)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a8WnQ-0004jQ-84
	for qemu-devel@nongnu.org; Mon, 14 Dec 2015 12:20:25 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a8WnM-0008Jq-Tr
	for qemu-devel@nongnu.org; Mon, 14 Dec 2015 12:20:24 -0500
Received: from mx1.redhat.com ([209.132.183.28]:53743)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a8WnM-0008Jc-Ma
	for qemu-devel@nongnu.org; Mon, 14 Dec 2015 12:20:20 -0500
Date: Mon, 14 Dec 2015 19:20:14 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20151214191303-mutt-send-email-mst@redhat.com>
References: <20151213212557.5410.48577.stgit@localhost.localdomain>
	<20151213212831.5410.84365.stgit@localhost.localdomain>
	<20151214113016-mutt-send-email-mst@redhat.com>
	<CAKgT0UdtY5stCPVA=-NoM+1C8Z3b+VXLL1OdrNth=dxvz0A9sQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAKgT0UdtY5stCPVA=-NoM+1C8Z3b+VXLL1OdrNth=dxvz0A9sQ@mail.gmail.com>
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] x86: Create dma_mark_dirty to
 dirty pages used for DMA by VM guest
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>, Yang Zhang <yang.zhang.wz@gmail.com>, Alex Williamson <alex.williamson@redhat.com>, kvm@vger.kernel.org, konrad.wilk@oracle.com, "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>, x86@kernel.org, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, qemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>, Alexander Duyck <aduyck@mirantis.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On Mon, Dec 14, 2015 at 08:34:00AM -0800, Alexander Duyck wrote:
> > This way distro can use a guest agent to disable
> > dirtying until before migration starts.
> 
> Right.  For a v2 version I would definitely want to have some way to
> limit the scope of this.  My main reason for putting this out here is
> to start altering the course of discussions since it seems like were
> weren't getting anywhere with the ixgbevf migration changes that were
> being proposed.

Absolutely, thanks for working on this.

> >> +     unsigned long pg_addr, start;
> >> +
> >> +     start = (unsigned long)addr;
> >> +     pg_addr = PAGE_ALIGN(start + size);
> >> +     start &= ~(sizeof(atomic_t) - 1);
> >> +
> >> +     /* trigger a write fault on each page, excluding first page */
> >> +     while ((pg_addr -= PAGE_SIZE) > start)
> >> +             atomic_add(0, (atomic_t *)pg_addr);
> >> +
> >> +     /* trigger a write fault on first word of DMA */
> >> +     atomic_add(0, (atomic_t *)start);
> >
> > start might not be aligned correctly for a cast to atomic_t.
> > It's harmless to do this for any memory, so I think you should
> > just do this for 1st byte of all pages including the first one.
> 
> You may not have noticed it but I actually aligned start in the line
> after pg_addr.

Yes you did. alignof would make it a bit more noticeable.

>  However instead of aligning to the start of the next
> atomic_t I just masked off the lower bits so that we start at the
> DWORD that contains the first byte of the starting address.  The
> assumption here is that I cannot trigger any sort of fault since if I
> have access to a given byte within a DWORD I will have access to the
> entire DWORD.

I'm curious where does this come from.  Isn't it true that access is
controlled at page granularity normally, so you can touch beginning of
page just as well?

>  I coded this up so that the spots where we touch the
> memory should match up with addresses provided by the hardware to
> perform the DMA over the PCI bus.

Yes but there's no requirement to do it like this from
virt POV. You just need to touch each page.

> Also I intentionally ran from highest address to lowest since that way
> we don't risk pushing the first cache line of the DMA buffer out of
> the L1 cache due to the PAGE_SIZE stride.
> 
> - Alex

Interesting. How does order of access help with this?

By the way, if you are into these micro-optimizations you might want to
limit prefetch, to this end you want to access the last line of the
page.  And it's probably worth benchmarking a bit and not doing it all just
based on theory, keep code simple in v1 otherwise.

-- 
MST