All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, david@redhat.com,
	dave.hansen@intel.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com
Subject: Re: [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages
Date: Fri, 31 Mar 2017 17:53:00 +0800	[thread overview]
Message-ID: <58DE26FC.7090403@intel.com> (raw)
In-Reply-To: <20170329204418-mutt-send-email-mst@kernel.org>

On 03/30/2017 01:48 AM, Michael S. Tsirkin wrote:
> On Fri, Mar 17, 2017 at 02:55:33PM +0800, Wei Wang wrote:
>> On 03/17/2017 05:28 AM, Andrew Morton wrote:
>>> On Thu, 16 Mar 2017 15:08:46 +0800 Wei Wang <wei.w.wang@intel.com> wrote:
>>>
>>>> From: Liang Li <liang.z.li@intel.com>
>>>>
>>>> This patch adds a function to provides a snapshot of the present system
>>>> unused pages. An important usage of this function is to provide the
>>>> unsused pages to the Live migration thread, which skips the transfer of
>>>> thoses unused pages. Newly used pages can be re-tracked by the dirty
>>>> page logging mechanisms.
>>> I don't think this will be useful for anything other than
>>> virtio-balloon.  I guess it would be better to keep this code in the
>>> virtio-balloon driver if possible, even though that's rather a layering
>>> violation :( What would have to be done to make that possible?  Perhaps
>>> we can put some *small* helpers into page_alloc.c to prevent things
>>> from becoming too ugly.
>> The patch description was too narrowed and may have caused some
>> confusion, sorry about that. This function is aimed to be generic. I
>> agree with the description suggested by Michael.
>>
>> Since the main body of the function is related to operating on the
>> free_list. I think it is better to have them located here.
>> Small helpers may be less efficient and thereby causing some
>> performance loss as well.
>> I think one improvement we can make is to remove the "chunk format"
>> related things from this function. The function can generally offer the
>> base pfn to the caller's recording buffer. Then it will be the caller's
>> responsibility to format the pfn if they need.
> Sounds good at a high level, but we'd have to see the implementation
> to judge it properly.
>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4498,6 +4498,120 @@ void show_free_areas(unsigned int filter)
>>>>    	show_swap_cache_info();
>>>>    }
>>>> +static int __record_unused_pages(struct zone *zone, int order,
>>>> +				 __le64 *buf, unsigned int size,
>>>> +				 unsigned int *offset, bool part_fill)
>>>> +{
>>>> +	unsigned long pfn, flags;
>>>> +	int t, ret = 0;
>>>> +	struct list_head *curr;
>>>> +	__le64 *chunk;
>>>> +
>>>> +	if (zone_is_empty(zone))
>>>> +		return 0;
>>>> +
>>>> +	spin_lock_irqsave(&zone->lock, flags);
>>>> +
>>>> +	if (*offset + zone->free_area[order].nr_free > size && !part_fill) {
>>>> +		ret = -ENOSPC;
>>>> +		goto out;
>>>> +	}
>>>> +	for (t = 0; t < MIGRATE_TYPES; t++) {
>>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>>> +			chunk = buf + *offset;
>>>> +			if (*offset + 2 > size) {
>>>> +				ret = -ENOSPC;
>>>> +				goto out;
>>>> +			}
>>>> +			/* Align to the chunk format used in virtio-balloon */
>>>> +			*chunk = cpu_to_le64(pfn << 12);
>>>> +			*(chunk + 1) = cpu_to_le64((1 << order) << 12);
>>>> +			*offset += 2;
>>>> +		}
>>>> +	}
>>>> +
>>>> +out:
>>>> +	spin_unlock_irqrestore(&zone->lock, flags);
>>>> +
>>>> +	return ret;
>>>> +}
>>> This looks like it could disable interrupts for a long time.  Too long?
>> What do you think if we give "budgets" to the above function?
>> For example, budget=1000, and there are 2000 nodes on the list.
>> record() returns with "incomplete" status in the first round, along with the
>> status info, "*continue_node".
>>
>> *continue_node: pointer to the starting node of the leftover. If
>> *continue_node
>> has been used at the time of the second call (i.e. continue_node->next ==
>> NULL),
>> which implies that the previous 1000 nodes have been used, then the record()
>> function can simply start from the head of the list.
>>
>> It is up to the caller whether it needs to continue the second round
>> when getting "incomplete".
> It might be cleaner to add APIs to
> 	- start iteration
> 	- do one step
> 	- end iteration
>
> caller can then iterate without too many issues
>

OK. I will re-implement it with this simple one - get only one node(page 
block) from the list in each call, and check if the time would increase 
a lot in comparison to v8.

Best,
Wei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Wei Wang <wei.w.wang@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, david@redhat.com,
	dave.hansen@intel.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com
Subject: Re: [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages
Date: Fri, 31 Mar 2017 17:53:00 +0800	[thread overview]
Message-ID: <58DE26FC.7090403@intel.com> (raw)
In-Reply-To: <20170329204418-mutt-send-email-mst@kernel.org>

On 03/30/2017 01:48 AM, Michael S. Tsirkin wrote:
> On Fri, Mar 17, 2017 at 02:55:33PM +0800, Wei Wang wrote:
>> On 03/17/2017 05:28 AM, Andrew Morton wrote:
>>> On Thu, 16 Mar 2017 15:08:46 +0800 Wei Wang <wei.w.wang@intel.com> wrote:
>>>
>>>> From: Liang Li <liang.z.li@intel.com>
>>>>
>>>> This patch adds a function to provides a snapshot of the present system
>>>> unused pages. An important usage of this function is to provide the
>>>> unsused pages to the Live migration thread, which skips the transfer of
>>>> thoses unused pages. Newly used pages can be re-tracked by the dirty
>>>> page logging mechanisms.
>>> I don't think this will be useful for anything other than
>>> virtio-balloon.  I guess it would be better to keep this code in the
>>> virtio-balloon driver if possible, even though that's rather a layering
>>> violation :( What would have to be done to make that possible?  Perhaps
>>> we can put some *small* helpers into page_alloc.c to prevent things
>>> from becoming too ugly.
>> The patch description was too narrowed and may have caused some
>> confusion, sorry about that. This function is aimed to be generic. I
>> agree with the description suggested by Michael.
>>
>> Since the main body of the function is related to operating on the
>> free_list. I think it is better to have them located here.
>> Small helpers may be less efficient and thereby causing some
>> performance loss as well.
>> I think one improvement we can make is to remove the "chunk format"
>> related things from this function. The function can generally offer the
>> base pfn to the caller's recording buffer. Then it will be the caller's
>> responsibility to format the pfn if they need.
> Sounds good at a high level, but we'd have to see the implementation
> to judge it properly.
>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4498,6 +4498,120 @@ void show_free_areas(unsigned int filter)
>>>>    	show_swap_cache_info();
>>>>    }
>>>> +static int __record_unused_pages(struct zone *zone, int order,
>>>> +				 __le64 *buf, unsigned int size,
>>>> +				 unsigned int *offset, bool part_fill)
>>>> +{
>>>> +	unsigned long pfn, flags;
>>>> +	int t, ret = 0;
>>>> +	struct list_head *curr;
>>>> +	__le64 *chunk;
>>>> +
>>>> +	if (zone_is_empty(zone))
>>>> +		return 0;
>>>> +
>>>> +	spin_lock_irqsave(&zone->lock, flags);
>>>> +
>>>> +	if (*offset + zone->free_area[order].nr_free > size && !part_fill) {
>>>> +		ret = -ENOSPC;
>>>> +		goto out;
>>>> +	}
>>>> +	for (t = 0; t < MIGRATE_TYPES; t++) {
>>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>>> +			chunk = buf + *offset;
>>>> +			if (*offset + 2 > size) {
>>>> +				ret = -ENOSPC;
>>>> +				goto out;
>>>> +			}
>>>> +			/* Align to the chunk format used in virtio-balloon */
>>>> +			*chunk = cpu_to_le64(pfn << 12);
>>>> +			*(chunk + 1) = cpu_to_le64((1 << order) << 12);
>>>> +			*offset += 2;
>>>> +		}
>>>> +	}
>>>> +
>>>> +out:
>>>> +	spin_unlock_irqrestore(&zone->lock, flags);
>>>> +
>>>> +	return ret;
>>>> +}
>>> This looks like it could disable interrupts for a long time.  Too long?
>> What do you think if we give "budgets" to the above function?
>> For example, budget=1000, and there are 2000 nodes on the list.
>> record() returns with "incomplete" status in the first round, along with the
>> status info, "*continue_node".
>>
>> *continue_node: pointer to the starting node of the leftover. If
>> *continue_node
>> has been used at the time of the second call (i.e. continue_node->next ==
>> NULL),
>> which implies that the previous 1000 nodes have been used, then the record()
>> function can simply start from the head of the list.
>>
>> It is up to the caller whether it needs to continue the second round
>> when getting "incomplete".
> It might be cleaner to add APIs to
> 	- start iteration
> 	- do one step
> 	- end iteration
>
> caller can then iterate without too many issues
>

OK. I will re-implement it with this simple one - get only one node(page 
block) from the list in each call, and check if the time would increase 
a lot in comparison to v8.

Best,
Wei

WARNING: multiple messages have this Message-ID (diff)
From: Wei Wang <wei.w.wang@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, david@redhat.com,
	dave.hansen@intel.com, cornelia.huck@de.ibm.com,
	mgorman@techsingularity.net, aarcange@redhat.com,
	amit.shah@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com
Subject: Re: [Qemu-devel] [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages
Date: Fri, 31 Mar 2017 17:53:00 +0800	[thread overview]
Message-ID: <58DE26FC.7090403@intel.com> (raw)
In-Reply-To: <20170329204418-mutt-send-email-mst@kernel.org>

On 03/30/2017 01:48 AM, Michael S. Tsirkin wrote:
> On Fri, Mar 17, 2017 at 02:55:33PM +0800, Wei Wang wrote:
>> On 03/17/2017 05:28 AM, Andrew Morton wrote:
>>> On Thu, 16 Mar 2017 15:08:46 +0800 Wei Wang <wei.w.wang@intel.com> wrote:
>>>
>>>> From: Liang Li <liang.z.li@intel.com>
>>>>
>>>> This patch adds a function to provides a snapshot of the present system
>>>> unused pages. An important usage of this function is to provide the
>>>> unsused pages to the Live migration thread, which skips the transfer of
>>>> thoses unused pages. Newly used pages can be re-tracked by the dirty
>>>> page logging mechanisms.
>>> I don't think this will be useful for anything other than
>>> virtio-balloon.  I guess it would be better to keep this code in the
>>> virtio-balloon driver if possible, even though that's rather a layering
>>> violation :( What would have to be done to make that possible?  Perhaps
>>> we can put some *small* helpers into page_alloc.c to prevent things
>>> from becoming too ugly.
>> The patch description was too narrowed and may have caused some
>> confusion, sorry about that. This function is aimed to be generic. I
>> agree with the description suggested by Michael.
>>
>> Since the main body of the function is related to operating on the
>> free_list. I think it is better to have them located here.
>> Small helpers may be less efficient and thereby causing some
>> performance loss as well.
>> I think one improvement we can make is to remove the "chunk format"
>> related things from this function. The function can generally offer the
>> base pfn to the caller's recording buffer. Then it will be the caller's
>> responsibility to format the pfn if they need.
> Sounds good at a high level, but we'd have to see the implementation
> to judge it properly.
>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -4498,6 +4498,120 @@ void show_free_areas(unsigned int filter)
>>>>    	show_swap_cache_info();
>>>>    }
>>>> +static int __record_unused_pages(struct zone *zone, int order,
>>>> +				 __le64 *buf, unsigned int size,
>>>> +				 unsigned int *offset, bool part_fill)
>>>> +{
>>>> +	unsigned long pfn, flags;
>>>> +	int t, ret = 0;
>>>> +	struct list_head *curr;
>>>> +	__le64 *chunk;
>>>> +
>>>> +	if (zone_is_empty(zone))
>>>> +		return 0;
>>>> +
>>>> +	spin_lock_irqsave(&zone->lock, flags);
>>>> +
>>>> +	if (*offset + zone->free_area[order].nr_free > size && !part_fill) {
>>>> +		ret = -ENOSPC;
>>>> +		goto out;
>>>> +	}
>>>> +	for (t = 0; t < MIGRATE_TYPES; t++) {
>>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>>> +			chunk = buf + *offset;
>>>> +			if (*offset + 2 > size) {
>>>> +				ret = -ENOSPC;
>>>> +				goto out;
>>>> +			}
>>>> +			/* Align to the chunk format used in virtio-balloon */
>>>> +			*chunk = cpu_to_le64(pfn << 12);
>>>> +			*(chunk + 1) = cpu_to_le64((1 << order) << 12);
>>>> +			*offset += 2;
>>>> +		}
>>>> +	}
>>>> +
>>>> +out:
>>>> +	spin_unlock_irqrestore(&zone->lock, flags);
>>>> +
>>>> +	return ret;
>>>> +}
>>> This looks like it could disable interrupts for a long time.  Too long?
>> What do you think if we give "budgets" to the above function?
>> For example, budget=1000, and there are 2000 nodes on the list.
>> record() returns with "incomplete" status in the first round, along with the
>> status info, "*continue_node".
>>
>> *continue_node: pointer to the starting node of the leftover. If
>> *continue_node
>> has been used at the time of the second call (i.e. continue_node->next ==
>> NULL),
>> which implies that the previous 1000 nodes have been used, then the record()
>> function can simply start from the head of the list.
>>
>> It is up to the caller whether it needs to continue the second round
>> when getting "incomplete".
> It might be cleaner to add APIs to
> 	- start iteration
> 	- do one step
> 	- end iteration
>
> caller can then iterate without too many issues
>

OK. I will re-implement it with this simple one - get only one node(page 
block) from the list in each call, and check if the time would increase 
a lot in comparison to v8.

Best,
Wei

  reply	other threads:[~2017-03-31  9:53 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16  7:08 [PATCH kernel v8 0/4] Extend virtio-balloon for fast (de)inflating & fast live migration Wei Wang
2017-03-16  7:08 ` [Qemu-devel] " Wei Wang
2017-03-16  7:08 ` Wei Wang
2017-03-16  7:08 ` [PATCH kernel v8 1/4] virtio-balloon: deflate via a page list Wei Wang
2017-03-16  7:08   ` [Qemu-devel] " Wei Wang
2017-03-16  7:08   ` Wei Wang
2017-03-16  7:08 ` Wei Wang
2017-03-16  7:08 ` [PATCH kernel v8 2/4] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER Wei Wang
2017-03-16  7:08   ` [Qemu-devel] " Wei Wang
2017-03-16  7:08   ` Wei Wang
2017-04-05  3:31   ` Wang, Wei W
2017-04-05  3:31     ` [Qemu-devel] " Wang, Wei W
2017-04-05  3:31     ` Wang, Wei W
2017-04-05  3:53     ` Michael S. Tsirkin
2017-04-05  3:53       ` [Qemu-devel] " Michael S. Tsirkin
2017-04-05  3:53       ` Michael S. Tsirkin
2017-04-05  3:53       ` Michael S. Tsirkin
2017-04-05  4:31       ` Wang, Wei W
2017-04-05  4:31         ` [Qemu-devel] " Wang, Wei W
2017-04-05  4:31         ` Wang, Wei W
2017-04-05  7:47         ` Wang, Wei W
2017-04-05  7:47           ` [Qemu-devel] " Wang, Wei W
2017-04-05  7:47           ` Wang, Wei W
2017-04-05  7:47           ` Wang, Wei W
2017-04-05  4:31       ` Wang, Wei W
2017-04-05  3:31   ` Wang, Wei W
2017-03-16  7:08 ` Wei Wang
2017-03-16  7:08 ` [PATCH kernel v8 3/4] mm: add inerface to offer info about unused pages Wei Wang
2017-03-16  7:08 ` Wei Wang
2017-03-16  7:08   ` [Qemu-devel] " Wei Wang
2017-03-16  7:08   ` Wei Wang
2017-03-16 21:28   ` Andrew Morton
2017-03-16 21:28     ` [Qemu-devel] " Andrew Morton
2017-03-16 21:28     ` Andrew Morton
2017-03-16 21:28     ` Andrew Morton
2017-03-17  6:55     ` Wei Wang
2017-03-17  6:55     ` Wei Wang
2017-03-17  6:55       ` [Qemu-devel] " Wei Wang
2017-03-17  6:55       ` Wei Wang
2017-03-22 10:52       ` Wang, Wei W
2017-03-22 10:52         ` [Qemu-devel] " Wang, Wei W
2017-03-22 10:52         ` Wang, Wei W
2017-03-22 10:52       ` Wang, Wei W
2017-03-29 17:48       ` Michael S. Tsirkin
2017-03-29 17:48         ` [Qemu-devel] " Michael S. Tsirkin
2017-03-29 17:48         ` Michael S. Tsirkin
2017-03-29 17:48         ` Michael S. Tsirkin
2017-03-31  9:53         ` Wei Wang [this message]
2017-03-31  9:53           ` [Qemu-devel] " Wei Wang
2017-03-31  9:53           ` Wei Wang
2017-03-31 16:25           ` Michael S. Tsirkin
2017-03-31 16:25             ` [Qemu-devel] " Michael S. Tsirkin
2017-03-31 16:25             ` Michael S. Tsirkin
2017-03-31 16:25             ` Michael S. Tsirkin
2017-03-31  9:53         ` Wei Wang
2017-04-13 11:07     ` Wei Wang
2017-04-13 11:07       ` [Qemu-devel] " Wei Wang
2017-04-13 11:07       ` Wei Wang
2017-04-13 11:07     ` Wei Wang
2017-03-17  1:21   ` Michael S. Tsirkin
2017-03-17  1:21     ` [Qemu-devel] " Michael S. Tsirkin
2017-03-17  1:21     ` Michael S. Tsirkin
2017-03-17  1:21   ` Michael S. Tsirkin
2017-03-16  7:08 ` [PATCH kernel v8 4/4] virtio-balloon: VIRTIO_BALLOON_F_HOST_REQ_VQ Wei Wang
2017-03-16  7:08   ` [Qemu-devel] " Wei Wang
2017-03-16  7:08   ` Wei Wang
2017-03-17  1:39   ` Michael S. Tsirkin
2017-03-17  1:39   ` Michael S. Tsirkin
2017-03-17  1:39     ` [Qemu-devel] " Michael S. Tsirkin
2017-03-17  1:39     ` Michael S. Tsirkin
2017-03-16  7:08 ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58DE26FC.7090403@intel.com \
    --to=wei.w.wang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.