* Re: [PATCH v4 3/6] smp: add function to execute a function synchronously on a cpu
From: Peter Zijlstra @ 2016-04-05 8:11 UTC (permalink / raw)
To: Juergen Gross
Cc: x86, jeremy, jdelvare, hpa, akataria, linux-kernel,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, xen-devel, boris.ostrovsky, tglx, linux
In-Reply-To: <1459833007-11618-4-git-send-email-jgross@suse.com>
On Tue, Apr 05, 2016 at 07:10:04AM +0200, Juergen Gross wrote:
> +int smp_call_on_cpu(unsigned int cpu, bool pin, int (*func)(void *), void *par)
Why .pin and not .phys? .pin does not (to me) reflect the
hypervisor/physical-cpu thing.
Also, as per smp_call_function_single() would it not be more consistent
to make this the last argument?
> +{
> + struct smp_call_on_cpu_struct sscs = {
> + .work = __WORK_INITIALIZER(sscs.work, smp_call_on_cpu_callback),
> + .done = COMPLETION_INITIALIZER_ONSTACK(sscs.done),
> + .func = func,
> + .data = par,
> + .cpu = pin ? cpu : -1,
> + };
> +
> + if (cpu >= nr_cpu_ids)
You might want to also include cpu_online().
if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> + return -ENXIO;
Seeing how its fairly hard to schedule work on a cpu that's not actually
there.
> +
> + queue_work_on(cpu, system_wq, &sscs.work);
> + wait_for_completion(&sscs.done);
> +
> + return sscs.ret;
> +}
^ permalink raw reply
* Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page
From: Vlastimil Babka @ 2016-04-05 8:20 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: Rik van Riel, Sergey Senozhatsky, YiPing Xu, aquini@redhat.com,
rknize@motorola.com, linux-mm@kvack.org, Chan Gyun Jeong,
Hugh Dickins, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, bfields@fieldses.org,
Minchan Kim, Gioh Kim, Mel Gorman, Sangseok Lee, Joonsoo Kim,
Andrew Morton, jlayton@poochiereds.net, koct9i@gmail.com, Al Viro
In-Reply-To: <20160405015402.GA30962@hori1.linux.bs1.fc.nec.co.jp>
On 04/05/2016 03:54 AM, Naoya Horiguchi wrote:
> On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote:
>> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
>>> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
> ...
>>>>>
>>>>> Also (but not your fault) the put_page() preceding
>>>>> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
>>>>> pin we are releasing and which one we still have (hopefully? if I
>>>>> read description of da1b13ccfbebe right) otherwise it looks like
>>>>> doing something with a page that we just potentially freed.
>>>>
>>>> Yes, while I read the code, I had same question. I think the releasing
>>>> refcount is for get_any_page.
>>>
>>> As the other callers of page migration do, soft_offline_page expects the
>>> migration source page to be freed at this put_page() (no pin remains.)
>>> The refcount released here is from isolate_lru_page() in __soft_offline_page().
>>> (the pin by get_any_page is released by put_hwpoison_page just after it.)
>>>
>>> .. yes, doing something just after freeing page looks weird, but that's
>>> how PageHWPoison flag works. IOW, many other page flags are maintained
>>> only during one "allocate-free" life span, but PageHWPoison still does
>>> its job beyond it.
>>
>> But what prevents the page from being allocated again between put_page()
>> and test_set_page_hwpoison()? In that case we would be marking page
>> poisoned while still in use, which is the same as marking it while still
>> in use after a failed migration?
>
> Actually nothing prevents that race. But I think that the result of the race
> is that the error page can be reused for allocation, which results in killing
> processes at page fault time. Soft offline is kind of mild/precautious thing
> (for correctable errors that don't require immediate handling), so killing
> processes looks to me an overkill. And marking hwpoison means that we can no
> longer do retry from userspace.
So you agree that this race is a bug? It may turn a soft-offline attempt
into a killed process. In that case we should fix it the same as we are
fixing the failed migration case. Maybe it will be just enough to switch
the test_set_page_hwpoison() and put_page() calls?
> And another practical thing is the race with unpoison_memory() as described
> in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
> poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
> That's why I'd like to avoid setting PageHWPoison for in-use pages if possible.
>
>> (Also, which part prevents pages with PageHWPoison to be allocated
>> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
>> remove from buddy freelists).
>
> check_new_page() in mm/page_alloc.c should prevent reallocation of PageHWPoison.
> As you pointed out, memory error handler doens't remove it from buddy freelists.
Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
searching. In any case that results in a bad_page() warning, right? Is
it desirable for a soft-offlined page? If we didn't free poisoned pages
to buddy system, they wouldn't trigger this warning.
> BTW, it might be a bit off-topic, but recently I felt that check_new_page()
> might be improvable, because when check_new_page() returns 1, the whole buddy
> block (not only the bad page) seems to be leaked from buddy freelist.
> For example, if thp (order 9) is requested, and PageHWPoison (or any other
> types of bad pages) is found in an order 9 block, all 512 page are discarded.
> Unpoison can't bring it back to buddy.
> So, some code to split buddy block including bad page (and recovering code from
> unpoison) might be helpful, although that's another story ...
Hm sounds like another argument for not freeing the page to buddy lists
in the first place. Maybe a hook in free_pages_check()?
> Thanks,
> Naoya Horiguchi
>
^ permalink raw reply
* [PATCH] VSOCK: Detach QP check should filter out non matching QPs.
From: Jorgen Hansen @ 2016-04-05 8:59 UTC (permalink / raw)
To: netdev, linux-kernel, virtualization
Cc: pv-drivers, gregkh, davem, Jorgen Hansen
The check in vmci_transport_peer_detach_cb should only allow a
detach when the qp handle of the transport matches the one in
the detach message.
Testing: Before this change, a detach from a peer on a different
socket would cause an active stream socket to register a detach.
Reviewed-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
---
net/vmw_vsock/vmci_transport.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 0a369bb..662bdd2 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -842,7 +842,7 @@ static void vmci_transport_peer_detach_cb(u32 sub_id,
* qp_handle.
*/
if (vmci_handle_is_invalid(e_payload->handle) ||
- vmci_handle_is_equal(trans->qp_handle, e_payload->handle))
+ !vmci_handle_is_equal(trans->qp_handle, e_payload->handle))
return;
/* We don't ask for delayed CBs when we subscribe to this event (we
@@ -2154,7 +2154,7 @@ module_exit(vmci_transport_exit);
MODULE_AUTHOR("VMware, Inc.");
MODULE_DESCRIPTION("VMCI transport for Virtual Sockets");
-MODULE_VERSION("1.0.2.0-k");
+MODULE_VERSION("1.0.3.0-k");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("vmware_vsock");
MODULE_ALIAS_NETPROTO(PF_VSOCK);
--
1.7.0
^ permalink raw reply related
* Re: [Xen-devel] [PATCH v4 1/6] xen: sync xen header
From: David Vrabel @ 2016-04-05 9:34 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, xen-devel
Cc: jeremy, jdelvare, peterz, x86, pali.rohar, virtualization, chrisw,
mingo, tglx, david.vrabel, Douglas_Warzecha, hpa, akataria,
boris.ostrovsky, linux
In-Reply-To: <1459833007-11618-2-git-send-email-jgross@suse.com>
On 05/04/16 06:10, Juergen Gross wrote:
> Import the actual version of include/xen/interface/sched.h from Xen.
Acked-by: David Vrabel <david.vrabel@citrix.com>
David
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
From: David Vrabel @ 2016-04-05 9:45 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, xen-devel
Cc: jeremy, jdelvare, peterz, x86, pali.rohar, virtualization, chrisw,
mingo, tglx, david.vrabel, Douglas_Warzecha, hpa, akataria,
boris.ostrovsky, linux
In-Reply-To: <1459833007-11618-5-git-send-email-jgross@suse.com>
On 05/04/16 06:10, Juergen Gross wrote:
> Some hardware models (e.g. Dell Studio 1555 laptops) require calls to
> the firmware to be issued on cpu 0 only. As Dom0 might have to use
> these calls, add xen_pin_vcpu() to achieve this functionality.
>
> In case either the domain doesn't have the privilege to make the
> related hypercall or the hypervisor isn't supporting it, issue a
> warning once and disable further pinning attempts.
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1885,6 +1885,45 @@ static void xen_set_cpu_features(struct cpuinfo_x86 *c)
> }
> }
>
> +static void xen_pin_vcpu(int cpu)
> +{
> + static bool disable_pinning;
> + struct sched_pin_override pin_override;
> + int ret;
> +
> + if (disable_pinning)
> + return;
> +
> + pin_override.pcpu = cpu;
> + ret = HYPERVISOR_sched_op(SCHEDOP_pin_override, &pin_override);
/* Ignore errors when removing override. */
> + if (cpu < 0)
> + return;
> +
> + switch (ret) {
> + case -ENOSYS:
> + pr_warn("The kernel tried to call a function on physical cpu %d, but Xen isn't\n"
> + "supporting this. In case of problems you might consider vcpu pinning.\n",
> + cpu);
> + disable_pinning = true;
> + break;
> + case -EPERM:
> + WARN(1, "Trying to pin vcpu without having privilege to do so\n");
> + disable_pinning = true;
> + break;
> + case -EINVAL:
> + case -EBUSY:
> + pr_warn("The kernel tried to call a function on physical cpu %d, but this cpu\n"
> + "seems not to be available. Please check your Xen cpu configuration.\n",
> + cpu);
> + break;
> + case 0:
> + break;
> + default:
> + WARN(1, "rc %d while trying to pin vcpu\n", ret);
> + disable_pinning = true;
> + }
These messages are a bit wordy for my taste and since they don't say
what function failed or what driver is affected they're not as useful as
they could be. I'd probably turn these all into:
if (cpu >= 0 && ret < 0) {
pr_warn("Failed to pin VCPU %d to physical CPU %d: %d",
smp_processor_id(), cpu, ret);
disable_pinning = true;
}
And look at getting the user of this API to print a more useful error.
"i8k: unable to call SMM BIOS on physical CPU %d: %d"
Or whatever.
David
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
From: Juergen Gross @ 2016-04-05 10:01 UTC (permalink / raw)
To: David Vrabel, linux-kernel, xen-devel
Cc: jeremy, jdelvare, peterz, x86, pali.rohar, virtualization, chrisw,
mingo, tglx, Douglas_Warzecha, hpa, akataria, boris.ostrovsky,
linux
In-Reply-To: <57038927.1030205@citrix.com>
On 05/04/16 11:45, David Vrabel wrote:
> On 05/04/16 06:10, Juergen Gross wrote:
>> Some hardware models (e.g. Dell Studio 1555 laptops) require calls to
>> the firmware to be issued on cpu 0 only. As Dom0 might have to use
>> these calls, add xen_pin_vcpu() to achieve this functionality.
>>
>> In case either the domain doesn't have the privilege to make the
>> related hypercall or the hypervisor isn't supporting it, issue a
>> warning once and disable further pinning attempts.
> [...]
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1885,6 +1885,45 @@ static void xen_set_cpu_features(struct cpuinfo_x86 *c)
>> }
>> }
>>
>> +static void xen_pin_vcpu(int cpu)
>> +{
>> + static bool disable_pinning;
>> + struct sched_pin_override pin_override;
>> + int ret;
>> +
>> + if (disable_pinning)
>> + return;
>> +
>> + pin_override.pcpu = cpu;
>> + ret = HYPERVISOR_sched_op(SCHEDOP_pin_override, &pin_override);
>
> /* Ignore errors when removing override. */
Okay.
>> + if (cpu < 0)
>> + return;
>> +
>> + switch (ret) {
>> + case -ENOSYS:
>> + pr_warn("The kernel tried to call a function on physical cpu %d, but Xen isn't\n"
>> + "supporting this. In case of problems you might consider vcpu pinning.\n",
>> + cpu);
>> + disable_pinning = true;
>> + break;
>> + case -EPERM:
>> + WARN(1, "Trying to pin vcpu without having privilege to do so\n");
>> + disable_pinning = true;
>> + break;
>> + case -EINVAL:
>> + case -EBUSY:
>> + pr_warn("The kernel tried to call a function on physical cpu %d, but this cpu\n"
>> + "seems not to be available. Please check your Xen cpu configuration.\n",
>> + cpu);
>> + break;
>> + case 0:
>> + break;
>> + default:
>> + WARN(1, "rc %d while trying to pin vcpu\n", ret);
>> + disable_pinning = true;
>> + }
>
> These messages are a bit wordy for my taste and since they don't say
> what function failed or what driver is affected they're not as useful as
Did you notice I used WARN() for the cases where a usage error is to
be suspected? This will print a stack backtrace helping to identify the
driver.
I can work on the message text, of course.
> they could be. I'd probably turn these all into:
>
> if (cpu >= 0 && ret < 0) {
> pr_warn("Failed to pin VCPU %d to physical CPU %d: %d",
> smp_processor_id(), cpu, ret);
> disable_pinning = true;
> }
No, I don't think this is a good idea. In the EINVAL or EBUSY case a
simple Xen admin command might be enough to make the next call succeed.
I don't want to disable pinning in this case.
> And look at getting the user of this API to print a more useful error.
>
> "i8k: unable to call SMM BIOS on physical CPU %d: %d"
TBH: I think this should be done by another patch. This is something
the maintainers of the callers' code should decide.
Juergen
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
From: David Vrabel @ 2016-04-05 10:03 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, xen-devel
Cc: jeremy, jdelvare, peterz, x86, pali.rohar, virtualization, chrisw,
mingo, tglx, Douglas_Warzecha, hpa, akataria, boris.ostrovsky,
linux
In-Reply-To: <57038D0E.1040708@suse.com>
On 05/04/16 11:01, Juergen Gross wrote:
>
> No, I don't think this is a good idea. In the EINVAL or EBUSY case a
> simple Xen admin command might be enough to make the next call succeed.
> I don't want to disable pinning in this case.
Ok.
Acked-by: David Vrabel <david.vrabel@citrix.com>
David
^ permalink raw reply
* Re: [PATCH v4 3/6] smp: add function to execute a function synchronously on a cpu
From: Juergen Gross @ 2016-04-05 10:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, jeremy, jdelvare, hpa, akataria, linux-kernel,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, xen-devel, boris.ostrovsky, tglx, linux
In-Reply-To: <20160405081107.GH3448@twins.programming.kicks-ass.net>
On 05/04/16 10:11, Peter Zijlstra wrote:
> On Tue, Apr 05, 2016 at 07:10:04AM +0200, Juergen Gross wrote:
>> +int smp_call_on_cpu(unsigned int cpu, bool pin, int (*func)(void *), void *par)
>
> Why .pin and not .phys? .pin does not (to me) reflect the
> hypervisor/physical-cpu thing.
I don't mind either way. As you don't like .pin, lets name it .phys.
> Also, as per smp_call_function_single() would it not be more consistent
> to make this the last argument?
Okay, I'll change it.
>
>> +{
>> + struct smp_call_on_cpu_struct sscs = {
>> + .work = __WORK_INITIALIZER(sscs.work, smp_call_on_cpu_callback),
>> + .done = COMPLETION_INITIALIZER_ONSTACK(sscs.done),
>> + .func = func,
>> + .data = par,
>> + .cpu = pin ? cpu : -1,
>> + };
>> +
>> + if (cpu >= nr_cpu_ids)
>
> You might want to also include cpu_online().
>
> if (cpu >= nr_cpu_ids || !cpu_online(cpu))
Indeed, good idea.
>> + return -ENXIO;
>
> Seeing how its fairly hard to schedule work on a cpu that's not actually
> there.
Really? ;-)
Juergen
^ permalink raw reply
* Re: [PATCH v3 04/16] mm/balloon: use general movable page feature into balloon
From: Vlastimil Babka @ 2016-04-05 12:03 UTC (permalink / raw)
To: Minchan Kim, Andrew Morton
Cc: Rik van Riel, YiPing Xu, aquini, rknize, Sergey Senozhatsky,
Chan Gyun Jeong, Hugh Dickins, linux-kernel, virtualization,
bfields, linux-mm, Gioh Kim, Gioh Kim, Mel Gorman, Sangseok Lee,
jlayton, Joonsoo Kim, koct9i, Al Viro
In-Reply-To: <1459321935-3655-5-git-send-email-minchan@kernel.org>
On 03/30/2016 09:12 AM, Minchan Kim wrote:
> Now, VM has a feature to migrate non-lru movable pages so
> balloon doesn't need custom migration hooks in migrate.c
> and compact.c. Instead, this patch implements page->mapping
> ->{isolate|migrate|putback} functions.
>
> With that, we could remove hooks for ballooning in general
> migration functions and make balloon compaction simple.
>
> Cc: virtualization@lists.linux-foundation.org
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Konstantin Khlebnikov <koct9i@gmail.com>
> Signed-off-by: Gioh Kim <gurugio@hanmail.net>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
I'm not familiar with the inode and pseudofs stuff, so just some things
I noticed:
> -#define PAGE_MOVABLE_MAPCOUNT_VALUE (-255)
> +#define PAGE_MOVABLE_MAPCOUNT_VALUE (-256)
> +#define PAGE_BALLOON_MAPCOUNT_VALUE PAGE_MOVABLE_MAPCOUNT_VALUE
>
> static inline int PageMovable(struct page *page)
> {
> - return ((test_bit(PG_movable, &(page)->flags) &&
> - atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE)
> - || PageBalloon(page));
> + return (test_bit(PG_movable, &(page)->flags) &&
> + atomic_read(&page->_mapcount) == PAGE_MOVABLE_MAPCOUNT_VALUE);
> }
>
> /* Caller should hold a PG_lock */
> @@ -645,6 +626,35 @@ static inline void __ClearPageMovable(struct page *page)
>
> PAGEFLAG(Isolated, isolated, PF_ANY);
>
> +static inline int PageBalloon(struct page *page)
> +{
> + return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE
> + && PagePrivate2(page);
> +}
Hmm so you are now using PG_private_2 flag here, but it's not
documented. Also the only caller of PageBalloon() seems to be
stable_page_flags(). Which will now report all movable pages with
PG_private_2 as KPF_BALOON. Seems like an overkill and also not
reliable. Could it test e.g. page->mapping instead?
Or maybe if we manage to get rid of PAGE_MOVABLE_MAPCOUNT_VALUE, we can
keep PAGE_BALLOON_MAPCOUNT_VALUE to simply distinguish balloon pages for
stable_page_flags().
> @@ -1033,7 +1019,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> out:
> /* If migration is successful, move newpage to right list */
> if (rc == MIGRATEPAGE_SUCCESS) {
> - if (unlikely(__is_movable_balloon_page(newpage)))
> + if (unlikely(PageMovable(newpage)))
> put_page(newpage);
> else
> putback_lru_page(newpage);
Hmm shouldn't the condition have been changed to
if (unlikely(__is_movable_balloon_page(newpage)) || PageMovable(newpage)
by patch 02/16? And this patch should be just removing the
balloon-specific check? Otherwise it seems like between patches 02 and
04, other kinds of PageMovable pages were unnecessarily/wrongly routed
through putback_lru_page()?
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d82196244340..c7696a2e11c7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1254,7 +1254,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>
> list_for_each_entry_safe(page, next, page_list, lru) {
> if (page_is_file_cache(page) && !PageDirty(page) &&
> - !isolated_balloon_page(page)) {
> + !PageIsolated(page)) {
> ClearPageActive(page);
> list_move(&page->lru, &clean_pages);
> }
This looks like the same comment as above at first glance. But looking
closer, it's even weirder. isolated_balloon_page() was simply
PageBalloon() after d6d86c0a7f8dd... weird already. You replace it with
check for !PageIsolated() which looks like a more correct check, so ok.
Except the potential false positive with PG_owner_priv_1.
Thanks.
^ permalink raw reply
* Re: [PATCH] virtio: fix "warning: ‘queue’ may be used uninitialized"
From: Jeff Mahoney @ 2016-04-05 13:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
In-Reply-To: <20160405110113-mutt-send-email-mst@redhat.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 1283 bytes --]
On 4/5/16 4:04 AM, Michael S. Tsirkin wrote:
> On Mon, Apr 04, 2016 at 02:14:19PM -0400, Jeff Mahoney wrote:
>> This fixes the following warning:
>> drivers/virtio/virtio_ring.c:1032:5: warning: ‘queue’ may be used
>> uninitialized in this function
>>
>> The conditions that govern when queue is set aren't apparent to gcc.
>>
>> Setting queue = NULL clears the warning.
>>
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>
> Which gcc version produces this warning?
> I do not seem to see it with gcc 5.3.1.
gcc version 4.8.5 (SUSE Linux)
> Also - use uninitialized_var then?
If it were a fast path, sure, but otherwise the use of uninitialized_var
just makes similar issues harder to debug if the code changes.
-Jeff
>> ---
>>
>> drivers/virtio/virtio_ring.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -1006,7 +1006,7 @@ struct virtqueue *vring_create_virtqueue
>> const char *name)
>> {
>> struct virtqueue *vq;
>> - void *queue;
>> + void *queue = NULL;
>> dma_addr_t dma_addr;
>> size_t queue_size_in_bytes;
>> struct vring vring;
>>
>> --
>> Jeff Mahoney
>> SUSE Labs
>
--
Jeff Mahoney
SUSE Labs
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] virtio: fix "warning: ‘queue’ may be used uninitialized"
From: Michael S. Tsirkin @ 2016-04-05 14:00 UTC (permalink / raw)
To: Jeff Mahoney; +Cc: virtualization
In-Reply-To: <5703BEE0.5020206@suse.com>
On Tue, Apr 05, 2016 at 09:34:24AM -0400, Jeff Mahoney wrote:
> On 4/5/16 4:04 AM, Michael S. Tsirkin wrote:
> > On Mon, Apr 04, 2016 at 02:14:19PM -0400, Jeff Mahoney wrote:
> >> This fixes the following warning:
> >> drivers/virtio/virtio_ring.c:1032:5: warning: ‘queue’ may be used
> >> uninitialized in this function
> >>
> >> The conditions that govern when queue is set aren't apparent to gcc.
> >>
> >> Setting queue = NULL clears the warning.
> >>
> >> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> >
> > Which gcc version produces this warning?
> > I do not seem to see it with gcc 5.3.1.
>
> gcc version 4.8.5 (SUSE Linux)
Looks like a minor compiler bug then, fixed since.
I don't think we need to worry about that this.
> > Also - use uninitialized_var then?
>
> If it were a fast path, sure, but otherwise the use of uninitialized_var
> just makes similar issues harder to debug if the code changes.
>
> -Jeff
Same does = NULL though. It's not like things will work with queue ==
NULL, so it'll still crash if it's not inited properly.
> >> ---
> >>
> >> drivers/virtio/virtio_ring.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -1006,7 +1006,7 @@ struct virtqueue *vring_create_virtqueue
> >> const char *name)
> >> {
> >> struct virtqueue *vq;
> >> - void *queue;
> >> + void *queue = NULL;
> >> dma_addr_t dma_addr;
> >> size_t queue_size_in_bytes;
> >> struct vring vring;
> >>
> >> --
> >> Jeff Mahoney
> >> SUSE Labs
> >
>
>
> --
> Jeff Mahoney
> SUSE Labs
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v4 6/6] hwmon: use smp_call_on_cpu() for dell-smm i8k
From: Guenter Roeck @ 2016-04-05 14:54 UTC (permalink / raw)
To: Juergen Gross
Cc: x86, jeremy, jdelvare, peterz, hpa, akataria, linux-kernel,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, xen-devel, boris.ostrovsky, tglx
In-Reply-To: <1459833007-11618-7-git-send-email-jgross@suse.com>
On Tue, Apr 05, 2016 at 07:10:07AM +0200, Juergen Gross wrote:
> Use the smp_call_on_cpu() function to call system management
> mode on cpu 0.
> Make call secure by adding get_online_cpus() to avoid e.g. suspend
> resume cycles in between.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V4: add call to get_online_cpus()
Pali, any chance to test this ?
Thanks,
Guenter
> ---
> drivers/hwmon/dell-smm-hwmon.c | 35 ++++++++++++++++++++---------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
> index c43318d..643f3a1 100644
> --- a/drivers/hwmon/dell-smm-hwmon.c
> +++ b/drivers/hwmon/dell-smm-hwmon.c
> @@ -21,6 +21,7 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/cpu.h>
> #include <linux/delay.h>
> #include <linux/module.h>
> #include <linux/types.h>
> @@ -35,6 +36,7 @@
> #include <linux/uaccess.h>
> #include <linux/io.h>
> #include <linux/sched.h>
> +#include <linux/smp.h>
>
> #include <linux/i8k.h>
>
> @@ -130,23 +132,15 @@ static inline const char *i8k_get_dmi_data(int field)
> /*
> * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard.
> */
> -static int i8k_smm(struct smm_regs *regs)
> +static int i8k_smm_func(void *par)
> {
> int rc;
> + struct smm_regs *regs = par;
> int eax = regs->eax;
> - cpumask_var_t old_mask;
>
> /* SMM requires CPU 0 */
> - if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
> - return -ENOMEM;
> - cpumask_copy(old_mask, ¤t->cpus_allowed);
> - rc = set_cpus_allowed_ptr(current, cpumask_of(0));
> - if (rc)
> - goto out;
> - if (smp_processor_id() != 0) {
> - rc = -EBUSY;
> - goto out;
> - }
> + if (smp_processor_id() != 0)
> + return -EBUSY;
>
> #if defined(CONFIG_X86_64)
> asm volatile("pushq %%rax\n\t"
> @@ -204,13 +198,24 @@ static int i8k_smm(struct smm_regs *regs)
> if (rc != 0 || (regs->eax & 0xffff) == 0xffff || regs->eax == eax)
> rc = -EINVAL;
>
> -out:
> - set_cpus_allowed_ptr(current, old_mask);
> - free_cpumask_var(old_mask);
> return rc;
> }
>
> /*
> + * Call the System Management Mode BIOS.
> + */
> +static int i8k_smm(struct smm_regs *regs)
> +{
> + int ret;
> +
> + get_online_cpus();
> + ret = smp_call_on_cpu(0, true, i8k_smm_func, regs);
> + put_online_cpus();
> +
> + return ret;
> +}
> +
> +/*
> * Read the fan status.
> */
> static int i8k_get_fan_status(int fan)
> --
> 2.6.2
>
^ permalink raw reply
* Re: [PATCH v2] drm/virtio: send vblank event after crtc updates
From: Gustavo Padovan @ 2016-04-05 18:21 UTC (permalink / raw)
To: Gerd Hoffmann
Cc: David Airlie, Gustavo Padovan, open list,
open list:VIRTIO GPU DRIVER, open list:VIRTIO GPU DRIVER
In-Reply-To: <1458677872-6024-1-git-send-email-gustavo@padovan.org>
Hi,
Any comment on this?
Gustavo
2016-03-22 Gustavo Padovan <gustavo@padovan.org>:
> From: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>
> virtio_gpu was failing to send vblank events when using the atomic IOCTL
> with the DRM_MODE_PAGE_FLIP_EVENT flag set. This patch fixes each and
> enables atomic pageflips updates.
>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> ---
> drivers/gpu/drm/virtio/virtgpu_display.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
> index b70bb8b..4f372e0 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_display.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_display.c
> @@ -274,12 +274,24 @@ static int virtio_gpu_crtc_atomic_check(struct drm_crtc *crtc,
> return 0;
> }
>
> +static void virtio_gpu_crtc_atomic_flush(struct drm_crtc *crtc,
> + struct drm_crtc_state *old_state)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&crtc->dev->event_lock, flags);
> + if (crtc->state->event)
> + drm_crtc_send_vblank_event(crtc, crtc->state->event);
> + spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
> +}
> +
> static const struct drm_crtc_helper_funcs virtio_gpu_crtc_helper_funcs = {
> .enable = virtio_gpu_crtc_enable,
> .disable = virtio_gpu_crtc_disable,
> .mode_fixup = virtio_gpu_crtc_mode_fixup,
> .mode_set_nofb = virtio_gpu_crtc_mode_set_nofb,
> .atomic_check = virtio_gpu_crtc_atomic_check,
> + .atomic_flush = virtio_gpu_crtc_atomic_flush,
> };
>
> static void virtio_gpu_enc_mode_set(struct drm_encoder *encoder,
> --
> 2.5.0
>
Gustavo
^ permalink raw reply
* Re: [PATCH v4 6/6] hwmon: use smp_call_on_cpu() for dell-smm i8k
From: Pali Rohár @ 2016-04-05 19:31 UTC (permalink / raw)
To: Guenter Roeck
Cc: Juergen Gross, x86, jeremy, jdelvare, peterz, akataria,
linux-kernel, virtualization, chrisw, mingo, david.vrabel,
Douglas_Warzecha, hpa, xen-devel, boris.ostrovsky, tglx
In-Reply-To: <20160405145414.GB27359@roeck-us.net>
[-- Attachment #1.1: Type: Text/Plain, Size: 674 bytes --]
On Tuesday 05 April 2016 16:54:14 Guenter Roeck wrote:
> On Tue, Apr 05, 2016 at 07:10:07AM +0200, Juergen Gross wrote:
> > Use the smp_call_on_cpu() function to call system management
> > mode on cpu 0.
> > Make call secure by adding get_online_cpus() to avoid e.g. suspend
> > resume cycles in between.
> >
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > ---
> > V4: add call to get_online_cpus()
>
> Pali, any chance to test this ?
I can test it, but just on machine where (probably) smm calls can be
send from any cpu... Need some time for testing and I believe I can do
that at the end of the week.
--
Pali Rohár
pali.rohar@gmail.com
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page
From: Naoya Horiguchi @ 2016-04-06 0:54 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Rik van Riel, Sergey Senozhatsky, YiPing Xu, aquini@redhat.com,
rknize@motorola.com, linux-mm@kvack.org, Chan Gyun Jeong,
Hugh Dickins, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, bfields@fieldses.org,
Minchan Kim, Gioh Kim, Mel Gorman, Sangseok Lee, Joonsoo Kim,
Andrew Morton, jlayton@poochiereds.net, koct9i@gmail.com, Al Viro
In-Reply-To: <57037562.3040203@suse.cz>
On Tue, Apr 05, 2016 at 10:20:50AM +0200, Vlastimil Babka wrote:
> On 04/05/2016 03:54 AM, Naoya Horiguchi wrote:
> > On Mon, Apr 04, 2016 at 04:46:31PM +0200, Vlastimil Babka wrote:
> >> On 04/04/2016 06:45 AM, Naoya Horiguchi wrote:
> >>> On Mon, Apr 04, 2016 at 10:39:17AM +0900, Minchan Kim wrote:
> > ...
> >>>>>
> >>>>> Also (but not your fault) the put_page() preceding
> >>>>> test_set_page_hwpoison(page)) IMHO deserves a comment saying which
> >>>>> pin we are releasing and which one we still have (hopefully? if I
> >>>>> read description of da1b13ccfbebe right) otherwise it looks like
> >>>>> doing something with a page that we just potentially freed.
> >>>>
> >>>> Yes, while I read the code, I had same question. I think the releasing
> >>>> refcount is for get_any_page.
> >>>
> >>> As the other callers of page migration do, soft_offline_page expects the
> >>> migration source page to be freed at this put_page() (no pin remains.)
> >>> The refcount released here is from isolate_lru_page() in __soft_offline_page().
> >>> (the pin by get_any_page is released by put_hwpoison_page just after it.)
> >>>
> >>> .. yes, doing something just after freeing page looks weird, but that's
> >>> how PageHWPoison flag works. IOW, many other page flags are maintained
> >>> only during one "allocate-free" life span, but PageHWPoison still does
> >>> its job beyond it.
> >>
> >> But what prevents the page from being allocated again between put_page()
> >> and test_set_page_hwpoison()? In that case we would be marking page
> >> poisoned while still in use, which is the same as marking it while still
> >> in use after a failed migration?
> >
> > Actually nothing prevents that race. But I think that the result of the race
> > is that the error page can be reused for allocation, which results in killing
> > processes at page fault time. Soft offline is kind of mild/precautious thing
> > (for correctable errors that don't require immediate handling), so killing
> > processes looks to me an overkill. And marking hwpoison means that we can no
> > longer do retry from userspace.
>
> So you agree that this race is a bug? It may turn a soft-offline attempt
> into a killed process. In that case we should fix it the same as we are
> fixing the failed migration case.
I agree, it's a bug, although rare and non-critical.
> Maybe it will be just enough to switch
> the test_set_page_hwpoison() and put_page() calls?
Unfortunately that restores the other race with unpoison (described below.)
Sorry for my bad/unclear statements, these races seems exclusive and a compatible
solution is not found, so I prioritized fixing the latter one by comparing
severity (the latter causes kernel crash,) which led to the current code.
> > And another practical thing is the race with unpoison_memory() as described
> > in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
> > poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
> > That's why I'd like to avoid setting PageHWPoison for in-use pages if possible.
> >
> >> (Also, which part prevents pages with PageHWPoison to be allocated
> >> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
> >> remove from buddy freelists).
> >
> > check_new_page() in mm/page_alloc.c should prevent reallocation of PageHWPoison.
> > As you pointed out, memory error handler doens't remove it from buddy freelists.
>
> Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
> searching. In any case that results in a bad_page() warning, right? Is
> it desirable for a soft-offlined page?
That's right, and the bad_page warning might be too strong for soft offlining.
We can't tell which of memory_failure/soft_offline_page a PageHWPoison came
from, but users can find other lines in dmesg which should tell that.
And memory error events can hit buddy pages directly, in that case we still
need the check in check_new_page().
> If we didn't free poisoned pages
> to buddy system, they wouldn't trigger this warning.
Actually, we didn't free at commit add05cecef80 ("mm: soft-offline: don't free
target page in successful page migration"), but that's was reverted in
commit f4c18e6f7b5b ("mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*").
Now I start thinking the revert was a bad decision, so I'll dig this problem again.
> > BTW, it might be a bit off-topic, but recently I felt that check_new_page()
> > might be improvable, because when check_new_page() returns 1, the whole buddy
> > block (not only the bad page) seems to be leaked from buddy freelist.
> > For example, if thp (order 9) is requested, and PageHWPoison (or any other
> > types of bad pages) is found in an order 9 block, all 512 page are discarded.
> > Unpoison can't bring it back to buddy.
> > So, some code to split buddy block including bad page (and recovering code from
> > unpoison) might be helpful, although that's another story ...
>
> Hm sounds like another argument for not freeing the page to buddy lists
> in the first place. Maybe a hook in free_pages_check()?
Sounds a good idea. I'll try it, too.
Thanks,
Naoya Horiguchi
^ permalink raw reply
* Re: [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page
From: Vlastimil Babka @ 2016-04-06 7:57 UTC (permalink / raw)
To: Naoya Horiguchi
Cc: Rik van Riel, Sergey Senozhatsky, YiPing Xu, aquini@redhat.com,
rknize@motorola.com, linux-mm@kvack.org, Chan Gyun Jeong,
Hugh Dickins, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, bfields@fieldses.org,
Minchan Kim, Gioh Kim, Mel Gorman, Sangseok Lee, Joonsoo Kim,
Andrew Morton, jlayton@poochiereds.net, koct9i@gmail.com, Al Viro
In-Reply-To: <20160406005403.GA29576@hori1.linux.bs1.fc.nec.co.jp>
On 04/06/2016 02:54 AM, Naoya Horiguchi wrote:
> On Tue, Apr 05, 2016 at 10:20:50AM +0200, Vlastimil Babka wrote:
>>
>> So you agree that this race is a bug? It may turn a soft-offline attempt
>> into a killed process. In that case we should fix it the same as we are
>> fixing the failed migration case.
>
> I agree, it's a bug, although rare and non-critical.
>
>> Maybe it will be just enough to switch
>> the test_set_page_hwpoison() and put_page() calls?
>
> Unfortunately that restores the other race with unpoison (described below.)
> Sorry for my bad/unclear statements, these races seems exclusive and a compatible
> solution is not found, so I prioritized fixing the latter one by comparing
> severity (the latter causes kernel crash,) which led to the current code.
Ah, I see. However unpoison is a functionality just for stress-testing,
and not expected to be used in production, right? So it's somewhat
unfortunate trade-off with danger of soft-offlining killing an unrelated
process.
>>> And another practical thing is the race with unpoison_memory() as described
>>> in commit da1b13ccfbebe. unpoison_memory() properly works only for properly
>>> poisoned pages, so doing unpoison for in-use hwpoisoned pages is fragile.
>>> That's why I'd like to avoid setting PageHWPoison for in-use pages if possible.
>>>
>>>> (Also, which part prevents pages with PageHWPoison to be allocated
>>>> again, anyway? I can't find it and test_set_page_hwpoison() doesn't
>>>> remove from buddy freelists).
>>>
>>> check_new_page() in mm/page_alloc.c should prevent reallocation of PageHWPoison.
>>> As you pointed out, memory error handler doens't remove it from buddy freelists.
>>
>> Oh, I see. It's using __PG_HWPOISON wrapper, so I didn't notice it when
>> searching. In any case that results in a bad_page() warning, right? Is
>> it desirable for a soft-offlined page?
>
> That's right, and the bad_page warning might be too strong for soft offlining.
> We can't tell which of memory_failure/soft_offline_page a PageHWPoison came
> from, but users can find other lines in dmesg which should tell that.
> And memory error events can hit buddy pages directly, in that case we still
> need the check in check_new_page().
Ah, ok.
>> If we didn't free poisoned pages
>> to buddy system, they wouldn't trigger this warning.
>
> Actually, we didn't free at commit add05cecef80 ("mm: soft-offline: don't free
> target page in successful page migration"), but that's was reverted in
> commit f4c18e6f7b5b ("mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*").
> Now I start thinking the revert was a bad decision, so I'll dig this problem again.
Good.
>>> BTW, it might be a bit off-topic, but recently I felt that check_new_page()
>>> might be improvable, because when check_new_page() returns 1, the whole buddy
>>> block (not only the bad page) seems to be leaked from buddy freelist.
>>> For example, if thp (order 9) is requested, and PageHWPoison (or any other
>>> types of bad pages) is found in an order 9 block, all 512 page are discarded.
>>> Unpoison can't bring it back to buddy.
>>> So, some code to split buddy block including bad page (and recovering code from
>>> unpoison) might be helpful, although that's another story ...
>>
>> Hm sounds like another argument for not freeing the page to buddy lists
>> in the first place. Maybe a hook in free_pages_check()?
>
> Sounds a good idea. I'll try it, too.
So what I think could hopefully work is to replace the put_page() after
migration with a hwpoison-specific construct that does something like:
if (put_page_testzero(page))
if (test_set_page_hwpoison()) ...
__put_page()
With some more thought about what other parts of put_page() apply - how
to handle compound pages and zone-device pages.
That should hopefully be the safest course. When put_page_testzero()
succeeds, there should be no other (current of near-future) users of the
page, and we can still do whatever we need before releasing to
__put_page(). I.e. set the HWPoison flag, and maybe combine this with
modification to free_pages_check() to divert it from becoming a buddy page.
It should be even safer than the current "put_page();
test_set_page_hwpoison();" approach in that we are currently not
guaranteed that the put_page() is indeed releasing the last pin, but we
set HWPoison in any case. Although we have just migrated the page away,
there might be a pfn scanner holding its pin and checking the page.
Hopefully no such scanner has a path that would break on HWPoison flag,
but I don't know. By not setting the HWpoison when we don't succeed
put_page_testzero(), we are safer. It's true the page might stay
unpoisoned due to a temporary pin, but the process data was migrated
away which is the important part, and userspace can retry anyway?
^ permalink raw reply
* Re: [virtio-dev] virtio-vsock live migration
From: Stefan Hajnoczi @ 2016-04-06 12:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
Matt Benjamin, virtualization, Christoffer Dall
In-Reply-To: <20160316163344-mutt-send-email-mst@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 2243 bytes --]
On Wed, Mar 16, 2016 at 05:05:19PM +0200, Michael S. Tsirkin wrote:
> > > > NFS and netperf are the first two protocols I looked
> > > > at and both transmit address information across the connection...
> > >
> > >
> > > Does netperf really attempt to get local IP
> > > and then send that inline within the connection?
> >
> > Yes, netperf has separate control and data sockets. I think part of the
> > reason for this split is that the control connection can communicate the
> > address details for the data connection over a different protocol (TCP +
> > RDMA?), but I'm not sure.
> >
> > Stefan
>
> Thinking about it, netperf does not survive disconnects.
> So the current protocol would be useless for it.
> I am not sure about NFS but from (long past) experience it did not
> attempt to re-resolve the name to address, so changing an
> address would break it as well.
>
> So I think these applications would have to use a 64 bit CID.
>
> Why, then, do we care about one aspect of these applications
> (creating connections) and not another (not breaking them)?
I care about mapping the semantics of AF_VSOCK to virtio-vsock.
AF_VSOCK was implemented with the ability to plug in additional
transports (like virtio). This allows guest agents and other
applications to compile once and run on any transport.
If we change virtio-vsock to rely on unique addresses across migration
then we lose zero-configuration. AF_VSOCK applications use the
VMADDR_CID_HOST (2) constant to communicate with the host. After live
migration this well-known CID refers to the new host. Applications
would need to know a unique host CID in order to work correctly across
live migration.
Although I appreciate your drive to make the device as flexible as
possible, if we want to do this we are totally beyond AF_VSOCK semantics
and would be better served by a separate effort that avoids confusion
between class AF_VSOCK semantics and virtio socket semantics.
Can we please treat AF_VSOCK semantics as the requirements we're trying
to implement? It supports qemu-guest-agent and as I described in a
previous mail could also support transparent connection migration a la
CRIU sockets.
Stefan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] virtio-vsock live migration
From: Michael S. Tsirkin @ 2016-04-06 13:17 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
Matt Benjamin, virtualization, Christoffer Dall
In-Reply-To: <20160406125550.GB17538@stefanha-x1.localdomain>
On Wed, Apr 06, 2016 at 01:55:50PM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 16, 2016 at 05:05:19PM +0200, Michael S. Tsirkin wrote:
> > > > > NFS and netperf are the first two protocols I looked
> > > > > at and both transmit address information across the connection...
> > > >
> > > >
> > > > Does netperf really attempt to get local IP
> > > > and then send that inline within the connection?
> > >
> > > Yes, netperf has separate control and data sockets. I think part of the
> > > reason for this split is that the control connection can communicate the
> > > address details for the data connection over a different protocol (TCP +
> > > RDMA?), but I'm not sure.
> > >
> > > Stefan
> >
> > Thinking about it, netperf does not survive disconnects.
> > So the current protocol would be useless for it.
> > I am not sure about NFS but from (long past) experience it did not
> > attempt to re-resolve the name to address, so changing an
> > address would break it as well.
> >
> > So I think these applications would have to use a 64 bit CID.
> >
> > Why, then, do we care about one aspect of these applications
> > (creating connections) and not another (not breaking them)?
>
> I care about mapping the semantics of AF_VSOCK to virtio-vsock.
> AF_VSOCK was implemented with the ability to plug in additional
> transports (like virtio). This allows guest agents and other
> applications to compile once and run on any transport.
>
> If we change virtio-vsock to rely on unique addresses across migration
> then we lose zero-configuration.
Could be an option. management has very little trouble configuring
unique CIDs and it does care about migration. Desktop users don't
migrate and they want zero configuration.
> AF_VSOCK applications use the
> VMADDR_CID_HOST (2) constant to communicate with the host. After live
> migration this well-known CID refers to the new host. Applications
> would need to know a unique host CID in order to work correctly across
> live migration.
>
> Although I appreciate your drive to make the device as flexible as
> possible, if we want to do this we are totally beyond AF_VSOCK semantics
> and would be better served by a separate effort that avoids confusion
> between class AF_VSOCK semantics and virtio socket semantics.
Maybe, though I merely proposed reserving some space so we can extend
CIDs to 64 bit (or 128 bit like hyperv guys would like to?)
in the future without too much pain.
To me this doesn't look like worth starting a completely separate effort for.
> Can we please treat AF_VSOCK semantics as the requirements we're trying
> to implement? It supports qemu-guest-agent and as I described in a
> previous mail could also support transparent connection migration a la
> CRIU sockets.
>
> Stefan
I only wish the semantics were better documented somewhere :)
--
MST
^ permalink raw reply
* [PATCH v5 0/6] Support calling functions on dedicated physical cpu
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
Some hardware (e.g. Dell Studio laptops) require special functions to
be called on physical cpu 0 in order to avoid occasional hangs. When
running as dom0 under Xen this could be achieved only via special boot
parameters (vcpu pinning) limiting the hypervisor in it's scheduling
decisions.
This patch series is adding a generic function to be able to temporarily
pin a (virtual) cpu to a dedicated physical cpu for executing above
mentioned functions on that specific cpu. The drivers (dcdbas and i8k)
requiring this functionality are modified accordingly.
Changes in V5:
- patch 3: rename and reshuffle parameters of smp_call_on_cpu() as requested
by Peter Zijlstra
- patch 3: test target cpu to be online as requested by Peter Zijlstra
- patch 4: less wordy messages as requested by David Vrabel
Changes in V4:
- move patches 5 and 6 further up in the series
- patch 2 (was 5): WARN_ONCE in case platform doesn't support pinning
as requested by Peter Zijlstra
- patch 3 (was 2): change return value in case of illegal cpu as
requested by Peter Zijlstra
- patch 3 (was 2): make pinning of vcpu an option as suggested by
Peter Zijlstra
- patches 5 and 6 (were 3 and 4): add call to get_online_cpus()
Changes in V3:
- use get_cpu()/put_cpu() as suggested by David Vrabel
Changes in V2:
- instead of manipulating the allowed set of cpus use cpu specific
workqueue as requested by Peter Zijlstra
- add include/linux/hypervisor.h to hide architecture specific stuff
from generic kernel code
Juergen Gross (6):
xen: sync xen header
virt, sched: add generic vcpu pinning support
smp: add function to execute a function synchronously on a cpu
xen: add xen_pin_vcpu() to support calling functions on a dedicated
pcpu
dcdbas: make use of smp_call_on_cpu()
hwmon: use smp_call_on_cpu() for dell-smm i8k
MAINTAINERS | 1 +
arch/x86/include/asm/hypervisor.h | 4 ++
arch/x86/kernel/cpu/hypervisor.c | 11 +++++
arch/x86/xen/enlighten.c | 40 +++++++++++++++
drivers/firmware/dcdbas.c | 51 +++++++++----------
drivers/hwmon/dell-smm-hwmon.c | 35 +++++++------
include/linux/hypervisor.h | 17 +++++++
include/linux/smp.h | 3 ++
include/xen/interface/sched.h | 100 +++++++++++++++++++++++++++++++-------
kernel/smp.c | 51 +++++++++++++++++++
kernel/up.c | 18 +++++++
11 files changed, 273 insertions(+), 58 deletions(-)
create mode 100644 include/linux/hypervisor.h
--
2.6.6
^ permalink raw reply
* [PATCH v5 1/6] xen: sync xen header
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
Import the actual version of include/xen/interface/sched.h from Xen.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
include/xen/interface/sched.h | 100 ++++++++++++++++++++++++++++++++++--------
1 file changed, 82 insertions(+), 18 deletions(-)
diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
index f184909..a4c4d73 100644
--- a/include/xen/interface/sched.h
+++ b/include/xen/interface/sched.h
@@ -3,6 +3,24 @@
*
* Scheduler state interactions
*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
* Copyright (c) 2005, Keir Fraser <keir@xensource.com>
*/
@@ -12,18 +30,30 @@
#include <xen/interface/event_channel.h>
/*
+ * Guest Scheduler Operations
+ *
+ * The SCHEDOP interface provides mechanisms for a guest to interact
+ * with the scheduler, including yield, blocking and shutting itself
+ * down.
+ */
+
+/*
* The prototype for this hypercall is:
- * long sched_op_new(int cmd, void *arg)
+ * long HYPERVISOR_sched_op(enum sched_op cmd, void *arg, ...)
+ *
* @cmd == SCHEDOP_??? (scheduler operation).
* @arg == Operation-specific extra argument(s), as described below.
+ * ... == Additional Operation-specific extra arguments, described below.
*
- * **NOTE**:
- * Versions of Xen prior to 3.0.2 provide only the following legacy version
+ * Versions of Xen prior to 3.0.2 provided only the following legacy version
* of this hypercall, supporting only the commands yield, block and shutdown:
* long sched_op(int cmd, unsigned long arg)
* @cmd == SCHEDOP_??? (scheduler operation).
* @arg == 0 (SCHEDOP_yield and SCHEDOP_block)
* == SHUTDOWN_* code (SCHEDOP_shutdown)
+ *
+ * This legacy version is available to new guests as:
+ * long HYPERVISOR_sched_op_compat(enum sched_op cmd, unsigned long arg)
*/
/*
@@ -44,12 +74,17 @@
/*
* Halt execution of this domain (all VCPUs) and notify the system controller.
* @arg == pointer to sched_shutdown structure.
+ *
+ * If the sched_shutdown_t reason is SHUTDOWN_suspend then
+ * x86 PV guests must also set RDX (EDX for 32-bit guests) to the MFN
+ * of the guest's start info page. RDX/EDX is the third hypercall
+ * argument.
+ *
+ * In addition, which reason is SHUTDOWN_suspend this hypercall
+ * returns 1 if suspend was cancelled or the domain was merely
+ * checkpointed, and 0 if it is resuming in a new domain.
*/
#define SCHEDOP_shutdown 2
-struct sched_shutdown {
- unsigned int reason; /* SHUTDOWN_* */
-};
-DEFINE_GUEST_HANDLE_STRUCT(sched_shutdown);
/*
* Poll a set of event-channel ports. Return when one or more are pending. An
@@ -57,12 +92,6 @@ DEFINE_GUEST_HANDLE_STRUCT(sched_shutdown);
* @arg == pointer to sched_poll structure.
*/
#define SCHEDOP_poll 3
-struct sched_poll {
- GUEST_HANDLE(evtchn_port_t) ports;
- unsigned int nr_ports;
- uint64_t timeout;
-};
-DEFINE_GUEST_HANDLE_STRUCT(sched_poll);
/*
* Declare a shutdown for another domain. The main use of this function is
@@ -71,15 +100,11 @@ DEFINE_GUEST_HANDLE_STRUCT(sched_poll);
* @arg == pointer to sched_remote_shutdown structure.
*/
#define SCHEDOP_remote_shutdown 4
-struct sched_remote_shutdown {
- domid_t domain_id; /* Remote domain ID */
- unsigned int reason; /* SHUTDOWN_xxx reason */
-};
/*
* Latch a shutdown code, so that when the domain later shuts down it
* reports this code to the control tools.
- * @arg == as for SCHEDOP_shutdown.
+ * @arg == sched_shutdown, as for SCHEDOP_shutdown.
*/
#define SCHEDOP_shutdown_code 5
@@ -92,10 +117,47 @@ struct sched_remote_shutdown {
* With id != 0 and timeout != 0, poke watchdog timer and set new timeout.
*/
#define SCHEDOP_watchdog 6
+
+/*
+ * Override the current vcpu affinity by pinning it to one physical cpu or
+ * undo this override restoring the previous affinity.
+ * @arg == pointer to sched_pin_override structure.
+ *
+ * A negative pcpu value will undo a previous pin override and restore the
+ * previous cpu affinity.
+ * This call is allowed for the hardware domain only and requires the cpu
+ * to be part of the domain's cpupool.
+ */
+#define SCHEDOP_pin_override 7
+
+struct sched_shutdown {
+ unsigned int reason; /* SHUTDOWN_* => shutdown reason */
+};
+DEFINE_GUEST_HANDLE_STRUCT(sched_shutdown);
+
+struct sched_poll {
+ GUEST_HANDLE(evtchn_port_t) ports;
+ unsigned int nr_ports;
+ uint64_t timeout;
+};
+DEFINE_GUEST_HANDLE_STRUCT(sched_poll);
+
+struct sched_remote_shutdown {
+ domid_t domain_id; /* Remote domain ID */
+ unsigned int reason; /* SHUTDOWN_* => shutdown reason */
+};
+DEFINE_GUEST_HANDLE_STRUCT(sched_remote_shutdown);
+
struct sched_watchdog {
uint32_t id; /* watchdog ID */
uint32_t timeout; /* timeout */
};
+DEFINE_GUEST_HANDLE_STRUCT(sched_watchdog);
+
+struct sched_pin_override {
+ int32_t pcpu;
+};
+DEFINE_GUEST_HANDLE_STRUCT(sched_pin_override);
/*
* Reason codes for SCHEDOP_shutdown. These may be interpreted by control
@@ -107,6 +169,7 @@ struct sched_watchdog {
#define SHUTDOWN_suspend 2 /* Clean up, save suspend info, kill. */
#define SHUTDOWN_crash 3 /* Tell controller we've crashed. */
#define SHUTDOWN_watchdog 4 /* Restart because watchdog time expired. */
+
/*
* Domain asked to perform 'soft reset' for it. The expected behavior is to
* reset internal Xen state for the domain returning it to the point where it
@@ -115,5 +178,6 @@ struct sched_watchdog {
* interfaces again.
*/
#define SHUTDOWN_soft_reset 5
+#define SHUTDOWN_MAX 5 /* Maximum valid shutdown reason. */
#endif /* __XEN_PUBLIC_SCHED_H__ */
--
2.6.6
^ permalink raw reply related
* [PATCH v5 2/6] virt, sched: add generic vcpu pinning support
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
Add generic virtualization support for pinning the current vcpu to a
specified physical cpu. As this operation isn't performance critical
(a very limited set of operations like BIOS calls and SMIs is expected
to need this) just add a hypervisor specific indirection.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V4: move this patch some places up in the series
WARN_ONCE in case platform doesn't support pinning as requested by
Peter Zijlstra
V3: use getc_cpu()/put_cpu() as suggested by David Vrabel
V2: adapt to using workqueues
add include/linux/hypervisor.h to hide architecture specific stuff
from generic kernel code
In case paravirt maintainers don't want to be responsible for
include/linux/hypervisor.h I could take it.
---
MAINTAINERS | 1 +
arch/x86/include/asm/hypervisor.h | 4 ++++
arch/x86/kernel/cpu/hypervisor.c | 11 +++++++++++
include/linux/hypervisor.h | 17 +++++++++++++++++
kernel/smp.c | 1 +
kernel/up.c | 1 +
6 files changed, 35 insertions(+)
create mode 100644 include/linux/hypervisor.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 40eb1db..d3bde4f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8330,6 +8330,7 @@ S: Supported
F: Documentation/virtual/paravirt_ops.txt
F: arch/*/kernel/paravirt*
F: arch/*/include/asm/paravirt.h
+F: include/linux/hypervisor.h
PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
M: Tim Waugh <tim@cyberelk.net>
diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h
index 055ea99..67942b6 100644
--- a/arch/x86/include/asm/hypervisor.h
+++ b/arch/x86/include/asm/hypervisor.h
@@ -43,6 +43,9 @@ struct hypervisor_x86 {
/* X2APIC detection (run once per boot) */
bool (*x2apic_available)(void);
+
+ /* pin current vcpu to specified physical cpu (run rarely) */
+ void (*pin_vcpu)(int);
};
extern const struct hypervisor_x86 *x86_hyper;
@@ -56,6 +59,7 @@ extern const struct hypervisor_x86 x86_hyper_kvm;
extern void init_hypervisor(struct cpuinfo_x86 *c);
extern void init_hypervisor_platform(void);
extern bool hypervisor_x2apic_available(void);
+extern void hypervisor_pin_vcpu(int cpu);
#else
static inline void init_hypervisor(struct cpuinfo_x86 *c) { }
static inline void init_hypervisor_platform(void) { }
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 73d391a..ff108f8 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -85,3 +85,14 @@ bool __init hypervisor_x2apic_available(void)
x86_hyper->x2apic_available &&
x86_hyper->x2apic_available();
}
+
+void hypervisor_pin_vcpu(int cpu)
+{
+ if (!x86_hyper)
+ return;
+
+ if (x86_hyper->pin_vcpu)
+ x86_hyper->pin_vcpu(cpu);
+ else
+ WARN_ONCE(1, "vcpu pinning requested but not supported!\n");
+}
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
new file mode 100644
index 0000000..3fa5ef2
--- /dev/null
+++ b/include/linux/hypervisor.h
@@ -0,0 +1,17 @@
+#ifndef __LINUX_HYPEVISOR_H
+#define __LINUX_HYPEVISOR_H
+
+/*
+ * Generic Hypervisor support
+ * Juergen Gross <jgross@suse.com>
+ */
+
+#ifdef CONFIG_HYPERVISOR_GUEST
+#include <asm/hypervisor.h>
+#else
+static inline void hypervisor_pin_vcpu(int cpu)
+{
+}
+#endif
+
+#endif /* __LINUX_HYPEVISOR_H */
diff --git a/kernel/smp.c b/kernel/smp.c
index 7416544..9388064 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -14,6 +14,7 @@
#include <linux/smp.h>
#include <linux/cpu.h>
#include <linux/sched.h>
+#include <linux/hypervisor.h>
#include "smpboot.h"
diff --git a/kernel/up.c b/kernel/up.c
index 1760bf3..3ccee2b 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/smp.h>
+#include <linux/hypervisor.h>
int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
int wait)
--
2.6.6
^ permalink raw reply related
* [PATCH v5 3/6] smp: add function to execute a function synchronously on a cpu
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
On some hardware models (e.g. Dell Studio 1555 laptop) some hardware
related functions (e.g. SMIs) are to be executed on physical cpu 0
only. Instead of open coding such a functionality multiple times in
the kernel add a service function for this purpose. This will enable
the possibility to take special measures in virtualized environments
like Xen, too.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V5: rename and reshuffle parameters of smp_call_on_cpu() as requested by
Peter Zijlstra
test target cpu to be online as requested by Peter Zijlstra
V4: change return value in case of illegal cpu as requested by Peter Zijlstra
make pinning of vcpu an option as suggested by Peter Zijlstra
V2: instead of manipulating the allowed set of cpus use cpu specific
workqueue as requested by Peter Zijlstra
---
include/linux/smp.h | 3 +++
kernel/smp.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/up.c | 17 +++++++++++++++++
3 files changed, 70 insertions(+)
diff --git a/include/linux/smp.h b/include/linux/smp.h
index c441407..4f1c475 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -196,4 +196,7 @@ extern void arch_enable_nonboot_cpus_end(void);
void smp_setup_processor_id(void);
+int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par,
+ bool phys);
+
#endif /* __LINUX_SMP_H */
diff --git a/kernel/smp.c b/kernel/smp.c
index 9388064..f671948 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -740,3 +740,53 @@ void wake_up_all_idle_cpus(void)
preempt_enable();
}
EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
+
+/**
+ * smp_call_on_cpu - Call a function on a specific cpu
+ *
+ * Used to call a function on a specific cpu and wait for it to return.
+ * Optionally make sure the call is done on a specified physical cpu via vcpu
+ * pinning in order to support virtualized environments.
+ */
+struct smp_call_on_cpu_struct {
+ struct work_struct work;
+ struct completion done;
+ int (*func)(void *);
+ void *data;
+ int ret;
+ int cpu;
+};
+
+static void smp_call_on_cpu_callback(struct work_struct *work)
+{
+ struct smp_call_on_cpu_struct *sscs;
+
+ sscs = container_of(work, struct smp_call_on_cpu_struct, work);
+ if (sscs->cpu >= 0)
+ hypervisor_pin_vcpu(sscs->cpu);
+ sscs->ret = sscs->func(sscs->data);
+ if (sscs->cpu >= 0)
+ hypervisor_pin_vcpu(-1);
+
+ complete(&sscs->done);
+}
+
+int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
+{
+ struct smp_call_on_cpu_struct sscs = {
+ .work = __WORK_INITIALIZER(sscs.work, smp_call_on_cpu_callback),
+ .done = COMPLETION_INITIALIZER_ONSTACK(sscs.done),
+ .func = func,
+ .data = par,
+ .cpu = phys ? cpu : -1,
+ };
+
+ if (cpu >= nr_cpu_ids || !cpu_online(cpu))
+ return -ENXIO;
+
+ queue_work_on(cpu, system_wq, &sscs.work);
+ wait_for_completion(&sscs.done);
+
+ return sscs.ret;
+}
+EXPORT_SYMBOL_GPL(smp_call_on_cpu);
diff --git a/kernel/up.c b/kernel/up.c
index 3ccee2b..ee81ac9 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -83,3 +83,20 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
preempt_enable();
}
EXPORT_SYMBOL(on_each_cpu_cond);
+
+int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
+{
+ int ret;
+
+ if (cpu != 0)
+ return -ENXIO;
+
+ if (phys)
+ hypervisor_pin_vcpu(0);
+ ret = func(par);
+ if (phys)
+ hypervisor_pin_vcpu(-1);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(smp_call_on_cpu);
--
2.6.6
^ permalink raw reply related
* [PATCH v5 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
Some hardware models (e.g. Dell Studio 1555 laptops) require calls to
the firmware to be issued on cpu 0 only. As Dom0 might have to use
these calls, add xen_pin_vcpu() to achieve this functionality.
In case either the domain doesn't have the privilege to make the
related hypercall or the hypervisor isn't supporting it, issue a
warning once and disable further pinning attempts.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
V5: less wordy messages as requested by David Vrabel
---
arch/x86/xen/enlighten.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 880862c..bfa2f21 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1885,6 +1885,45 @@ static void xen_set_cpu_features(struct cpuinfo_x86 *c)
}
}
+static void xen_pin_vcpu(int cpu)
+{
+ static bool disable_pinning;
+ struct sched_pin_override pin_override;
+ int ret;
+
+ if (disable_pinning)
+ return;
+
+ pin_override.pcpu = cpu;
+ ret = HYPERVISOR_sched_op(SCHEDOP_pin_override, &pin_override);
+
+ /* Ignore errors when removing override. */
+ if (cpu < 0)
+ return;
+
+ switch (ret) {
+ case -ENOSYS:
+ pr_warn("Unable to pin on physical cpu %d. In case of problems consider vcpu pinning.\n",
+ cpu);
+ disable_pinning = true;
+ break;
+ case -EPERM:
+ WARN(1, "Trying to pin vcpu without having privilege to do so\n");
+ disable_pinning = true;
+ break;
+ case -EINVAL:
+ case -EBUSY:
+ pr_warn("Physical cpu %d not available for pinning. Check Xen cpu configuration.\n",
+ cpu);
+ break;
+ case 0:
+ break;
+ default:
+ WARN(1, "rc %d while trying to pin vcpu\n", ret);
+ disable_pinning = true;
+ }
+}
+
const struct hypervisor_x86 x86_hyper_xen = {
.name = "Xen",
.detect = xen_platform,
@@ -1893,6 +1932,7 @@ const struct hypervisor_x86 x86_hyper_xen = {
#endif
.x2apic_available = xen_x2apic_para_available,
.set_cpu_features = xen_set_cpu_features,
+ .pin_vcpu = xen_pin_vcpu,
};
EXPORT_SYMBOL(x86_hyper_xen);
--
2.6.6
^ permalink raw reply related
* [PATCH v5 5/6] dcdbas: make use of smp_call_on_cpu()
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
Use smp_call_on_cpu() to raise SMI on cpu 0.
Make call secure by adding get_online_cpus() to avoid e.g. suspend
resume cycles in between.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V4: add call to get_online_cpus()
---
drivers/firmware/dcdbas.c | 51 ++++++++++++++++++++++++-----------------------
1 file changed, 26 insertions(+), 25 deletions(-)
diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index 829eec8..2fe1a13 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -23,6 +23,7 @@
#include <linux/platform_device.h>
#include <linux/dma-mapping.h>
#include <linux/errno.h>
+#include <linux/cpu.h>
#include <linux/gfp.h>
#include <linux/init.h>
#include <linux/kernel.h>
@@ -238,33 +239,14 @@ static ssize_t host_control_on_shutdown_store(struct device *dev,
return count;
}
-/**
- * dcdbas_smi_request: generate SMI request
- *
- * Called with smi_data_lock.
- */
-int dcdbas_smi_request(struct smi_cmd *smi_cmd)
+static int raise_smi(void *par)
{
- cpumask_var_t old_mask;
- int ret = 0;
+ struct smi_cmd *smi_cmd = par;
- if (smi_cmd->magic != SMI_CMD_MAGIC) {
- dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
- __func__);
- return -EBADR;
- }
-
- /* SMI requires CPU 0 */
- if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
- return -ENOMEM;
-
- cpumask_copy(old_mask, ¤t->cpus_allowed);
- set_cpus_allowed_ptr(current, cpumask_of(0));
if (smp_processor_id() != 0) {
dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
__func__);
- ret = -EBUSY;
- goto out;
+ return -EBUSY;
}
/* generate SMI */
@@ -280,9 +262,28 @@ int dcdbas_smi_request(struct smi_cmd *smi_cmd)
: "memory"
);
-out:
- set_cpus_allowed_ptr(current, old_mask);
- free_cpumask_var(old_mask);
+ return 0;
+}
+/**
+ * dcdbas_smi_request: generate SMI request
+ *
+ * Called with smi_data_lock.
+ */
+int dcdbas_smi_request(struct smi_cmd *smi_cmd)
+{
+ int ret;
+
+ if (smi_cmd->magic != SMI_CMD_MAGIC) {
+ dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
+ __func__);
+ return -EBADR;
+ }
+
+ /* SMI requires CPU 0 */
+ get_online_cpus();
+ ret = smp_call_on_cpu(0, raise_smi, smi_cmd, true);
+ put_online_cpus();
+
return ret;
}
--
2.6.6
^ permalink raw reply related
* [PATCH v5 6/6] hwmon: use smp_call_on_cpu() for dell-smm i8k
From: Juergen Gross @ 2016-04-06 14:17 UTC (permalink / raw)
To: linux-kernel, xen-devel
Cc: Juergen Gross, jeremy, jdelvare, peterz, hpa, akataria, x86,
virtualization, chrisw, mingo, david.vrabel, Douglas_Warzecha,
pali.rohar, boris.ostrovsky, tglx, linux
In-Reply-To: <1459952266-3687-1-git-send-email-jgross@suse.com>
Use the smp_call_on_cpu() function to call system management
mode on cpu 0.
Make call secure by adding get_online_cpus() to avoid e.g. suspend
resume cycles in between.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V4: add call to get_online_cpus()
---
drivers/hwmon/dell-smm-hwmon.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
index c43318d..c91bf35 100644
--- a/drivers/hwmon/dell-smm-hwmon.c
+++ b/drivers/hwmon/dell-smm-hwmon.c
@@ -21,6 +21,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/cpu.h>
#include <linux/delay.h>
#include <linux/module.h>
#include <linux/types.h>
@@ -35,6 +36,7 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <linux/sched.h>
+#include <linux/smp.h>
#include <linux/i8k.h>
@@ -130,23 +132,15 @@ static inline const char *i8k_get_dmi_data(int field)
/*
* Call the System Management Mode BIOS. Code provided by Jonathan Buzzard.
*/
-static int i8k_smm(struct smm_regs *regs)
+static int i8k_smm_func(void *par)
{
int rc;
+ struct smm_regs *regs = par;
int eax = regs->eax;
- cpumask_var_t old_mask;
/* SMM requires CPU 0 */
- if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
- return -ENOMEM;
- cpumask_copy(old_mask, ¤t->cpus_allowed);
- rc = set_cpus_allowed_ptr(current, cpumask_of(0));
- if (rc)
- goto out;
- if (smp_processor_id() != 0) {
- rc = -EBUSY;
- goto out;
- }
+ if (smp_processor_id() != 0)
+ return -EBUSY;
#if defined(CONFIG_X86_64)
asm volatile("pushq %%rax\n\t"
@@ -204,13 +198,24 @@ static int i8k_smm(struct smm_regs *regs)
if (rc != 0 || (regs->eax & 0xffff) == 0xffff || regs->eax == eax)
rc = -EINVAL;
-out:
- set_cpus_allowed_ptr(current, old_mask);
- free_cpumask_var(old_mask);
return rc;
}
/*
+ * Call the System Management Mode BIOS.
+ */
+static int i8k_smm(struct smm_regs *regs)
+{
+ int ret;
+
+ get_online_cpus();
+ ret = smp_call_on_cpu(0, i8k_smm_func, regs, true);
+ put_online_cpus();
+
+ return ret;
+}
+
+/*
* Read the fan status.
*/
static int i8k_get_fan_status(int fan)
--
2.6.6
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox