* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Andy Lutomirski @ 2018-10-03 14:20 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: Wanpeng Li, Florian Weimer, X86 ML, Juergen Gross, Arnd Bergmann,
Radim Krcmar, Peter Zijlstra, LKML, Linux Virtualization,
Stephen Boyd, John Stultz, Andy Lutomirski, devel, Paolo Bonzini,
Thomas Gleixner, Matt Rickard
In-Reply-To: <87murvdysd.fsf@vitty.brq.redhat.com>
> On Oct 3, 2018, at 5:01 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Andy Lutomirski <luto@amacapital.net> writes:
>
>>> On Oct 3, 2018, at 2:22 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>
>>> Andy Lutomirski <luto@kernel.org> writes:
>>>
>>>> Hi Vitaly, Paolo, Radim, etc.,
>>>>
>>> The notification you're talking about exists, it is called
>>> Reenligntenment, see 0092e4346f49 "x86/kvm: Support Hyper-V
>>> reenlightenment"). When TSC page changes (and this only happens when L1
>>> is migrated to a different host with a different TSC frequency and TSC
>>> scaling is not supported by the CPU) we receive an interrupt in L1 (at
>>> this moment all TSC accesses are emulated which guarantees the
>>> correctness of the readings), pause all L2 guests, update their kvmclock
>>> structures with new data (we already know the new TSC frequency) and
>>> then tell L0 that we're done and it can stop emulating TSC accesses.
>>
>> That’s delightful! Does the emulation magic also work for L1 user
>> mode?
>
> As far as I understand - yes, all rdtsc* calls will trap into L0.
>
>> If so, couldn’t we drop the HyperV vclock entirely and just
>> fold the adjustment into the core timekeeping data? (Preferably the
>> actual core data, which would require core changes, but it could
>> plausibly be done in arch code, too.)
>
> Not all Hyper-V hosts support reenlightenment notifications (and, if I'm
> not mistaken, you need to enable nesting for the VM to get the feature -
> and most VMs don't have this) so I think we'll have to keep Hyper-V
> vclock for the time being.
>
>
But this does suggest that the correct way to pass a clock through to an L2 guest where L0 is HV is to make L1 use the “tsc” clock and L2 use kvmclock (or something newer and better). This would require adding support for atomic frequency changes all the way through the timekeeping and arch code.
John, tglx, would that be okay or crazy?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Vitaly Kuznetsov @ 2018-10-03 12:01 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Wanpeng Li, Florian Weimer, X86 ML, Juergen Gross, Arnd Bergmann,
Radim Krcmar, Peter Zijlstra, LKML, Linux Virtualization,
Stephen Boyd, John Stultz, Andy Lutomirski, devel, Paolo Bonzini,
Thomas Gleixner, Matt Rickard
In-Reply-To: <4B6A97E1-17E6-40F2-A7A0-87731668A07C@amacapital.net>
Andy Lutomirski <luto@amacapital.net> writes:
>> On Oct 3, 2018, at 2:22 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>
>> Andy Lutomirski <luto@kernel.org> writes:
>>
>>> Hi Vitaly, Paolo, Radim, etc.,
>>>
>> The notification you're talking about exists, it is called
>> Reenligntenment, see 0092e4346f49 "x86/kvm: Support Hyper-V
>> reenlightenment"). When TSC page changes (and this only happens when L1
>> is migrated to a different host with a different TSC frequency and TSC
>> scaling is not supported by the CPU) we receive an interrupt in L1 (at
>> this moment all TSC accesses are emulated which guarantees the
>> correctness of the readings), pause all L2 guests, update their kvmclock
>> structures with new data (we already know the new TSC frequency) and
>> then tell L0 that we're done and it can stop emulating TSC accesses.
>
> That’s delightful! Does the emulation magic also work for L1 user
> mode?
As far as I understand - yes, all rdtsc* calls will trap into L0.
> If so, couldn’t we drop the HyperV vclock entirely and just
> fold the adjustment into the core timekeeping data? (Preferably the
> actual core data, which would require core changes, but it could
> plausibly be done in arch code, too.)
Not all Hyper-V hosts support reenlightenment notifications (and, if I'm
not mistaken, you need to enable nesting for the VM to get the feature -
and most VMs don't have this) so I think we'll have to keep Hyper-V
vclock for the time being.
--
Vitaly
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Andy Lutomirski @ 2018-10-03 10:20 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: Wanpeng Li, Florian Weimer, X86 ML, Juergen Gross, Arnd Bergmann,
Radim Krcmar, Peter Zijlstra, LKML, Linux Virtualization,
Stephen Boyd, John Stultz, Andy Lutomirski, devel, Paolo Bonzini,
Thomas Gleixner, Matt Rickard
In-Reply-To: <87sh1ne64t.fsf@vitty.brq.redhat.com>
> On Oct 3, 2018, at 2:22 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Andy Lutomirski <luto@kernel.org> writes:
>
>> Hi Vitaly, Paolo, Radim, etc.,
>>
>>> On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime()
>>> implementation, which extended the clockid switch case and added yet
>>> another slightly different copy of the same code.
>>>
>>> Especially the extended switch case is problematic as the compiler tends to
>>> generate a jump table which then requires to use retpolines. If jump tables
>>> are disabled it adds yet another conditional to the existing maze.
>>>
>>> This series takes a different approach by consolidating the almost
>>> identical functions into one implementation for high resolution clocks and
>>> one for the coarse grained clock ids by storing the base data for each
>>> clock id in an array which is indexed by the clock id.
>>>
>>
>> I was trying to understand more of the implications of this patch
>> series, and I was again reminded that there is an entire extra copy of
>> the vclock reading code in arch/x86/kvm/x86.c. And the purpose of
>> that code is very, very opaque.
>>
>> Can one of you explain what the code is even doing? From a couple of
>> attempts to read through it, it's a whole bunch of
>> probably-extremely-buggy code that, drumroll please, tries to
>> atomically read the TSC value and the time. And decide whether the
>> result is "based on the TSC". And then synthesizes a TSC-to-ns
>> multiplier and shift, based on *something other than the actual
>> multiply and shift used*.
>>
>> IOW, unless I'm totally misunderstanding it, the code digs into the
>> private arch clocksource data intended for the vDSO, uses a poorly
>> maintained copy of the vDSO code to read the time (instead of doing
>> the sane thing and using the kernel interfaces for this), and
>> propagates a totally made up copy to the guest. And gets it entirely
>> wrong when doing nested virt, since, unless there's some secret in
>> this maze, it doesn't acutlaly use the scaling factor from the host
>> when it tells the guest what to do.
>>
>> I am really, seriously tempted to send a patch to simply delete all
>> this code. The correct way to do it is to hook
>
> "I have discovered a truly marvelous proof of this, which this margin is
> too narrow to contain" :-)
>
> There is a very long history of different (hardware) issues Marcelo was
> fighting with and the current code is the survived Frankenstein. E.g. it
> is very, very unclear what "catchup", "always catchup" and
> masterclock-less mode in general are and if we still need them.
>
> That said I'm all for simplification. I'm not sure if we still need to
> care about buggy hardware though.
>
>>
>> And I don't see how it's even possible to pass kvmclock correctly to
>> the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but
>> L1 isn't notified when the data structure changes, so how the heck is
>> it supposed to update the kvmclock structure?
>
> Well, this kind of works in the the followin way:
> L1's clocksource is 'tsc_page' which is, basically, a compliment to TSC:
> two numbers provided by L0: offset and scale and KVM was tought to treat
> this clocksource as a good one (see b0c39dc68e3b "x86/kvm: Pass stable
> clocksource to guests when running nested on Hyper-V").
>
> The notification you're talking about exists, it is called
> Reenligntenment, see 0092e4346f49 "x86/kvm: Support Hyper-V
> reenlightenment"). When TSC page changes (and this only happens when L1
> is migrated to a different host with a different TSC frequency and TSC
> scaling is not supported by the CPU) we receive an interrupt in L1 (at
> this moment all TSC accesses are emulated which guarantees the
> correctness of the readings), pause all L2 guests, update their kvmclock
> structures with new data (we already know the new TSC frequency) and
> then tell L0 that we're done and it can stop emulating TSC accesses.
That’s delightful! Does the emulation magic also work for L1 user mode? If so, couldn’t we drop the HyperV vclock entirely and just fold the adjustment into the core timekeeping data? (Preferably the actual core data, which would require core changes, but it could plausibly be done in arch code, too.)
>
> (Nothing like this exists for KVM-on-KVM, by the way, when L1's
> clocksource is 'kvmclock' L2s won't get a stable kvmclock clocksource.)
>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Vitaly Kuznetsov @ 2018-10-03 9:22 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Marcelo Tosatti, Paolo Bonzini,
Radim Krcmar, Wanpeng Li
Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
devel, Matt Rickard
In-Reply-To: <CALCETrU+jP2hPLoJWTYJKTvr7-=YLwtk1=cZ_uizHYQZ1z4P-w@mail.gmail.com>
Andy Lutomirski <luto@kernel.org> writes:
> Hi Vitaly, Paolo, Radim, etc.,
>
> On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime()
>> implementation, which extended the clockid switch case and added yet
>> another slightly different copy of the same code.
>>
>> Especially the extended switch case is problematic as the compiler tends to
>> generate a jump table which then requires to use retpolines. If jump tables
>> are disabled it adds yet another conditional to the existing maze.
>>
>> This series takes a different approach by consolidating the almost
>> identical functions into one implementation for high resolution clocks and
>> one for the coarse grained clock ids by storing the base data for each
>> clock id in an array which is indexed by the clock id.
>>
>
> I was trying to understand more of the implications of this patch
> series, and I was again reminded that there is an entire extra copy of
> the vclock reading code in arch/x86/kvm/x86.c. And the purpose of
> that code is very, very opaque.
>
> Can one of you explain what the code is even doing? From a couple of
> attempts to read through it, it's a whole bunch of
> probably-extremely-buggy code that, drumroll please, tries to
> atomically read the TSC value and the time. And decide whether the
> result is "based on the TSC". And then synthesizes a TSC-to-ns
> multiplier and shift, based on *something other than the actual
> multiply and shift used*.
>
> IOW, unless I'm totally misunderstanding it, the code digs into the
> private arch clocksource data intended for the vDSO, uses a poorly
> maintained copy of the vDSO code to read the time (instead of doing
> the sane thing and using the kernel interfaces for this), and
> propagates a totally made up copy to the guest. And gets it entirely
> wrong when doing nested virt, since, unless there's some secret in
> this maze, it doesn't acutlaly use the scaling factor from the host
> when it tells the guest what to do.
>
> I am really, seriously tempted to send a patch to simply delete all
> this code. The correct way to do it is to hook
"I have discovered a truly marvelous proof of this, which this margin is
too narrow to contain" :-)
There is a very long history of different (hardware) issues Marcelo was
fighting with and the current code is the survived Frankenstein. E.g. it
is very, very unclear what "catchup", "always catchup" and
masterclock-less mode in general are and if we still need them.
That said I'm all for simplification. I'm not sure if we still need to
care about buggy hardware though.
>
> And I don't see how it's even possible to pass kvmclock correctly to
> the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but
> L1 isn't notified when the data structure changes, so how the heck is
> it supposed to update the kvmclock structure?
Well, this kind of works in the the followin way:
L1's clocksource is 'tsc_page' which is, basically, a compliment to TSC:
two numbers provided by L0: offset and scale and KVM was tought to treat
this clocksource as a good one (see b0c39dc68e3b "x86/kvm: Pass stable
clocksource to guests when running nested on Hyper-V").
The notification you're talking about exists, it is called
Reenligntenment, see 0092e4346f49 "x86/kvm: Support Hyper-V
reenlightenment"). When TSC page changes (and this only happens when L1
is migrated to a different host with a different TSC frequency and TSC
scaling is not supported by the CPU) we receive an interrupt in L1 (at
this moment all TSC accesses are emulated which guarantees the
correctness of the readings), pause all L2 guests, update their kvmclock
structures with new data (we already know the new TSC frequency) and
then tell L0 that we're done and it can stop emulating TSC accesses.
(Nothing like this exists for KVM-on-KVM, by the way, when L1's
clocksource is 'kvmclock' L2s won't get a stable kvmclock clocksource.)
--
Vitaly
^ permalink raw reply
* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Andy Lutomirski @ 2018-10-03 5:15 UTC (permalink / raw)
To: Thomas Gleixner, Marcelo Tosatti, Paolo Bonzini, Radim Krcmar,
Wanpeng Li
Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
Andrew Lutomirski, devel, Matt Rickard
In-Reply-To: <20180914125006.349747096@linutronix.de>
Hi Vitaly, Paolo, Radim, etc.,
On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime()
> implementation, which extended the clockid switch case and added yet
> another slightly different copy of the same code.
>
> Especially the extended switch case is problematic as the compiler tends to
> generate a jump table which then requires to use retpolines. If jump tables
> are disabled it adds yet another conditional to the existing maze.
>
> This series takes a different approach by consolidating the almost
> identical functions into one implementation for high resolution clocks and
> one for the coarse grained clock ids by storing the base data for each
> clock id in an array which is indexed by the clock id.
>
I was trying to understand more of the implications of this patch
series, and I was again reminded that there is an entire extra copy of
the vclock reading code in arch/x86/kvm/x86.c. And the purpose of
that code is very, very opaque.
Can one of you explain what the code is even doing? From a couple of
attempts to read through it, it's a whole bunch of
probably-extremely-buggy code that, drumroll please, tries to
atomically read the TSC value and the time. And decide whether the
result is "based on the TSC". And then synthesizes a TSC-to-ns
multiplier and shift, based on *something other than the actual
multiply and shift used*.
IOW, unless I'm totally misunderstanding it, the code digs into the
private arch clocksource data intended for the vDSO, uses a poorly
maintained copy of the vDSO code to read the time (instead of doing
the sane thing and using the kernel interfaces for this), and
propagates a totally made up copy to the guest. And gets it entirely
wrong when doing nested virt, since, unless there's some secret in
this maze, it doesn't acutlaly use the scaling factor from the host
when it tells the guest what to do.
I am really, seriously tempted to send a patch to simply delete all
this code. The correct way to do it is to hook
And I don't see how it's even possible to pass kvmclock correctly to
the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but
L1 isn't notified when the data structure changes, so how the heck is
it supposed to update the kvmclock structure?
--Andy
^ permalink raw reply
* Re: [PATCH] VMCI: Resource wildcard match fixed
From: Greg KH @ 2018-10-02 22:35 UTC (permalink / raw)
To: Jorgen Hansen; +Cc: pv-drivers, linux-kernel, virtualization
In-Reply-To: <20180921073105.5758-1-jhansen@vmware.com>
On Fri, Sep 21, 2018 at 12:31:05AM -0700, Jorgen Hansen wrote:
> When adding a VMCI resource, the check for an existing entry
> would ignore that the new entry could be a wildcard. This could
> result in multiple resource entries that would match a given
> handle. One disastrous outcome of this is that the
> refcounting used to ensure that delayed callbacks for VMCI
> datagrams have run before the datagram is destroyed can be
> wrong, since the refcount could be increased on the duplicate
> entry. This in turn leads to a use after free bug. This issue
> was discovered by Hangbin Liu using KASAN and syzkaller.
>
> Fixes: bc63dedb7d46 ("VMCI: resource object implementation")
> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
> Reviewed-by: Adit Ranadive <aditr@vmware.com>
> Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
> Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
> ---
> drivers/misc/vmw_vmci/vmci_driver.c | 2 +-
> drivers/misc/vmw_vmci/vmci_resource.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/misc/vmw_vmci/vmci_driver.c b/drivers/misc/vmw_vmci/vmci_driver.c
> index d7eaf1eb11e7..003bfba40758 100644
> --- a/drivers/misc/vmw_vmci/vmci_driver.c
> +++ b/drivers/misc/vmw_vmci/vmci_driver.c
> @@ -113,5 +113,5 @@ module_exit(vmci_drv_exit);
>
> MODULE_AUTHOR("VMware, Inc.");
> MODULE_DESCRIPTION("VMware Virtual Machine Communication Interface.");
> -MODULE_VERSION("1.1.5.0-k");
> +MODULE_VERSION("1.1.6.0-k");
> MODULE_LICENSE("GPL v2");
You do know MODULE_VERSION means nothing, right? Please just remove it.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH 4/4] drm/virtio: Use IDAs more efficiently
From: Matthew Wilcox @ 2018-10-02 12:55 UTC (permalink / raw)
To: Gerd Hoffmann; +Cc: David Airlie, linux-kernel, dri-devel, virtualization
In-Reply-To: <20181002114328.kltyrjlojzrhzwgl@sirius.home.kraxel.org>
On Tue, Oct 02, 2018 at 01:43:28PM +0200, Gerd Hoffmann wrote:
> On Wed, Sep 26, 2018 at 09:04:55AM -0700, Matthew Wilcox wrote:
> > On Wed, Sep 26, 2018 at 09:00:31AM -0700, Matthew Wilcox wrote:
> > > @@ -59,6 +59,7 @@ static int virtio_gpu_context_create(struct virtio_gpu_device *vgdev,
> > >
> > > if (handle < 0)
> > > return handle;
> > > + handle++;
> > > virtio_gpu_cmd_context_create(vgdev, handle, nlen, name);
> > > return handle;
> > > }
> >
> > Uh. This line is missing.
> >
> > - int handle = ida_alloc_min(&vgdev->ctx_id_ida, 1, GFP_KERNEL);
> > + int handle = ida_alloc(&vgdev->ctx_id_ida, GFP_KERNEL);
> >
> > It'll be there in v2 ;-)
>
> I've touched the resource/object id handling too, see my "drm/virtio:
> rework ttm resource handling" patch series
> (https://patchwork.freedesktop.org/series/50382/). Which still needs a
> review btw.
Um, according to patchwork, you only posted it yesterday. Does DRM
normally expect a review within 24 hours?
> I think that series obsoletes patch 3/4 (object id fixes) of your
> series. The other patches should rebase without too much trouble, you
> could do that as well when preparing v2 ...
It seems a little odd to me to expect a drive-by contributor (ie me) to
rebase their patches on top of a patch series which wasn't even posted
at the time they contributed their original patch. If it was already
in -next, that'd be a reasonable request.
^ permalink raw reply
* Re: [PATCH 4/4] drm/virtio: Use IDAs more efficiently
From: Gerd Hoffmann @ 2018-10-02 11:43 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: David Airlie, linux-kernel, dri-devel, virtualization
In-Reply-To: <20180926160455.GA20337@bombadil.infradead.org>
On Wed, Sep 26, 2018 at 09:04:55AM -0700, Matthew Wilcox wrote:
> On Wed, Sep 26, 2018 at 09:00:31AM -0700, Matthew Wilcox wrote:
> > @@ -59,6 +59,7 @@ static int virtio_gpu_context_create(struct virtio_gpu_device *vgdev,
> >
> > if (handle < 0)
> > return handle;
> > + handle++;
> > virtio_gpu_cmd_context_create(vgdev, handle, nlen, name);
> > return handle;
> > }
>
> Uh. This line is missing.
>
> - int handle = ida_alloc_min(&vgdev->ctx_id_ida, 1, GFP_KERNEL);
> + int handle = ida_alloc(&vgdev->ctx_id_ida, GFP_KERNEL);
>
> It'll be there in v2 ;-)
I've touched the resource/object id handling too, see my "drm/virtio:
rework ttm resource handling" patch series
(https://patchwork.freedesktop.org/series/50382/). Which still needs a
review btw.
I think that series obsoletes patch 3/4 (object id fixes) of your
series. The other patches should rebase without too much trouble, you
could do that as well when preparing v2 ...
cheers,
Gerd
^ permalink raw reply
* [PATCH v3 2/2] drm/bochs: add edid support.
From: Gerd Hoffmann @ 2018-10-02 11:10 UTC (permalink / raw)
To: dri-devel
Cc: David Airlie, open list,
open list:DRM DRIVER FOR BOCHS VIRTUAL GPU
In-Reply-To: <20181002111041.17053-1-kraxel@redhat.com>
Recent qemu (latest master branch, upcoming 3.1 release) got support
for EDID data. This patch adds guest driver support.
EDID support in qemu is not (yet) enabled by default, so please use
'qemu -device VGA,edid=on' for testing.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
drivers/gpu/drm/bochs/bochs.h | 1 +
drivers/gpu/drm/bochs/bochs_hw.c | 38 ++++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/bochs/bochs_kms.c | 18 +++++++++++++++---
3 files changed, 54 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/bochs/bochs.h b/drivers/gpu/drm/bochs/bochs.h
index 375bf92cd0..055d7b437a 100644
--- a/drivers/gpu/drm/bochs/bochs.h
+++ b/drivers/gpu/drm/bochs/bochs.h
@@ -70,6 +70,7 @@ struct bochs_device {
u16 yres_virtual;
u32 stride;
u32 bpp;
+ struct edid *edid;
/* drm */
struct drm_device *dev;
diff --git a/drivers/gpu/drm/bochs/bochs_hw.c b/drivers/gpu/drm/bochs/bochs_hw.c
index a39b0343c1..21a769e48c 100644
--- a/drivers/gpu/drm/bochs/bochs_hw.c
+++ b/drivers/gpu/drm/bochs/bochs_hw.c
@@ -47,6 +47,41 @@ static void bochs_dispi_write(struct bochs_device *bochs, u16 reg, u16 val)
}
}
+static int bochs_load_edid(struct bochs_device *bochs)
+{
+ uint8_t *blob;
+ size_t i, len;
+ uint8_t num_exts;
+
+ if (!bochs->mmio)
+ return -1;
+
+ if ((readb(bochs->mmio+0) != 0x00 ||
+ readb(bochs->mmio+1) != 0xff))
+ return -1;
+
+ num_exts = readb(bochs->mmio + 126);
+ len = EDID_LENGTH * (1 + num_exts);
+ if (len > 0x400 /* vga register offset */)
+ return -1;
+
+ kfree(bochs->edid);
+ bochs->edid = kmalloc(len, GFP_KERNEL);
+ blob = (void *)bochs->edid;
+ for (i = 0; i < len; i++) {
+ blob[i] = readb(bochs->mmio+i);
+ }
+
+ if (!drm_edid_is_valid(bochs->edid)) {
+ DRM_ERROR("EDID is not valid, ignoring.\n");
+ kfree(bochs->edid);
+ bochs->edid = NULL;
+ return -1;
+ }
+
+ return 0;
+}
+
int bochs_hw_init(struct drm_device *dev, uint32_t flags)
{
struct bochs_device *bochs = dev->dev_private;
@@ -133,6 +168,9 @@ int bochs_hw_init(struct drm_device *dev, uint32_t flags)
}
noext:
+ if (bochs_load_edid(bochs) == 0)
+ DRM_INFO("Found EDID data blob.\n");
+
return 0;
}
diff --git a/drivers/gpu/drm/bochs/bochs_kms.c b/drivers/gpu/drm/bochs/bochs_kms.c
index ca5a9afdd5..5b3afe9364 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -181,10 +181,17 @@ static void bochs_encoder_init(struct drm_device *dev)
static int bochs_connector_get_modes(struct drm_connector *connector)
{
- int count;
+ struct bochs_device *bochs =
+ container_of(connector, struct bochs_device, connector);
+ int count = 0;
- count = drm_add_modes_noedid(connector, 8192, 8192);
- drm_set_preferred_mode(connector, defx, defy);
+ if (bochs->edid)
+ count = drm_add_edid_modes(connector, bochs->edid);
+
+ if (!count) {
+ count = drm_add_modes_noedid(connector, 8192, 8192);
+ drm_set_preferred_mode(connector, defx, defy);
+ }
return count;
}
@@ -239,6 +246,11 @@ static void bochs_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector,
&bochs_connector_connector_helper_funcs);
drm_connector_register(connector);
+
+ if (bochs->edid) {
+ drm_connector_attach_edid_property(connector);
+ drm_connector_update_edid_property(connector, bochs->edid);
+ }
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH] qxl: fix null-pointer crash during suspend
From: Laurent Pinchart @ 2018-10-02 10:05 UTC (permalink / raw)
To: Daniel Vetter
Cc: Laurent Pinchart, linux-kernel, dri-devel, virtualization,
Alan Jenkins, Peter Wu, Dave Airlie
In-Reply-To: <20181002081422.GH11082@phenom.ffwll.local>
Hello,
On Tuesday, 2 October 2018 11:14:22 EEST Daniel Vetter wrote:
> On Tue, Sep 04, 2018 at 10:27:47PM +0200, Peter Wu wrote:
> > "crtc->helper_private" is not initialized by the QXL driver and thus the
>
> This is still initialized, it's the ->disable that goes boom. At least the
> call to drm_crtc_helper_add is still there. The ->disable was removed in:
>
> commit 64581714b58bc3e16ede8dc37a025c3aa0e0eef1
> Author: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
> Date: Fri Jun 30 12:36:45 2017 +0300
>
> drm: Convert atomic drivers from CRTC .disable() to .atomic_disable()
>
> Fixes: 64581714b58b ("drm: Convert atomic drivers from CRTC .disable() to
> .atomic_disable()") Cc: <stable@vger.kernel.org> # v4.14+
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> I'll let Gerd pick this one up, after some testing. Also adding Laurent.
Sorry for breaking it :-( Please let me know if there's something I can do to
help.
> > "crtc_funcs->disable" call would crash (resulting in suspend failure).
> > Fix this by converting the suspend/resume functions to use the
> > drm_mode_config_helper_* helpers.
> >
> > Tested system sleep with QEMU 3.0 using "echo mem > /sys/power/state".
> >
> > During suspend the following message is visible from QEMU:
> > spice/server/display-channel.c:2425:display_channel_validate_surface:
> > canvas address is 0x7fd05da68308 for 0 (and is NULL)
> > spice/server/display-channel.c:2426:display_channel_validate_surface:
> > failed on 0>
> > This seems to be triggered by QXL_IO_NOTIFY_CMD after
> > QXL_IO_DESTROY_PRIMARY_ASYNC, but aside from the warning things still
> > seem to work (tested with both the GTK and -spice options).
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> > Hi,
> >
> > I found this issue while trying to suspend a VM that uses QXL. In order to
> > see the stack trace over serial, boot with no_console_suspend. Searching
> > for "qxl_drm_freeze" showed one recent report from Alan:
> > https://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com
> >
> > Kind regards,
> > Peter
> > ---
> >
> > drivers/gpu/drm/qxl/qxl_drv.c | 26 +++++---------------------
> > 1 file changed, 5 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
> > index 2445e75cf7ea..d00f45eed03c 100644
> > --- a/drivers/gpu/drm/qxl/qxl_drv.c
> > +++ b/drivers/gpu/drm/qxl/qxl_drv.c
> > @@ -136,20 +136,11 @@ static int qxl_drm_freeze(struct drm_device *dev)
> >
> > {
> >
> > struct pci_dev *pdev = dev->pdev;
> > struct qxl_device *qdev = dev->dev_private;
> >
> > - struct drm_crtc *crtc;
> > -
> > - drm_kms_helper_poll_disable(dev);
> > -
> > - console_lock();
> > - qxl_fbdev_set_suspend(qdev, 1);
> > - console_unlock();
> > + int ret;
> >
> > - /* unpin the front buffers */
> > - list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
> > - const struct drm_crtc_helper_funcs *crtc_funcs = crtc-
>helper_private;
> > - if (crtc->enabled)
> > - (*crtc_funcs->disable)(crtc);
> > - }
> > + ret = drm_mode_config_helper_suspend(dev);
> > + if (ret)
> > + return ret;
> >
> > qxl_destroy_monitors_object(qdev);
> > qxl_surf_evict(qdev);
> >
> > @@ -175,14 +166,7 @@ static int qxl_drm_resume(struct drm_device *dev,
> > bool thaw)>
> > }
> >
> > qxl_create_monitors_object(qdev);
> >
> > - drm_helper_resume_force_mode(dev);
> > -
> > - console_lock();
> > - qxl_fbdev_set_suspend(qdev, 0);
> > - console_unlock();
> > -
> > - drm_kms_helper_poll_enable(dev);
> > - return 0;
> > + return drm_mode_config_helper_resume(dev);
> >
> > }
> >
> > static int qxl_pm_suspend(struct device *dev)
--
Regards,
Laurent Pinchart
^ permalink raw reply
* Re: [PATCH] qxl: fix null-pointer crash during suspend
From: Daniel Vetter @ 2018-10-02 8:14 UTC (permalink / raw)
To: Peter Wu, Laurent Pinchart
Cc: linux-kernel, dri-devel, virtualization, Alan Jenkins,
Dave Airlie
In-Reply-To: <20180904202747.14968-1-peter@lekensteyn.nl>
On Tue, Sep 04, 2018 at 10:27:47PM +0200, Peter Wu wrote:
> "crtc->helper_private" is not initialized by the QXL driver and thus the
This is still initialized, it's the ->disable that goes boom. At least the
call to drm_crtc_helper_add is still there. The ->disable was removed in:
commit 64581714b58bc3e16ede8dc37a025c3aa0e0eef1
Author: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Date: Fri Jun 30 12:36:45 2017 +0300
drm: Convert atomic drivers from CRTC .disable() to .atomic_disable()
Fixes: 64581714b58b ("drm: Convert atomic drivers from CRTC .disable() to .atomic_disable()")
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I'll let Gerd pick this one up, after some testing. Also adding Laurent.
-Daniel
> "crtc_funcs->disable" call would crash (resulting in suspend failure).
> Fix this by converting the suspend/resume functions to use the
> drm_mode_config_helper_* helpers.
>
> Tested system sleep with QEMU 3.0 using "echo mem > /sys/power/state".
> During suspend the following message is visible from QEMU:
>
> spice/server/display-channel.c:2425:display_channel_validate_surface: canvas address is 0x7fd05da68308 for 0 (and is NULL)
> spice/server/display-channel.c:2426:display_channel_validate_surface: failed on 0
>
> This seems to be triggered by QXL_IO_NOTIFY_CMD after
> QXL_IO_DESTROY_PRIMARY_ASYNC, but aside from the warning things still
> seem to work (tested with both the GTK and -spice options).
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
> Hi,
>
> I found this issue while trying to suspend a VM that uses QXL. In order to see
> the stack trace over serial, boot with no_console_suspend. Searching for
> "qxl_drm_freeze" showed one recent report from Alan:
> https://lkml.kernel.org/r/891e334c-cf19-032c-b996-59ac166fcde1@gmail.com
>
> Kind regards,
> Peter
> ---
> drivers/gpu/drm/qxl/qxl_drv.c | 26 +++++---------------------
> 1 file changed, 5 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
> index 2445e75cf7ea..d00f45eed03c 100644
> --- a/drivers/gpu/drm/qxl/qxl_drv.c
> +++ b/drivers/gpu/drm/qxl/qxl_drv.c
> @@ -136,20 +136,11 @@ static int qxl_drm_freeze(struct drm_device *dev)
> {
> struct pci_dev *pdev = dev->pdev;
> struct qxl_device *qdev = dev->dev_private;
> - struct drm_crtc *crtc;
> -
> - drm_kms_helper_poll_disable(dev);
> -
> - console_lock();
> - qxl_fbdev_set_suspend(qdev, 1);
> - console_unlock();
> + int ret;
>
> - /* unpin the front buffers */
> - list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
> - const struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;
> - if (crtc->enabled)
> - (*crtc_funcs->disable)(crtc);
> - }
> + ret = drm_mode_config_helper_suspend(dev);
> + if (ret)
> + return ret;
>
> qxl_destroy_monitors_object(qdev);
> qxl_surf_evict(qdev);
> @@ -175,14 +166,7 @@ static int qxl_drm_resume(struct drm_device *dev, bool thaw)
> }
>
> qxl_create_monitors_object(qdev);
> - drm_helper_resume_force_mode(dev);
> -
> - console_lock();
> - qxl_fbdev_set_suspend(qdev, 0);
> - console_unlock();
> -
> - drm_kms_helper_poll_enable(dev);
> - return 0;
> + return drm_mode_config_helper_resume(dev);
> }
>
> static int qxl_pm_suspend(struct device *dev)
> --
> 2.18.0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply
* [PATCH v2 2/2] drm/bochs: add edid support.
From: Gerd Hoffmann @ 2018-10-02 6:19 UTC (permalink / raw)
To: dri-devel
Cc: David Airlie, open list,
open list:DRM DRIVER FOR BOCHS VIRTUAL GPU
In-Reply-To: <20181002061905.10496-1-kraxel@redhat.com>
Recent qemu (latest master branch, upcoming 3.1 release) got support
for EDID data. This patch adds guest driver support.
EDID support in qemu is not (yet) enabled by default, so please use
'qemu -device VGA,edid=on' for testing.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
drivers/gpu/drm/bochs/bochs.h | 1 +
drivers/gpu/drm/bochs/bochs_hw.c | 38 ++++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/bochs/bochs_kms.c | 18 +++++++++++++++---
3 files changed, 54 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/bochs/bochs.h b/drivers/gpu/drm/bochs/bochs.h
index 375bf92cd0..055d7b437a 100644
--- a/drivers/gpu/drm/bochs/bochs.h
+++ b/drivers/gpu/drm/bochs/bochs.h
@@ -70,6 +70,7 @@ struct bochs_device {
u16 yres_virtual;
u32 stride;
u32 bpp;
+ struct edid *edid;
/* drm */
struct drm_device *dev;
diff --git a/drivers/gpu/drm/bochs/bochs_hw.c b/drivers/gpu/drm/bochs/bochs_hw.c
index a39b0343c1..21a769e48c 100644
--- a/drivers/gpu/drm/bochs/bochs_hw.c
+++ b/drivers/gpu/drm/bochs/bochs_hw.c
@@ -47,6 +47,41 @@ static void bochs_dispi_write(struct bochs_device *bochs, u16 reg, u16 val)
}
}
+static int bochs_load_edid(struct bochs_device *bochs)
+{
+ uint8_t *blob;
+ size_t i, len;
+ uint8_t num_exts;
+
+ if (!bochs->mmio)
+ return -1;
+
+ if ((readb(bochs->mmio+0) != 0x00 ||
+ readb(bochs->mmio+1) != 0xff))
+ return -1;
+
+ num_exts = readb(bochs->mmio + 126);
+ len = EDID_LENGTH * (1 + num_exts);
+ if (len > 0x400 /* vga register offset */)
+ return -1;
+
+ kfree(bochs->edid);
+ bochs->edid = kmalloc(len, GFP_KERNEL);
+ blob = (void *)bochs->edid;
+ for (i = 0; i < len; i++) {
+ blob[i] = readb(bochs->mmio+i);
+ }
+
+ if (!drm_edid_is_valid(bochs->edid)) {
+ DRM_ERROR("EDID is not valid, ignoring.\n");
+ kfree(bochs->edid);
+ bochs->edid = NULL;
+ return -1;
+ }
+
+ return 0;
+}
+
int bochs_hw_init(struct drm_device *dev, uint32_t flags)
{
struct bochs_device *bochs = dev->dev_private;
@@ -133,6 +168,9 @@ int bochs_hw_init(struct drm_device *dev, uint32_t flags)
}
noext:
+ if (bochs_load_edid(bochs) == 0)
+ DRM_INFO("Found EDID data blob.\n");
+
return 0;
}
diff --git a/drivers/gpu/drm/bochs/bochs_kms.c b/drivers/gpu/drm/bochs/bochs_kms.c
index ca5a9afdd5..5b3afe9364 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -181,10 +181,17 @@ static void bochs_encoder_init(struct drm_device *dev)
static int bochs_connector_get_modes(struct drm_connector *connector)
{
- int count;
+ struct bochs_device *bochs =
+ container_of(connector, struct bochs_device, connector);
+ int count = 0;
- count = drm_add_modes_noedid(connector, 8192, 8192);
- drm_set_preferred_mode(connector, defx, defy);
+ if (bochs->edid)
+ count = drm_add_edid_modes(connector, bochs->edid);
+
+ if (!count) {
+ count = drm_add_modes_noedid(connector, 8192, 8192);
+ drm_set_preferred_mode(connector, defx, defy);
+ }
return count;
}
@@ -239,6 +246,11 @@ static void bochs_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector,
&bochs_connector_connector_helper_funcs);
drm_connector_register(connector);
+
+ if (bochs->edid) {
+ drm_connector_attach_edid_property(connector);
+ drm_connector_update_edid_property(connector, bochs->edid);
+ }
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH net V2] vhost-vsock: fix use after free
From: Michael S. Tsirkin @ 2018-09-27 23:50 UTC (permalink / raw)
To: Jason Wang
Cc: kvm, sergei.shtylyov, netdev, linux-kernel, virtualization,
stefanha
In-Reply-To: <588ed28b-7e4b-9dc8-92ce-d75692836c9e@redhat.com>
On Fri, Sep 28, 2018 at 07:37:37AM +0800, Jason Wang wrote:
>
>
> On 2018年09月28日 01:04, Michael S. Tsirkin wrote:
> > On Thu, Sep 27, 2018 at 08:22:04PM +0800, Jason Wang wrote:
> > > The access of vsock is not protected by vhost_vsock_lock. This may
> > > lead to use after free since vhost_vsock_dev_release() may free the
> > > pointer at the same time.
> > >
> > > Fix this by holding the lock during the access.
> > >
> > > Reported-by:syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
> > > Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
> > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> > > Cc: Stefan Hajnoczi<stefanha@redhat.com>
> > > Signed-off-by: Jason Wang<jasowang@redhat.com>
> > Wow is that really the best we can do?
>
> For net/stable, probably yes.
>
> > A global lock on a data path
> > operation?
>
> It's already there,
&vhost_vsock_lock? were is it takes on data path?
> and the patch only increase the critical section.
>
> > Granted use after free is nasty but Stefan said he sees
> > a way to fix it using a per socket refcount. He's on vacation
> > until Oct 4 though ...
> >
>
> Stefan has acked the pacth, so I think it's ok? We can do optimization for
> -next on top.
>
> Thanks
Well on high SMP serializing can drop performance as much as x100 so I'm
not sure it's appropriate - seems to fix a bug but can introduce a
regression. Let's see how does a proper fix look first?
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2] vhost-vsock: fix use after free
From: Jason Wang @ 2018-09-27 23:37 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: kvm, sergei.shtylyov, netdev, linux-kernel, virtualization,
stefanha
In-Reply-To: <20180927123539-mutt-send-email-mst@kernel.org>
On 2018年09月28日 01:04, Michael S. Tsirkin wrote:
> On Thu, Sep 27, 2018 at 08:22:04PM +0800, Jason Wang wrote:
>> The access of vsock is not protected by vhost_vsock_lock. This may
>> lead to use after free since vhost_vsock_dev_release() may free the
>> pointer at the same time.
>>
>> Fix this by holding the lock during the access.
>>
>> Reported-by:syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
>> Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
>> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
>> Cc: Stefan Hajnoczi<stefanha@redhat.com>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> Wow is that really the best we can do?
For net/stable, probably yes.
> A global lock on a data path
> operation?
It's already there, and the patch only increase the critical section.
> Granted use after free is nasty but Stefan said he sees
> a way to fix it using a per socket refcount. He's on vacation
> until Oct 4 though ...
>
Stefan has acked the pacth, so I think it's ok? We can do optimization
for -next on top.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net V2] vhost-vsock: fix use after free
From: Michael S. Tsirkin @ 2018-09-27 17:04 UTC (permalink / raw)
To: Jason Wang
Cc: kvm, sergei.shtylyov, netdev, linux-kernel, virtualization,
stefanha
In-Reply-To: <20180927122204.4188-1-jasowang@redhat.com>
On Thu, Sep 27, 2018 at 08:22:04PM +0800, Jason Wang wrote:
> The access of vsock is not protected by vhost_vsock_lock. This may
> lead to use after free since vhost_vsock_dev_release() may free the
> pointer at the same time.
>
> Fix this by holding the lock during the access.
>
> Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
> Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Wow is that really the best we can do? A global lock on a data path
operation? Granted use after free is nasty but Stefan said he sees
a way to fix it using a per socket refcount. He's on vacation
until Oct 4 though ...
> ---
> - V2: fix typos
> - The patch is needed for -stable.
> ---
> drivers/vhost/vsock.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 34bc3ab40c6d..7d0b292867fd 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -210,21 +210,27 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> struct vhost_vsock *vsock;
> int len = pkt->len;
>
> + spin_lock_bh(&vhost_vsock_lock);
> +
> /* Find the vhost_vsock according to guest context id */
> - vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
> + vsock = __vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
> if (!vsock) {
> virtio_transport_free_pkt(pkt);
> + spin_unlock_bh(&vhost_vsock_lock);
> return -ENODEV;
> }
>
> if (pkt->reply)
> atomic_inc(&vsock->queued_replies);
>
> - spin_lock_bh(&vsock->send_pkt_list_lock);
> + spin_lock(&vsock->send_pkt_list_lock);
> list_add_tail(&pkt->list, &vsock->send_pkt_list);
> - spin_unlock_bh(&vsock->send_pkt_list_lock);
> + spin_unlock(&vsock->send_pkt_list_lock);
>
> vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> +
> + spin_unlock_bh(&vhost_vsock_lock);
> +
> return len;
> }
>
> @@ -236,18 +242,22 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> int cnt = 0;
> LIST_HEAD(freeme);
>
> + spin_lock_bh(&vhost_vsock_lock);
> +
> /* Find the vhost_vsock according to guest context id */
> - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
> - if (!vsock)
> + vsock = __vhost_vsock_get(vsk->remote_addr.svm_cid);
> + if (!vsock) {
> + spin_unlock_bh(&vhost_vsock_lock);
> return -ENODEV;
> + }
>
> - spin_lock_bh(&vsock->send_pkt_list_lock);
> + spin_lock(&vsock->send_pkt_list_lock);
> list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> if (pkt->vsk != vsk)
> continue;
> list_move(&pkt->list, &freeme);
> }
> - spin_unlock_bh(&vsock->send_pkt_list_lock);
> + spin_unlock(&vsock->send_pkt_list_lock);
>
> list_for_each_entry_safe(pkt, n, &freeme, list) {
> if (pkt->reply)
> @@ -265,6 +275,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> vhost_poll_queue(&tx_vq->poll);
> }
>
> + spin_unlock_bh(&vhost_vsock_lock);
> +
> return 0;
> }
>
> --
> 2.17.1
^ permalink raw reply
* Re: [PATCH net V2] vhost-vsock: fix use after free
From: Stefan Hajnoczi @ 2018-09-27 15:33 UTC (permalink / raw)
To: Jason Wang
Cc: kvm, sergei.shtylyov, mst, netdev, linux-kernel, virtualization
In-Reply-To: <20180927122204.4188-1-jasowang@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]
On Thu, Sep 27, 2018 at 08:22:04PM +0800, Jason Wang wrote:
> The access of vsock is not protected by vhost_vsock_lock. This may
> lead to use after free since vhost_vsock_dev_release() may free the
> pointer at the same time.
>
> Fix this by holding the lock during the access.
>
> Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
> Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> - V2: fix typos
> - The patch is needed for -stable.
> ---
> drivers/vhost/vsock.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
Thank you, Jason!
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-27 14:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Stephen Boyd,
X86 ML, LKML, Linux Virtualization, John Stultz, Andy Lutomirski,
Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809181731570.7302@nanos.tec.linutronix.de>
On Tue, 18 Sep 2018, Thomas Gleixner wrote:
> On Tue, 18 Sep 2018, Thomas Gleixner wrote:
> > So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be
> > smaller than cycle_last. The TSC sync stuff does not catch the small delta
> > for unknown raisins. I'll go and find that machine and test that again.
>
> Of course it does not trigger anymore. We accumulated code between the
> point in timekeeping_advance() where the TSC is read and the update of the
> VDSO data.
>
> I'll might have to get an 2.6ish kernel booted on that machine and try with
> that again. /me shudders
Actually it does happen, because the TSC is very slowly drifting apart due
to SMI wreckage trying to hide itself. It just takes a very long time.
Thanks,
tglx
^ permalink raw reply
* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Andy Lutomirski @ 2018-09-27 14:39 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809271630470.8118@nanos.tec.linutronix.de>
> On Sep 27, 2018, at 7:36 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Wed, 19 Sep 2018, Thomas Gleixner wrote:
>> On Tue, 18 Sep 2018, Andy Lutomirski wrote:
>>>> On Sep 18, 2018, at 3:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>> On Tue, 18 Sep 2018, Andy Lutomirski wrote:
>>>>> Do we do better if we use signed arithmetic for the whole calculation?
>>>>> Then a small backwards movement would result in a small backwards result.
>>>>> Or we could offset everything so that we’d have to go back several
>>>>> hundred ms before we cross zero.
>>>>
>>>> That would be probably the better solution as signed math would be
>>>> problematic when the resulting ns value becomes negative. As the delta is
>>>> really small, otherwise the TSC sync check would have caught it, the caller
>>>> should never be able to observe time going backwards.
>>>>
>>>> I'll have a look into that. It needs some thought vs. the fractional part
>>>> of the base time, but it should be not rocket science to get that
>>>> correct. Famous last words...
>>>>
>>>
>>> It’s also fiddly to tune. If you offset it too much, then the fancy
>>> divide-by-repeated-subtraction loop will hurt more than the comparison to
>>> last.
>>
>> Not really. It's sufficient to offset it by at max. 1000 cycles or so. That
>> won't hurt the magic loop, but it will definitely cover that slight offset
>> case.
>
> I got it working, but first of all the gain is close to 0.
>
> There is this other subtle issue that we've seen TSCs slowly drifting apart
> which is caught by the TSC watchdog eventually, but if it exeeds the offset
> _before_ the watchdog triggers, we're back to square one.
>
> So I rather stay on the safe side and just accept that we have to deal with
> that. Sigh.
>
>
Seems okay to me. Oh well.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [patch 09/11] x86/vdso: Simplify the invalid vclock case
From: Thomas Gleixner @ 2018-09-27 14:36 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Florian Weimer, Juergen Gross, Arnd Bergmann, Peter Zijlstra,
X86 ML, LKML, Linux Virtualization, Stephen Boyd, John Stultz,
Andy Lutomirski, Paolo Bonzini, devel, Matt Rickard
In-Reply-To: <alpine.DEB.2.21.1809190112020.1468@nanos.tec.linutronix.de>
[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]
On Wed, 19 Sep 2018, Thomas Gleixner wrote:
> On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> > > On Sep 18, 2018, at 3:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > On Tue, 18 Sep 2018, Andy Lutomirski wrote:
> > >> Do we do better if we use signed arithmetic for the whole calculation?
> > >> Then a small backwards movement would result in a small backwards result.
> > >> Or we could offset everything so that we’d have to go back several
> > >> hundred ms before we cross zero.
> > >
> > > That would be probably the better solution as signed math would be
> > > problematic when the resulting ns value becomes negative. As the delta is
> > > really small, otherwise the TSC sync check would have caught it, the caller
> > > should never be able to observe time going backwards.
> > >
> > > I'll have a look into that. It needs some thought vs. the fractional part
> > > of the base time, but it should be not rocket science to get that
> > > correct. Famous last words...
> > >
> >
> > It’s also fiddly to tune. If you offset it too much, then the fancy
> > divide-by-repeated-subtraction loop will hurt more than the comparison to
> > last.
>
> Not really. It's sufficient to offset it by at max. 1000 cycles or so. That
> won't hurt the magic loop, but it will definitely cover that slight offset
> case.
I got it working, but first of all the gain is close to 0.
There is this other subtle issue that we've seen TSCs slowly drifting apart
which is caught by the TSC watchdog eventually, but if it exeeds the offset
_before_ the watchdog triggers, we're back to square one.
So I rather stay on the safe side and just accept that we have to deal with
that. Sigh.
Thanks,
tglx
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH net V2] vhost-vsock: fix use after free
From: Jason Wang @ 2018-09-27 12:22 UTC (permalink / raw)
To: stefanha, mst; +Cc: kvm, sergei.shtylyov, netdev, linux-kernel, virtualization
The access of vsock is not protected by vhost_vsock_lock. This may
lead to use after free since vhost_vsock_dev_release() may free the
pointer at the same time.
Fix this by holding the lock during the access.
Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
- V2: fix typos
- The patch is needed for -stable.
---
drivers/vhost/vsock.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34bc3ab40c6d..7d0b292867fd 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -210,21 +210,27 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
struct vhost_vsock *vsock;
int len = pkt->len;
+ spin_lock_bh(&vhost_vsock_lock);
+
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
+ vsock = __vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
if (!vsock) {
virtio_transport_free_pkt(pkt);
+ spin_unlock_bh(&vhost_vsock_lock);
return -ENODEV;
}
if (pkt->reply)
atomic_inc(&vsock->queued_replies);
- spin_lock_bh(&vsock->send_pkt_list_lock);
+ spin_lock(&vsock->send_pkt_list_lock);
list_add_tail(&pkt->list, &vsock->send_pkt_list);
- spin_unlock_bh(&vsock->send_pkt_list_lock);
+ spin_unlock(&vsock->send_pkt_list_lock);
vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+
+ spin_unlock_bh(&vhost_vsock_lock);
+
return len;
}
@@ -236,18 +242,22 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
int cnt = 0;
LIST_HEAD(freeme);
+ spin_lock_bh(&vhost_vsock_lock);
+
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
- if (!vsock)
+ vsock = __vhost_vsock_get(vsk->remote_addr.svm_cid);
+ if (!vsock) {
+ spin_unlock_bh(&vhost_vsock_lock);
return -ENODEV;
+ }
- spin_lock_bh(&vsock->send_pkt_list_lock);
+ spin_lock(&vsock->send_pkt_list_lock);
list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
if (pkt->vsk != vsk)
continue;
list_move(&pkt->list, &freeme);
}
- spin_unlock_bh(&vsock->send_pkt_list_lock);
+ spin_unlock(&vsock->send_pkt_list_lock);
list_for_each_entry_safe(pkt, n, &freeme, list) {
if (pkt->reply)
@@ -265,6 +275,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
vhost_poll_queue(&tx_vq->poll);
}
+ spin_unlock_bh(&vhost_vsock_lock);
+
return 0;
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net] vhost-vsock: fix use after free
From: Jason Wang @ 2018-09-27 12:21 UTC (permalink / raw)
To: Sergei Shtylyov, stefanha, mst; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <a413a8b4-302d-7fbc-092d-95e8aa421c4c@cogentembedded.com>
On 2018年09月27日 17:52, Sergei Shtylyov wrote:
> Hello!
>
> On 9/27/2018 11:43 AM, Jason Wang wrote:
>
> Just a couple of typos...
>
>> The access of vsock is not protected by vhost_vsock_lock. This may
>> lead use after free since vhost_vsock_dev_release() may free the
>
> Lead to use.
>
>> pointer at the same time.
>>
>> Fix this by holding the lock during the acess.
>
> Access.
>
>> Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
>> Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
>> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> [...]
>
> MBR, Sergei
Let me post V2.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net] vhost-vsock: fix use after free
From: Sergei Shtylyov @ 2018-09-27 9:52 UTC (permalink / raw)
To: Jason Wang, stefanha, mst; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180927084301.573-1-jasowang@redhat.com>
Hello!
On 9/27/2018 11:43 AM, Jason Wang wrote:
Just a couple of typos...
> The access of vsock is not protected by vhost_vsock_lock. This may
> lead use after free since vhost_vsock_dev_release() may free the
Lead to use.
> pointer at the same time.
>
> Fix this by holding the lock during the acess.
Access.
> Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
> Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
[...]
MBR, Sergei
^ permalink raw reply
* [PATCH net] vhost-vsock: fix use after free
From: Jason Wang @ 2018-09-27 8:43 UTC (permalink / raw)
To: stefanha, mst; +Cc: netdev, linux-kernel, kvm, virtualization
The access of vsock is not protected by vhost_vsock_lock. This may
lead use after free since vhost_vsock_dev_release() may free the
pointer at the same time.
Fix this by holding the lock during the acess.
Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
Fixes: 16320f363ae1 ("vhost-vsock: add pkt cancel capability")
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
- The patch is needed for -stable.
---
drivers/vhost/vsock.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34bc3ab40c6d..7d0b292867fd 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -210,21 +210,27 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
struct vhost_vsock *vsock;
int len = pkt->len;
+ spin_lock_bh(&vhost_vsock_lock);
+
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
+ vsock = __vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
if (!vsock) {
virtio_transport_free_pkt(pkt);
+ spin_unlock_bh(&vhost_vsock_lock);
return -ENODEV;
}
if (pkt->reply)
atomic_inc(&vsock->queued_replies);
- spin_lock_bh(&vsock->send_pkt_list_lock);
+ spin_lock(&vsock->send_pkt_list_lock);
list_add_tail(&pkt->list, &vsock->send_pkt_list);
- spin_unlock_bh(&vsock->send_pkt_list_lock);
+ spin_unlock(&vsock->send_pkt_list_lock);
vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+
+ spin_unlock_bh(&vhost_vsock_lock);
+
return len;
}
@@ -236,18 +242,22 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
int cnt = 0;
LIST_HEAD(freeme);
+ spin_lock_bh(&vhost_vsock_lock);
+
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
- if (!vsock)
+ vsock = __vhost_vsock_get(vsk->remote_addr.svm_cid);
+ if (!vsock) {
+ spin_unlock_bh(&vhost_vsock_lock);
return -ENODEV;
+ }
- spin_lock_bh(&vsock->send_pkt_list_lock);
+ spin_lock(&vsock->send_pkt_list_lock);
list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
if (pkt->vsk != vsk)
continue;
list_move(&pkt->list, &freeme);
}
- spin_unlock_bh(&vsock->send_pkt_list_lock);
+ spin_unlock(&vsock->send_pkt_list_lock);
list_for_each_entry_safe(pkt, n, &freeme, list) {
if (pkt->reply)
@@ -265,6 +275,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
vhost_poll_queue(&tx_vq->poll);
}
+ spin_unlock_bh(&vhost_vsock_lock);
+
return 0;
}
--
2.17.1
^ permalink raw reply related
* Re: [REBASE PATCH net-next v9 0/4] net: vhost: improve performance when enable busyloop
From: David Miller @ 2018-09-27 3:26 UTC (permalink / raw)
To: xiangxia.m.yue; +Cc: netdev, virtualization, mst
In-Reply-To: <1537879012-20859-1-git-send-email-xiangxia.m.yue@gmail.com>
From: xiangxia.m.yue@gmail.com
Date: Tue, 25 Sep 2018 05:36:48 -0700
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patches improve the guest receive performance.
> On the handle_tx side, we poll the sock receive queue
> at the same time. handle_rx do that in the same way.
>
> For more performance report, see patch 4
Series applied, thank you.
^ permalink raw reply
* [PULL 2/2] virtio/s390: fix race in ccw_io_helper()
From: Cornelia Huck @ 2018-09-26 16:48 UTC (permalink / raw)
To: Michael S . Tsirkin
Cc: linux-s390, kvm, Cornelia Huck, stable, virtualization,
Halil Pasic
In-Reply-To: <20180926164830.8435-1-cohuck@redhat.com>
From: Halil Pasic <pasic@linux.ibm.com>
While ccw_io_helper() seems like intended to be exclusive in a sense that
it is supposed to facilitate I/O for at most one thread at any given
time, there is actually nothing ensuring that threads won't pile up at
vcdev->wait_q. If they do, all threads get woken up and see the status
that belongs to some other request than their own. This can lead to bugs.
For an example see:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788432
This race normally does not cause any problems. The operations provided
by struct virtio_config_ops are usually invoked in a well defined
sequence, normally don't fail, and are normally used quite infrequent
too.
Yet, if some of the these operations are directly triggered via sysfs
attributes, like in the case described by the referenced bug, userspace
is given an opportunity to force races by increasing the frequency of the
given operations.
Let us fix the problem by ensuring, that for each device, we finish
processing the previous request before starting with a new one.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org
Message-Id: <20180925121309.58524-3-pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
drivers/s390/virtio/virtio_ccw.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index a5e8530a3391..b67dc4974f23 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -56,6 +56,7 @@ struct virtio_ccw_device {
unsigned int revision; /* Transport revision */
wait_queue_head_t wait_q;
spinlock_t lock;
+ struct mutex io_lock; /* Serializes I/O requests */
struct list_head virtqueues;
unsigned long indicators;
unsigned long indicators2;
@@ -296,6 +297,7 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
unsigned long flags;
int flag = intparm & VIRTIO_CCW_INTPARM_MASK;
+ mutex_lock(&vcdev->io_lock);
do {
spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
ret = ccw_device_start(vcdev->cdev, ccw, intparm, 0, 0);
@@ -308,7 +310,9 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
cpu_relax();
} while (ret == -EBUSY);
wait_event(vcdev->wait_q, doing_io(vcdev, flag) == 0);
- return ret ? ret : vcdev->err;
+ ret = ret ? ret : vcdev->err;
+ mutex_unlock(&vcdev->io_lock);
+ return ret;
}
static void virtio_ccw_drop_indicator(struct virtio_ccw_device *vcdev,
@@ -1253,6 +1257,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
init_waitqueue_head(&vcdev->wait_q);
INIT_LIST_HEAD(&vcdev->virtqueues);
spin_lock_init(&vcdev->lock);
+ mutex_init(&vcdev->io_lock);
spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
dev_set_drvdata(&cdev->dev, vcdev);
--
2.14.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox