From: Marcelo Tosatti <mtosatti@redhat.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: "David Hildenbrand" <david@redhat.com>,
"KVM list" <kvm@vger.kernel.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
stable <stable@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
"Cornelia Huck" <cornelia.huck@de.ibm.com>
Subject: Re: [PATCH v2] KVM: kvm_io_bus_unregister_dev() should never fail
Date: Fri, 24 Mar 2017 07:33:01 -0300 [thread overview]
Message-ID: <20170324103257.GA5250@amt.cnet> (raw)
In-Reply-To: <CACT4Y+Y9MoCVSf515_btVJn=X8kGbJNQA7ujB+XPk=UdcxKNqQ@mail.gmail.com>
On Fri, Mar 24, 2017 at 09:48:31AM +0100, Dmitry Vyukov wrote:
> On Thu, Mar 23, 2017 at 9:42 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Mar 23, 2017 at 06:24:19PM +0100, David Hildenbrand wrote:
> >> No caller currently checks the return value of
> >> kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
> >> freeing their device. A stale reference will remain in the io_bus,
> >> getting at least used again, when the iobus gets teared down on
> >> kvm_destroy_vm() - leading to use after free errors.
> >>
> >> There is nothing the callers could do, except retrying over and over
> >> again.
> >>
> >> So let's simply remove the bus altogether, print an error and make
> >> sure no one can access this broken bus again (returning -ENOMEM on any
> >> attempt to access it).
> >>
> >> Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
> >> Cc: stable@vger.kernel.org # 3.4+
> >> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>
> >> Based on kvm/queue, where we just got 2a108a4e7c1 ("KVM: x86: clear bus
> >> pointer when destroyed"), which added a check we need here.
> >>
> >> v1 -> v2:
> >> - added a check in kvm_destroy_vm()
> >> - added a check in virt/kvm/eventfd.c
> >>
> >> Using 'git grep -C 4 "kvm->buses"' should help to find all users. The other
> >> user in virt/kvm/eventfd.c should be fine.
> >>
> >> ---
> >> include/linux/kvm_host.h | 4 ++--
> >> virt/kvm/eventfd.c | 3 ++-
> >> virt/kvm/kvm_main.c | 42 +++++++++++++++++++++++++-----------------
> >> 3 files changed, 29 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index 2c14ad9..d025074 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -162,8 +162,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> >> int len, void *val);
> >> int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> >> int len, struct kvm_io_device *dev);
> >> -int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >> - struct kvm_io_device *dev);
> >> +void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >> + struct kvm_io_device *dev);
> >> struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >> gpa_t addr);
> >>
> >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> >> index a29786d..4d28a9d 100644
> >> --- a/virt/kvm/eventfd.c
> >> +++ b/virt/kvm/eventfd.c
> >> @@ -870,7 +870,8 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx,
> >> continue;
> >>
> >> kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
> >> - kvm->buses[bus_idx]->ioeventfd_count--;
> >> + if (kvm->buses[bus_idx])
> >> + kvm->buses[bus_idx]->ioeventfd_count--;
> >> ioeventfd_release(p);
> >> ret = 0;
> >> break;
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index 7445566..ef1aa7f 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -728,7 +728,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
> >> spin_unlock(&kvm_lock);
> >> kvm_free_irq_routing(kvm);
> >> for (i = 0; i < KVM_NR_BUSES; i++) {
> >> - kvm_io_bus_destroy(kvm->buses[i]);
> >> + if (kvm->buses[i])
> >> + kvm_io_bus_destroy(kvm->buses[i]);
> >> kvm->buses[i] = NULL;
> >> }
> >> kvm_coalesced_mmio_free(kvm);
> >> @@ -3476,6 +3477,8 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> >> };
> >>
> >> bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> >> + if (!bus)
> >> + return -ENOMEM;
> >> r = __kvm_io_bus_write(vcpu, bus, &range, val);
> >> return r < 0 ? r : 0;
> >> }
> >> @@ -3493,6 +3496,8 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> >> };
> >>
> >> bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> >> + if (!bus)
> >> + return -ENOMEM;
> >>
> >> /* First try the device referenced by cookie. */
> >> if ((cookie >= 0) && (cookie < bus->dev_count) &&
> >> @@ -3543,6 +3548,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> >> };
> >>
> >> bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> >> + if (!bus)
> >> + return -ENOMEM;
> >> r = __kvm_io_bus_read(vcpu, bus, &range, val);
> >> return r < 0 ? r : 0;
> >> }
> >> @@ -3555,6 +3562,9 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> >> struct kvm_io_bus *new_bus, *bus;
> >>
> >> bus = kvm->buses[bus_idx];
> >> + if (!bus)
> >> + return -ENOMEM;
> >> +
> >> /* exclude ioeventfd which is limited by maximum fd */
> >> if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
> >> return -ENOSPC;
> >> @@ -3574,45 +3584,41 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> >> }
> >>
> >> /* Caller must hold slots_lock. */
> >> -int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >> - struct kvm_io_device *dev)
> >> +void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >> + struct kvm_io_device *dev)
> >> {
> >> - int i, r;
> >> + int i;
> >> struct kvm_io_bus *new_bus, *bus;
> >>
> >> bus = kvm->buses[bus_idx];
> >> -
> >> - /*
> >> - * It's possible the bus being released before hand. If so,
> >> - * we're done here.
> >> - */
> >> if (!bus)
> >> - return 0;
> >> + return;
> >>
> >> - r = -ENOENT;
> >> for (i = 0; i < bus->dev_count; i++)
> >> if (bus->range[i].dev == dev) {
> >> - r = 0;
> >> break;
> >> }
> >>
> >> - if (r)
> >> - return r;
> >> + if (i == bus->dev_count)
> >> + return;
> >>
> >> new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) *
> >> sizeof(struct kvm_io_range)), GFP_KERNEL);
> >> - if (!new_bus)
> >> - return -ENOMEM;
> >> + if (!new_bus) {
> >> + pr_err("kvm: failed to shrink bus, removing it completely\n");
> >> + goto broken;
> >
> > The guest will fail in mysterious ways, if you do this (and
> > io_bus_unregister_dev can be called during runtime): in-kernel device
> > accesses will fail with unknown behaviour in the guest.
> >
> > Can't you retry a handful of times with GFP_KERNEL before switching to GFP_ATOMIC?
> > (which in case fails the machine is likely to be crashing soon).
>
> The process can run in a cgroup, then kmalloc failure has nothing to
> do with overall memory consumption. Machine can be perfectly fine.
> Also, this very process can be chosen as an OOM kill target, then it
> needs to gracefully deal with kmalloc failure and proceed to a
> termination point.
> Generally retrying something in a loop does not look like a solid plan
> to deal with errors.
Thats a good point.
Still dislike the "unregister all bus devices" as that renders the guest
unuseable. Can't you fail gracefully? Say force qemu error.
next prev parent reply other threads:[~2017-03-24 14:02 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-23 17:24 [PATCH v2] KVM: kvm_io_bus_unregister_dev() should never fail David Hildenbrand
2017-03-23 17:31 ` Cornelia Huck
2017-03-23 20:42 ` Marcelo Tosatti
2017-03-24 8:48 ` Dmitry Vyukov
2017-03-24 8:55 ` David Hildenbrand
2017-03-24 9:33 ` Cornelia Huck
2017-03-24 10:33 ` Marcelo Tosatti [this message]
2017-03-24 14:40 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170324103257.GA5250@amt.cnet \
--to=mtosatti@redhat.com \
--cc=cornelia.huck@de.ibm.com \
--cc=david@redhat.com \
--cc=dvyukov@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox