* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-03 3:07 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, mst, virtualization
In-Reply-To: <CAMDZJNUMBYOVh+pzaykRq23ePqNR7BOA8wP_=i85-PXvwaxuBA@mail.gmail.com>
On 2018年08月03日 10:51, Tonghao Zhang wrote:
> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>> + struct vhost_virtqueue *rvq,
>>>>>> + struct vhost_virtqueue *tvq,
>>>>>> + bool rx)
>>>>>> +{
>>>>>> + struct socket *sock = rvq->private_data;
>>>>>> +
>>>>>> + if (rx)
>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>> + else {
>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>> + * will wait for sock wakeup for rx, and
>>>>>> + * vhost_enable_notify() is not needed. */
>>>>> A possible case is we do have rx data but guest does not refill the rx
>>>>> queue. In this case we may lose notifications from guest.
>>>> Yes, should consider this case. thanks.
>>> I'm a bit confused. Isn't this covered by the previous
>>> "else if (sock && sk_has_rx_data(...))" block?
>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>> vhost_enble_notify() is false.
>>
>>>>>>> +
>>>>>>> + cpu_relax();
>>>>>>> + }
>>>>>>> +
>>>>>>> + preempt_enable();
>>>>>>> +
>>>>>>> + if (!rx)
>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>> called soon.
>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>> so the network is broken.
>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>> there's no need to enable it since handle_rx() will do this for us.
>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>> need to enable vq since in that case we have no rx data and handle_rx()
>>> is not scheduled.
>>>
>> Yes.
> So we will use the vhost_has_work() to check whether or not the
> handle_rx is scheduled ?
> If we use the vhost_has_work(), the work in the dev work_list may be
> rx work, or tx work, right ?
Yes. We can add a boolean to record whether or not we've called
vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
it was true.
So here's the needed changes:
1) Split the wakeup disabling to another patch
2) Squash the vhost_net_busy_poll_try_queue() and
vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
duplicated checks.
3) If possible, rename the boolean rx in vhost_net_busy_poll() to
poll_rx, this makes code more readable.
Thanks
>> Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-08-03 3:24 UTC (permalink / raw)
To: jasowang; +Cc: Linux Kernel Network Developers, mst, virtualization
In-Reply-To: <fc4dc137-98de-2cee-4f41-e23ab4eb2492@redhat.com>
On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月03日 10:51, Tonghao Zhang wrote:
> > On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >>
> >> On 2018年08月02日 16:41, Toshiaki Makita wrote:
> >>> On 2018/08/02 17:18, Jason Wang wrote:
> >>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
> >>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
> >>>>>> + struct vhost_virtqueue *rvq,
> >>>>>> + struct vhost_virtqueue *tvq,
> >>>>>> + bool rx)
> >>>>>> +{
> >>>>>> + struct socket *sock = rvq->private_data;
> >>>>>> +
> >>>>>> + if (rx)
> >>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
> >>>>>> + else if (sock && sk_has_rx_data(sock->sk))
> >>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
> >>>>>> + else {
> >>>>>> + /* On tx here, sock has no rx data, so we
> >>>>>> + * will wait for sock wakeup for rx, and
> >>>>>> + * vhost_enable_notify() is not needed. */
> >>>>> A possible case is we do have rx data but guest does not refill the rx
> >>>>> queue. In this case we may lose notifications from guest.
> >>>> Yes, should consider this case. thanks.
> >>> I'm a bit confused. Isn't this covered by the previous
> >>> "else if (sock && sk_has_rx_data(...))" block?
> >> The problem is it does nothing if vhost_vq_avail_empty() is true and
> >> vhost_enble_notify() is false.
> >>
> >>>>>>> +
> >>>>>>> + cpu_relax();
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + preempt_enable();
> >>>>>>> +
> >>>>>>> + if (!rx)
> >>>>>>> + vhost_net_enable_vq(net, vq);
> >>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
> >>>>>> called soon.
> >>>>> If we disable rx virtqueue in handle_tx and don't send packets from
> >>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
> >>>>> so the network is broken.
> >>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
> >>>> there's no need to enable it since handle_rx() will do this for us.
> >>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
> >>> need to enable vq since in that case we have no rx data and handle_rx()
> >>> is not scheduled.
> >>>
> >> Yes.
> > So we will use the vhost_has_work() to check whether or not the
> > handle_rx is scheduled ?
> > If we use the vhost_has_work(), the work in the dev work_list may be
> > rx work, or tx work, right ?
>
> Yes. We can add a boolean to record whether or not we've called
> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
> it was true.
so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
may not consider the case: work is tx work in the dev work list.
> So here's the needed changes:
>
> 1) Split the wakeup disabling to another patch
> 2) Squash the vhost_net_busy_poll_try_queue() and
> vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
> duplicated checks.
> 3) If possible, rename the boolean rx in vhost_net_busy_poll() to
> poll_rx, this makes code more readable.
OK
> Thanks
>
> >> Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-03 3:32 UTC (permalink / raw)
To: Jason Wang, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <90c0deec-6a7b-9787-b62f-7ea76a5cbd7c@redhat.com>
On 2018/08/03 12:07, Jason Wang wrote:
> On 2018年08月02日 17:23, Jason Wang wrote:
>>>>>>>
>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>> called soon.
>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>> so the network is broken.
>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>> there's no need to enable it since handle_rx() will do this for us.
>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>> need to enable vq since in that case we have no rx data and handle_rx()
>>> is not scheduled.
>>>
> Rethink about this, looks not. We enable rx wakeups in this case, so if
> there's pending data, handle_rx() will be schedule after
> vhost_net_enable_vq().
You are right, but what I wanted to say is vhost_net_enable_vq() should
be needed (I was talking about what would happen if
vhost_net_enable_vq() were removed). Also, I think we should move
vhost_net_enable_vq() from vhost_net_busy_poll() to this last "else"
block because this is the case where rx wakeups is required.
Anyway this part will be refactored so let's see what this code will
look like in next version.
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-03 3:40 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <CAMDZJNWZ+WH3JhCy81h1VSka7-z2zF1Tw-EnLTr26J_5SkQZTg@mail.gmail.com>
On 2018/08/03 12:24, Tonghao Zhang wrote:
> On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
>> On 2018年08月03日 10:51, Tonghao Zhang wrote:
>>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>>>> + struct vhost_virtqueue *rvq,
>>>>>>>> + struct vhost_virtqueue *tvq,
>>>>>>>> + bool rx)
>>>>>>>> +{
>>>>>>>> + struct socket *sock = rvq->private_data;
>>>>>>>> +
>>>>>>>> + if (rx)
>>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>>>> + else {
>>>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>>>> + * will wait for sock wakeup for rx, and
>>>>>>>> + * vhost_enable_notify() is not needed. */
>>>>>>> A possible case is we do have rx data but guest does not refill the rx
>>>>>>> queue. In this case we may lose notifications from guest.
>>>>>> Yes, should consider this case. thanks.
>>>>> I'm a bit confused. Isn't this covered by the previous
>>>>> "else if (sock && sk_has_rx_data(...))" block?
>>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>>>> vhost_enble_notify() is false.
>>>>
>>>>>>>>> +
>>>>>>>>> + cpu_relax();
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + preempt_enable();
>>>>>>>>> +
>>>>>>>>> + if (!rx)
>>>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>>>> called soon.
>>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>>>> so the network is broken.
>>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>>>> there's no need to enable it since handle_rx() will do this for us.
>>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>>>> need to enable vq since in that case we have no rx data and handle_rx()
>>>>> is not scheduled.
>>>>>
>>>> Yes.
>>> So we will use the vhost_has_work() to check whether or not the
>>> handle_rx is scheduled ?
>>> If we use the vhost_has_work(), the work in the dev work_list may be
>>> rx work, or tx work, right ?
>>
>> Yes. We can add a boolean to record whether or not we've called
>> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
>> it was true.
> so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
> may not consider the case: work is tx work in the dev work list.
Not sure what you are concerned but what I can say is that we need to
poll rx work if vhost_has_work() detects tx work in
vhost_net_rx_peek_head_len() since rx busypoll exits prematurely in that
case.
>> So here's the needed changes:
>>
>> 1) Split the wakeup disabling to another patch
>> 2) Squash the vhost_net_busy_poll_try_queue() and
>> vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
>> duplicated checks.
>> 3) If possible, rename the boolean rx in vhost_net_busy_poll() to
>> poll_rx, this makes code more readable.
> OK
>> Thanks
>>
>>>> Thanks
>>
>
>
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-03 3:43 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, mst, virtualization
In-Reply-To: <CAMDZJNWZ+WH3JhCy81h1VSka7-z2zF1Tw-EnLTr26J_5SkQZTg@mail.gmail.com>
On 2018年08月03日 11:24, Tonghao Zhang wrote:
> On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>> On 2018年08月03日 10:51, Tonghao Zhang wrote:
>>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>>>> + struct vhost_virtqueue *rvq,
>>>>>>>> + struct vhost_virtqueue *tvq,
>>>>>>>> + bool rx)
>>>>>>>> +{
>>>>>>>> + struct socket *sock = rvq->private_data;
>>>>>>>> +
>>>>>>>> + if (rx)
>>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>>>> + else {
>>>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>>>> + * will wait for sock wakeup for rx, and
>>>>>>>> + * vhost_enable_notify() is not needed. */
>>>>>>> A possible case is we do have rx data but guest does not refill the rx
>>>>>>> queue. In this case we may lose notifications from guest.
>>>>>> Yes, should consider this case. thanks.
>>>>> I'm a bit confused. Isn't this covered by the previous
>>>>> "else if (sock && sk_has_rx_data(...))" block?
>>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>>>> vhost_enble_notify() is false.
>>>>
>>>>>>>>> +
>>>>>>>>> + cpu_relax();
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + preempt_enable();
>>>>>>>>> +
>>>>>>>>> + if (!rx)
>>>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>>>> called soon.
>>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>>>> so the network is broken.
>>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>>>> there's no need to enable it since handle_rx() will do this for us.
>>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>>>> need to enable vq since in that case we have no rx data and handle_rx()
>>>>> is not scheduled.
>>>>>
>>>> Yes.
>>> So we will use the vhost_has_work() to check whether or not the
>>> handle_rx is scheduled ?
>>> If we use the vhost_has_work(), the work in the dev work_list may be
>>> rx work, or tx work, right ?
>> Yes. We can add a boolean to record whether or not we've called
>> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
>> it was true.
> so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
> may not consider the case: work is tx work in the dev work list.
So two kinds of work, tx kick or tx wakeup.
For tx kick, we check vhost_vq_avail_empty() and avoid unnecessary kicks
by not enabling kick if we found something is pending on txq. For tx
wakeup, yes, the commit does not consider it. And that's why we want to
disable tx wakeups during busy polling.
Thanks
>
>> So here's the needed changes:
>>
>> 1) Split the wakeup disabling to another patch
>> 2) Squash the vhost_net_busy_poll_try_queue() and
>> vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
>> duplicated checks.
>> 3) If possible, rename the boolean rx in vhost_net_busy_poll() to
>> poll_rx, this makes code more readable.
> OK
>> Thanks
>>
>>>> Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-03 3:44 UTC (permalink / raw)
To: Toshiaki Makita, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <74edb26a-e715-cb49-4e52-62e4b45638d1@lab.ntt.co.jp>
On 2018年08月03日 11:32, Toshiaki Makita wrote:
> On 2018/08/03 12:07, Jason Wang wrote:
>> On 2018年08月02日 17:23, Jason Wang wrote:
>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>>> called soon.
>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>>> so the network is broken.
>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>>> there's no need to enable it since handle_rx() will do this for us.
>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>>> need to enable vq since in that case we have no rx data and handle_rx()
>>>> is not scheduled.
>>>>
>> Rethink about this, looks not. We enable rx wakeups in this case, so if
>> there's pending data, handle_rx() will be schedule after
>> vhost_net_enable_vq().
> You are right, but what I wanted to say is vhost_net_enable_vq() should
> be needed (I was talking about what would happen if
> vhost_net_enable_vq() were removed). Also, I think we should move
> vhost_net_enable_vq() from vhost_net_busy_poll() to this last "else"
> block because this is the case where rx wakeups is required.
> Anyway this part will be refactored so let's see what this code will
> look like in next version.
>
I get your point.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-08-03 4:04 UTC (permalink / raw)
To: jasowang; +Cc: Linux Kernel Network Developers, mst, virtualization
In-Reply-To: <a900e9f3-3aad-2cfa-da4c-226bf8981059@redhat.com>
On Fri, Aug 3, 2018 at 11:43 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月03日 11:24, Tonghao Zhang wrote:
> > On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >>
> >> On 2018年08月03日 10:51, Tonghao Zhang wrote:
> >>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>
> >>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
> >>>>> On 2018/08/02 17:18, Jason Wang wrote:
> >>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
> >>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
> >>>>>>>> + struct vhost_virtqueue *rvq,
> >>>>>>>> + struct vhost_virtqueue *tvq,
> >>>>>>>> + bool rx)
> >>>>>>>> +{
> >>>>>>>> + struct socket *sock = rvq->private_data;
> >>>>>>>> +
> >>>>>>>> + if (rx)
> >>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
> >>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
> >>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
> >>>>>>>> + else {
> >>>>>>>> + /* On tx here, sock has no rx data, so we
> >>>>>>>> + * will wait for sock wakeup for rx, and
> >>>>>>>> + * vhost_enable_notify() is not needed. */
> >>>>>>> A possible case is we do have rx data but guest does not refill the rx
> >>>>>>> queue. In this case we may lose notifications from guest.
> >>>>>> Yes, should consider this case. thanks.
> >>>>> I'm a bit confused. Isn't this covered by the previous
> >>>>> "else if (sock && sk_has_rx_data(...))" block?
> >>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
> >>>> vhost_enble_notify() is false.
> >>>>
> >>>>>>>>> +
> >>>>>>>>> + cpu_relax();
> >>>>>>>>> + }
> >>>>>>>>> +
> >>>>>>>>> + preempt_enable();
> >>>>>>>>> +
> >>>>>>>>> + if (!rx)
> >>>>>>>>> + vhost_net_enable_vq(net, vq);
> >>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
> >>>>>>>> called soon.
> >>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
> >>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
> >>>>>>> so the network is broken.
> >>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
> >>>>>> there's no need to enable it since handle_rx() will do this for us.
> >>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
> >>>>> need to enable vq since in that case we have no rx data and handle_rx()
> >>>>> is not scheduled.
> >>>>>
> >>>> Yes.
> >>> So we will use the vhost_has_work() to check whether or not the
> >>> handle_rx is scheduled ?
> >>> If we use the vhost_has_work(), the work in the dev work_list may be
> >>> rx work, or tx work, right ?
> >> Yes. We can add a boolean to record whether or not we've called
> >> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
> >> it was true.
> > so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
> > may not consider the case: work is tx work in the dev work list.
>
> So two kinds of work, tx kick or tx wakeup.
>
> For tx kick, we check vhost_vq_avail_empty() and avoid unnecessary kicks
> by not enabling kick if we found something is pending on txq. For tx
> wakeup, yes, the commit does not consider it. And that's why we want to
> disable tx wakeups during busy polling.
And in the handle_rx but not busy polling, the tx can wakeup anytime
and the tx work will be added to dev work list. In that case, if we
add
the rx poll to the queue, it is necessary ? the commit be294a51a may
check whether the rx work is in the dev work list.
> Thanks
>
> >
> >> So here's the needed changes:
> >>
> >> 1) Split the wakeup disabling to another patch
> >> 2) Squash the vhost_net_busy_poll_try_queue() and
> >> vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
> >> duplicated checks.
> >> 3) If possible, rename the boolean rx in vhost_net_busy_poll() to
> >> poll_rx, this makes code more readable.
> > OK
> >> Thanks
> >>
> >>>> Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-08-03 4:14 UTC (permalink / raw)
To: makita.toshiaki; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <540b49d5-4496-8d12-5df2-95e7f73cf5c6@lab.ntt.co.jp>
On Fri, Aug 3, 2018 at 11:40 AM Toshiaki Makita
<makita.toshiaki@lab.ntt.co.jp> wrote:
>
> On 2018/08/03 12:24, Tonghao Zhang wrote:
> > On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On 2018年08月03日 10:51, Tonghao Zhang wrote:
> >>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
> >>>>> On 2018/08/02 17:18, Jason Wang wrote:
> >>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
> >>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
> >>>>>>>> + struct vhost_virtqueue *rvq,
> >>>>>>>> + struct vhost_virtqueue *tvq,
> >>>>>>>> + bool rx)
> >>>>>>>> +{
> >>>>>>>> + struct socket *sock = rvq->private_data;
> >>>>>>>> +
> >>>>>>>> + if (rx)
> >>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
> >>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
> >>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
> >>>>>>>> + else {
> >>>>>>>> + /* On tx here, sock has no rx data, so we
> >>>>>>>> + * will wait for sock wakeup for rx, and
> >>>>>>>> + * vhost_enable_notify() is not needed. */
> >>>>>>> A possible case is we do have rx data but guest does not refill the rx
> >>>>>>> queue. In this case we may lose notifications from guest.
> >>>>>> Yes, should consider this case. thanks.
> >>>>> I'm a bit confused. Isn't this covered by the previous
> >>>>> "else if (sock && sk_has_rx_data(...))" block?
> >>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
> >>>> vhost_enble_notify() is false.
> >>>>
> >>>>>>>>> +
> >>>>>>>>> + cpu_relax();
> >>>>>>>>> + }
> >>>>>>>>> +
> >>>>>>>>> + preempt_enable();
> >>>>>>>>> +
> >>>>>>>>> + if (!rx)
> >>>>>>>>> + vhost_net_enable_vq(net, vq);
> >>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
> >>>>>>>> called soon.
> >>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
> >>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
> >>>>>>> so the network is broken.
> >>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
> >>>>>> there's no need to enable it since handle_rx() will do this for us.
> >>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
> >>>>> need to enable vq since in that case we have no rx data and handle_rx()
> >>>>> is not scheduled.
> >>>>>
> >>>> Yes.
> >>> So we will use the vhost_has_work() to check whether or not the
> >>> handle_rx is scheduled ?
> >>> If we use the vhost_has_work(), the work in the dev work_list may be
> >>> rx work, or tx work, right ?
> >>
> >> Yes. We can add a boolean to record whether or not we've called
> >> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
> >> it was true.
> > so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
> > may not consider the case: work is tx work in the dev work list.
>
> Not sure what you are concerned but what I can say is that we need to
> poll rx work if vhost_has_work() detects tx work in
> vhost_net_rx_peek_head_len() since rx busypoll exits prematurely in that
> case.
In the handle_rx, when we busy poll, the vhost_has_work() return true,
because the tx but not rx work is in the dev work list.
and it is the most case, because tx work may be added to dev work list
anytime(not during busy poll) when guest kick the vhost-net.
so it is not necessary to add it., right ?
> >> So here's the needed changes:
> >>
> >> 1) Split the wakeup disabling to another patch
> >> 2) Squash the vhost_net_busy_poll_try_queue() and
> >> vhost_net_busy_poll_check() into vhost_net_busy_poll() and reduce
> >> duplicated checks.
> >> 3) If possible, rename the boolean rx in vhost_net_busy_poll() to
> >> poll_rx, this makes code more readable.
> > OK
> >> Thanks
> >>
> >>>> Thanks
> >>
> >
> >
>
> --
> Toshiaki Makita
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-03 4:25 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <CAMDZJNUyZwcVpYc-3WGB_0CWSe-ti7c0BOvyFwwTo6PZ4s5V1Q@mail.gmail.com>
On 2018/08/03 13:14, Tonghao Zhang wrote:
> On Fri, Aug 3, 2018 at 11:40 AM Toshiaki Makita
> <makita.toshiaki@lab.ntt.co.jp> wrote:
>>
>> On 2018/08/03 12:24, Tonghao Zhang wrote:
>>> On Fri, Aug 3, 2018 at 11:07 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018年08月03日 10:51, Tonghao Zhang wrote:
>>>>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>>>>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>>>>>> + struct vhost_virtqueue *rvq,
>>>>>>>>>> + struct vhost_virtqueue *tvq,
>>>>>>>>>> + bool rx)
>>>>>>>>>> +{
>>>>>>>>>> + struct socket *sock = rvq->private_data;
>>>>>>>>>> +
>>>>>>>>>> + if (rx)
>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>>>>>> + else {
>>>>>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>>>>>> + * will wait for sock wakeup for rx, and
>>>>>>>>>> + * vhost_enable_notify() is not needed. */
>>>>>>>>> A possible case is we do have rx data but guest does not refill the rx
>>>>>>>>> queue. In this case we may lose notifications from guest.
>>>>>>>> Yes, should consider this case. thanks.
>>>>>>> I'm a bit confused. Isn't this covered by the previous
>>>>>>> "else if (sock && sk_has_rx_data(...))" block?
>>>>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>>>>>> vhost_enble_notify() is false.
>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> + cpu_relax();
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + preempt_enable();
>>>>>>>>>>> +
>>>>>>>>>>> + if (!rx)
>>>>>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>>>>>> called soon.
>>>>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>>>>>> so the network is broken.
>>>>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>>>>>> there's no need to enable it since handle_rx() will do this for us.
>>>>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>>>>>> need to enable vq since in that case we have no rx data and handle_rx()
>>>>>>> is not scheduled.
>>>>>>>
>>>>>> Yes.
>>>>> So we will use the vhost_has_work() to check whether or not the
>>>>> handle_rx is scheduled ?
>>>>> If we use the vhost_has_work(), the work in the dev work_list may be
>>>>> rx work, or tx work, right ?
>>>>
>>>> Yes. We can add a boolean to record whether or not we've called
>>>> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
>>>> it was true.
>>> so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
>>> may not consider the case: work is tx work in the dev work list.
>>
>> Not sure what you are concerned but what I can say is that we need to
>> poll rx work if vhost_has_work() detects tx work in
>> vhost_net_rx_peek_head_len() since rx busypoll exits prematurely in that
>> case.
> In the handle_rx, when we busy poll, the vhost_has_work() return true,
> because the tx but not rx work is in the dev work list.
> and it is the most case, because tx work may be added to dev work list
> anytime(not during busy poll) when guest kick the vhost-net.
> so it is not necessary to add it., right ?
I'm lost.
What is the part you think is not needed?
1. When there is a tx work we exit rx busypoll.
2. When we exit rx busypoll by tx work, we poll rx work (so that we can
continue rx busypoll later).
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-03 5:07 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, mst, virtualization
In-Reply-To: <CAMDZJNVhmWVEeHbZDa3ooFsFXNTtKSH=ey6izfs8CeakbQLjZQ@mail.gmail.com>
On 2018年08月03日 12:04, Tonghao Zhang wrote:
> On Fri, Aug 3, 2018 at 11:43 AM Jason Wang<jasowang@redhat.com> wrote:
>>
>> On 2018年08月03日 11:24, Tonghao Zhang wrote:
>>> On Fri, Aug 3, 2018 at 11:07 AM Jason Wang<jasowang@redhat.com> wrote:
>>>> On 2018年08月03日 10:51, Tonghao Zhang wrote:
>>>>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang<jasowang@redhat.com> wrote:
>>>>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>>>>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>>>>>> + struct vhost_virtqueue *rvq,
>>>>>>>>>> + struct vhost_virtqueue *tvq,
>>>>>>>>>> + bool rx)
>>>>>>>>>> +{
>>>>>>>>>> + struct socket *sock = rvq->private_data;
>>>>>>>>>> +
>>>>>>>>>> + if (rx)
>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>>>>>> + else {
>>>>>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>>>>>> + * will wait for sock wakeup for rx, and
>>>>>>>>>> + * vhost_enable_notify() is not needed. */
>>>>>>>>> A possible case is we do have rx data but guest does not refill the rx
>>>>>>>>> queue. In this case we may lose notifications from guest.
>>>>>>>> Yes, should consider this case. thanks.
>>>>>>> I'm a bit confused. Isn't this covered by the previous
>>>>>>> "else if (sock && sk_has_rx_data(...))" block?
>>>>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>>>>>> vhost_enble_notify() is false.
>>>>>>
>>>>>>>>>>> +
>>>>>>>>>>> + cpu_relax();
>>>>>>>>>>> + }
>>>>>>>>>>> +
>>>>>>>>>>> + preempt_enable();
>>>>>>>>>>> +
>>>>>>>>>>> + if (!rx)
>>>>>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>>>>>>>> called soon.
>>>>>>>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>>>>>>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>>>>>>>> so the network is broken.
>>>>>>>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>>>>>>>> there's no need to enable it since handle_rx() will do this for us.
>>>>>>> Looks like in the last "else" block in vhost_net_busy_poll_check() we
>>>>>>> need to enable vq since in that case we have no rx data and handle_rx()
>>>>>>> is not scheduled.
>>>>>>>
>>>>>> Yes.
>>>>> So we will use the vhost_has_work() to check whether or not the
>>>>> handle_rx is scheduled ?
>>>>> If we use the vhost_has_work(), the work in the dev work_list may be
>>>>> rx work, or tx work, right ?
>>>> Yes. We can add a boolean to record whether or not we've called
>>>> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
>>>> it was true.
>>> so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during busypoll"
>>> may not consider the case: work is tx work in the dev work list.
>> So two kinds of work, tx kick or tx wakeup.
>>
>> For tx kick, we check vhost_vq_avail_empty() and avoid unnecessary kicks
>> by not enabling kick if we found something is pending on txq. For tx
>> wakeup, yes, the commit does not consider it. And that's why we want to
>> disable tx wakeups during busy polling.
> And in the handle_rx but not busy polling, the tx can wakeup anytime
> and the tx work will be added to dev work list. In that case, if we
> add
> the rx poll to the queue, it is necessary ? the commit be294a51a may
> check whether the rx work is in the dev work list.
I think the point this we don't poll rx during tx at that time. So if rx
poll is interrupted, we should reschedule handle_rx(). After we poll rx
on handle_tx(), we can try to optimize this on top.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-03 5:25 UTC (permalink / raw)
To: Jason Wang, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <155c4d23-ca90-b8f7-5a1f-bc54a92e922e@redhat.com>
On 2018/08/03 14:07, Jason Wang wrote:
> On 2018年08月03日 12:04, Tonghao Zhang wrote:
>> On Fri, Aug 3, 2018 at 11:43 AM Jason Wang<jasowang@redhat.com> wrote:
>>>
>>> On 2018年08月03日 11:24, Tonghao Zhang wrote:
>>>> On Fri, Aug 3, 2018 at 11:07 AM Jason Wang<jasowang@redhat.com> wrote:
>>>>> On 2018年08月03日 10:51, Tonghao Zhang wrote:
>>>>>> On Thu, Aug 2, 2018 at 5:23 PM Jason Wang<jasowang@redhat.com>
>>>>>> wrote:
>>>>>>> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>>>>>>>> On 2018/08/02 17:18, Jason Wang wrote:
>>>>>>>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>>>>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>>>>>>>> + struct vhost_virtqueue *rvq,
>>>>>>>>>>> + struct vhost_virtqueue *tvq,
>>>>>>>>>>> + bool rx)
>>>>>>>>>>> +{
>>>>>>>>>>> + struct socket *sock = rvq->private_data;
>>>>>>>>>>> +
>>>>>>>>>>> + if (rx)
>>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>>>>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>>>>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>>>>>>>> + else {
>>>>>>>>>>> + /* On tx here, sock has no rx data, so we
>>>>>>>>>>> + * will wait for sock wakeup for rx, and
>>>>>>>>>>> + * vhost_enable_notify() is not needed. */
>>>>>>>>>> A possible case is we do have rx data but guest does not
>>>>>>>>>> refill the rx
>>>>>>>>>> queue. In this case we may lose notifications from guest.
>>>>>>>>> Yes, should consider this case. thanks.
>>>>>>>> I'm a bit confused. Isn't this covered by the previous
>>>>>>>> "else if (sock && sk_has_rx_data(...))" block?
>>>>>>> The problem is it does nothing if vhost_vq_avail_empty() is true and
>>>>>>> vhost_enble_notify() is false.
>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> + cpu_relax();
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> + preempt_enable();
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (!rx)
>>>>>>>>>>>> + vhost_net_enable_vq(net, vq);
>>>>>>>>>>> No need to enable rx virtqueue, if we are sure handle_rx()
>>>>>>>>>>> will be
>>>>>>>>>>> called soon.
>>>>>>>>>> If we disable rx virtqueue in handle_tx and don't send packets
>>>>>>>>>> from
>>>>>>>>>> guest anymore(handle_tx is not called), so we can wake up for
>>>>>>>>>> sock rx.
>>>>>>>>>> so the network is broken.
>>>>>>>>> Not sure I understand here. I mean is we schedule work for
>>>>>>>>> handle_rx(),
>>>>>>>>> there's no need to enable it since handle_rx() will do this for
>>>>>>>>> us.
>>>>>>>> Looks like in the last "else" block in
>>>>>>>> vhost_net_busy_poll_check() we
>>>>>>>> need to enable vq since in that case we have no rx data and
>>>>>>>> handle_rx()
>>>>>>>> is not scheduled.
>>>>>>>>
>>>>>>> Yes.
>>>>>> So we will use the vhost_has_work() to check whether or not the
>>>>>> handle_rx is scheduled ?
>>>>>> If we use the vhost_has_work(), the work in the dev work_list may be
>>>>>> rx work, or tx work, right ?
>>>>> Yes. We can add a boolean to record whether or not we've called
>>>>> vhost_poll_queue() for rvq. And avoid calling vhost_net_enable_vq() if
>>>>> it was true.
>>>> so, the commit be294a51a "vhost_net: Avoid rx queue wake-ups during
>>>> busypoll"
>>>> may not consider the case: work is tx work in the dev work list.
>>> So two kinds of work, tx kick or tx wakeup.
>>>
>>> For tx kick, we check vhost_vq_avail_empty() and avoid unnecessary kicks
>>> by not enabling kick if we found something is pending on txq. For tx
>>> wakeup, yes, the commit does not consider it. And that's why we want to
>>> disable tx wakeups during busy polling.
>> And in the handle_rx but not busy polling, the tx can wakeup anytime
>> and the tx work will be added to dev work list. In that case, if we
>> add
>> the rx poll to the queue, it is necessary ? the commit be294a51a may
>> check whether the rx work is in the dev work list.
>
> I think the point this we don't poll rx during tx at that time. So if rx
> poll is interrupted, we should reschedule handle_rx(). After we poll rx
> on handle_tx(), we can try to optimize this on top.
That's true. We may be able to skip poll_queue in handle_rx/tx after
rx/tx busypolling is unified by this patch set.
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-08-03 5:35 UTC (permalink / raw)
To: Michal Hocko
Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <20180802114755.GI10808@dhcp22.suse.cz>
On 08/02/2018 07:47 PM, Michal Hocko wrote:
> On Thu 02-08-18 18:32:44, Wei Wang wrote:
>> On 08/01/2018 07:34 PM, Michal Hocko wrote:
>>> On Wed 01-08-18 19:12:25, Wei Wang wrote:
>>>> On 07/30/2018 05:00 PM, Michal Hocko wrote:
>>>>> On Fri 27-07-18 17:24:55, Wei Wang wrote:
>>>>>> The OOM notifier is getting deprecated to use for the reasons mentioned
>>>>>> here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
>>>>>>
>>>>>> This patch replaces the virtio-balloon oom notifier with a shrinker
>>>>>> to release balloon pages on memory pressure.
>>>>> It would be great to document the replacement. This is not a small
>>>>> change...
>>>> OK. I plan to document the following to the commit log:
>>>>
>>>> The OOM notifier is getting deprecated to use for the reasons:
>>>> - As a callout from the oom context, it is too subtle and easy to
>>>> generate bugs and corner cases which are hard to track;
>>>> - It is called too late (after the reclaiming has been performed).
>>>> Drivers with large amuont of reclaimable memory is expected to be
>>>> released them at an early age of memory pressure;
>>>> - The notifier callback isn't aware of the oom contrains;
>>>> Link: https://lkml.org/lkml/2018/7/12/314
>>>>
>>>> This patch replaces the virtio-balloon oom notifier with a shrinker
>>>> to release balloon pages on memory pressure. Users can set the amount of
>>>> memory pages to release each time a shrinker_scan is called via the
>>>> module parameter balloon_pages_to_shrink, and the default amount is 256
>>>> pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
>>>> been used to release balloon pages on OOM. We continue to use this
>>>> feature bit for the shrinker, so the shrinker is only registered when
>>>> this feature bit has been negotiated with host.
>>> Do you have any numbers for how does this work in practice?
>> It works in this way: for example, we can set the parameter,
>> balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is called.
>> Now, we have a 8GB guest, and we balloon out 7GB. When shrink scan is
>> called, the balloon driver will get back 1GB memory and give them back to
>> mm, then the ballooned memory becomes 6GB.
>>
>> When the shrinker scan is called the second time, another 1GB will be given
>> back to mm. So the ballooned pages are given back to mm gradually.
>>
>>> Let's say
>>> you have a medium page cache workload which triggers kswapd to do a
>>> light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
>>> no idea how people expect this to work. Shouldn't this be more
>>> adaptive? How precious are those pages anyway?
>> Those pages are given to host to use usually because the guest has enough
>> free memory, and host doesn't want to waste those pieces of memory as they
>> are not used by this guest. When the guest needs them, it is reasonable that
>> the guest has higher priority to take them back.
>> But I'm not sure if there would be a more adaptive approach than "gradually
>> giving back as the guest wants more".
> I am not sure I follow. Let me be more specific. Say you have a trivial
> stream IO triggering reclaim to recycle clean page cache. This will
> invoke slab shrinkers as well. Do you really want to drop your batch of
> pages on each invocation? Doesn't that remove them very quickly? Just
> try to dd if=large_file of=/dev/null and see how your pages are
> disappearing. Shrinkers usually scale the number of objects they are
> going to reclaim based on the memory pressure (aka targer to be
> reclaimed).
Thanks, I think it looks better to make it more adaptive. I'll send out
a new version for review.
Best,
Wei
^ permalink raw reply
* [PATCH net-next] vhost: switch to use new message format
From: Jason Wang @ 2018-08-03 7:04 UTC (permalink / raw)
To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization
We use to have message like:
struct vhost_msg {
int type;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};
Unfortunately, there will be a hole of 32bit in 64bit machine because
of the alignment. This leads a different formats between 32bit API and
64bit API. What's more it will break 32bit program running on 64bit
machine.
So fixing this by introducing a new message type with an explicit
32bit reserved field after type like:
struct vhost_msg_v2 {
int type;
__u32 reserved;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};
We will have a consistent ABI after switching to use this. To enable
this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for
userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2).
Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 30 ++++++++++++++++++++
drivers/vhost/vhost.c | 71 ++++++++++++++++++++++++++++++++++------------
drivers/vhost/vhost.h | 11 ++++++-
include/uapi/linux/vhost.h | 18 ++++++++++++
4 files changed, 111 insertions(+), 19 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 367d802..4e656f8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -78,6 +78,10 @@ enum {
};
enum {
+ VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
+};
+
+enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
VHOST_NET_VQ_MAX = 2,
@@ -1399,6 +1403,21 @@ static long vhost_net_reset_owner(struct vhost_net *n)
return err;
}
+static int vhost_net_set_backend_features(struct vhost_net *n, u64 features)
+{
+ int i;
+
+ mutex_lock(&n->dev.mutex);
+ for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+ mutex_lock(&n->vqs[i].vq.mutex);
+ n->vqs[i].vq.acked_backend_features = features;
+ mutex_unlock(&n->vqs[i].vq.mutex);
+ }
+ mutex_unlock(&n->dev.mutex);
+
+ return 0;
+}
+
static int vhost_net_set_features(struct vhost_net *n, u64 features)
{
size_t vhost_hlen, sock_hlen, hdr_len;
@@ -1489,6 +1508,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
if (features & ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
+ case VHOST_GET_BACKEND_FEATURES:
+ features = VHOST_NET_BACKEND_FEATURES;
+ if (copy_to_user(featurep, &features, sizeof(features)))
+ return -EFAULT;
+ return 0;
+ case VHOST_SET_BACKEND_FEATURES:
+ if (copy_from_user(&features, featurep, sizeof(features)))
+ return -EFAULT;
+ if (features & ~VHOST_NET_BACKEND_FEATURES)
+ return -EOPNOTSUPP;
+ return vhost_net_set_backend_features(n, features);
case VHOST_RESET_OWNER:
return vhost_net_reset_owner(n);
case VHOST_SET_OWNER:
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a502f1a..6f6c42d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -315,6 +315,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->log_addr = -1ull;
vq->private_data = NULL;
vq->acked_features = 0;
+ vq->acked_backend_features = 0;
vq->log_base = NULL;
vq->error_ctx = NULL;
vq->kick = NULL;
@@ -1027,28 +1028,40 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
struct iov_iter *from)
{
- struct vhost_msg_node node;
- unsigned size = sizeof(struct vhost_msg);
- size_t ret;
- int err;
+ struct vhost_iotlb_msg msg;
+ size_t offset;
+ int type, ret;
- if (iov_iter_count(from) < size)
- return 0;
- ret = copy_from_iter(&node.msg, size, from);
- if (ret != size)
+ ret = copy_from_iter(&type, sizeof(type), from);
+ if (ret != sizeof(type))
goto done;
- switch (node.msg.type) {
+ switch (type) {
case VHOST_IOTLB_MSG:
- err = vhost_process_iotlb_msg(dev, &node.msg.iotlb);
- if (err)
- ret = err;
+ /* There maybe a hole after type for V1 message type,
+ * so skip it here.
+ */
+ offset = offsetof(struct vhost_msg, iotlb) - sizeof(int);
+ break;
+ case VHOST_IOTLB_MSG_V2:
+ offset = sizeof(__u32);
break;
default:
ret = -EINVAL;
- break;
+ goto done;
+ }
+
+ iov_iter_advance(from, offset);
+ ret = copy_from_iter(&msg, sizeof(msg), from);
+ if (ret != sizeof(msg))
+ goto done;
+ if (vhost_process_iotlb_msg(dev, &msg)) {
+ ret = -EFAULT;
+ goto done;
}
+ ret = (type == VHOST_IOTLB_MSG) ? sizeof(struct vhost_msg) :
+ sizeof(struct vhost_msg_v2);
done:
return ret;
}
@@ -1107,13 +1120,28 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to,
finish_wait(&dev->wait, &wait);
if (node) {
- ret = copy_to_iter(&node->msg, size, to);
+ struct vhost_iotlb_msg *msg;
+ void *start = &node->msg;
+
+ switch (node->msg.type) {
+ case VHOST_IOTLB_MSG:
+ size = sizeof(node->msg);
+ msg = &node->msg.iotlb;
+ break;
+ case VHOST_IOTLB_MSG_V2:
+ size = sizeof(node->msg_v2);
+ msg = &node->msg_v2.iotlb;
+ break;
+ default:
+ BUG();
+ break;
+ }
- if (ret != size || node->msg.type != VHOST_IOTLB_MISS) {
+ ret = copy_to_iter(start, size, to);
+ if (ret != size || msg->type != VHOST_IOTLB_MISS) {
kfree(node);
return ret;
}
-
vhost_enqueue_msg(dev, &dev->pending_list, node);
}
@@ -1126,12 +1154,19 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
struct vhost_dev *dev = vq->dev;
struct vhost_msg_node *node;
struct vhost_iotlb_msg *msg;
+ bool v2 = vhost_backend_has_feature(vq, VHOST_BACKEND_F_IOTLB_MSG_V2);
- node = vhost_new_msg(vq, VHOST_IOTLB_MISS);
+ node = vhost_new_msg(vq, v2 ? VHOST_IOTLB_MSG_V2 : VHOST_IOTLB_MSG);
if (!node)
return -ENOMEM;
- msg = &node->msg.iotlb;
+ if (v2) {
+ node->msg_v2.type = VHOST_IOTLB_MSG_V2;
+ msg = &node->msg_v2.iotlb;
+ } else {
+ msg = &node->msg.iotlb;
+ }
+
msg->type = VHOST_IOTLB_MISS;
msg->iova = iova;
msg->perm = access;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 6c844b9..466ef75 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -132,6 +132,7 @@ struct vhost_virtqueue {
struct vhost_umem *iotlb;
void *private_data;
u64 acked_features;
+ u64 acked_backend_features;
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
@@ -147,7 +148,10 @@ struct vhost_virtqueue {
};
struct vhost_msg_node {
- struct vhost_msg msg;
+ union {
+ struct vhost_msg msg;
+ struct vhost_msg_v2 msg_v2;
+ };
struct vhost_virtqueue *vq;
struct list_head node;
};
@@ -238,6 +242,11 @@ static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
return vq->acked_features & (1ULL << bit);
}
+static inline bool vhost_backend_has_feature(struct vhost_virtqueue *vq, int bit)
+{
+ return vq->acked_backend_features & (1ULL << bit);
+}
+
#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
{
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c51f8e5..c176e9d 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -65,6 +65,7 @@ struct vhost_iotlb_msg {
};
#define VHOST_IOTLB_MSG 0x1
+#define VHOST_IOTLB_MSG_V2 0x2
struct vhost_msg {
int type;
@@ -74,6 +75,15 @@ struct vhost_msg {
};
};
+struct vhost_msg_v2 {
+ int type;
+ __u32 reserved;
+ union {
+ struct vhost_iotlb_msg iotlb;
+ __u8 padding[64];
+ };
+};
+
struct vhost_memory_region {
__u64 guest_phys_addr;
__u64 memory_size; /* bytes */
@@ -160,6 +170,14 @@ struct vhost_memory {
#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24, \
struct vhost_vring_state)
+/* Set or get vhost backend capability */
+
+/* Use message type V2 */
+#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+
+#define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
+#define VHOST_GET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x26, __u64)
+
/* VHOST_NET specific defines */
/* Attach virtio net ring to a raw socket, or tap device.
--
2.7.4
^ permalink raw reply related
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-03 7:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier,
joe, robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <de4888b6457e220776e16a9c8958ff0886ffc66c.camel@kernel.crashing.org>
On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> So let's differenciate the two problems of having an IOMMU (real or
> emulated) which indeeds adds overhead etc... and using the DMA API.
>
> At the moment, virtio does this all over the place:
>
> if (use_dma_api)
> dma_map/alloc_something(...)
> else
> use_pa
>
> The idea of the patch set is to do two, somewhat orthogonal, changes
> that together achieve what we want. Let me know where you think there
> is "a bunch of issues" because I'm missing it:
>
> 1- Replace the above if/else constructs with just calling the DMA API,
> and have virtio, at initialization, hookup its own dma_ops that just
> "return pa" (roughly) when the IOMMU stuff isn't used.
>
> This adds an indirect function call to the path that previously didn't
> have one (the else case above). Is that a significant/measurable
> overhead ?
If you call it often enough it does:
https://www.spinics.net/lists/netdev/msg495413.html
> 2- Make virtio use the DMA API with our custom platform-provided
> swiotlb callbacks when needed, that is when not using IOMMU *and*
> running on a secure VM in our case.
And total NAK the customer platform-provided part of this. We need
a flag passed in from the hypervisor that the device needs all bus
specific dma api treatment, and then just use the normal plaform
dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI
dma-range property to limit the addressable range, and a swiotlb
capable plaform will use swiotlb automatically.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-03 7:06 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel,
linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier,
mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <20180802235233-mutt-send-email-mst@kernel.org>
On Thu, Aug 02, 2018 at 11:53:08PM +0300, Michael S. Tsirkin wrote:
> > We don't need cache flushing tricks.
>
> You don't but do real devices on same platform need them?
IBMs power plaforms are always cache coherent. There are some powerpc
platforms have not cache coherent DMA, but I guess this scheme isn't
intended for them.
^ permalink raw reply
* Re: [PATCH net-next] vhost: switch to use new message format
From: Michael S. Tsirkin @ 2018-08-03 7:59 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1533279891-12249-1-git-send-email-jasowang@redhat.com>
On Fri, Aug 03, 2018 at 03:04:51PM +0800, Jason Wang wrote:
> We use to have message like:
>
> struct vhost_msg {
> int type;
> union {
> struct vhost_iotlb_msg iotlb;
> __u8 padding[64];
> };
> };
>
> Unfortunately, there will be a hole of 32bit in 64bit machine because
> of the alignment. This leads a different formats between 32bit API and
> 64bit API. What's more it will break 32bit program running on 64bit
> machine.
>
> So fixing this by introducing a new message type with an explicit
> 32bit reserved field after type like:
>
> struct vhost_msg_v2 {
> int type;
> __u32 reserved;
> union {
> struct vhost_iotlb_msg iotlb;
> __u8 padding[64];
> };
> };
>
> We will have a consistent ABI after switching to use this. To enable
> this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for
> userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2).
>
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 30 ++++++++++++++++++++
> drivers/vhost/vhost.c | 71 ++++++++++++++++++++++++++++++++++------------
> drivers/vhost/vhost.h | 11 ++++++-
> include/uapi/linux/vhost.h | 18 ++++++++++++
> 4 files changed, 111 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 367d802..4e656f8 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -78,6 +78,10 @@ enum {
> };
>
> enum {
> + VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
> +};
> +
> +enum {
> VHOST_NET_VQ_RX = 0,
> VHOST_NET_VQ_TX = 1,
> VHOST_NET_VQ_MAX = 2,
> @@ -1399,6 +1403,21 @@ static long vhost_net_reset_owner(struct vhost_net *n)
> return err;
> }
>
> +static int vhost_net_set_backend_features(struct vhost_net *n, u64 features)
> +{
> + int i;
> +
> + mutex_lock(&n->dev.mutex);
> + for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
> + mutex_lock(&n->vqs[i].vq.mutex);
> + n->vqs[i].vq.acked_backend_features = features;
> + mutex_unlock(&n->vqs[i].vq.mutex);
> + }
> + mutex_unlock(&n->dev.mutex);
> +
> + return 0;
> +}
> +
> static int vhost_net_set_features(struct vhost_net *n, u64 features)
> {
> size_t vhost_hlen, sock_hlen, hdr_len;
> @@ -1489,6 +1508,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> if (features & ~VHOST_NET_FEATURES)
> return -EOPNOTSUPP;
> return vhost_net_set_features(n, features);
> + case VHOST_GET_BACKEND_FEATURES:
> + features = VHOST_NET_BACKEND_FEATURES;
> + if (copy_to_user(featurep, &features, sizeof(features)))
> + return -EFAULT;
> + return 0;
> + case VHOST_SET_BACKEND_FEATURES:
> + if (copy_from_user(&features, featurep, sizeof(features)))
> + return -EFAULT;
> + if (features & ~VHOST_NET_BACKEND_FEATURES)
> + return -EOPNOTSUPP;
> + return vhost_net_set_backend_features(n, features);
> case VHOST_RESET_OWNER:
> return vhost_net_reset_owner(n);
> case VHOST_SET_OWNER:
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a502f1a..6f6c42d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -315,6 +315,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
> vq->log_addr = -1ull;
> vq->private_data = NULL;
> vq->acked_features = 0;
> + vq->acked_backend_features = 0;
> vq->log_base = NULL;
> vq->error_ctx = NULL;
> vq->kick = NULL;
> @@ -1027,28 +1028,40 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
> ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
> struct iov_iter *from)
> {
> - struct vhost_msg_node node;
> - unsigned size = sizeof(struct vhost_msg);
> - size_t ret;
> - int err;
> + struct vhost_iotlb_msg msg;
> + size_t offset;
> + int type, ret;
>
> - if (iov_iter_count(from) < size)
> - return 0;
> - ret = copy_from_iter(&node.msg, size, from);
> - if (ret != size)
> + ret = copy_from_iter(&type, sizeof(type), from);
> + if (ret != sizeof(type))
> goto done;
>
> - switch (node.msg.type) {
> + switch (type) {
> case VHOST_IOTLB_MSG:
> - err = vhost_process_iotlb_msg(dev, &node.msg.iotlb);
> - if (err)
> - ret = err;
> + /* There maybe a hole after type for V1 message type,
> + * so skip it here.
> + */
> + offset = offsetof(struct vhost_msg, iotlb) - sizeof(int);
> + break;
> + case VHOST_IOTLB_MSG_V2:
> + offset = sizeof(__u32);
> break;
> default:
> ret = -EINVAL;
> - break;
> + goto done;
> + }
> +
> + iov_iter_advance(from, offset);
> + ret = copy_from_iter(&msg, sizeof(msg), from);
> + if (ret != sizeof(msg))
> + goto done;
> + if (vhost_process_iotlb_msg(dev, &msg)) {
> + ret = -EFAULT;
> + goto done;
> }
>
> + ret = (type == VHOST_IOTLB_MSG) ? sizeof(struct vhost_msg) :
> + sizeof(struct vhost_msg_v2);
> done:
> return ret;
> }
We can actually fix 32 bit apps too, checking the mode for v1.
But that can wait for another patch.
> @@ -1107,13 +1120,28 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to,
> finish_wait(&dev->wait, &wait);
>
> if (node) {
> - ret = copy_to_iter(&node->msg, size, to);
> + struct vhost_iotlb_msg *msg;
> + void *start = &node->msg;
> +
> + switch (node->msg.type) {
> + case VHOST_IOTLB_MSG:
> + size = sizeof(node->msg);
> + msg = &node->msg.iotlb;
> + break;
> + case VHOST_IOTLB_MSG_V2:
> + size = sizeof(node->msg_v2);
> + msg = &node->msg_v2.iotlb;
> + break;
> + default:
> + BUG();
> + break;
> + }
>
> - if (ret != size || node->msg.type != VHOST_IOTLB_MISS) {
> + ret = copy_to_iter(start, size, to);
> + if (ret != size || msg->type != VHOST_IOTLB_MISS) {
> kfree(node);
> return ret;
> }
> -
> vhost_enqueue_msg(dev, &dev->pending_list, node);
> }
>
> @@ -1126,12 +1154,19 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
> struct vhost_dev *dev = vq->dev;
> struct vhost_msg_node *node;
> struct vhost_iotlb_msg *msg;
> + bool v2 = vhost_backend_has_feature(vq, VHOST_BACKEND_F_IOTLB_MSG_V2);
>
> - node = vhost_new_msg(vq, VHOST_IOTLB_MISS);
> + node = vhost_new_msg(vq, v2 ? VHOST_IOTLB_MSG_V2 : VHOST_IOTLB_MSG);
> if (!node)
> return -ENOMEM;
>
> - msg = &node->msg.iotlb;
> + if (v2) {
> + node->msg_v2.type = VHOST_IOTLB_MSG_V2;
> + msg = &node->msg_v2.iotlb;
> + } else {
> + msg = &node->msg.iotlb;
> + }
> +
> msg->type = VHOST_IOTLB_MISS;
> msg->iova = iova;
> msg->perm = access;
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 6c844b9..466ef75 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -132,6 +132,7 @@ struct vhost_virtqueue {
> struct vhost_umem *iotlb;
> void *private_data;
> u64 acked_features;
> + u64 acked_backend_features;
> /* Log write descriptors */
> void __user *log_base;
> struct vhost_log *log;
> @@ -147,7 +148,10 @@ struct vhost_virtqueue {
> };
>
> struct vhost_msg_node {
> - struct vhost_msg msg;
> + union {
> + struct vhost_msg msg;
> + struct vhost_msg_v2 msg_v2;
> + };
> struct vhost_virtqueue *vq;
> struct list_head node;
> };
> @@ -238,6 +242,11 @@ static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
> return vq->acked_features & (1ULL << bit);
> }
>
> +static inline bool vhost_backend_has_feature(struct vhost_virtqueue *vq, int bit)
> +{
> + return vq->acked_backend_features & (1ULL << bit);
> +}
> +
> #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
> static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
> {
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c51f8e5..c176e9d 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -65,6 +65,7 @@ struct vhost_iotlb_msg {
> };
>
> #define VHOST_IOTLB_MSG 0x1
> +#define VHOST_IOTLB_MSG_V2 0x2
>
> struct vhost_msg {
> int type;
> @@ -74,6 +75,15 @@ struct vhost_msg {
> };
> };
>
> +struct vhost_msg_v2 {
> + int type;
> + __u32 reserved;
> + union {
> + struct vhost_iotlb_msg iotlb;
> + __u8 padding[64];
> + };
> +};
> +
> struct vhost_memory_region {
> __u64 guest_phys_addr;
> __u64 memory_size; /* bytes */
> @@ -160,6 +170,14 @@ struct vhost_memory {
> #define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24, \
> struct vhost_vring_state)
>
> +/* Set or get vhost backend capability */
> +
> +/* Use message type V2 */
> +#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
> +
> +#define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
> +#define VHOST_GET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x26, __u64)
> +
> /* VHOST_NET specific defines */
>
> /* Attach virtio net ring to a raw socket, or tap device.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> --
> 2.7.4
^ permalink raw reply
* [PATCH v3 0/2] virtio-balloon: some improvements
From: Wei Wang @ 2018-08-03 8:32 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, linux-mm, mst, mhocko,
akpm, penguin-kernel
This series is split from the "Virtio-balloon: support free page
reporting" series to make some improvements.
ChangeLog:
v2->v3:
- shrink the balloon pages according to the amount requested by the
claimer, instead of using a user specified number;
v1->v2:
- register the shrinker when VIRTIO_BALLOON_F_DEFLATE_ON_OOM is
negotiated.
Wei Wang (2):
virtio-balloon: remove BUG() in init_vqs
virtio_balloon: replace oom notifier with shrinker
drivers/virtio/virtio_balloon.c | 121 ++++++++++++++++++++++------------------
1 file changed, 67 insertions(+), 54 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH v3 1/2] virtio-balloon: remove BUG() in init_vqs
From: Wei Wang @ 2018-08-03 8:32 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, linux-mm, mst, mhocko,
akpm, penguin-kernel
In-Reply-To: <1533285146-25212-1-git-send-email-wei.w.wang@intel.com>
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
drivers/virtio/virtio_balloon.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3988c09..8100e77 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
num_stats = update_balloon_stats(vb);
sg_init_one(&sg, vb->stats, sizeof(vb->stats[0]) * num_stats);
- if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL)
- < 0)
- BUG();
+ err = virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb,
+ GFP_KERNEL);
+ if (err) {
+ dev_warn(&vb->vdev->dev, "%s: add stat_vq failed\n",
+ __func__);
+ return err;
+ }
virtqueue_kick(vb->stats_vq);
}
return 0;
--
2.7.4
^ permalink raw reply related
* [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-08-03 8:32 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, linux-mm, mst, mhocko,
akpm, penguin-kernel
In-Reply-To: <1533285146-25212-1-git-send-email-wei.w.wang@intel.com>
The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
Drivers with large amuont of reclaimable memory is expected to
release them at an early stage of memory pressure;
- The notifier callback isn't aware of oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314
This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. The balloon pages are
given back to mm adaptively by returning the number of pages that the
reclaimer is asking for (i.e. sc->nr_to_scan).
Currently the max possible value of sc->nr_to_scan passed to the balloon
shrinker is SHRINK_BATCH, which is 128. This is smaller than the
limitation that only VIRTIO_BALLOON_ARRAY_PFNS_MAX (256) pages can be
returned via one invocation of leak_balloon. But this patch still
considers the case that SHRINK_BATCH or shrinker->batch could be changed
to a value larger than VIRTIO_BALLOON_ARRAY_PFNS_MAX, which will need to
do multiple invocations of leak_balloon.
Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been used
to release balloon pages on OOM. We continue to use this feature bit for
the shrinker, so the shrinker is only registered when this feature bit
has been negotiated with host.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
drivers/virtio/virtio_balloon.c | 111 ++++++++++++++++++++++------------------
1 file changed, 60 insertions(+), 51 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8100e77..612a359 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/balloon_compaction.h>
-#include <linux/oom.h>
#include <linux/wait.h>
#include <linux/mm.h>
#include <linux/mount.h>
@@ -40,13 +39,8 @@
*/
#define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
-
#ifdef CONFIG_BALLOON_COMPACTION
static struct vfsmount *balloon_mnt;
#endif
@@ -86,8 +80,8 @@ struct virtio_balloon {
/* Memory statistics */
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
- /* To register callback in oom notifier call chain */
- struct notifier_block nb;
+ /* To register a shrinker to shrink memory upon memory pressure */
+ struct shrinker shrinker;
};
static struct virtio_device_id id_table[] = {
@@ -365,38 +359,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
&actual);
}
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- * memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
- unsigned long dummy, void *parm)
-{
- struct virtio_balloon *vb;
- unsigned long *freed;
- unsigned num_freed_pages;
-
- vb = container_of(self, struct virtio_balloon, nb);
- if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
- return NOTIFY_OK;
-
- freed = parm;
- num_freed_pages = leak_balloon(vb, oom_pages);
- update_balloon_size(vb);
- *freed += num_freed_pages;
-
- return NOTIFY_OK;
-}
-
static void update_balloon_stats_func(struct work_struct *work)
{
struct virtio_balloon *vb;
@@ -550,6 +512,53 @@ static struct file_system_type balloon_fs = {
#endif /* CONFIG_BALLOON_COMPACTION */
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ unsigned long pages_to_free, pages_freed = 0;
+ struct virtio_balloon *vb = container_of(shrinker,
+ struct virtio_balloon, shrinker);
+
+ pages_to_free = sc->nr_to_scan * VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+ /*
+ * One invocation of leak_balloon can deflate at most
+ * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+ * multiple times to deflate pages till reaching pages_to_free.
+ */
+ while (vb->num_pages && pages_to_free) {
+ pages_to_free -= pages_freed;
+ pages_freed += leak_balloon(vb, pages_to_free);
+ }
+ update_balloon_size(vb);
+
+ return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ struct virtio_balloon *vb = container_of(shrinker,
+ struct virtio_balloon, shrinker);
+
+ return vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
+{
+ unregister_shrinker(&vb->shrinker);
+}
+
+static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
+{
+ vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
+ vb->shrinker.count_objects = virtio_balloon_shrinker_count;
+ vb->shrinker.batch = 0;
+ vb->shrinker.seeks = DEFAULT_SEEKS;
+
+ return register_shrinker(&vb->shrinker);
+}
+
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
@@ -582,17 +591,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
- vb->nb.notifier_call = virtballoon_oom_notify;
- vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
- err = register_oom_notifier(&vb->nb);
- if (err < 0)
- goto out_del_vqs;
-
#ifdef CONFIG_BALLOON_COMPACTION
balloon_mnt = kern_mount(&balloon_fs);
if (IS_ERR(balloon_mnt)) {
err = PTR_ERR(balloon_mnt);
- unregister_oom_notifier(&vb->nb);
goto out_del_vqs;
}
@@ -601,13 +603,20 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (IS_ERR(vb->vb_dev_info.inode)) {
err = PTR_ERR(vb->vb_dev_info.inode);
kern_unmount(balloon_mnt);
- unregister_oom_notifier(&vb->nb);
vb->vb_dev_info.inode = NULL;
goto out_del_vqs;
}
vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
#endif
-
+ /*
+ * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
+ * shrinker needs to be registered to relieve memory pressure.
+ */
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
+ err = virtio_balloon_register_shrinker(vb);
+ if (err)
+ goto out_del_vqs;
+ }
virtio_device_ready(vdev);
if (towards_target(vb))
@@ -639,8 +648,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
{
struct virtio_balloon *vb = vdev->priv;
- unregister_oom_notifier(&vb->nb);
-
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+ virtio_balloon_unregister_shrinker(vb);
spin_lock_irq(&vb->stop_update_lock);
vb->stop_update = true;
spin_unlock_irq(&vb->stop_update_lock);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker
From: Tetsuo Handa @ 2018-08-03 12:11 UTC (permalink / raw)
To: Wei Wang, virtio-dev, linux-kernel, virtualization, linux-mm, mst,
mhocko, akpm
In-Reply-To: <1533285146-25212-3-git-send-email-wei.w.wang@intel.com>
On 2018/08/03 17:32, Wei Wang wrote:
> +static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
> +{
> + vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
> + vb->shrinker.count_objects = virtio_balloon_shrinker_count;
> + vb->shrinker.batch = 0;
> + vb->shrinker.seeks = DEFAULT_SEEKS;
Why flags field is not set? If vb is allocated by kmalloc(GFP_KERNEL)
and is nowhere zero-cleared, KASAN would complain it.
> +
> + return register_shrinker(&vb->shrinker);
> +}
^ permalink raw reply
* Re: [PATCH] crypto: virtio: Replace GFP_ATOMIC with GFP_KERNEL in __virtio_crypto_ablkcipher_do_req()
From: Herbert Xu @ 2018-08-03 13:58 UTC (permalink / raw)
To: Jia-Ju Bai; +Cc: mst, linux-kernel, virtualization, linux-crypto, davem
In-Reply-To: <20180723084346.16488-1-baijiaju1990@gmail.com>
On Mon, Jul 23, 2018 at 04:43:46PM +0800, Jia-Ju Bai wrote:
> __virtio_crypto_ablkcipher_do_req() is never called in atomic context.
>
> __virtio_crypto_ablkcipher_do_req() is only called by
> virtio_crypto_ablkcipher_crypt_req(), which is only called by
> virtcrypto_find_vqs() that is never called in atomic context.
>
> __virtio_crypto_ablkcipher_do_req() calls kzalloc_node() with GFP_ATOMIC,
> which is not necessary.
> GFP_ATOMIC can be replaced with GFP_KERNEL.
>
> This is found by a static analysis tool named DCNS written by myself.
> I also manually check the kernel code before reporting it.
>
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-03 15:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy,
david, linuxppc-dev, elfring, haren, Anshuman Khandual
In-Reply-To: <20180803070507.GA1344@infradead.org>
On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > 2- Make virtio use the DMA API with our custom platform-provided
> > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > running on a secure VM in our case.
>
> And total NAK the customer platform-provided part of this. We need
> a flag passed in from the hypervisor that the device needs all bus
> specific dma api treatment, and then just use the normal plaform
> dma mapping setup.
Christoph, as I have explained already, we do NOT have a way to provide
such a flag as neither the hypervisor nor qemu knows anything about
this when the VM is created.
> To get swiotlb you'll need to then use the DT/ACPI
> dma-range property to limit the addressable range, and a swiotlb
> capable plaform will use swiotlb automatically.
This cannot be done as you describe it.
The VM is created as a *normal* VM. The DT stuff is generated by qemu
at a point where it has *no idea* that the VM will later become secure
and thus will have to restrict which pages can be used for "DMA".
The VM will *at runtime* turn itself into a secure VM via interactions
with the security HW and the Ultravisor layer (which sits below the
HV). This happens way after the DT has been created and consumed, the
qemu devices instanciated etc...
Only the guest kernel knows because it initates the transition. When
that happens, the virtio devices have already been used by the guest
firmware, bootloader, possibly another kernel that kexeced the "secure"
one, etc...
So instead of running around saying NAK NAK NAK, please explain how we
can solve that differently.
Ben.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-03 16:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier,
joe, robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <eb1750e90e4bd45da297fa6f78f8ef93671b7c2f.camel@kernel.crashing.org>
On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > 2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> >
> > And total NAK the customer platform-provided part of this. We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup.
>
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.
Well, if your setup is so fucked up I see no way to support it in Linux.
Let's end the discussion right now then.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-03 18:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy,
david, linuxppc-dev, elfring, haren, Anshuman Khandual
In-Reply-To: <20180803160246.GA13794@infradead.org>
On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote:
> On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > > 2- Make virtio use the DMA API with our custom platform-provided
> > > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > > running on a secure VM in our case.
> > >
> > > And total NAK the customer platform-provided part of this. We need
> > > a flag passed in from the hypervisor that the device needs all bus
> > > specific dma api treatment, and then just use the normal plaform
> > > dma mapping setup.
> >
> > Christoph, as I have explained already, we do NOT have a way to provide
> > such a flag as neither the hypervisor nor qemu knows anything about
> > this when the VM is created.
>
> Well, if your setup is so fucked up I see no way to support it in Linux.
>
> Let's end the discussion right now then.
You are saying something along the lines of "I don't like an
instruction in your ISA, let's not support your entire CPU architecture
in Linux".
Our setup is not fucked. It makes a LOT of sense and it's a very
sensible design. It's hitting a problem due to a corner case oddity in
virtio bypassing the MMU, we've worked around such corner cases many
times in the past without any problem, I fail to see what the problem
is here.
We aren't going to cancel years of HW and SW development for our
security infrastructure bcs you don't like a 2 lines hook into virtio
to make things work and aren't willing to even consider the options.
Ben.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-03 19:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram,
virtualization, Christoph Hellwig, paulus, marc.zyngier, joe,
robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <eb1750e90e4bd45da297fa6f78f8ef93671b7c2f.camel@kernel.crashing.org>
On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote:
> On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote:
> > > 2- Make virtio use the DMA API with our custom platform-provided
> > > swiotlb callbacks when needed, that is when not using IOMMU *and*
> > > running on a secure VM in our case.
> >
> > And total NAK the customer platform-provided part of this. We need
> > a flag passed in from the hypervisor that the device needs all bus
> > specific dma api treatment, and then just use the normal plaform
> > dma mapping setup.
>
> Christoph, as I have explained already, we do NOT have a way to provide
> such a flag as neither the hypervisor nor qemu knows anything about
> this when the VM is created.
I think the fact you can't add flags from the hypervisor is
a sign of a problematic architecture, you should look at
adding that down the road - you will likely need it at some point.
However in this specific case, the flag does not need to come from the
hypervisor, it can be set by arch boot code I think.
Christoph do you see a problem with that?
> > To get swiotlb you'll need to then use the DT/ACPI
> > dma-range property to limit the addressable range, and a swiotlb
> > capable plaform will use swiotlb automatically.
>
> This cannot be done as you describe it.
>
> The VM is created as a *normal* VM. The DT stuff is generated by qemu
> at a point where it has *no idea* that the VM will later become secure
> and thus will have to restrict which pages can be used for "DMA".
>
> The VM will *at runtime* turn itself into a secure VM via interactions
> with the security HW and the Ultravisor layer (which sits below the
> HV). This happens way after the DT has been created and consumed, the
> qemu devices instanciated etc...
>
> Only the guest kernel knows because it initates the transition. When
> that happens, the virtio devices have already been used by the guest
> firmware, bootloader, possibly another kernel that kexeced the "secure"
> one, etc...
>
> So instead of running around saying NAK NAK NAK, please explain how we
> can solve that differently.
>
> Ben.
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox