public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found] <1393615016-9187-1-git-send-email-zoltan.kiss@citrix.com>
@ 2014-03-06 17:09 ` Zoltan Kiss
  2014-03-07  4:46   ` Pravin Shelar
  0 siblings, 1 reply; 8+ messages in thread
From: Zoltan Kiss @ 2014-03-06 17:09 UTC (permalink / raw)
  To: Jesse Gross, pshelar, tgraf, dev; +Cc: xen-devel, netdev, linux-kernel, kvm

Do you have any feedback on this? I'm also adding KVM list as they might 
be interested in this.

Zoli

On 28/02/14 19:16, Zoltan Kiss wrote:
> The kernel datapath now switched to zerocopy Netlink messages, but that also
> means that the pages on frags array are sent straight to userspace. If those
> pages came outside the kernel, we have to swap them out with local copies.
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>   net/openvswitch/datapath.c |    6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 36f8872..ffb563c 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>   	}
>   	nla->nla_len = nla_attr_size(skb->len);
>
> +	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) {
> +		err = -ENOMEM;
> +		skb_tx_error(skb);
> +		goto out;
> +	}
> +
>   	skb_zerocopy(user_skb, skb, skb->len, hlen);
>
>   	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
  2014-03-06 17:09 ` [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss
@ 2014-03-07  4:46   ` Pravin Shelar
  2014-03-07 12:29     ` Zoltan Kiss
       [not found]     ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Pravin Shelar @ 2014-03-07  4:46 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev,
	LKML, kvm

On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
> Do you have any feedback on this? I'm also adding KVM list as they might be
> interested in this.
>
> Zoli
>
>
> On 28/02/14 19:16, Zoltan Kiss wrote:
>>
>> The kernel datapath now switched to zerocopy Netlink messages, but that
>> also
>> means that the pages on frags array are sent straight to userspace. If
>> those
>> pages came outside the kernel, we have to swap them out with local copies.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

I do not think this is required, netlink zero copy only maps
pre-allocated buffers to user-space.

But I found bug in datapath user-space queue code. I am not sure how
this can work with skb fragments and MMAP-netlink socket.
Here is what happens, OVS allocates netlink skb and adds fragments to
skb using skb_zero_copy(), then calls genlmsg_unicast().
But if netlink sock is mmped then netlink-send queues netlink
allocated skb->head (linear data of skb) and ignore skb frags.

Currently this is not problem with OVS vswitchd since it does not use
netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
it can break it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
  2014-03-07  4:46   ` Pravin Shelar
@ 2014-03-07 12:29     ` Zoltan Kiss
       [not found]       ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
       [not found]     ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Zoltan Kiss @ 2014-03-07 12:29 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev,
	LKML, kvm

On 07/03/14 04:46, Pravin Shelar wrote:
> On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> Do you have any feedback on this? I'm also adding KVM list as they might be
>> interested in this.
>>
>> Zoli
>>
>>
>> On 28/02/14 19:16, Zoltan Kiss wrote:
>>>
>>> The kernel datapath now switched to zerocopy Netlink messages, but that
>>> also
>>> means that the pages on frags array are sent straight to userspace. If
>>> those
>>> pages came outside the kernel, we have to swap them out with local copies.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>
> I do not think this is required, netlink zero copy only maps
> pre-allocated buffers to user-space.
How do you mean "pre-allocated"? By who?

As far as I've seen the skb in this function came straight from the 
device (vif in our case), and skb_zerocopy just copy the frags to 
user_skb, which is sent to the userspace. Those frags contain pages from 
guest, and it's a bad idea to pass them to userspace: e.g if userspace 
dies in the meantime, what happens with them? Also, in Xen's case they 
are actually not mapped to userspace, so accessing them can lead to garbage.

Zoli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found]     ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-07 15:58       ` Thomas Graf
       [not found]         ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2014-03-07 15:58 UTC (permalink / raw)
  To: Pravin Shelar, Zoltan Kiss
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b

On 03/07/2014 05:46 AM, Pravin Shelar wrote:
> But I found bug in datapath user-space queue code. I am not sure how
> this can work with skb fragments and MMAP-netlink socket.
> Here is what happens, OVS allocates netlink skb and adds fragments to
> skb using skb_zero_copy(), then calls genlmsg_unicast().
> But if netlink sock is mmped then netlink-send queues netlink
> allocated skb->head (linear data of skb) and ignore skb frags.
>
> Currently this is not problem with OVS vswitchd since it does not use
> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
> it can break it.

The secret is out ;-)

I was very surprised too when I noticed that it worked. It's not just
OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is
setup with a giant tailroom in netlink_ring_setup_skb():

	skb->end	= skb->tail + size;

and skb_zerocopy() will consume whatever tailroom is available first:

	/* dont bother with small payloads */
	if (len <= skb_tailroom(to)) {
		skb_copy_bits(from, 0, skb_put(to, len), len);
		return;
	}

I was planning to fix this while adding GSO support to the upcall as
that is the moment when this bug would really surface.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found]         ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-07 17:19           ` Pravin Shelar
       [not found]             ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Pravin Shelar @ 2014-03-07 17:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b

On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 03/07/2014 05:46 AM, Pravin Shelar wrote:
>>
>> But I found bug in datapath user-space queue code. I am not sure how
>> this can work with skb fragments and MMAP-netlink socket.
>> Here is what happens, OVS allocates netlink skb and adds fragments to
>> skb using skb_zero_copy(), then calls genlmsg_unicast().
>> But if netlink sock is mmped then netlink-send queues netlink
>> allocated skb->head (linear data of skb) and ignore skb frags.
>>
>> Currently this is not problem with OVS vswitchd since it does not use
>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
>> it can break it.
>
>
> The secret is out ;-)
>
> I was very surprised too when I noticed that it worked. It's not just
> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is
> setup with a giant tailroom in netlink_ring_setup_skb():
>
>         skb->end        = skb->tail + size;
>
For OVS use-case, the size is linear part of skb. so I think for
mmap-netlink socket it will fail.

> and skb_zerocopy() will consume whatever tailroom is available first:
>
>         /* dont bother with small payloads */
>         if (len <= skb_tailroom(to)) {
>                 skb_copy_bits(from, 0, skb_put(to, len), len);
>                 return;
>         }
>
> I was planning to fix this while adding GSO support to the upcall as
> that is the moment when this bug would really surface.

ok.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found]       ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
@ 2014-03-07 17:38         ` Pravin Shelar
  0 siblings, 0 replies; 8+ messages in thread
From: Pravin Shelar @ 2014-03-07 17:38 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b

On Fri, Mar 7, 2014 at 4:29 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
> On 07/03/14 04:46, Pravin Shelar wrote:
>>
>> On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
>> wrote:
>>>
>>> Do you have any feedback on this? I'm also adding KVM list as they might
>>> be
>>> interested in this.
>>>
>>> Zoli
>>>
>>>
>>> On 28/02/14 19:16, Zoltan Kiss wrote:
>>>>
>>>>
>>>> The kernel datapath now switched to zerocopy Netlink messages, but that
>>>> also
>>>> means that the pages on frags array are sent straight to userspace. If
>>>> those
>>>> pages came outside the kernel, we have to swap them out with local
>>>> copies.
>>>>
>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
>>
>>
>> I do not think this is required, netlink zero copy only maps
>> pre-allocated buffers to user-space.
>
> How do you mean "pre-allocated"? By who?
>
by netlink, ref. netlink_alloc_skb().

> As far as I've seen the skb in this function came straight from the device
> (vif in our case), and skb_zerocopy just copy the frags to user_skb, which
> is sent to the userspace. Those frags contain pages from guest, and it's a
> bad idea to pass them to userspace: e.g if userspace dies in the meantime,
> what happens with them? Also, in Xen's case they are actually not mapped to
> userspace, so accessing them can lead to garbage.
>
AFAIK Netlink does not handle skb-frags for zero-copy case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found]             ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-07 18:05               ` Thomas Graf
       [not found]                 ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2014-03-07 18:05 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b

On 03/07/2014 06:19 PM, Pravin Shelar wrote:
> On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 03/07/2014 05:46 AM, Pravin Shelar wrote:
>>>
>>> But I found bug in datapath user-space queue code. I am not sure how
>>> this can work with skb fragments and MMAP-netlink socket.
>>> Here is what happens, OVS allocates netlink skb and adds fragments to
>>> skb using skb_zero_copy(), then calls genlmsg_unicast().
>>> But if netlink sock is mmped then netlink-send queues netlink
>>> allocated skb->head (linear data of skb) and ignore skb frags.
>>>
>>> Currently this is not problem with OVS vswitchd since it does not use
>>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
>>> it can break it.
>>
>>
>> The secret is out ;-)
>>
>> I was very surprised too when I noticed that it worked. It's not just
>> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is
>> setup with a giant tailroom in netlink_ring_setup_skb():
>>
>>          skb->end        = skb->tail + size;
>>
> For OVS use-case, the size is linear part of skb. so I think for
> mmap-netlink socket it will fail.

Could you rephrase? I'm not sure I understand correctly.

The tailroom size equals to the configured frame payload size of
the ring buffer. So as long as the frame size chosen is large
enough to hold whatever pieces comes out of skb_gso_segment() we are
fine. That said, I agree that we should fix this properly before we
enable mmap on the OVS user space side.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
       [not found]                 ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-07 18:43                   ` Pravin Shelar
  0 siblings, 0 replies; 8+ messages in thread
From: Pravin Shelar @ 2014-03-07 18:43 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b

On Fri, Mar 7, 2014 at 10:05 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 03/07/2014 06:19 PM, Pravin Shelar wrote:
>>
>> On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>> On 03/07/2014 05:46 AM, Pravin Shelar wrote:
>>>>
>>>>
>>>> But I found bug in datapath user-space queue code. I am not sure how
>>>> this can work with skb fragments and MMAP-netlink socket.
>>>> Here is what happens, OVS allocates netlink skb and adds fragments to
>>>> skb using skb_zero_copy(), then calls genlmsg_unicast().
>>>> But if netlink sock is mmped then netlink-send queues netlink
>>>> allocated skb->head (linear data of skb) and ignore skb frags.
>>>>
>>>> Currently this is not problem with OVS vswitchd since it does not use
>>>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
>>>> it can break it.
>>>
>>>
>>>
>>> The secret is out ;-)
>>>
>>> I was very surprised too when I noticed that it worked. It's not just
>>> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is
>>> setup with a giant tailroom in netlink_ring_setup_skb():
>>>
>>>          skb->end        = skb->tail + size;
>>>
>> For OVS use-case, the size is linear part of skb. so I think for
>> mmap-netlink socket it will fail.
>
>
> Could you rephrase? I'm not sure I understand correctly.
>
I meant OVS (queue_userspace_packet) passes skb_zerocopy_headlen() to
genlmsg_new_unicast().

> The tailroom size equals to the configured frame payload size of
> the ring buffer. So as long as the frame size chosen is large
> enough to hold whatever pieces comes out of skb_gso_segment() we are
> fine. That said, I agree that we should fix this properly before we
> enable mmap on the OVS user space side.

I see. So this work assuming that userspace sets
nl_mmap_req.nm_frame_size large enough.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-03-07 18:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1393615016-9187-1-git-send-email-zoltan.kiss@citrix.com>
2014-03-06 17:09 ` [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss
2014-03-07  4:46   ` Pravin Shelar
2014-03-07 12:29     ` Zoltan Kiss
     [not found]       ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:38         ` Pravin Shelar
     [not found]     ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 15:58       ` Thomas Graf
     [not found]         ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:19           ` Pravin Shelar
     [not found]             ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 18:05               ` Thomas Graf
     [not found]                 ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 18:43                   ` Pravin Shelar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox