* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] <1393615016-9187-1-git-send-email-zoltan.kiss@citrix.com> @ 2014-03-06 17:09 ` Zoltan Kiss 2014-03-07 4:46 ` Pravin Shelar 0 siblings, 1 reply; 8+ messages in thread From: Zoltan Kiss @ 2014-03-06 17:09 UTC (permalink / raw) To: Jesse Gross, pshelar, tgraf, dev; +Cc: xen-devel, netdev, linux-kernel, kvm Do you have any feedback on this? I'm also adding KVM list as they might be interested in this. Zoli On 28/02/14 19:16, Zoltan Kiss wrote: > The kernel datapath now switched to zerocopy Netlink messages, but that also > means that the pages on frags array are sent straight to userspace. If those > pages came outside the kernel, we have to swap them out with local copies. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > --- > net/openvswitch/datapath.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c > index 36f8872..ffb563c 100644 > --- a/net/openvswitch/datapath.c > +++ b/net/openvswitch/datapath.c > @@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, > } > nla->nla_len = nla_attr_size(skb->len); > > + if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) { > + err = -ENOMEM; > + skb_tx_error(skb); > + goto out; > + } > + > skb_zerocopy(user_skb, skb, skb->len, hlen); > > /* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */ > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall 2014-03-06 17:09 ` [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss @ 2014-03-07 4:46 ` Pravin Shelar 2014-03-07 12:29 ` Zoltan Kiss [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 2 replies; 8+ messages in thread From: Pravin Shelar @ 2014-03-07 4:46 UTC (permalink / raw) To: Zoltan Kiss Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev, LKML, kvm On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote: > Do you have any feedback on this? I'm also adding KVM list as they might be > interested in this. > > Zoli > > > On 28/02/14 19:16, Zoltan Kiss wrote: >> >> The kernel datapath now switched to zerocopy Netlink messages, but that >> also >> means that the pages on frags array are sent straight to userspace. If >> those >> pages came outside the kernel, we have to swap them out with local copies. >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> I do not think this is required, netlink zero copy only maps pre-allocated buffers to user-space. But I found bug in datapath user-space queue code. I am not sure how this can work with skb fragments and MMAP-netlink socket. Here is what happens, OVS allocates netlink skb and adds fragments to skb using skb_zero_copy(), then calls genlmsg_unicast(). But if netlink sock is mmped then netlink-send queues netlink allocated skb->head (linear data of skb) and ignore skb frags. Currently this is not problem with OVS vswitchd since it does not use netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, it can break it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall 2014-03-07 4:46 ` Pravin Shelar @ 2014-03-07 12:29 ` Zoltan Kiss [not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Zoltan Kiss @ 2014-03-07 12:29 UTC (permalink / raw) To: Pravin Shelar Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev, LKML, kvm On 07/03/14 04:46, Pravin Shelar wrote: > On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote: >> Do you have any feedback on this? I'm also adding KVM list as they might be >> interested in this. >> >> Zoli >> >> >> On 28/02/14 19:16, Zoltan Kiss wrote: >>> >>> The kernel datapath now switched to zerocopy Netlink messages, but that >>> also >>> means that the pages on frags array are sent straight to userspace. If >>> those >>> pages came outside the kernel, we have to swap them out with local copies. >>> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > I do not think this is required, netlink zero copy only maps > pre-allocated buffers to user-space. How do you mean "pre-allocated"? By who? As far as I've seen the skb in this function came straight from the device (vif in our case), and skb_zerocopy just copy the frags to user_skb, which is sent to the userspace. Those frags contain pages from guest, and it's a bad idea to pass them to userspace: e.g if userspace dies in the meantime, what happens with them? Also, in Xen's case they are actually not mapped to userspace, so accessing them can lead to garbage. Zoli ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 17:38 ` Pravin Shelar 0 siblings, 0 replies; 8+ messages in thread From: Pravin Shelar @ 2014-03-07 17:38 UTC (permalink / raw) To: Zoltan Kiss Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 4:29 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote: > On 07/03/14 04:46, Pravin Shelar wrote: >> >> On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> >> wrote: >>> >>> Do you have any feedback on this? I'm also adding KVM list as they might >>> be >>> interested in this. >>> >>> Zoli >>> >>> >>> On 28/02/14 19:16, Zoltan Kiss wrote: >>>> >>>> >>>> The kernel datapath now switched to zerocopy Netlink messages, but that >>>> also >>>> means that the pages on frags array are sent straight to userspace. If >>>> those >>>> pages came outside the kernel, we have to swap them out with local >>>> copies. >>>> >>>> Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> >> >> >> I do not think this is required, netlink zero copy only maps >> pre-allocated buffers to user-space. > > How do you mean "pre-allocated"? By who? > by netlink, ref. netlink_alloc_skb(). > As far as I've seen the skb in this function came straight from the device > (vif in our case), and skb_zerocopy just copy the frags to user_skb, which > is sent to the userspace. Those frags contain pages from guest, and it's a > bad idea to pass them to userspace: e.g if userspace dies in the meantime, > what happens with them? Also, in Xen's case they are actually not mapped to > userspace, so accessing them can lead to garbage. > AFAIK Netlink does not handle skb-frags for zero-copy case. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-07 15:58 ` Thomas Graf [not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Thomas Graf @ 2014-03-07 15:58 UTC (permalink / raw) To: Pravin Shelar, Zoltan Kiss Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On 03/07/2014 05:46 AM, Pravin Shelar wrote: > But I found bug in datapath user-space queue code. I am not sure how > this can work with skb fragments and MMAP-netlink socket. > Here is what happens, OVS allocates netlink skb and adds fragments to > skb using skb_zero_copy(), then calls genlmsg_unicast(). > But if netlink sock is mmped then netlink-send queues netlink > allocated skb->head (linear data of skb) and ignore skb frags. > > Currently this is not problem with OVS vswitchd since it does not use > netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, > it can break it. The secret is out ;-) I was very surprised too when I noticed that it worked. It's not just OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is setup with a giant tailroom in netlink_ring_setup_skb(): skb->end = skb->tail + size; and skb_zerocopy() will consume whatever tailroom is available first: /* dont bother with small payloads */ if (len <= skb_tailroom(to)) { skb_copy_bits(from, 0, skb_put(to, len), len); return; } I was planning to fix this while adding GSO support to the upcall as that is the moment when this bug would really surface. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 17:19 ` Pravin Shelar [not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Pravin Shelar @ 2014-03-07 17:19 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 03/07/2014 05:46 AM, Pravin Shelar wrote: >> >> But I found bug in datapath user-space queue code. I am not sure how >> this can work with skb fragments and MMAP-netlink socket. >> Here is what happens, OVS allocates netlink skb and adds fragments to >> skb using skb_zero_copy(), then calls genlmsg_unicast(). >> But if netlink sock is mmped then netlink-send queues netlink >> allocated skb->head (linear data of skb) and ignore skb frags. >> >> Currently this is not problem with OVS vswitchd since it does not use >> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >> it can break it. > > > The secret is out ;-) > > I was very surprised too when I noticed that it worked. It's not just > OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is > setup with a giant tailroom in netlink_ring_setup_skb(): > > skb->end = skb->tail + size; > For OVS use-case, the size is linear part of skb. so I think for mmap-netlink socket it will fail. > and skb_zerocopy() will consume whatever tailroom is available first: > > /* dont bother with small payloads */ > if (len <= skb_tailroom(to)) { > skb_copy_bits(from, 0, skb_put(to, len), len); > return; > } > > I was planning to fix this while adding GSO support to the upcall as > that is the moment when this bug would really surface. ok. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-07 18:05 ` Thomas Graf [not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Thomas Graf @ 2014-03-07 18:05 UTC (permalink / raw) To: Pravin Shelar Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On 03/07/2014 06:19 PM, Pravin Shelar wrote: > On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> On 03/07/2014 05:46 AM, Pravin Shelar wrote: >>> >>> But I found bug in datapath user-space queue code. I am not sure how >>> this can work with skb fragments and MMAP-netlink socket. >>> Here is what happens, OVS allocates netlink skb and adds fragments to >>> skb using skb_zero_copy(), then calls genlmsg_unicast(). >>> But if netlink sock is mmped then netlink-send queues netlink >>> allocated skb->head (linear data of skb) and ignore skb frags. >>> >>> Currently this is not problem with OVS vswitchd since it does not use >>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >>> it can break it. >> >> >> The secret is out ;-) >> >> I was very surprised too when I noticed that it worked. It's not just >> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is >> setup with a giant tailroom in netlink_ring_setup_skb(): >> >> skb->end = skb->tail + size; >> > For OVS use-case, the size is linear part of skb. so I think for > mmap-netlink socket it will fail. Could you rephrase? I'm not sure I understand correctly. The tailroom size equals to the configured frame payload size of the ring buffer. So as long as the frame size chosen is large enough to hold whatever pieces comes out of skb_gso_segment() we are fine. That said, I agree that we should fix this properly before we enable mmap on the OVS user space side. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 18:43 ` Pravin Shelar 0 siblings, 0 replies; 8+ messages in thread From: Pravin Shelar @ 2014-03-07 18:43 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 10:05 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 03/07/2014 06:19 PM, Pravin Shelar wrote: >> >> On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >>> >>> On 03/07/2014 05:46 AM, Pravin Shelar wrote: >>>> >>>> >>>> But I found bug in datapath user-space queue code. I am not sure how >>>> this can work with skb fragments and MMAP-netlink socket. >>>> Here is what happens, OVS allocates netlink skb and adds fragments to >>>> skb using skb_zero_copy(), then calls genlmsg_unicast(). >>>> But if netlink sock is mmped then netlink-send queues netlink >>>> allocated skb->head (linear data of skb) and ignore skb frags. >>>> >>>> Currently this is not problem with OVS vswitchd since it does not use >>>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >>>> it can break it. >>> >>> >>> >>> The secret is out ;-) >>> >>> I was very surprised too when I noticed that it worked. It's not just >>> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is >>> setup with a giant tailroom in netlink_ring_setup_skb(): >>> >>> skb->end = skb->tail + size; >>> >> For OVS use-case, the size is linear part of skb. so I think for >> mmap-netlink socket it will fail. > > > Could you rephrase? I'm not sure I understand correctly. > I meant OVS (queue_userspace_packet) passes skb_zerocopy_headlen() to genlmsg_new_unicast(). > The tailroom size equals to the configured frame payload size of > the ring buffer. So as long as the frame size chosen is large > enough to hold whatever pieces comes out of skb_gso_segment() we are > fine. That said, I agree that we should fix this properly before we > enable mmap on the OVS user space side. I see. So this work assuming that userspace sets nl_mmap_req.nm_frame_size large enough. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-07 18:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1393615016-9187-1-git-send-email-zoltan.kiss@citrix.com>
2014-03-06 17:09 ` [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss
2014-03-07 4:46 ` Pravin Shelar
2014-03-07 12:29 ` Zoltan Kiss
[not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:38 ` Pravin Shelar
[not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 15:58 ` Thomas Graf
[not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:19 ` Pravin Shelar
[not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 18:05 ` Thomas Graf
[not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 18:43 ` Pravin Shelar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox