* [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
@ 2014-02-28 19:16 Zoltan Kiss
2014-03-06 17:09 ` Zoltan Kiss
[not found] ` <1393615016-9187-1-git-send-email-zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
0 siblings, 2 replies; 15+ messages in thread
From: Zoltan Kiss @ 2014-02-28 19:16 UTC (permalink / raw)
To: Jesse Gross, pshelar-l0M0P4e3n4LQT0dZR+AlfA,
tgraf-H+wXaHxf7aLQT0dZR+AlfA, dev-yBygre7rU0TnMu66kgdUjQ
Cc: xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
The kernel datapath now switched to zerocopy Netlink messages, but that also
means that the pages on frags array are sent straight to userspace. If those
pages came outside the kernel, we have to swap them out with local copies.
Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
---
net/openvswitch/datapath.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 36f8872..ffb563c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
}
nla->nla_len = nla_attr_size(skb->len);
+ if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) {
+ err = -ENOMEM;
+ skb_tx_error(skb);
+ goto out;
+ }
+
skb_zerocopy(user_skb, skb, skb->len, hlen);
/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall 2014-02-28 19:16 [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss @ 2014-03-06 17:09 ` Zoltan Kiss 2014-03-07 4:46 ` Pravin Shelar [not found] ` <1393615016-9187-1-git-send-email-zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Zoltan Kiss @ 2014-03-06 17:09 UTC (permalink / raw) To: Jesse Gross, pshelar, tgraf, dev; +Cc: xen-devel, netdev, linux-kernel, kvm Do you have any feedback on this? I'm also adding KVM list as they might be interested in this. Zoli On 28/02/14 19:16, Zoltan Kiss wrote: > The kernel datapath now switched to zerocopy Netlink messages, but that also > means that the pages on frags array are sent straight to userspace. If those > pages came outside the kernel, we have to swap them out with local copies. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > --- > net/openvswitch/datapath.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c > index 36f8872..ffb563c 100644 > --- a/net/openvswitch/datapath.c > +++ b/net/openvswitch/datapath.c > @@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, > } > nla->nla_len = nla_attr_size(skb->len); > > + if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) { > + err = -ENOMEM; > + skb_tx_error(skb); > + goto out; > + } > + > skb_zerocopy(user_skb, skb, skb->len, hlen); > > /* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */ > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall 2014-03-06 17:09 ` Zoltan Kiss @ 2014-03-07 4:46 ` Pravin Shelar 2014-03-07 12:29 ` Zoltan Kiss [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 2 replies; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 4:46 UTC (permalink / raw) To: Zoltan Kiss Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev, LKML, kvm On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote: > Do you have any feedback on this? I'm also adding KVM list as they might be > interested in this. > > Zoli > > > On 28/02/14 19:16, Zoltan Kiss wrote: >> >> The kernel datapath now switched to zerocopy Netlink messages, but that >> also >> means that the pages on frags array are sent straight to userspace. If >> those >> pages came outside the kernel, we have to swap them out with local copies. >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> I do not think this is required, netlink zero copy only maps pre-allocated buffers to user-space. But I found bug in datapath user-space queue code. I am not sure how this can work with skb fragments and MMAP-netlink socket. Here is what happens, OVS allocates netlink skb and adds fragments to skb using skb_zero_copy(), then calls genlmsg_unicast(). But if netlink sock is mmped then netlink-send queues netlink allocated skb->head (linear data of skb) and ignore skb frags. Currently this is not problem with OVS vswitchd since it does not use netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, it can break it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall 2014-03-07 4:46 ` Pravin Shelar @ 2014-03-07 12:29 ` Zoltan Kiss [not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Zoltan Kiss @ 2014-03-07 12:29 UTC (permalink / raw) To: Pravin Shelar Cc: Jesse Gross, Thomas Graf, dev@openvswitch.org, xen-devel, netdev, LKML, kvm On 07/03/14 04:46, Pravin Shelar wrote: > On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote: >> Do you have any feedback on this? I'm also adding KVM list as they might be >> interested in this. >> >> Zoli >> >> >> On 28/02/14 19:16, Zoltan Kiss wrote: >>> >>> The kernel datapath now switched to zerocopy Netlink messages, but that >>> also >>> means that the pages on frags array are sent straight to userspace. If >>> those >>> pages came outside the kernel, we have to swap them out with local copies. >>> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > I do not think this is required, netlink zero copy only maps > pre-allocated buffers to user-space. How do you mean "pre-allocated"? By who? As far as I've seen the skb in this function came straight from the device (vif in our case), and skb_zerocopy just copy the frags to user_skb, which is sent to the userspace. Those frags contain pages from guest, and it's a bad idea to pass them to userspace: e.g if userspace dies in the meantime, what happens with them? Also, in Xen's case they are actually not mapped to userspace, so accessing them can lead to garbage. Zoli ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 17:38 ` Pravin Shelar 0 siblings, 0 replies; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 17:38 UTC (permalink / raw) To: Zoltan Kiss Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 4:29 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote: > On 07/03/14 04:46, Pravin Shelar wrote: >> >> On Thu, Mar 6, 2014 at 9:09 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> >> wrote: >>> >>> Do you have any feedback on this? I'm also adding KVM list as they might >>> be >>> interested in this. >>> >>> Zoli >>> >>> >>> On 28/02/14 19:16, Zoltan Kiss wrote: >>>> >>>> >>>> The kernel datapath now switched to zerocopy Netlink messages, but that >>>> also >>>> means that the pages on frags array are sent straight to userspace. If >>>> those >>>> pages came outside the kernel, we have to swap them out with local >>>> copies. >>>> >>>> Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> >> >> >> I do not think this is required, netlink zero copy only maps >> pre-allocated buffers to user-space. > > How do you mean "pre-allocated"? By who? > by netlink, ref. netlink_alloc_skb(). > As far as I've seen the skb in this function came straight from the device > (vif in our case), and skb_zerocopy just copy the frags to user_skb, which > is sent to the userspace. Those frags contain pages from guest, and it's a > bad idea to pass them to userspace: e.g if userspace dies in the meantime, > what happens with them? Also, in Xen's case they are actually not mapped to > userspace, so accessing them can lead to garbage. > AFAIK Netlink does not handle skb-frags for zero-copy case. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-07 15:58 ` Thomas Graf [not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Thomas Graf @ 2014-03-07 15:58 UTC (permalink / raw) To: Pravin Shelar, Zoltan Kiss Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On 03/07/2014 05:46 AM, Pravin Shelar wrote: > But I found bug in datapath user-space queue code. I am not sure how > this can work with skb fragments and MMAP-netlink socket. > Here is what happens, OVS allocates netlink skb and adds fragments to > skb using skb_zero_copy(), then calls genlmsg_unicast(). > But if netlink sock is mmped then netlink-send queues netlink > allocated skb->head (linear data of skb) and ignore skb frags. > > Currently this is not problem with OVS vswitchd since it does not use > netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, > it can break it. The secret is out ;-) I was very surprised too when I noticed that it worked. It's not just OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is setup with a giant tailroom in netlink_ring_setup_skb(): skb->end = skb->tail + size; and skb_zerocopy() will consume whatever tailroom is available first: /* dont bother with small payloads */ if (len <= skb_tailroom(to)) { skb_copy_bits(from, 0, skb_put(to, len), len); return; } I was planning to fix this while adding GSO support to the upcall as that is the moment when this bug would really surface. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 17:19 ` Pravin Shelar [not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 17:19 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 03/07/2014 05:46 AM, Pravin Shelar wrote: >> >> But I found bug in datapath user-space queue code. I am not sure how >> this can work with skb fragments and MMAP-netlink socket. >> Here is what happens, OVS allocates netlink skb and adds fragments to >> skb using skb_zero_copy(), then calls genlmsg_unicast(). >> But if netlink sock is mmped then netlink-send queues netlink >> allocated skb->head (linear data of skb) and ignore skb frags. >> >> Currently this is not problem with OVS vswitchd since it does not use >> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >> it can break it. > > > The secret is out ;-) > > I was very surprised too when I noticed that it worked. It's not just > OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is > setup with a giant tailroom in netlink_ring_setup_skb(): > > skb->end = skb->tail + size; > For OVS use-case, the size is linear part of skb. so I think for mmap-netlink socket it will fail. > and skb_zerocopy() will consume whatever tailroom is available first: > > /* dont bother with small payloads */ > if (len <= skb_tailroom(to)) { > skb_copy_bits(from, 0, skb_put(to, len), len); > return; > } > > I was planning to fix this while adding GSO support to the upcall as > that is the moment when this bug would really surface. ok. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-07 18:05 ` Thomas Graf [not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Thomas Graf @ 2014-03-07 18:05 UTC (permalink / raw) To: Pravin Shelar Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On 03/07/2014 06:19 PM, Pravin Shelar wrote: > On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> On 03/07/2014 05:46 AM, Pravin Shelar wrote: >>> >>> But I found bug in datapath user-space queue code. I am not sure how >>> this can work with skb fragments and MMAP-netlink socket. >>> Here is what happens, OVS allocates netlink skb and adds fragments to >>> skb using skb_zero_copy(), then calls genlmsg_unicast(). >>> But if netlink sock is mmped then netlink-send queues netlink >>> allocated skb->head (linear data of skb) and ignore skb frags. >>> >>> Currently this is not problem with OVS vswitchd since it does not use >>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >>> it can break it. >> >> >> The secret is out ;-) >> >> I was very surprised too when I noticed that it worked. It's not just >> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is >> setup with a giant tailroom in netlink_ring_setup_skb(): >> >> skb->end = skb->tail + size; >> > For OVS use-case, the size is linear part of skb. so I think for > mmap-netlink socket it will fail. Could you rephrase? I'm not sure I understand correctly. The tailroom size equals to the configured frame payload size of the ring buffer. So as long as the frame size chosen is large enough to hold whatever pieces comes out of skb_gso_segment() we are fine. That said, I agree that we should fix this properly before we enable mmap on the OVS user space side. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 18:43 ` Pravin Shelar 0 siblings, 0 replies; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 18:43 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 10:05 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 03/07/2014 06:19 PM, Pravin Shelar wrote: >> >> On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >>> >>> On 03/07/2014 05:46 AM, Pravin Shelar wrote: >>>> >>>> >>>> But I found bug in datapath user-space queue code. I am not sure how >>>> this can work with skb fragments and MMAP-netlink socket. >>>> Here is what happens, OVS allocates netlink skb and adds fragments to >>>> skb using skb_zero_copy(), then calls genlmsg_unicast(). >>>> But if netlink sock is mmped then netlink-send queues netlink >>>> allocated skb->head (linear data of skb) and ignore skb frags. >>>> >>>> Currently this is not problem with OVS vswitchd since it does not use >>>> netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket, >>>> it can break it. >>> >>> >>> >>> The secret is out ;-) >>> >>> I was very surprised too when I noticed that it worked. It's not just >>> OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is >>> setup with a giant tailroom in netlink_ring_setup_skb(): >>> >>> skb->end = skb->tail + size; >>> >> For OVS use-case, the size is linear part of skb. so I think for >> mmap-netlink socket it will fail. > > > Could you rephrase? I'm not sure I understand correctly. > I meant OVS (queue_userspace_packet) passes skb_zerocopy_headlen() to genlmsg_new_unicast(). > The tailroom size equals to the configured frame payload size of > the ring buffer. So as long as the frame size chosen is large > enough to hold whatever pieces comes out of skb_gso_segment() we are > fine. That said, I agree that we should fix this properly before we > enable mmap on the OVS user space side. I see. So this work assuming that userspace sets nl_mmap_req.nm_frame_size large enough. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <1393615016-9187-1-git-send-email-zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <1393615016-9187-1-git-send-email-zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 16:23 ` Thomas Graf [not found] ` <5319F272.1070101-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Thomas Graf @ 2014-03-07 16:23 UTC (permalink / raw) To: Zoltan Kiss, Jesse Gross, pshelar-l0M0P4e3n4LQT0dZR+AlfA, dev-yBygre7rU0TnMu66kgdUjQ Cc: xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, linux-kernel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA On 02/28/2014 08:16 PM, Zoltan Kiss wrote: > The kernel datapath now switched to zerocopy Netlink messages, but that also > means that the pages on frags array are sent straight to userspace. If those > pages came outside the kernel, we have to swap them out with local copies. > > Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> > --- > net/openvswitch/datapath.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c > index 36f8872..ffb563c 100644 > --- a/net/openvswitch/datapath.c > +++ b/net/openvswitch/datapath.c > @@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, > } > nla->nla_len = nla_attr_size(skb->len); > > + if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) { > + err = -ENOMEM; > + skb_tx_error(skb); > + goto out; > + } > + > skb_zerocopy(user_skb, skb, skb->len, hlen); > > /* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */ How about we move the call to skb_orphan_frags() into skb_zerocopy() itself and call it before we actually reference the frags? It's not just OVS affected but nfqueue as well. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <5319F272.1070101-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <5319F272.1070101-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 17:28 ` Pravin Shelar [not found] ` <CALnjE+q=fejHPsjVj9+jnypJCUHfCDFc553U48WR48Tjcf3FZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 17:28 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 8:23 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 02/28/2014 08:16 PM, Zoltan Kiss wrote: >> >> The kernel datapath now switched to zerocopy Netlink messages, but that >> also >> means that the pages on frags array are sent straight to userspace. If >> those >> pages came outside the kernel, we have to swap them out with local copies. >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> >> --- >> net/openvswitch/datapath.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c >> index 36f8872..ffb563c 100644 >> --- a/net/openvswitch/datapath.c >> +++ b/net/openvswitch/datapath.c >> @@ -464,6 +464,12 @@ static int queue_userspace_packet(struct datapath >> *dp, struct sk_buff *skb, >> } >> nla->nla_len = nla_attr_size(skb->len); >> >> + if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) { >> + err = -ENOMEM; >> + skb_tx_error(skb); >> + goto out; >> + } >> + >> skb_zerocopy(user_skb, skb, skb->len, hlen); >> >> /* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */ > > > How about we move the call to skb_orphan_frags() into skb_zerocopy() > itself and call it before we actually reference the frags? > Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy is not doing that. Unless I missing something, Current netlink code can not handle skb-frags with zero copy. So we have to copy skb anyways and no need to orphan-frags here. If you are planning on handling skb-frags without copying then skb_orphan_frags should be done in netlink. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CALnjE+q=fejHPsjVj9+jnypJCUHfCDFc553U48WR48Tjcf3FZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <CALnjE+q=fejHPsjVj9+jnypJCUHfCDFc553U48WR48Tjcf3FZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-07 17:59 ` Thomas Graf [not found] ` <531A0911.4040304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Thomas Graf @ 2014-03-07 17:59 UTC (permalink / raw) To: Pravin Shelar Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On 03/07/2014 06:28 PM, Pravin Shelar wrote: > Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy > is not doing that. > > Unless I missing something, Current netlink code can not handle > skb-frags with zero copy. So we have to copy skb anyways and no need > to orphan-frags here. > If you are planning on handling skb-frags without copying then > skb_orphan_frags should be done in netlink. If you look at the second part of skb_zerocopy() this is exactly what it is doing unless the target skb has sufficient linear space preallocated. At least unless mmap is enabled in which case we would have to copy again until we have implemented a way to pass page refs via the nl ring buffer. So I think Zoltan is correct in orphaning frags that come from f.e. a tun device via zerocopy_sg_from_iovec(). ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <531A0911.4040304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <531A0911.4040304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-07 18:41 ` Pravin Shelar 2014-03-11 19:41 ` Zoltan Kiss 1 sibling, 0 replies; 15+ messages in thread From: Pravin Shelar @ 2014-03-07 18:41 UTC (permalink / raw) To: Thomas Graf Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev, LKML, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b On Fri, Mar 7, 2014 at 9:59 AM, Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 03/07/2014 06:28 PM, Pravin Shelar wrote: >> >> Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy >> is not doing that. >> >> Unless I missing something, Current netlink code can not handle >> skb-frags with zero copy. So we have to copy skb anyways and no need >> to orphan-frags here. >> If you are planning on handling skb-frags without copying then >> skb_orphan_frags should be done in netlink. > > > If you look at the second part of skb_zerocopy() this is exactly what > it is doing unless the target skb has sufficient linear space > preallocated. At least unless mmap is enabled in which case we would > have to copy again until we have implemented a way to pass page refs > via the nl ring buffer. > What is problem with passing page ref with get_page() to new skb as long as it is accessed kernel? I think we should set SKBTX_DEV_ZEROCOPY flag for new skb in skb_zerocopy() to track such pages but no need call skb_orphan_frags(). > So I think Zoltan is correct in orphaning frags that come from f.e. > a tun device via zerocopy_sg_from_iovec(). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <531A0911.4040304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-03-07 18:41 ` Pravin Shelar @ 2014-03-11 19:41 ` Zoltan Kiss [not found] ` <531F66D0.1050000-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Zoltan Kiss @ 2014-03-11 19:41 UTC (permalink / raw) To: Thomas Graf, Pravin Shelar Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, LKML, netdev On 07/03/14 17:59, Thomas Graf wrote: > On 03/07/2014 06:28 PM, Pravin Shelar wrote: >> Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy >> is not doing that. >> >> Unless I missing something, Current netlink code can not handle >> skb-frags with zero copy. So we have to copy skb anyways and no need >> to orphan-frags here. >> If you are planning on handling skb-frags without copying then >> skb_orphan_frags should be done in netlink. > > If you look at the second part of skb_zerocopy() this is exactly what > it is doing unless the target skb has sufficient linear space > preallocated. At least unless mmap is enabled in which case we would > have to copy again until we have implemented a way to pass page refs > via the nl ring buffer. > > So I think Zoltan is correct in orphaning frags that come from f.e. > a tun device via zerocopy_sg_from_iovec(). Now as I'm checking how Netlink works, I might be wrong at some parts :) skb_zerocopy correctly add the frags to the user_skb we are sending upwards, however when the userspace receive it in netlink_recvmsg(), it gets copied to the supplied buffer anyway. Is that correct? In which case we don't need to worry that userspace will sit on that page indefinitely. However we have to worry about userspace not calling recv on that Netlink socket, so in the end we still need skb_orphan_frags, just for a different reason :) We can put skb_orphan_frags into skb_zerocopy, skb_clone also do that. However with Netlink mmapped IO, we should take a different approach, and instead of calling skb_orphan_frags we should make sure user_skb can hold any skb we get from the kernel, and copy the frags there. Even if we would be able to pass page refs to userspace through the ring buffer (AFAIK currently we can't), it would be fragile to just pass kernel pages directly to userspace, even if they came without the SKBTX_DEV_ZEROCOPY flag. And I think it would be quite rare that we need that copy anyway, because the flow setup usually happens with small packets without frags. If we choose the above approach with Netlink mmap, we don't need skb_orphan_frags, in fact Regards, Zoli ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <531F66D0.1050000-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall [not found] ` <531F66D0.1050000-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> @ 2014-03-14 22:26 ` Zoltan Kiss 0 siblings, 0 replies; 15+ messages in thread From: Zoltan Kiss @ 2014-03-14 22:26 UTC (permalink / raw) To: Thomas Graf, Pravin Shelar Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, LKML, netdev On 11/03/14 19:41, Zoltan Kiss wrote: > On 07/03/14 17:59, Thomas Graf wrote: >> On 03/07/2014 06:28 PM, Pravin Shelar wrote: >>> Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy >>> is not doing that. >>> >>> Unless I missing something, Current netlink code can not handle >>> skb-frags with zero copy. So we have to copy skb anyways and no need >>> to orphan-frags here. >>> If you are planning on handling skb-frags without copying then >>> skb_orphan_frags should be done in netlink. >> >> If you look at the second part of skb_zerocopy() this is exactly what >> it is doing unless the target skb has sufficient linear space >> preallocated. At least unless mmap is enabled in which case we would >> have to copy again until we have implemented a way to pass page refs >> via the nl ring buffer. >> >> So I think Zoltan is correct in orphaning frags that come from f.e. >> a tun device via zerocopy_sg_from_iovec(). > > Now as I'm checking how Netlink works, I might be wrong at some parts :) > skb_zerocopy correctly add the frags to the user_skb we are sending > upwards, however when the userspace receive it in netlink_recvmsg(), it > gets copied to the supplied buffer anyway. Is that correct? In which > case we don't need to worry that userspace will sit on that page > indefinitely. However we have to worry about userspace not calling recv > on that Netlink socket, so in the end we still need skb_orphan_frags, > just for a different reason :) > We can put skb_orphan_frags into skb_zerocopy, skb_clone also do that. > > However with Netlink mmapped IO, we should take a different approach, > and instead of calling skb_orphan_frags we should make sure user_skb can > hold any skb we get from the kernel, and copy the frags there. Even if > we would be able to pass page refs to userspace through the ring buffer > (AFAIK currently we can't), it would be fragile to just pass kernel > pages directly to userspace, even if they came without the > SKBTX_DEV_ZEROCOPY flag. And I think it would be quite rare that we need > that copy anyway, because the flow setup usually happens with small > packets without frags. > If we choose the above approach with Netlink mmap, we don't need > skb_orphan_frags, in fact I spent some time to think about this mmaped scenaria, and discussed it with others: the conclusion is that it shouldn't be a big problem to expose local kernel pages through the frags array as I thought before. So OVS can get along with passing refs to those pages in the shared ring. However skb_orphan_frags would be still necessary in skb_zerocopy, for the same reason as now. Should I post a new patch which does calls orphan_frags in zerocopy? Or do you have any other opinion? Zoli ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-03-14 22:26 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-28 19:16 [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss
2014-03-06 17:09 ` Zoltan Kiss
2014-03-07 4:46 ` Pravin Shelar
2014-03-07 12:29 ` Zoltan Kiss
[not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:38 ` Pravin Shelar
[not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 15:58 ` Thomas Graf
[not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:19 ` Pravin Shelar
[not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 18:05 ` Thomas Graf
[not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 18:43 ` Pravin Shelar
[not found] ` <1393615016-9187-1-git-send-email-zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-07 16:23 ` Thomas Graf
[not found] ` <5319F272.1070101-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:28 ` Pravin Shelar
[not found] ` <CALnjE+q=fejHPsjVj9+jnypJCUHfCDFc553U48WR48Tjcf3FZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 17:59 ` Thomas Graf
[not found] ` <531A0911.4040304-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 18:41 ` Pravin Shelar
2014-03-11 19:41 ` Zoltan Kiss
[not found] ` <531F66D0.1050000-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-14 22:26 ` Zoltan Kiss
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).