From: Zoltan Kiss <zoltan.kiss@citrix.com>
To: Thomas Graf <tgraf@redhat.com>, Pravin Shelar <pshelar@nicira.com>
Cc: "dev@openvswitch.org" <dev@openvswitch.org>,
<xen-devel@lists.xenproject.org>,
LKML <linux-kernel@vger.kernel.org>,
netdev <netdev@vger.kernel.org>
Subject: Re: [ovs-dev] [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
Date: Fri, 14 Mar 2014 22:26:36 +0000 [thread overview]
Message-ID: <5323821C.70001@citrix.com> (raw)
In-Reply-To: <531F66D0.1050000@citrix.com>
On 11/03/14 19:41, Zoltan Kiss wrote:
> On 07/03/14 17:59, Thomas Graf wrote:
>> On 03/07/2014 06:28 PM, Pravin Shelar wrote:
>>> Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy
>>> is not doing that.
>>>
>>> Unless I missing something, Current netlink code can not handle
>>> skb-frags with zero copy. So we have to copy skb anyways and no need
>>> to orphan-frags here.
>>> If you are planning on handling skb-frags without copying then
>>> skb_orphan_frags should be done in netlink.
>>
>> If you look at the second part of skb_zerocopy() this is exactly what
>> it is doing unless the target skb has sufficient linear space
>> preallocated. At least unless mmap is enabled in which case we would
>> have to copy again until we have implemented a way to pass page refs
>> via the nl ring buffer.
>>
>> So I think Zoltan is correct in orphaning frags that come from f.e.
>> a tun device via zerocopy_sg_from_iovec().
>
> Now as I'm checking how Netlink works, I might be wrong at some parts :)
> skb_zerocopy correctly add the frags to the user_skb we are sending
> upwards, however when the userspace receive it in netlink_recvmsg(), it
> gets copied to the supplied buffer anyway. Is that correct? In which
> case we don't need to worry that userspace will sit on that page
> indefinitely. However we have to worry about userspace not calling recv
> on that Netlink socket, so in the end we still need skb_orphan_frags,
> just for a different reason :)
> We can put skb_orphan_frags into skb_zerocopy, skb_clone also do that.
>
> However with Netlink mmapped IO, we should take a different approach,
> and instead of calling skb_orphan_frags we should make sure user_skb can
> hold any skb we get from the kernel, and copy the frags there. Even if
> we would be able to pass page refs to userspace through the ring buffer
> (AFAIK currently we can't), it would be fragile to just pass kernel
> pages directly to userspace, even if they came without the
> SKBTX_DEV_ZEROCOPY flag. And I think it would be quite rare that we need
> that copy anyway, because the flow setup usually happens with small
> packets without frags.
> If we choose the above approach with Netlink mmap, we don't need
> skb_orphan_frags, in fact
I spent some time to think about this mmaped scenaria, and discussed it
with others: the conclusion is that it shouldn't be a big problem to
expose local kernel pages through the frags array as I thought before.
So OVS can get along with passing refs to those pages in the shared
ring. However skb_orphan_frags would be still necessary in skb_zerocopy,
for the same reason as now.
Should I post a new patch which does calls orphan_frags in zerocopy? Or
do you have any other opinion?
Zoli
WARNING: multiple messages have this Message-ID (diff)
From: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
To: Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Pravin Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Cc: "dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org"
<dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>,
xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b@public.gmane.org,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall
Date: Fri, 14 Mar 2014 22:26:36 +0000 [thread overview]
Message-ID: <5323821C.70001@citrix.com> (raw)
In-Reply-To: <531F66D0.1050000-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
On 11/03/14 19:41, Zoltan Kiss wrote:
> On 07/03/14 17:59, Thomas Graf wrote:
>> On 03/07/2014 06:28 PM, Pravin Shelar wrote:
>>> Problem is mapping SKBTX_DEV_ZEROCOPY pages to userspace. skb_zerocopy
>>> is not doing that.
>>>
>>> Unless I missing something, Current netlink code can not handle
>>> skb-frags with zero copy. So we have to copy skb anyways and no need
>>> to orphan-frags here.
>>> If you are planning on handling skb-frags without copying then
>>> skb_orphan_frags should be done in netlink.
>>
>> If you look at the second part of skb_zerocopy() this is exactly what
>> it is doing unless the target skb has sufficient linear space
>> preallocated. At least unless mmap is enabled in which case we would
>> have to copy again until we have implemented a way to pass page refs
>> via the nl ring buffer.
>>
>> So I think Zoltan is correct in orphaning frags that come from f.e.
>> a tun device via zerocopy_sg_from_iovec().
>
> Now as I'm checking how Netlink works, I might be wrong at some parts :)
> skb_zerocopy correctly add the frags to the user_skb we are sending
> upwards, however when the userspace receive it in netlink_recvmsg(), it
> gets copied to the supplied buffer anyway. Is that correct? In which
> case we don't need to worry that userspace will sit on that page
> indefinitely. However we have to worry about userspace not calling recv
> on that Netlink socket, so in the end we still need skb_orphan_frags,
> just for a different reason :)
> We can put skb_orphan_frags into skb_zerocopy, skb_clone also do that.
>
> However with Netlink mmapped IO, we should take a different approach,
> and instead of calling skb_orphan_frags we should make sure user_skb can
> hold any skb we get from the kernel, and copy the frags there. Even if
> we would be able to pass page refs to userspace through the ring buffer
> (AFAIK currently we can't), it would be fragile to just pass kernel
> pages directly to userspace, even if they came without the
> SKBTX_DEV_ZEROCOPY flag. And I think it would be quite rare that we need
> that copy anyway, because the flow setup usually happens with small
> packets without frags.
> If we choose the above approach with Netlink mmap, we don't need
> skb_orphan_frags, in fact
I spent some time to think about this mmaped scenaria, and discussed it
with others: the conclusion is that it shouldn't be a big problem to
expose local kernel pages through the frags array as I thought before.
So OVS can get along with passing refs to those pages in the shared
ring. However skb_orphan_frags would be still necessary in skb_zerocopy,
for the same reason as now.
Should I post a new patch which does calls orphan_frags in zerocopy? Or
do you have any other opinion?
Zoli
next prev parent reply other threads:[~2014-03-14 22:26 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-28 19:16 [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall Zoltan Kiss
2014-02-28 19:16 ` Zoltan Kiss
2014-03-06 17:09 ` Zoltan Kiss
2014-03-07 4:46 ` Pravin Shelar
2014-03-07 12:29 ` Zoltan Kiss
2014-03-07 17:38 ` Pravin Shelar
[not found] ` <5319BBAE.7030109-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:38 ` Pravin Shelar
2014-03-07 17:38 ` Pravin Shelar
2014-03-07 12:29 ` Zoltan Kiss
[not found] ` <CALnjE+rWc=n_F+1jSLQtPrgKSvvxONEkkYxWEHon2_KVNG9z3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 15:58 ` Thomas Graf
2014-03-07 15:58 ` Thomas Graf
2014-03-07 17:19 ` Pravin Shelar
[not found] ` <5319EC8E.2010606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 17:19 ` Pravin Shelar
2014-03-07 17:19 ` Pravin Shelar
2014-03-07 18:05 ` Thomas Graf
[not found] ` <CALnjE+oDM=ga_C6T_-9i2UNwv=K4g-+y-LJA04nh+=WmoeuNXw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-07 18:05 ` Thomas Graf
2014-03-07 18:05 ` Thomas Graf
[not found] ` <531A0A5B.2000104-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-07 18:43 ` Pravin Shelar
2014-03-07 18:43 ` Pravin Shelar
2014-03-07 18:43 ` Pravin Shelar
2014-03-07 15:58 ` Thomas Graf
2014-03-07 4:46 ` Pravin Shelar
2014-03-06 17:09 ` Zoltan Kiss
2014-03-07 16:23 ` Thomas Graf
2014-03-07 16:23 ` Thomas Graf
2014-03-07 16:23 ` Thomas Graf
2014-03-07 17:28 ` Pravin Shelar
2014-03-07 17:28 ` Pravin Shelar
2014-03-07 17:28 ` Pravin Shelar
2014-03-07 17:59 ` Thomas Graf
2014-03-07 17:59 ` Thomas Graf
2014-03-07 17:59 ` Thomas Graf
2014-03-07 18:41 ` Pravin Shelar
2014-03-07 18:41 ` Pravin Shelar
2014-03-07 18:41 ` Pravin Shelar
2014-03-11 19:41 ` Zoltan Kiss
2014-03-11 19:41 ` Zoltan Kiss
2014-03-11 19:41 ` Zoltan Kiss
2014-03-14 22:26 ` Zoltan Kiss [this message]
2014-03-14 22:26 ` Zoltan Kiss
2014-03-14 22:26 ` [ovs-dev] " Zoltan Kiss
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5323821C.70001@citrix.com \
--to=zoltan.kiss@citrix.com \
--cc=dev@openvswitch.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pshelar@nicira.com \
--cc=tgraf@redhat.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.