Re: TUN problems (regression?)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>,
	Paul Moore <pmoore@redhat.com>,
	netdev@vger.kernel.org
Subject: Re: TUN problems (regression?)
Date: Fri, 04 Jan 2013 13:04:21 +0800	[thread overview]
Message-ID: <50E662D5.8010007@redhat.com> (raw)
In-Reply-To: <20121227222513.394d8234@nehalam.linuxnetplumber.net>

On 12/28/2012 02:25 PM, Stephen Hemminger wrote:
> On Fri, 28 Dec 2012 13:43:54 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 12/28/2012 08:41 AM, Stephen Hemminger wrote:
>>> On Fri, 21 Dec 2012 12:26:56 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
>>>>> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
>>>>>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>>>>>>> On Thu, 20 Dec 2012 15:38:17 -0800
>>>>>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>>
>>>>>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>>>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>>>>>>
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>>>>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>>>>>>> expected, but running it a second time results in failure and a 
>>>>>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>>>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>>>>>>
>>>>>>>>> Have you seen this before?
>>>>>>>>>
>>>>>>>> Obviously code in tun_flow_init() is wrong...
>>>>>>>>
>>>>>>>> static int tun_flow_init(struct tun_struct *tun)
>>>>>>>> {
>>>>>>>>         int i;
>>>>>>>>
>>>>>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>>>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>>>>>>                                             NULL);
>>>>>>>>         if (!tun->flow_cache)
>>>>>>>>                 return -ENOMEM;
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> I have no idea why we would need a kmem_cache per tun_struct,
>>>>>>>> and why we even need a kmem_cache.
>>>>>>> Normally flow malloc/free should be good enough.
>>>>>>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>>>>>>
>>>>>>>
>>>>>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
>>>>>> Should be at least a global cache, I thought I can get some speed-up by
>>>>>> using kmem_cache.
>>>>>>
>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>> Was it with SLUB or SLAB ?
>>>>>
>>>>> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
>>>>> bytes per object, as we guarantee each object is on a single cache line.
>>>>>
>>>>>
>>>> Right, thanks for the explanation.
>>>>
>>> I wonder if TUN would be better if it used a array to translate
>>> receive hash to receive queue. This is how real hardware works with the
>>> indirection table, and it would allow RFS acceleration. The current flow
>>> cache stuff is prone to DoS attack and scaling problems with lots of
>>> short lived flows.
>> The problem of indirection table is hash collision which may even happen
>> when few flows existed.
> Hash collision is fine, as long as the the statistical average of
> hash across queue's is approximately equal it will be faster. A simple
> array indirection is much faster than walking a hash table.

True, but hash collision may cause some negative effects such as losing
the flow affinity and packet re-ordering in guest which does not exist
in a perfect filter. Maybe we can implement them both and let user to
choose.
>
>> For the RFS, we can open a API/ioctl for userspace to add or remove a
>> flow cache.
> RFS acceleration relies on programming the table. It is easier if
> TUN looks more like hardware.
>
>> For the DoS/scaling issue, I have an idea of:
>> - limit the total number of flow entries in tun/tap
>> - only update the flow entry every N (say 20 like ixgbe) packets or the
>> the tcp packet has sync flag
>> - I'm not sure skb_get_rxhash() is lightweight enough, or change to more
>> lightweight one?
> Ideally the hash should be programmable L2 vs L3, but that is splitting
> hairs at this point.
>
> Flow tables are scaling problem, especially on highly loaded servers where
> they are most needed.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2013-01-04  5:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-20 23:16 TUN problems (regression?) Paul Moore
2012-12-20 23:38 ` Eric Dumazet
2012-12-20 23:50   ` Stephen Hemminger
2012-12-21  3:32     ` Jason Wang
2012-12-21  3:39       ` Eric Dumazet
2012-12-21  4:26         ` Jason Wang
2012-12-28  0:41           ` Stephen Hemminger
2012-12-28  5:43             ` Jason Wang
2012-12-28  6:25               ` Stephen Hemminger
2013-01-04  5:04                 ` Jason Wang [this message]
2012-12-21 21:15       ` David Miller
2012-12-21 16:27   ` Paul Moore
2012-12-21 17:17     ` [PATCH] tuntap: dont use a private kmem_cache Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50E662D5.8010007@redhat.com \
    --to=jasowang@redhat.com \
    --cc=erdnetdev@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pmoore@redhat.com \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).