All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>,
	Paul Moore <pmoore@redhat.com>,
	netdev@vger.kernel.org
Subject: Re: TUN problems (regression?)
Date: Thu, 27 Dec 2012 22:25:13 -0800	[thread overview]
Message-ID: <20121227222513.394d8234@nehalam.linuxnetplumber.net> (raw)
In-Reply-To: <50DD319A.5000708@redhat.com>

On Fri, 28 Dec 2012 13:43:54 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 12/28/2012 08:41 AM, Stephen Hemminger wrote:
> > On Fri, 21 Dec 2012 12:26:56 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> >> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> >>> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> >>>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> >>>>> On Thu, 20 Dec 2012 15:38:17 -0800
> >>>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>>>
> >>>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>>>>>
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
> >>>>>>> into a problem when running tunctl: running it once w/o arguments works as 
> >>>>>>> expected, but running it a second time results in failure and a 
> >>>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> >>>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>>>>>
> >>>>>>> Have you seen this before?
> >>>>>>>
> >>>>>> Obviously code in tun_flow_init() is wrong...
> >>>>>>
> >>>>>> static int tun_flow_init(struct tun_struct *tun)
> >>>>>> {
> >>>>>>         int i;
> >>>>>>
> >>>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
> >>>>>>                                             NULL);
> >>>>>>         if (!tun->flow_cache)
> >>>>>>                 return -ENOMEM;
> >>>>>> ...
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> I have no idea why we would need a kmem_cache per tun_struct,
> >>>>>> and why we even need a kmem_cache.
> >>>>> Normally flow malloc/free should be good enough.
> >>>>> It might make sense to use private kmem_cache if doing hlist_nulls.
> >>>>>
> >>>>>
> >>>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >>>> Should be at least a global cache, I thought I can get some speed-up by
> >>>> using kmem_cache.
> >>>>
> >>>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>> Was it with SLUB or SLAB ?
> >>>
> >>> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> >>> bytes per object, as we guarantee each object is on a single cache line.
> >>>
> >>>
> >> Right, thanks for the explanation.
> >>
> > I wonder if TUN would be better if it used a array to translate
> > receive hash to receive queue. This is how real hardware works with the
> > indirection table, and it would allow RFS acceleration. The current flow
> > cache stuff is prone to DoS attack and scaling problems with lots of
> > short lived flows.
> 
> The problem of indirection table is hash collision which may even happen
> when few flows existed.

Hash collision is fine, as long as the the statistical average of
hash across queue's is approximately equal it will be faster. A simple
array indirection is much faster than walking a hash table.

> For the RFS, we can open a API/ioctl for userspace to add or remove a
> flow cache.

RFS acceleration relies on programming the table. It is easier if
TUN looks more like hardware.

> For the DoS/scaling issue, I have an idea of:
> - limit the total number of flow entries in tun/tap
> - only update the flow entry every N (say 20 like ixgbe) packets or the
> the tcp packet has sync flag
> - I'm not sure skb_get_rxhash() is lightweight enough, or change to more
> lightweight one?

Ideally the hash should be programmable L2 vs L3, but that is splitting
hairs at this point.

Flow tables are scaling problem, especially on highly loaded servers where
they are most needed.

  reply	other threads:[~2012-12-28  6:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-20 23:16 TUN problems (regression?) Paul Moore
2012-12-20 23:38 ` Eric Dumazet
2012-12-20 23:50   ` Stephen Hemminger
2012-12-21  3:32     ` Jason Wang
2012-12-21  3:39       ` Eric Dumazet
2012-12-21  4:26         ` Jason Wang
2012-12-28  0:41           ` Stephen Hemminger
2012-12-28  5:43             ` Jason Wang
2012-12-28  6:25               ` Stephen Hemminger [this message]
2013-01-04  5:04                 ` Jason Wang
2012-12-21 21:15       ` David Miller
2012-12-21 16:27   ` Paul Moore
2012-12-21 17:17     ` [PATCH] tuntap: dont use a private kmem_cache Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121227222513.394d8234@nehalam.linuxnetplumber.net \
    --to=shemminger@vyatta.com \
    --cc=erdnetdev@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pmoore@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.