netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <edumazet@google.com>
Cc: <aams@amazon.com>, <davem@davemloft.net>, <kuba@kernel.org>,
	<kuni1840@gmail.com>, <kuniyu@amazon.com>,
	<netdev@vger.kernel.org>, <pabeni@redhat.com>
Subject: Re: [PATCH v1 net-next 3/6] af_unix: Define a per-netns hash table.
Date: Fri, 17 Jun 2022 10:52:15 -0700	[thread overview]
Message-ID: <20220617175215.1769-1-kuniyu@amazon.com> (raw)
In-Reply-To: <CANn89iJGh452phb9Z9tsOTmesqCaX7LJdTWXbBPF1Ynz-AJMww@mail.gmail.com>

From:   Eric Dumazet <edumazet@google.com>
Date:   Fri, 17 Jun 2022 10:00:25 +0200
> On Fri, Jun 17, 2022 at 8:57 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From:   Eric Dumazet <edumazet@google.com>
> > Date:   Fri, 17 Jun 2022 08:08:32 +0200
> > > On Fri, Jun 17, 2022 at 7:34 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From:   Eric Dumazet <edumazet@google.com>
> > > > Date:   Fri, 17 Jun 2022 06:23:37 +0200
> > > > > On Fri, Jun 17, 2022 at 1:48 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > >
> > > > > > This commit adds a per netns hash table for AF_UNIX.
> > > > > >
> > > > > > Note that its size is fixed as UNIX_HASH_SIZE for now.
> > > > > >
> > > > >
> > > > > Note: Please include memory costs for this table, including when LOCKDEP is on.
> > > >
> > > > I'm sorry but I'm not quite sure.
> > > > Do you mean the table should have the size as member of its struct?
> > > > Could you elaborate on memory costs and LOCKDEP?
> > >
> > > I am saying that instead of two separate arrays, you are now using one
> > > array, with holes in the structure
> > >
> > > Without LOCKDEP, sizeof(spinlock_t) is 4.
> > > With LOCKDEP, sizeof(spinlock_t) is bigger.
> > >
> > > So we are trading some costs of having two shared dense arrays, and
> > > having per-netns hash tables.
> > >
> > > It would be nice to mention this trade off in the changelog, because
> > > some hosts have thousands of netns and few af_unix sockets :/
> >
> > Thank you for explanation in detail!
> > I'm on the same page.
> > How about having separate arrays like this in per-netns struct?
> >
> > struct unix_table {
> >        spinlock_t *locks;
> >        list_head  *buckets;
> > }
> 
> Are we sure we need per-netns locks ?
> 
> I would think that sharing 256 spinlocks would be just fine, even on
> hosts with more than 256 cpus.

I ran the test written on the last patch with three kernels 10 times
for each:

  1) global locks and hash table
     1m 38s ~ 1m 43s

  2) per-netns locks and hash tables (two dense arrays version)
     11s

  3) global locks and per-netns hash tables
     15s

As you thought, the length of list has larger impact than lock contention.
But on a host with 10 cpus per-netns, per-netns locks are faster than
shared one.

What do you think about this trade-off?


> > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > > > ---
> > > > > >  include/net/af_unix.h    |  5 +++++
> > > > > >  include/net/netns/unix.h |  2 ++
> > > > > >  net/unix/af_unix.c       | 40 ++++++++++++++++++++++++++++++++++------
> > > > > >  3 files changed, 41 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > > > > > index acb56e463db1..0a17e49af0c9 100644
> > > > > > --- a/include/net/af_unix.h
> > > > > > +++ b/include/net/af_unix.h
> > > > > > @@ -24,6 +24,11 @@ extern unsigned int unix_tot_inflight;
> > > > > >  extern spinlock_t unix_table_locks[UNIX_HASH_SIZE];
> > > > > >  extern struct hlist_head unix_socket_table[UNIX_HASH_SIZE];
> > > > > >
> > > > > > +struct unix_hashbucket {
> > > > > > +       spinlock_t              lock;
> > > > > > +       struct hlist_head       head;
> > > > > > +};
> > > > > > +
> > > > > >  struct unix_address {
> > > > > >         refcount_t      refcnt;
> > > > > >         int             len;
> > > > > > diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h
> > > > > > index 91a3d7e39198..975c4e3f8a5b 100644
> > > > > > --- a/include/net/netns/unix.h
> > > > > > +++ b/include/net/netns/unix.h
> > > > > > @@ -5,8 +5,10 @@
> > > > > >  #ifndef __NETNS_UNIX_H__
> > > > > >  #define __NETNS_UNIX_H__
> > > > > >
> > > > > > +struct unix_hashbucket;
> > > > > >  struct ctl_table_header;
> > > > > >  struct netns_unix {
> > > > > > +       struct unix_hashbucket  *hash;
> > > > > >         int                     sysctl_max_dgram_qlen;
> > > > > >         struct ctl_table_header *ctl;
> > > > > >  };
> > > > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > > > > > index c0804ae9c96a..3c07702e2349 100644
> > > > > > --- a/net/unix/af_unix.c
> > > > > > +++ b/net/unix/af_unix.c
> > > > > > @@ -3559,7 +3559,7 @@ static const struct net_proto_family unix_family_ops = {
> > > > > >
> > > > > >  static int __net_init unix_net_init(struct net *net)
> > > > > >  {
> > > > > > -       int error = -ENOMEM;
> > > > > > +       int i;
> > > > > >
> > > > > >         net->unx.sysctl_max_dgram_qlen = 10;
> > > > > >         if (unix_sysctl_register(net))
> > > > > > @@ -3567,18 +3567,35 @@ static int __net_init unix_net_init(struct net *net)
> > > > > >
> > > > > >  #ifdef CONFIG_PROC_FS
> > > > > >         if (!proc_create_net("unix", 0, net->proc_net, &unix_seq_ops,
> > > > > > -                       sizeof(struct seq_net_private))) {
> > > > > > -               unix_sysctl_unregister(net);
> > > > > > -               goto out;
> > > > > > +                            sizeof(struct seq_net_private)))
> > > > > > +               goto err_sysctl;
> > > > > > +#endif
> > > > > > +
> > > > > > +       net->unx.hash = kmalloc(sizeof(struct unix_hashbucket) * UNIX_HASH_SIZE,
> > > > > > +                               GFP_KERNEL);
> > > > >
> > > > > This will fail under memory pressure.
> > > > >
> > > > > Prefer kvmalloc_array()
> > > >
> > > > Thank you for feedback!
> > > > I will use it in v2.
> > > >
> > > >
> > > > > > +       if (!net->unx.hash)
> > > > > > +               goto err_proc;
> > > > > > +
> > > > > > +       for (i = 0; i < UNIX_HASH_SIZE; i++) {
> > > > > > +               INIT_HLIST_HEAD(&net->unx.hash[i].head);
> > > > > > +               spin_lock_init(&net->unx.hash[i].lock);
> > > > > >         }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +err_proc:
> > > > > > +#ifdef CONFIG_PROC_FS
> > > > > > +       remove_proc_entry("unix", net->proc_net);
> > > > > >  #endif
> > > > > > -       error = 0;
> > > > > > +err_sysctl:
> > > > > > +       unix_sysctl_unregister(net);
> > > > > >  out:
> > > > > > -       return error;
> > > > > > +       return -ENOMEM;
> > > > > >  }
> > > > > >
> > > > > >  static void __net_exit unix_net_exit(struct net *net)
> > > > > >  {
> > > > > > +       kfree(net->unx.hash);
> > > > >
> > > > > kvfree()
> > > > >
> > > > > >         unix_sysctl_unregister(net);
> > > > > >         remove_proc_entry("unix", net->proc_net);
> > > > > >  }
> > > > > > @@ -3666,6 +3683,16 @@ static int __init af_unix_init(void)
> > > > > >
> > > > > >         BUILD_BUG_ON(sizeof(struct unix_skb_parms) > sizeof_field(struct sk_buff, cb));
> > > > > >
> > > > > > +       init_net.unx.hash = kmalloc(sizeof(struct unix_hashbucket) * UNIX_HASH_SIZE,
> > > > > > +                                   GFP_KERNEL);
> > > > >
> > > > > Why are you allocating the hash table twice ? It should be done
> > > > > already in  unix_net_init() ?
> > > >
> > > > Ah sorry, just my mistake.
> > > > I'll remove this alloc/free part.
> > > >
> > > >
> > > > > > +       if (!init_net.unx.hash)
> > > > > > +               goto out;
> > > > > > +
> > > > > > +       for (i = 0; i < UNIX_HASH_SIZE; i++) {
> > > > > > +               INIT_HLIST_HEAD(&init_net.unx.hash[i].head);
> > > > > > +               spin_lock_init(&init_net.unx.hash[i].lock);
> > > > > > +       }
> > > > > > +
> > > > > >         for (i = 0; i < UNIX_HASH_SIZE; i++)
> > > > > >                 spin_lock_init(&unix_table_locks[i]);
> > > > > >
> > > > > > @@ -3699,6 +3726,7 @@ static void __exit af_unix_exit(void)
> > > > > >         proto_unregister(&unix_dgram_proto);
> > > > > >         proto_unregister(&unix_stream_proto);
> > > > > >         unregister_pernet_subsys(&unix_net_ops);
> > > > > > +       kfree(init_net.unx.hash);
> > > > >
> > > > >    Not needed.
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  /* Earlier than device_initcall() so that other drivers invoking
> > > > > > --
> > > > > > 2.30.2
> > > > > >
> > > >
> > > >
> > > > Best regards,
> > > > Kuniyuki

  reply	other threads:[~2022-06-17 17:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16 23:47 [PATCH v1 net-next 0/6] af_unix: Introduce per-netns socket hash table Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 1/6] af_unix: Clean up some sock_net() uses Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 2/6] af_unix: Include the whole hash table size in UNIX_HASH_SIZE Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 3/6] af_unix: Define a per-netns hash table Kuniyuki Iwashima
2022-06-17  4:23   ` Eric Dumazet
2022-06-17  5:33     ` Kuniyuki Iwashima
2022-06-17  6:08       ` Eric Dumazet
2022-06-17  6:57         ` Kuniyuki Iwashima
2022-06-17  8:00           ` Eric Dumazet
2022-06-17 17:52             ` Kuniyuki Iwashima [this message]
2022-06-17 18:17   ` kernel test robot
2022-06-17 20:44     ` Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 4/6] af_unix: Acquire/Release per-netns hash table's locks Kuniyuki Iwashima
2022-06-20  6:10   ` [af_unix] b4813d5914: WARNING:possible_recursive_locking_detected kernel test robot
2022-06-20 16:47     ` Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 5/6] af_unix: Put a socket into a per-netns hash table Kuniyuki Iwashima
2022-06-16 23:47 ` [PATCH v1 net-next 6/6] af_unix: Remove unix_table_locks Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220617175215.1769-1-kuniyu@amazon.com \
    --to=kuniyu@amazon.com \
    --cc=aams@amazon.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).