public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Octavian Purdila <tavi@cs.pub.ro>,
	"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH 00/69] faster tree-based sysctl implementation
Date: Sun, 01 May 2011 20:06:27 -0700	[thread overview]
Message-ID: <m1aaf5yf64.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <BANLkTinTMckhNYx1OiyH_YR+hr31naPQHw@mail.gmail.com> (Lucian Adrian Grijincu's message of "Mon, 2 May 2011 01:12:41 +0200")

Lucian Adrian Grijincu <lucian.grijincu@gmail.com> writes:

> On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>>> Since we are touching most if not all of the sysctl registrations this
>>>> would also be a good time to pass a path string instead of the weird
>>>> ctl_path data structure.  We needed ctl_path when we had both binary
>>>> and proc paths to worry about but we no longer have that concern.
>>>
>>>
>>> I still find good use for it in the next patches (some optimisations).
>>> Getting rid of it makes some things more difficult:
>>> - I wouldn't like to parse strings into path components at registeration
>>
>> I don't expect '/' being more difficult to deal with than an array.  In
>> general I expect a single string to be more space efficient and easier
>> for human comprehension.
>
>
> We also use the string from ctl_path as a name for the sysctl
> directory. We would need to either:
> * strdup part of the string for each directory, remember to kfree
> * replace '/' with '\0' in the given string (meaning it can't be put
> in a read-only zone)

If we are only registering leaves, we can just deal with the tail of the
path and point just past the final /. There should be no need to
duplicate anything.

> Also I make use of the ctl_path to add some optimisations that deal
> with the case when there are very many known-to-be-uniquely-named
> sub-directories like for /proc/sys/net/ipv4/conf/DEVICE. IXIACOM which
> sponsored this work has usecases where they need to create 10^3..10^5
> virtual network devices and these optimisations really add up for that
> many interfaces.

I am convinced the places where we have network devices in the path are
indeed the pain points for scaling.

My gut feel is that we should use a balanced binary tree instead of a
doubly linked list for the directories.  The space cost of a tree
is just an extra color member instead of two pointers.

For 100000 entries your target of a binary tree should only be 17
entries tall.  Maybe twice that for if the tree is an rbtree.  17 or
even 33 should be a small enough value log(N) to keep the cost from
being painful.  And using a binary tree means fewer special cases
overall.


A binary tree is faster than your special case for lookup.  Which means
it solves the case of actually using the sysctl entries as well as the
case for creating them.

Furthermore to we also need to change sysfs because it also has
directories that will contain all 100000 of the network devices,
and I don't expect simply skipping the duplicate check is going to
fly in sysfs.

We could do something besides a data structure without a logN
insert/remove/lookup cost complexity.  But I think we need numbers
to show that won't scale.  So far all we have are numbers that show
a linked list doesn't scale.

> For details about the optimisation see patches:
>  61/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133694 and
>  62/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133711
>
>
> I will make another function that would take a string, parse it up,
> create a ctl_path array and register it, but I'd really like to keep
> ctl_path both in the implementation and as a means to register a
> table.

Using a string path certainly isn't critical at this point.  But so
far I don't see practical down sides.

>>> - users of the register_sysctl_paths would need to create strings with
>>> dynamic components (for example "net/conf/%s/" - where %s is a
>>> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
>>> domain-name).
>>
>> This is a good point.
>>
>> In the normal proc implementation this is solved by being able to
>> pass the equivalent of a ctl_table_header into the registration
>> function, which allows the use of relative paths in the registration
>> function.
>>
>> In the examples you have given relative paths should also work for
>> sysctl.
>
>
> Hmm, I don't think we're on the same channel here. I don't understand
> what you're trying to say
> - normal proc implementation?
> - the equivalent of a ctl_table_header?
> - relative paths?

I was looking at effectively other virtual filesystems that have
had similar problems and talking about other solutions used.

In particular I was referring to create_proc_entry.  It takes
a path and an optional parent directory.

> I was saying that if we are to *replace* the ctl_path based mechanism
> with a string denoting the path, then some other registrants will need
> to allocate memory for those strings because the paths they register
> are computed at runtime. Then I gave two distinct examples where this
> is done. In both of those cases, ctl_path saves us from allocating a
> string before allocation, only to chop it then back to pieces in the
> __register function.

And I was saying if that string was treated as a relative path.  We
could have:

struct ctl_table_header *register_sysctl_path(struct ctl_table_header *parent,
						const char *path,
						struct ctl_table *table);

The optional parent parameter would save us from the pain of having to
even place the sysctl entry in a ctl_path.  __register_sysctl_paths
already has a very similar interface.

Eric



  reply	other threads:[~2011-05-02  3:06 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-01  1:35 [PATCH 00/69] faster tree-based sysctl implementation Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 01/69] sysctl: remove .child from dev/parport/default Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 02/69] sysctl: parport: reorder .child assignments to simplify review Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 03/69] sysctl: remove .child from dev/parport/PORT/devices/DEVICE Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 04/69] sysctl: remove .child from dev/parport/PORT/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 05/69] sysctl: remove .child from dev/parport/PORT/devices/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 06/69] sysctl: remove .child from kernel/vsyscall64 (x86) Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 07/69] sysctl: remove .child from abi/vsyscall32 (x86) Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 08/69] sysctl: remove .child from crypto/fips_enabled Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 09/69] sysctl: remove .child from dev/cdrom/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 10/69] sysctl: remove .child from dev/hpet/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 11/69] sysctl: remove .child from dev/ipmi/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 12/69] sysctl: remove .child from dev/rtc/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 13/69] sysctl: remove .child from dev/mac_hid/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 14/69] sysctl: remove .child from dev/raid/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 15/69] sysctl: remove .child from xpc/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 16/69] sysctl: remove .child from xpc/hb Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 17/69] sysctl: remove .child from kernel/sclp (s390) Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 18/69] sysctl: remove .child from dev/scsi Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 19/69] sysctl: remove .child from kernel/pty Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 20/69] sysctl: remove .child from coda/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 21/69] sysctl: remove .child from fscache/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 22/69] sysctl: remove .child from fs/nfs/ nlm_table table Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 23/69] sysctl: remove .child from fs/nfs/ nfs_cb_table Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 24/69] sysctl: remove .child from fs/ntfs-debug Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 25/69] sysctl: remove .child from fs/ocfs2/nm/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 26/69] sysctl: remove .child from fs/quota/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 27/69] sysctl: remove .child from fs/xfs/ Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 28/69] sysctl: remove .child from kernel/ (ipc) Lucian Adrian Grijincu
2011-05-01  1:35 ` [PATCH 29/69] sysctl: remove .child from fs/mqueue Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 30/69] sysctl: sched: add sd_table_template Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 31/69] sysctl: remove .child from kernel/sched_domain/cpuX/domainY/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 32/69] sysctl: remove .child from kernel/ (utsname) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 33/69] sysctl: remove .child from sunrpc/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 34/69] sysctl: remove .child from sunrpc/svc_rdma Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 35/69] sysctl: remove .child from sunrpc/ (xprtrdma) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 36/69] sysctl: remove .child from sunrpc/ (xprtsock) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 37/69] sysctl: remove .child from bus/isa/ (arm) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 38/69] sysctl: remove .child from reboot/warm (arm) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 39/69] sysctl: remove .child from lasat/ (mips) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 40/69] sysctl: remove .child from appldata/ (s390) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 41/69] sysctl: remove .child from s390dbf/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 42/69] sysctl: remove .child from vm/ (s390) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 43/69] sysctl: remove .child from kernel/perfmon/ (ia64) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 44/69] sysctl: remove .child from kernel/ (ia64/kdump) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 45/69] sysctl: remove .child from kernel/powersave-nap (powerpc) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 46/69] sysctl: remove .child from pm/ (frv) Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 47/69] sysctl: remove .child from frv/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 48/69] sysctl: remove .child from sh64/unaligned_fixup/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 49/69] sysctl: delete unused register_sysctl_table function Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 50/69] sysctl: remove .child from ax25 table Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 51/69] sysctl: remove .child from net/ipv4/route and net/ipv4/neigh tables Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 52/69] sysctl: remove .child from net/ipv4/neigh table Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 53/69] sysctl: remove .child from net/ipv6/route, net/ipv6/icmp, net/ipv6 tables Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 54/69] sysctl: remove .child from net/llc tables Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 55/69] sysctl: no-child: manually register kernel/random Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 56/69] sysctl: no-child: manually register kernel/keys Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 57/69] sysctl: no-child: manually register fs/inotify Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 58/69] sysctl: no-child: manually register fs/epoll Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 59/69] sysctl: no-child: manually register root tables Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 60/69] sysctl: faster tree-based sysctl implementation Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 61/69] sysctl: single subheader path: optimisation for paths used only once Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 62/69] sysctl: single subheader path: net/ipv4/conf/DEVICE-NAME/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 63/69] sysctl: single subheader path: net/{ipv4|ipv6}/neigh/DEV/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 64/69] sysctl: single subheader path: net/ipv6/conf/DEVICE-NAME/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 65/69] sysctl: single subheader path: dev/parport/PORT/devices/DEVICE/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 66/69] sysctl: single subheader path: net/ax25/DEVICE Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 67/69] sysctl: single subheader path: kernel/sched_domain/CPU/DOMAIN/ Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 68/69] sysctl: single subheader path: net/decnet/conf/DEVNAME Lucian Adrian Grijincu
2011-05-01  1:36 ` [PATCH 69/69] RFC: sysctl: convert read-write lock to RCU Lucian Adrian Grijincu
2011-05-01 11:07 ` [PATCH 00/69] faster tree-based sysctl implementation Lucian Adrian Grijincu
2011-05-01 15:28 ` Eric W. Biederman
2011-05-01 16:20   ` Lucian Adrian Grijincu
2011-05-01 22:49     ` Eric W. Biederman
2011-05-01 23:12       ` Lucian Adrian Grijincu
2011-05-02  3:06         ` Eric W. Biederman [this message]
2011-05-02  7:44           ` Lucian Adrian Grijincu
2011-05-02 19:02             ` Eric W. Biederman
2011-05-02 21:43               ` Lucian Adrian Grijincu
2011-05-01 23:34       ` Lucian Adrian Grijincu
2011-05-02  0:03         ` Lucian Adrian Grijincu
2011-05-02  3:26         ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1aaf5yf64.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=adobriyan@gmail.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucian.grijincu@gmail.com \
    --cc=tavi@cs.pub.ro \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox