From: ebiederm@xmission.com (Eric W. Biederman)
To: Benjamin LaHaise <bcrl@lhnet.ca>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Octavian Purdila <opurdila@ixiacom.com>,
netdev@vger.kernel.org, Cosmin Ratiu <cratiu@ixiacom.com>
Subject: Re: [PATCH] net: allow netdev_wait_allrefs() to run faster
Date: Fri, 30 Oct 2009 16:25:52 -0700 [thread overview]
Message-ID: <m1my38lb0f.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20091030143527.GA3141@kvack.org> (Benjamin LaHaise's message of "Fri\, 30 Oct 2009 10\:35\:27 -0400")
Benjamin LaHaise <bcrl@lhnet.ca> writes:
> On Thu, Oct 29, 2009 at 06:45:32PM -0700, Eric W. Biederman wrote:
>> The reason for the existence of sysfs_dirent is as things grow larger
>> we want to keep the amount of RAM consumed down. So we don't pin
>> everything in the dcache. So we try and keep the amount of memory
>> consumed down.
>
> I'm aware of that, but for users running into this sort of scaling issue,
> the amount of RAM required is a non-issue (30,000 interfaces require about
> 1GB of RAM at present), making the question more one of how to avoid the
> overhead for users who don't require it. I'd prefer a config option. The
> only way I can really see saving memory usage is to somehow tie sysfs dirent
> lookups into the network stack's own tables for looking up device entries.
> The network stack already has to cope with this kind of scaling, and that
> would save the RAM.
There is that. I'm trying to figure out how to add the improvements
without making sysfs_dirent larger. Which I think that is doable.
>> So I would like to see how much we can par down.
>
>> For dealing with seeks in the middle of readdir I expect the best way
>> to do that is to be inspired by htrees in extNfs and return a hash of
>> the filename as our position, and keep the filename list sorted by
>> that hash. Since we are optimizing for size we don't need to store
>> that hash. Then we can turn that list into a some flavor of sorted
>> binary tree.
>
> readdir() generally isn't an issue at present.
Supporting seekdir into the middle of a directory is the entire reason
I keep the entries sorted by inode. If we sort by a hash of the name.
We can use the hash to support directory position in readdir and seekdir.
And we can completely remove the linear list when the rb_tree is introduced.
>> I'm surprised sysfs_count_nlink shows up, as it is not directly on the
>> add or remove path. I think the answer there is to change s_flags
>> into a set of bitfields and make link_count one of them, perhaps
>> 16bits long. If we ever overflow our bitfield we can just set link
>> count to 0, and userspace (aka find) will know it can't optimized
>> based on link count.
>
> It shows up because of the bits of userspace (udev) touching the directory
> from things like the hotplug code path.
I realized after sending the message that s_mode in sysfs_dirent is a
real size offense. It is a 16bit field packed in between two longs.
So in practice it is possible to move the s_mode up next to s_flags
and add a s_nlink after it both unsigned short and get a cheap sysfs_nlink.
>> I was expecting someone to run into problems with the linear directory
>> of sysfs someday.
>
> Alas, sysfs isn't the only offender.
Agreed. Sysfs is probably the easiest to untangle.
Since I'm not quite ready to post my patches. I will briefly
mention what I have in my queue and hopefully get things posted.
I have changes to make it so that sysfs never has to go from
the sysfs_dirent to the sysfs inode.
I have changes to sys_sysctl() so that it becomes a filesystem lookup
under /proc/sys. Which ultimately makes the code easier to maintain
and debug.
Now back to getting things forward ported and ready to post.
Eric
next prev parent reply other threads:[~2009-10-30 23:25 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-17 22:18 [PATCH/RFC] make unregister_netdev() delete more than 4 interfaces per second Benjamin LaHaise
2009-10-18 4:26 ` Eric Dumazet
2009-10-18 16:13 ` Benjamin LaHaise
2009-10-18 17:51 ` Eric Dumazet
2009-10-18 18:21 ` Benjamin LaHaise
2009-10-18 19:36 ` Eric Dumazet
2009-10-21 12:39 ` Octavian Purdila
2009-10-21 15:40 ` [PATCH] net: allow netdev_wait_allrefs() to run faster Eric Dumazet
2009-10-21 16:09 ` Eric Dumazet
2009-10-21 16:51 ` Benjamin LaHaise
2009-10-21 19:54 ` Eric Dumazet
2009-10-29 23:07 ` Eric W. Biederman
2009-10-29 23:38 ` Benjamin LaHaise
2009-10-30 1:45 ` Eric W. Biederman
2009-10-30 14:35 ` Benjamin LaHaise
2009-10-30 14:43 ` Eric Dumazet
2009-10-30 23:25 ` Eric W. Biederman [this message]
2009-10-30 23:53 ` Benjamin LaHaise
2009-10-31 0:37 ` Eric W. Biederman
2010-08-09 17:23 ` Ben Greear
2010-08-09 17:34 ` Benjamin LaHaise
2010-08-09 17:44 ` Ben Greear
2010-08-09 17:48 ` Benjamin LaHaise
2010-08-09 18:03 ` Ben Greear
2010-08-09 19:59 ` Eric W. Biederman
2010-08-09 21:03 ` Benjamin LaHaise
2010-08-09 21:17 ` Eric W. Biederman
2009-10-21 16:55 ` Octavian Purdila
2009-10-23 21:13 ` Paul E. McKenney
2009-10-24 4:35 ` Eric Dumazet
2009-10-24 5:49 ` Paul E. McKenney
2009-10-24 8:49 ` Eric Dumazet
2009-10-24 13:52 ` Paul E. McKenney
2009-10-24 14:24 ` Eric Dumazet
2009-10-24 14:46 ` Paul E. McKenney
2009-10-24 23:49 ` Octavian Purdila
2009-10-25 4:47 ` Paul E. McKenney
2009-10-25 8:35 ` Eric Dumazet
2009-10-25 15:19 ` Octavian Purdila
2009-10-25 19:28 ` Eric Dumazet
2009-10-24 20:22 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1my38lb0f.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=bcrl@lhnet.ca \
--cc=cratiu@ixiacom.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=opurdila@ixiacom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.