From: Evgeniy Polyakov <s0mbre@tservice.net.ru>
To: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: Netchannles: first stage has been completed. Further ideas.
Date: Thu, 20 Jul 2006 11:32:23 +0400 [thread overview]
Message-ID: <20060720073223.GA15567@tservice.net.ru> (raw)
In-Reply-To: <20060719131915.GA21942@ms2.inr.ac.ru>
> Hello!
Hello, Alexey.
[ Sorry for long delay, there are some problems with mail servers, so I
can not access them remotely, so I create mail by hads, hopefully thread
will not be broken. ]
>> There is no socket spinlock anymore.
>> Above lock is skb_queue lock which is held inside
>> skb_dequeue/skb_queue_tail calls.
> Lock is named differently, but it is still here.
> BTW for UDP even the name is the same.
There is no bh processing, that lock is needed for 4 operations when skb
is enqueued/dequeued.
And if I would changed skbs to different structures there were no locks
at all - it is extremely lightweight, it can not be compared with socket
lock at all.
No bh/irq processing at all, natural speed management - that is main idea
behind netchannels.
>> > Equivalent of socket user lock.
>>
>> No, it is an equivalent for hash lock in socket table.
>OK. But you have to introduce socket mutex somewhere in any case.
>Even in ATCP.
Actually not - VJ's idea is to have only one consumer and one provider,
so no locks needed, but I agree, in general case it is needed, but _only_
to protect against several netchannel userspace consumers.
There is no BH protocol processing at all, so there is no need to
pprotect against someone who will add data while you are processing own
chunk.
>> Just an example - tcp_established() can be called with bh disabled
>> under the socket lock.
> When we have a process context in hands, it is not.
>Did you ask youself, why do not we put all the packets to
>backlog/prequeue
>and just wait when user will read the data? It would be 100% equivalent
>to "netchannels".
How many hacks just to be a bit closer to userspace processing,
implemented in netchannels!
>The answer is simple: because we cannot wait. If user delays for
>200msec,
>wait for connection collapse due to retransmissions. If the segment is
>out of order, immediate attention is required. Any scheme, which tries
>to wait for user unconditionally, at least has to run a watchdog timer,
>which fires before sender senses the gap.
If userspace is scheduled away for too much time, it is bloody wrong to
ack the data, that is impossible to read due to the fact that system is
being busy. It is just postponing the work from one end to another - ack
now and stop when queue is full, or postpone the ack generation when
segment is realy being read.
>And this is what we do for ages. Grep for "VJ" in sources. :-)
>netchannels have nothing to do with it, it is much elder idea.
And it was Van, who decided to move away from BH/irq processing.
It was slow and a bit pain way (how many hacks with prequeue, with
direct processing, it is enough just to look how TCP socket lock is locked
in different contexts :)
>> In that case one copies the whole data into userspace, so access for
>> 20 bytes of headers completely does not matter.
>For short packets it matters.
>But I said not this. I said it looks _worse_. A bit, but worse.
At least for 80 bytes it does not matter at all.
And it is very likely that data is misaligned, so half of the
header will be in a cache line. And socket code has the same problem -
skb->cb can be flushed away, and tcp_recvmsg() needs to get it again.
And actually I never understood nanooptimisation behind more serious
problems (i.e. one cache line vs. 50MB/sec speed).
>> Hmm, for 80 bytes sized packets win was about 2.5 times. Could you
>> please show me lines inside existing code, which should be commented,
>> so I got 50Mbyte/sec for that?
>If I knew it would be done. :-)
>
>Actually, it is the action, which I would expect. This, but
>not dropping all the TCP stack.
I tried to use existing one, and I had speed and CPU usage win, but it's
magnitude was not what I expected, so I started userspace network stack
implementation. It was succeded, and there are _very_ major
optimisations over existing code, when processing is fully moved into
userspace, but also there are big problems, like one syscall per ack,
so I decided to use that stack as a base for in-kernel process protocol
processing, and I succeded. Probably I will return to the userspace
network stack idea when I complete zero-copy networking support.
>> I showed there, that using existing stack it is imposible
>Please, understand, it is such statements that compromise your work.
>If it is impossible then it is not interesting.
Do not mix soft and warm - I just post the facts, that netchannel TCP
implementation works (sumetimes much) faster.
It is socket code that probably has some misoptimisations, and if it is
impossible to fix them (well, it least it is very hard), then it is not
interesting.
I definitely do not say, that it must be removed/replaced/anything - it
works perfectly ok, but it is possible to have better performance by
changing architecture, and it was done.
>Alexey
--
Evgeniy Polyakov
next prev parent reply other threads:[~2006-07-20 7:32 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-18 8:16 Netchannles: first stage has been completed. Further ideas Evgeniy Polyakov
2006-07-18 8:34 ` David Miller
2006-07-18 8:50 ` Evgeniy Polyakov
2006-07-18 11:16 ` Christian Borntraeger
2006-07-18 11:51 ` Evgeniy Polyakov
2006-07-18 12:36 ` Christian Borntraeger
2006-07-18 19:11 ` Evgeniy Polyakov
2006-07-18 21:20 ` David Miller
2006-07-18 12:15 ` Jörn Engel
2006-07-18 19:08 ` Evgeniy Polyakov
2006-07-19 11:00 ` Jörn Engel
2006-07-20 7:42 ` Evgeniy Polyakov
2006-07-18 23:01 ` Alexey Kuznetsov
2006-07-19 0:39 ` David Miller
2006-07-19 5:38 ` Evgeniy Polyakov
2006-07-19 6:30 ` Evgeniy Polyakov
2006-07-19 13:19 ` Alexey Kuznetsov
2006-07-20 7:32 ` Evgeniy Polyakov [this message]
2006-07-20 16:41 ` Alexey Kuznetsov
2006-07-20 21:08 ` Evgeniy Polyakov
2006-07-20 21:21 ` Ben Greear
2006-07-21 7:19 ` Evgeniy Polyakov
2006-07-21 7:20 ` Evgeniy Polyakov
2006-07-21 16:14 ` Ben Greear
2006-07-21 16:27 ` Evgeniy Polyakov
2006-07-22 13:23 ` Caitlin Bestler
2006-07-20 21:40 ` Ian McDonald
2006-07-21 7:26 ` Evgeniy Polyakov
2006-07-20 22:59 ` Alexey Kuznetsov
2006-07-21 4:55 ` David Miller
2006-07-21 7:10 ` Evgeniy Polyakov
2006-07-21 7:47 ` David Miller
2006-07-21 9:06 ` Evgeniy Polyakov
2006-07-21 9:19 ` David Miller
2006-07-21 9:39 ` Evgeniy Polyakov
2006-07-21 9:46 ` David Miller
2006-07-21 9:55 ` Evgeniy Polyakov
2006-07-21 16:26 ` Rick Jones
2006-07-21 20:57 ` David Miller
2006-07-19 19:52 ` Stephen Hemminger
2006-07-19 20:01 ` David Miller
2006-07-19 20:16 ` Stephen Hemminger
2006-07-24 18:54 ` Stephen Hemminger
2006-07-24 20:52 ` Alexey Kuznetsov
2006-07-27 2:17 ` Rusty Russell
2006-07-27 5:17 ` David Miller
2006-07-27 5:46 ` Rusty Russell
2006-07-27 6:00 ` David Miller
2006-07-27 18:54 ` Stephen Hemminger
2006-07-28 8:21 ` David Miller
2006-07-28 5:54 ` Rusty Russell
2006-08-01 4:47 ` David Miller
2006-08-01 6:36 ` Rusty Russell
2006-07-27 16:33 ` Alexey Kuznetsov
2006-07-27 16:51 ` Evgeniy Polyakov
2006-07-27 20:56 ` Alexey Kuznetsov
2006-07-28 5:17 ` Evgeniy Polyakov
2006-07-28 5:34 ` David Miller
2006-07-28 5:47 ` Evgeniy Polyakov
2006-07-28 4:49 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060720073223.GA15567@tservice.net.ru \
--to=s0mbre@tservice.net.ru \
--cc=davem@davemloft.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.