Re: [PATCH tty-next 0/4] tty: Fix ^C echo

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Hurley <peter@hurleysoftware.com>
To: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.cz>,
	linux-kernel@vger.kernel.org, linux-serial@vger.kernel.org
Subject: Re: [PATCH tty-next 0/4] tty: Fix ^C echo
Date: Wed, 11 Dec 2013 22:59:20 -0500	[thread overview]
Message-ID: <52A93498.4030803@hurleysoftware.com> (raw)
In-Reply-To: <20131205001315.3ac390d6@alan.etchedpixels.co.uk>

On 12/04/2013 07:13 PM, One Thousand Gnomes wrote:
>> Not so much confused as simply merged. Input processing is inherently
>> single-threaded; it makes sense to rely on that at the highest level
>> possible.
>
> I would disagree entirely. You want to minimise the areas affected by a
> given lock. You also want to lock data not code. Correctness comes before
> speed. You optimise it when its right, otherwise you end up in a nasty
> mess when you discover you've optimised to assumptions that are flawed.

Sorry for the delayed reply, Alan; what little free time I had was spent
snuffing out regressions :/

Sure, I understand that ideally locks protect data, not operations.
But I think maybe you're missing my point. Almost every lock, even at
inception, is somewhat optimized; otherwise, every datum would have its
own lock. Eliminating overlapping locks is a common optimization in stable
code.

In this case, an already broken bit of code is just only still broken.
buf->lock is also fairly simple to break apart (although I don't want to
because of the performance hit) which is not characteristic of locks
which protect operations.


>> Firewire, which is capable of sustained throughput in excess of 40MB/sec,
>> struggles to get over 5MB/sec through the tty layer. [And drm output
>> is orders-of-magnitude slower than that, which is just sad...]
>
> And what protocols do you care about 5MB/second - n_tty - no ? For the
> high speed protocols you are trying to fix a lost cause. By the time
> we've gone piddling around with tty buffers and serialized tty queues
> firing bytes through tasks and the like you already lost.
>
> For drm I assume you mean the framebuffer console logic ? Last time I
> benched that except for the Poulsbo it was bottlenecked on the GPU - not
> that I can type at 5MB/second anyway. Not that fixing the performance of
> the various bits wouldn't be a good thing too especially on the output
> end.

For drm, I actually mean GEM object deletion, which is typically fenced
and thus appears to be GPU-bound. What's really needed there is deferred
deletion, like kfree_rcu(), with partial synchronization on allocation
failures only.

I mostly care about output speed; unfortunately, that's the input side
at the other end :)

>> While that would work, it's expensive extra locking in a path that 99.999%
>> of the time doesn't need it. I'd rather explore other solutions.
>
> How about getting the high speed paths out of the whole tty buffer
> layer ? Almost every line discipline can be a fastpath directly to the
> network layer. If optimisation is the new obsession then we can cut the
> crap entirely by optimising for networking not making it a slave of n_tty.
>
> Starting at the beginning
>
> we have locks on rx because
> - we want serialized rx
> - we have buffer lifetimes
> - we have buffer queues
> - we have loads of flow control parameters
>
> Only n_tty needs the buffers (maybe some of irda but irda hasn't worked
> for years afaik). IRQ receive paths are serialized (and as a bonus can be
> pinned to a CPU). Flow control is n_tty stuff, everyone else simply fires
> it at their network layer as fast as possible and net already does the
> work.
>
> Keep a single tty_buf in the tty for batching at any given time, and
> private so no locks at all
>
> Have a wrapper via
> ld->receive(tty, buf)
>
> which fires the tty_buf at the ldisc and allocates a new empty one
>
> tty_queue_bytes(tty, buf, flags, len)
>
> which adds to the buffer, and if full calls ld->queue and then carries on
> the copying cycle
>
> and
>
> ld->receive_direct(tty, buf, flags, len)
>
> which allows block mode devices to blast bytes directly at the queue (ie
> all the USB 3G stuff, firewire, etc) without going via any additional
> copies.
>
> For almost all ldiscs
>
> ld->receive would be
>
> ld->receive_direct(tty, buf->buf, buf->flags, buf->len);
> free buffer
>
> For n_tty type stuff
>
> ld->receive is basically much of tty_flip_buffer_push
>
> ld->receive_direct allocates tty_buffers and copies into it
>
> We may even be able to optimise some of the n_tty cases into the
> fastpath afterwards (notably raw, no echo)
>
> For anything receiving in blocks that puts us close to (but not quite at)
> ethernet kinds of cleanness for network buffer delivery.
>
> Worth me looking into ?

I have to give this a lot more thought.

The universality of n_tty is important, and costs real cycles on servers and
such. It's not just about typing speed.

>> The clock/generation method seems like it might yield a lockless solution
>> for this problem, but maybe creates another one because the driver-side
>> would need to stamp the buffer (in essence, a flush could affect data
>> that has not yet been copied from the driver).
>
> But it has arrived in the driver so might not matter. That requires a
> little thought!

This is my next experiment.

Regards,
Peter Hurley

next prev parent reply	other threads:[~2013-12-12  3:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 21:12 [PATCH tty-next 0/4] tty: Fix ^C echo Peter Hurley
2013-12-02 21:12 ` [PATCH tty-next 1/4] tty: Fix stale tty_buffer_flush() comment Peter Hurley
2013-12-02 21:12 ` [PATCH tty-next 2/4] tty: Add flush_nested() tty driver method and accessor Peter Hurley
2013-12-02 21:12 ` [PATCH tty-next 3/4] tty: Fix pty flush Peter Hurley
2013-12-02 21:12 ` [PATCH tty-next 4/4] n_tty: Flush echoes for signal chars Peter Hurley
2013-12-02 21:12   ` Peter Hurley
2013-12-03  0:01 ` [PATCH tty-next 0/4] tty: Fix ^C echo One Thousand Gnomes
2013-12-03  3:22   ` Peter Hurley
2013-12-03 14:20     ` One Thousand Gnomes
2013-12-03 17:23       ` Convert termios to RCU (was Re: [PATCH tty-next 0/4] tty: Fix ^C echo) Peter Hurley
2013-12-04  0:14         ` Peter Hurley
2013-12-04 17:42       ` [PATCH tty-next 0/4] tty: Fix ^C echo Peter Hurley
2013-12-05  0:13         ` One Thousand Gnomes
2013-12-12  3:59           ` Peter Hurley [this message]
2013-12-12 15:44             ` One Thousand Gnomes
2013-12-09  1:12 ` Greg Kroah-Hartman
2013-12-09 13:19   ` Peter Hurley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A93498.4030803@hurleysoftware.com \
    --to=peter@hurleysoftware.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.