linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
To: Henrik Rydberg <rydberg@bitmath.org>
Cc: linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jiri Kosina <jkosina@suse.cz>,
	Mika Kuoppala <mika.kuoppala@nokia.com>,
	Benjamin Tissoires <tissoire@cena.fr>,
	Rafi Rubin <rafi@seas.upenn.edu>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism
Date: Fri, 4 Jun 2010 09:36:22 -0700	[thread overview]
Message-ID: <20100604163622.GB16827@core.coreip.homeip.net> (raw)
In-Reply-To: <4C08BCB5.7020201@bitmath.org>

On Fri, Jun 04, 2010 at 10:43:33AM +0200, Henrik Rydberg wrote:
> Hi Dmitry,
> >> +struct buflock_writer {
> >> +	unsigned int head;
> >> +	unsigned int next_head;
> >> +};
> > 
> > Since there can be only one writer thread should we just create "struct
> > buflock" and pull head and next head into it along with the buffer
> > itself and element size?
> 
> It is possible, but there are some arguments against it:
> 
> 1. What type to give the buffer?

u8.

> 2. Static or dynamic buffering?

You mean resizeable?

> 3. Can size be both a compile-time constant and a variable?
> 

Obviously not compile-time only.

> In short, I think that by _not_ including the actual buffer, the method
> ultimately becomes more useful.
> 
> > Also, maybe we could extend kfifo with the notion of multiple readers?
> 
> If merging the data and algorithm as you suggest, that would be a logical step,
> yes. To me, the most ideal would be to modify split the kfifo into data, writers
> and readers. But that would require api changes.
> 
> > 
> > In any case, it shoudl not live in include/linux/input/ as it may be
> > useful ouside of input subsystem.
> 
> Agreed.
> 
> > 
> >> +
> >> +struct buflock_reader {
> >> +	unsigned int head;
> >> +	unsigned int tail;
> >> +};
> >> +
> >> +/*
> >> + * Write to buffer without locking
> > 
> > Implies that there is an option of doing so with locking. Juts change to
> > write. Also please use standard kerneldoc-style markups.
> 
> Ok.
> 
> > 
> >> + *
> >> + * bw - the buflock_writer keeping track of the write position
> >> + * buf - the buffer to write to (array of item type)
> >> + * size - the size of the circular buffer (must be a power of two)
> >> + * item - the item to write
> >> + *
> >> + * There is no locking involved during write, so this method is
> >> + * suitable to use in interrupt context.
> > 
> > This is a misleading statement. You can say that this operation does not
> > sleep and thus is suitable for use in atomic contexts.
> 
> Ok.
> 
> > 
> >> + */
> >> +#define buflock_write(bw, buf, size, item)				\
> >> +	do {								\
> >> +		bw.next_head = (bw.head + 1) & ((size) - 1);		\
> >> +		smp_wmb();						\
> > 
> > Why do we need the write barrier here?
> 
>    reader_loads_next_head
>     -> interrupt modifying next_head then the buffer then the head
>    reader_loads_buffer
>    reader_loads_head
> 
> In this scenario, the reader ends up seeing next_head < head, but the position
> written was next_head. The reader will get a false picture of which portion of
> the buffer was modified.

I see.

> 
> > 
> >> +		buf[bw.head] = item;					\
> >> +		smp_wmb();						\
> > 
> > I think this is the only barrier that is needed. You want to make sure
> > that we advance head only after we complete write. Also, why SMP only?
> > Can't we get into trouble if we rearrange writes and take interrupt and
> > schedule away from this thread?
> 
> This would be true for a single-reader fifo, if we do not care about what
> happens when the buffer wraps around. Regarding reordering, my impression was
> that this cannot happen across smp_wmb(), but I might very well be wrong.
> 
> > 
> >> +		bw.head = bw.next_head;					\
> >> +		smp_wmb();						\
> > 
> > Why do we need the write barrier here?
> 
> This is following the pattern of seqlocks. My understanding is that since we
> will later rely on head being written, the last smp_wmb() is "for the road".
> 
> > 
> >> +	} while (0)
> >> +
> > 
> > This (and the rest) should be a static inline function so that we have
> > type safety, etc, etc.
> 
> And this is precisely what I wanted to avoid by not including the buffer in the
> buflock structures.
> 
> > 
> >> +
> >> +/*
> >> + * Syncronize reader with current writer
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + * bw - the buflock_writer keeping track of the write position
> >> + *
> >> + * Synchronize the reader head with the writer head, effectively
> >> + * telling the reader thread that there is new data to read.
> >> + *
> >> + * The reader head will always follow the writer head. As a
> >> + * consequence, the number of items stored in the read buffer might
> >> + * decrease during sync, as an effect of wrap-around. To avoid
> >> + * non-deterministic behavior during polls, the read buffer is
> >> + * guaranteed to be non-empty after synchronization.
> >> + *
> >> + */
> >> +#define buflock_sync_reader(br, bw)				\
> >> +	do {							\
> >> +		if (br.tail != bw.head)				\
> >> +			br.head = bw.head;			\
> > 
> > Why condition? Simple assignment is cheaper.
>

Ah, crap, I misread the condition... Anyway, thanks for the
explalantion.
 
> The condition takes care of a problem that is present also in the current evdev
> code: When the buffer is very small and wraps around a lot, it may well be that
> a write increases the head so that head == tail. If this happens between a point
> where a poll is triggered and the actual data being read, there will be no data
> to read. In an application like "cat", this will close the file and the program
> will exit.
> 
> By ensuring that the writer never creates a situation where head == tail, this
> problem is avoided.
> 
> > 
> >> +	} while (0)
> >> +
> >> +/*
> >> + * True if reader is empty
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + *
> >> + */
> >> +#define buflock_reader_empty(br) (br.head == br.tail)
> >> +
> >> +/*
> >> + * Read from buffer, retry during wrap-around
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + * bw - the buflock_writer keeping track of the write position
> >> + * buf - the buffer to read from (array of item type)
> >> + * size - the size of the circular buffer (must be a power of two)
> >> + * item - the item to read to
> >> + *
> >> + * Read the oldest item available in the buffer.
> >> + *
> >> + * During normal operation, with adequate buffer size, this method will not
> >> + * block, regardless of the number of concurrent readers. The method will
> >> + * only block momentarily during a write to the same position being read
> >> + * from, which happens when the buffer gets full. In such cases, the value
> >> + * eventually read will be the new value written to the buffer.
> >> + *
> >> + */
> >> +#define buflock_read(br, bw, buf, size, item)				\
> >> +	do {								\
> >> +		unsigned int _h, _nh;					\
> >> +		do {							\
> >> +			_h = bw.head;					\
> >> +			smp_rmb();					\
> >> +			item = buf[br.tail];				\
> >> +			smp_rmb();					\
> >> +			_nh = bw.next_head;				\
> >> +			smp_rmb();					\
> >> +		} while (unlikely(br.tail - _h < _nh - _h));		\
> >> +		br.tail = (br.tail + 1) & ((size) - 1);			\
> >> +	} while (0)
> > 
> > Again, are we sure we need all these barriers? Spinlock may end up less
> > expensive... Anyway, Oleg Nesterov knows more than anyone about data
> > coherency issues (CCed).
> 
> These barriers I am less certain of, so additional eyes would be very helpful.
> 
> Thanks,
> Henrik
> 

-- 
Dmitry

  reply	other threads:[~2010-06-04 16:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-03  8:00 [PATCH 0/4] input: evdev: Dynamic buffers (rev3) Henrik Rydberg
2010-06-03  8:00 ` [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism Henrik Rydberg
2010-06-03  8:01   ` [PATCH 2/4] input: evdev: Use multi-reader buffer to save space (rev3) Henrik Rydberg
2010-06-03  8:01     ` [PATCH 3/4] input: evdev: Convert to dynamic event buffer (rev3) Henrik Rydberg
2010-06-03  8:01       ` [PATCH 4/4] input: Use driver hint to compute the evdev buffer size Henrik Rydberg
2010-06-04  6:34         ` Dmitry Torokhov
2010-06-04  6:37       ` [PATCH 3/4] input: evdev: Convert to dynamic event buffer (rev3) Dmitry Torokhov
2010-06-04  6:56   ` [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism Dmitry Torokhov
2010-06-04  8:43     ` Henrik Rydberg
2010-06-04 16:36       ` Dmitry Torokhov [this message]
2010-06-04 17:08       ` Jonathan Cameron
2010-06-04 19:13       ` Oleg Nesterov
2010-06-04 19:43         ` Henrik Rydberg
2010-06-05 17:40           ` Oleg Nesterov
2010-06-05 18:34             ` Henrik Rydberg
2010-06-04 16:36     ` Henrik Rydberg
2010-06-05  1:35   ` Andrew Morton
2010-06-05 11:21     ` Henrik Rydberg
2010-06-04  6:59 ` [PATCH 0/4] input: evdev: Dynamic buffers (rev3) Dmitry Torokhov
2010-06-04 16:11   ` Henrik Rydberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100604163622.GB16827@core.coreip.homeip.net \
    --to=dmitry.torokhov@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.kuoppala@nokia.com \
    --cc=oleg@redhat.com \
    --cc=rafi@seas.upenn.edu \
    --cc=rydberg@bitmath.org \
    --cc=tissoire@cena.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).