kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* NMIs, Locks, linked lists, barriers, and rculists, oh my!
@ 2013-04-04  2:08 Arlie Stephens
  2013-04-04  4:21 ` Valdis.Kletnieks at vt.edu
  2013-04-04 17:10 ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 2 replies; 6+ messages in thread
From: Arlie Stephens @ 2013-04-04  2:08 UTC (permalink / raw)
  To: kernelnewbies

Hi Folks,

I've got another "what's the natural way to do this in the linux
kernel" question. As always, I'm attempting something bizarre, that
doesn't seem to be covered in the newbie examples, and discussion with
coworkers produced no consensus. I got such useful ideas last time I
asked one of those question here, that I'm coming back for more.

Here's the situation:

- I've got a linked list, with entries occassionally added from normal
contexts. 
- Entries are never deleted from the list.
- I've got data structures which save pointers to various members of
the list.
- I've got routines which want to update a saved pointer to the next
  item in the list, or walk the whole list, or do other simple list
  operations. 

This would be simple, except that the routines which READ the list may
get called from panic(), or inside an oops, or from an NMI. It's
important that they succeed in doing their jobs in that case, even if
they managed to interrupt a list addition in progress. 

Put another way, my problem is that I want the readers to be able to
do their thing regardless of the state of the lock that serializes
writers. 

In all cases, writers are protected from each other with a mutex. 

Here are some choices for making sure readers see something sensible:

1) Use an rculist, and hope that it really does cover this
situation. 

2) Be thankful I'm only running on x86, hope that the linux kernel
isn't built with aggressive optimizations that reorder writes to
pointers, fix __list_add() so it updates pointers in the new entry 
before updating pointers in its neighbours, and do the reads with no
synchronization whatsoever. 

3) Add my own explicit use of compiler directives, either to a copy 
of list.h or to a simpler, home-grown singly linked tail queue 
implementation. Because I'm on x86 all I really care about is compiler 
optimizations, but it looks like the linux way to do this may be
with barrier macros. 

4) Give up on always getting this to work from an NMI. Use something
like a reader-writer spinlock, but if I'm in an NMI, make only a
conditional attempt to acquire it - and don't try to walk the list if
that acquisition fails (presumed self-deadlock)

As always, my questions are:

- Will these even work? In particular, do I need to deal with compiler
optimizations on x86? And is using RCU semantics sufficient to deal
with interrupting the writing context?

- Which of these make sense from the POV of someone whose been in the
linux kernel for a while? 

As a generic kernel person who's spent a lot of time in industry, my
instincts are to avoid roll-your-own unless no existing primitive can
do what I want. I also have a particular dislike of home grown
concurrency control; even if the original author gets it right, it's
tricky enough that some maintainer is almost certain to break it. 

And as for my team, one guy favours the rculist (#1), but isn't sure it's
adequate; one guy wants to use explicit barriers (#3); and I've
optimistically implemented #2, which I know would work on FreeBSD, and
which could fairly easily be converted to using rculists. None of us
want #4.

Thanks for any insight,

--
Arlie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NMIs, Locks, linked lists, barriers, and rculists, oh my!
  2013-04-04  2:08 NMIs, Locks, linked lists, barriers, and rculists, oh my! Arlie Stephens
@ 2013-04-04  4:21 ` Valdis.Kletnieks at vt.edu
  2013-04-04  7:31   ` Arlie Stephens
  2013-04-04 17:10 ` michi1 at michaelblizek.twilightparadox.com
  1 sibling, 1 reply; 6+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-04-04  4:21 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 03 Apr 2013 19:08:45 -0700, Arlie Stephens said:

> - I've got a linked list, with entries occassionally added from normal
> contexts.
> - Entries are never deleted from the list.

This is already busted - you *will* eventually OOM the system this way.

> This would be simple, except that the routines which READ the list may
> get called from panic(), or inside an oops, or from an NMI. It's
> important that they succeed in doing their jobs in that case, even if
> they managed to interrupt a list addition in progress.

At that point, you need to be *really* specific on what the semantics
of "succeed" really are.  In particular, do they merely need to
succeed at reading a single specific entry, or the entire list?

You should also be clear on why panic() and oops need to poke the list
(hint - if you're in panic(), you're there because the kernel is already
convinced things are too screwed up to continue, so why should you
trust your list enough to walk across it during panic()?)

> 1) Use an rculist, and hope that it really does cover this
> situation.

That's probably safe/sufficient, as long as the read semantics of
an RCU-protected region are acceptable. In particular, check the
behavior of when an update happens, compared to when it becomes
visible to readers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130404/50ea62d6/attachment.bin 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NMIs, Locks, linked lists, barriers, and rculists, oh my!
  2013-04-04  4:21 ` Valdis.Kletnieks at vt.edu
@ 2013-04-04  7:31   ` Arlie Stephens
  0 siblings, 0 replies; 6+ messages in thread
From: Arlie Stephens @ 2013-04-04  7:31 UTC (permalink / raw)
  To: kernelnewbies

On Apr 04 2013, Valdis.Kletnieks at vt.edu wrote:
> On Wed, 03 Apr 2013 19:08:45 -0700, Arlie Stephens said:
> 
> > - I've got a linked list, with entries occassionally added from normal
> > contexts.
> > - Entries are never deleted from the list.
> 
> This is already busted - you *will* eventually OOM the system this
> way.

Probably not. I'm expecting the list to be very short. 

> > This would be simple, except that the routines which READ the list may
> > get called from panic(), or inside an oops, or from an NMI. It's
> > important that they succeed in doing their jobs in that case, even if
> > they managed to interrupt a list addition in progress.
> 
> At that point, you need to be *really* specific on what the semantics
> of "succeed" really are.  In particular, do they merely need to
> succeed at reading a single specific entry, or the entire list?

I need to be able to move to the 'next' item, perhaps repeatedly,
without missing any items which had already been completely added. 

I generally will not be doing this by starting at the first element
and going on to the end (foreach); instead I'll be starting at some
arbitrary element.

If I'm racing with an update, I don't care whether or not I see the
item that's being added, as long as I don't follow bad pointers. 
If I see the new item, I had better see all of it, including a sane
'next' pointer. 

I can live without following the 'prev' pointer chain. 

> You should also be clear on why panic() and oops need to poke the list
> (hint - if you're in panic(), you're there because the kernel is already
> convinced things are too screwed up to continue, so why should you
> trust your list enough to walk across it during panic()?)

Obviously I can't trust anything 100%.

However, the feature I'm working on is analagous to pstore - its job
is to try to get information out in the event of a failure.

If pstore were a little bit more mature, I'd be using/extending it.
It's not, particularly not in the versions available in RHEL 6.1 and
earlier, which is the environment I'm living in. *sigh* 

> > 1) Use an rculist, and hope that it really does cover this
> > situation.
> 
> That's probably safe/sufficient, as long as the read semantics of
> an RCU-protected region are acceptable. In particular, check the
> behavior of when an update happens, compared to when it becomes
> visible to readers.

I'm still trying to really wrap my mind around what rculist actually
does. I've worked on the Itanium (IA64) with its cache incoherency and
intense use of speculative fetches etc., so the principles seem to
make sense. But I find I'm not entirely confident in my understanding.

Some of that is probably because it's doing things I don't care about, 
in order to handle deletion. (I don't need any of the _copy_
semantics, as far as I can see. I just need barriers in the right
places. Unless of course I ever decide to support deletion, whereupon
leaving deleted elements available for an appropriate time period
becomes important. )

Maybe I'm just dense. I'm certainly feeling that way at the moment.

--
Arlie

(Arlie Stephens					arlie at worldash.org)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NMIs, Locks, linked lists, barriers, and rculists, oh my!
  2013-04-04  2:08 NMIs, Locks, linked lists, barriers, and rculists, oh my! Arlie Stephens
  2013-04-04  4:21 ` Valdis.Kletnieks at vt.edu
@ 2013-04-04 17:10 ` michi1 at michaelblizek.twilightparadox.com
  2013-04-04 21:03   ` Arlie Stephens
  1 sibling, 1 reply; 6+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2013-04-04 17:10 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 19:08 Wed 03 Apr     , Arlie Stephens wrote:
...
> Put another way, my problem is that I want the readers to be able to
> do their thing regardless of the state of the lock that serializes
> writers. 
> 
> In all cases, writers are protected from each other with a mutex. 
> 
> Here are some choices for making sure readers see something sensible:
> 
> 1) Use an rculist, and hope that it really does cover this
> situation. 

rculist might cover this case. The question is whether you can aquire read
locks inside panic(), NMI, ... However, you should be able to skip the read
lock, if you never free the contents.

> 2) Be thankful I'm only running on x86,

Is there any good reason for limiting your code to x86? "My machine is x86 and
I do not care about yours" will probably not convince maintainers.

> hope that the linux kernel isn't built with aggressive optimizations that
> reorder writes to pointers,

There is no guarantee for that.

> fix __list_add() so it updates pointers in the new entry before updating
> pointers in its neighbours,

IMHO, this is like asking for trouble. Somebody else might change the order
in the future without knowing it might break your stuff. Also, if you have bad
luck, somebody else might already depend on the current order without you
knowing.

> and do the reads with no synchronization whatsoever.

You should at least do atomic_read()+atomic_set(). RCU is exactly that plus
a mechanism to free memory after all readers have finished.

> 3) Add my own explicit use of compiler directives, either to a copy 
> of list.h or to a simpler, home-grown singly linked tail queue 
> implementation. Because I'm on x86 all I really care about is compiler 
> optimizations, but it looks like the linux way to do this may be
> with barrier macros. 

You can do that. If you do that, I recommend using atomic operations. Do not
hope that the compiler does the right thing. It almost certainly does not
(atomic operations need the "lock" prefix in assembly and the compiler will
not add it).

> 4) Give up on always getting this to work from an NMI. Use something
> like a reader-writer spinlock, but if I'm in an NMI, make only a
> conditional attempt to acquire it - and don't try to walk the list if
> that acquisition fails (presumed self-deadlock)
> 
> As always, my questions are:
> 
> - Will these even work? In particular, do I need to deal with compiler
> optimizations on x86? And is using RCU semantics sufficient to deal
> with interrupting the writing context?
> 
> - Which of these make sense from the POV of someone whose been in the
> linux kernel for a while? 

Not sure if I qualify (I do not have any code upstream), I would do either
1 or 3. Option 2 is broken and 4 looks more like a last resort.

> As a generic kernel person who's spent a lot of time in industry, my
> instincts are to avoid roll-your-own unless no existing primitive can
> do what I want. I also have a particular dislike of home grown
> concurrency control; even if the original author gets it right, it's
> tricky enough that some maintainer is almost certain to break it. 

In this case you have to deal with concurrency either way. Just do it way
which is not tricky or subtle.

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NMIs, Locks, linked lists, barriers, and rculists, oh my!
  2013-04-04 17:10 ` michi1 at michaelblizek.twilightparadox.com
@ 2013-04-04 21:03   ` Arlie Stephens
  2013-04-05  5:28     ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 1 reply; 6+ messages in thread
From: Arlie Stephens @ 2013-04-04 21:03 UTC (permalink / raw)
  To: kernelnewbies

Hi Michi,

These all look like excellent points. Thank you for responding. 

On Apr 04 2013, michi1 at michaelblizek.twilightparadox.com wrote:
> On 19:08 Wed 03 Apr     , Arlie Stephens wrote:
> ...
> > Put another way, my problem is that I want the readers to be able to
> > do their thing regardless of the state of the lock that serializes
> > writers. 
> > 
> > In all cases, writers are protected from each other with a mutex. 
> > 
> > Here are some choices for making sure readers see something sensible:
> > 
> > 1) Use an rculist, and hope that it really does cover this
> > situation. 
> 
> rculist might cover this case. The question is whether you can aquire read
> locks inside panic(), NMI, ... However, you should be able to skip the read
> lock, if you never free the contents.

If I understand correctly, if you want to deal with the NMI case, you
are even more restricted than if you simply want to be called in
interrupt context. The protection from interrupts that come with
spinlocks doesn't help you in the case of NMI. So all you can use are
trylock primitives - get the lock if available, otherwise fail. 

Fortunately I believe the rcu system doesn't need any locks on the
reader side. (If I'm wrong about that, than I'd better stop reading
the docs and start reading the code :-) I haven't actually looked at
code any lower than rculist.h) 
> 
> > 2) Be thankful I'm only running on x86,
> 
> Is there any good reason for limiting your code to x86? "My machine is x86 and
> I do not care about yours" will probably not convince maintainers.

That's part of my working context. I'm writing code for a family of
network appliances running scientific linux 6.1 and earlier
versions. They are all x86. And the code I'm attempting to enhance
addresses a problem that's been addressed upstream with a rather
different design. 

If we didn't already have working code, I'd convince the upstream
patches to apply to 2.6.32 and earlier kernels (possibly painful),
add support for our hardware, fix whichever of its issues bit us, 
add the additional features we need, and then try to upstream the
lot - preferably in small patches. 

And if the upstream code handled NMIs well I'd copy its design. 

> > hope that the linux kernel isn't built with aggressive optimizations that
> > reorder writes to pointers,
> 
> There is no guarantee for that.
> 
> > fix __list_add() so it updates pointers in the new entry before updating
> > pointers in its neighbours,
> 
> IMHO, this is like asking for trouble. Somebody else might change the order
> in the future without knowing it might break your stuff. Also, if you have bad
> luck, somebody else might already depend on the current order without you
> knowing.

It's notable that __list_add_rcu(), from rculist.h has the order the
way I want it, even in our elderly kernel version. 

> You should at least do atomic_read()+atomic_set(). RCU is exactly that plus
> a mechanism to free memory after all readers have finished.

Now I'm confused. It seems to me that part of RCU's contribution is
use of barrier instructions (compiler and run-time) at appropriate
points. Are you saying that atomic_read() and atomic_set() have
implicit barriers? 

--
Arlie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* NMIs, Locks, linked lists, barriers, and rculists, oh my!
  2013-04-04 21:03   ` Arlie Stephens
@ 2013-04-05  5:28     ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 0 replies; 6+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2013-04-05  5:28 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 14:03 Thu 04 Apr     , Arlie Stephens wrote:
...
> On Apr 04 2013, michi1 at michaelblizek.twilightparadox.com wrote:
...
> > On 19:08 Wed 03 Apr     , Arlie Stephens wrote:
...
> > > fix __list_add() so it updates pointers in the new entry before updating
> > > pointers in its neighbours,
> > 
> > IMHO, this is like asking for trouble. Somebody else might change the order
> > in the future without knowing it might break your stuff. Also, if you have bad
> > luck, somebody else might already depend on the current order without you
> > knowing.
> 
> It's notable that __list_add_rcu(), from rculist.h has the order the
> way I want it, even in our elderly kernel version. 
> 
> > You should at least do atomic_read()+atomic_set(). RCU is exactly that plus
> > a mechanism to free memory after all readers have finished.
> 
> Now I'm confused. It seems to me that part of RCU's contribution is
> use of barrier instructions (compiler and run-time) at appropriate
> points. Are you saying that atomic_read() and atomic_set() have
> implicit barriers? 

You are right, it seems like atomic ops do not imply barriers:
Documentation/atomic_ops.txt

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-05  5:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-04  2:08 NMIs, Locks, linked lists, barriers, and rculists, oh my! Arlie Stephens
2013-04-04  4:21 ` Valdis.Kletnieks at vt.edu
2013-04-04  7:31   ` Arlie Stephens
2013-04-04 17:10 ` michi1 at michaelblizek.twilightparadox.com
2013-04-04 21:03   ` Arlie Stephens
2013-04-05  5:28     ` michi1 at michaelblizek.twilightparadox.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).