netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Graf <tgraf@suug.ch>
To: jamal <hadi@cyberus.ca>
Cc: Patrick McHardy <kaber@trash.net>,
	Stephen Hemminger <shemminger@osdl.org>,
	netdev@oss.sgi.com, Werner Almesberger <werner@almesberger.net>
Subject: Re: [RFC] batched tc to improve change throughput
Date: Wed, 26 Jan 2005 15:35:45 +0100	[thread overview]
Message-ID: <20050126143545.GK31837@postel.suug.ch> (raw)
In-Reply-To: <1106747313.1107.7.camel@jzny.localdomain>

* jamal <1106747313.1107.7.camel@jzny.localdomain> 2005-01-26 08:48
> On Mon, 2005-01-24 at 10:06, Thomas Graf wrote:
> 
> > I'm not talking of the nlmsg_seq but rather a a sequence number with
> > global or nl_family scope. It gets increased whenever a netlink
> > message of that family is processed and is returned with the ack. If
> > a userspace application wants to enforce atomicy between two requests
> > which cannot be batched because a answer is expected in between then
> > it could provide the expected sequence number and the request is only
> > fullfilled if this is true. Example:
> > 
> > --> RTM_NEWLINK
> > <-- answer
> > <-- ACK (seq = 222)
> > --> RTM_SETLINK (expect = 222)
> > <-- ACK
> > 
> > Now if another netlink app interfers:
> > 
> > --> RTM_NEWLINK
> > <-- answer
> > <-- ACK (seq = 222)
> > 
> > -- other app --
> > --> RTM_SETLINK
> > <-- ACK (seq = 223)
> > 
> > -- back to first app --
> > --> RTM_SETLINK (expect = 222)
> > <-- ERROR
> > 
> > The application can then retry it's operation a few times and
> > finally give up.  The main problem I see is to extend nlmsghdr
> > in a way it stays compatible.
> 
> The best thing you could get out of this is a warning that something
> changed under you i.e doesnt really solve the synchronization issue. 

Why? If we do the check with regard to the rtnl sem we can guarantee
atomicity. The comparison of the expected seq and the current seq must
be done before any action and within the rtnl semaphore. It is very
unlikely that someone interfers so strict locking is pretty inefficient.

rtnl_send_atomic(msg, expect_seq)
	retries := 10;
retry:
	res := send_msg(msg, expect_seq);
	if res = -ERETRY and --retries then
		goto retry;
	endif

	if retries = 0 then
		err "Timeout while trying to achieve atomic operation"
	endif

and in the kernel:

rtnl_lock();
if expect_seq != seq then
   rtnl_unlock()
   return -ERETRY;
endif

... atomic action can take place here ...

Of course this only works if netlink requests itself are
synchronized in the relevant netlink family.

> [And a lot more complexity is introduced - if you say you want to change
> the netlink header and maintain state in the kernel].

This is the big problem, there is no padding gap common to all rtnl users.

What we can do is to set a flag in nlmsghdr stating that a u32 block of
data follows the nlmsg header before the netlink user specific header,
i.e.

 +---------------------------------+
 | nlmsghdr flags |= NLM_F_EXP_SEQ |
 +---------------------------------+
 | expected_seq (u32)              |
 +---------------------------------+
 | netlink user specific data      |
 +---------------------------------+

I'd even go one step further and define a header options chain like in
IPv6 so we can add more header attributes later on, like:

 +--------------------------------+
 | nlmsghdr flags |= NLM_F_OPTS   |
 +--------------------------------+
 | size=4, type=expt_seq, next=0  |
 +- - - - - - - -  - - - - - - - -+
 | expected sequence              |
 +--------------------------------+
 | netlink user specific data     |
 +--------------------------------+

Thoughts?

> Your call really - you are the one who is going to maintain it;->
> As for ease of use and avoiding users from knowing details of how
> tlvs are put together etc - i think it doesnt matter how thats done
> underneath the hood; it is still doable on top of current libnetlink. In
> other words whats required, IMO, is something that hides netlink totaly
> so that the programmer/user doesnt even get to see TLVs.

Agreed, I even hide the structs exported to usersapce to avoid breakage,
i.e. i don't export tc_stats directly for example.

  reply	other threads:[~2005-01-26 14:35 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf
2005-01-17 15:45 ` jamal
2005-01-17 16:05   ` Thomas Graf
2005-01-17 16:36     ` jamal
2005-01-17 16:56       ` Thomas Graf
2005-01-17 22:49         ` jamal
2005-01-18 13:44           ` Thomas Graf
2005-01-18 14:29             ` jamal
2005-01-18 14:36               ` Lennert Buytenhek
2005-01-18 14:43                 ` jamal
2005-01-18 15:07                   ` Thomas Graf
2005-01-18 15:20                   ` Lennert Buytenhek
2005-01-19 14:24                     ` jamal
2005-01-18 14:58               ` Thomas Graf
2005-01-18 15:23                 ` Lennert Buytenhek
2005-01-19 14:13                 ` jamal
2005-01-19 14:36                   ` Thomas Graf
2005-01-19 16:45                   ` Werner Almesberger
2005-01-19 16:54                   ` Thomas Graf
2005-01-20 14:42                     ` jamal
2005-01-20 15:35                       ` Thomas Graf
2005-01-20 17:06                         ` Stephen Hemminger
2005-01-20 17:19                           ` Thomas Graf
2005-01-24 14:13                         ` jamal
2005-01-24 15:06                           ` Thomas Graf
2005-01-26 13:48                             ` jamal
2005-01-26 14:35                               ` Thomas Graf [this message]
2005-02-11 15:07                               ` Dan Siemon
2005-02-12 13:45                                 ` jamal
2005-02-12 14:29                                   ` Thomas Graf
2005-02-12 22:07                                   ` Dan Siemon
2005-02-12 22:32                                     ` Thomas Graf
2005-02-14  0:23                                       ` Dan Siemon
2005-02-14 14:27                                         ` Thomas Graf
2005-02-15 20:28                                           ` Dan Siemon
2005-02-15 20:47                                             ` Thomas Graf
2005-02-22 21:40                                               ` Dan Siemon
2005-02-22 23:15                                                 ` Thomas Graf
2005-01-18 15:07               ` Werner Almesberger
2005-01-19 14:08                 ` Thomas Graf
2005-01-19 16:33                   ` Werner Almesberger
2005-01-19 17:22                     ` Thomas Graf
2005-01-17 18:00 ` Stephen Hemminger
2005-01-17 18:02 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050126143545.GK31837@postel.suug.ch \
    --to=tgraf@suug.ch \
    --cc=hadi@cyberus.ca \
    --cc=kaber@trash.net \
    --cc=netdev@oss.sgi.com \
    --cc=shemminger@osdl.org \
    --cc=werner@almesberger.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).