Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-12-23 13:13 UTC (permalink / raw)
  To: dshwatrz; +Cc: davem, netdev, yoshfuji
In-Reply-To: <31436f4a0712230504x7b0b4f38i8f69a7a825bcc14b@mail.gmail.com>

In article <31436f4a0712230504x7b0b4f38i8f69a7a825bcc14b@mail.gmail.com> (at Sun, 23 Dec 2007 15:04:37 +0200), "David Shwatrz" <dshwatrz@gmail.com> says:

> Hello,
> 
> 
> >You should iterate all of ifa_list (for IPv4) / addr_list (for IPv6).
> > For IPv6, we also have anycast (maintained by ac_list) as well.
> 
> I am not sure that we need to iterate all of ifa_list in IPv4.
> The reason is that we end with arp_send, and it initiates a broadcast.
> So all neighbours will receive it and update their arp tables
> accordingly.
> The dest hw in the arp_send is NULL according to this patch ; this means that
> we will assign dev->broadcast to dest_hw  in apr_create().
> 
> It seems to me there's no reason to send more than one broadcast.

Urgh? what is happend if you have multiple IPv4 addresses on the device?


> In IPv6, I need to check, since it is multicast.

Please read RFC2461 Section 7.2.6.  In short we should send a few
unsolicited NA, but I think you can start from sending once per an
address.

--yoshfuji

^ permalink raw reply

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: jamal @ 2007-12-23 13:17 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: dshwatrz, davem, netdev, kaber
In-Reply-To: <20071223.220447.56355925.yoshfuji@linux-ipv6.org>

On Sun, 2007-23-12 at 22:04 +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:

> If the secondary MACs are used with ARP/NDP, we should take care of
> that, but I think we use the primary MAC for ARP/NDP, no?
> (In other words, we always use primary MAC for ARP reply / NA, no?)

I think it maybe a policy decision; 
In the IPV4 case, where the system owns the IP addresses for the
classical scenario where there is a single MAC address per ethx then we
always respond with MAC address of ethx wherever the arp request was
received from. I think it is different in the case of IPV6 where the eth
device owns the IP address, no? i.e is that where you are drawing the
concept of primary MAC?
The case of multiple MACs per interface requires further policy
resolution IMO. It would be nice to be able to tell the kernel which MAC
to use when responding for which ip address it owns. This can be then
easily mapped to the routing table src address selection.
Patrick?

cheers,
jamal




^ permalink raw reply

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: David Shwatrz @ 2007-12-23 13:21 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: davem, netdev
In-Reply-To: <20071223.221355.40687949.yoshfuji@linux-ipv6.org>

Yoshfuji,

Thanks, you are right ! soon I will send the patches.

Thanks !
DS


On Dec 23, 2007 3:13 PM, YOSHIFUJI Hideaki / 吉藤英明
<yoshfuji@linux-ipv6.org> wrote:
> In article <31436f4a0712230504x7b0b4f38i8f69a7a825bcc14b@mail.gmail.com> (at Sun, 23 Dec 2007 15:04:37 +0200), "David Shwatrz" <dshwatrz@gmail.com> says:
>
> > Hello,
> >
> >
> > >You should iterate all of ifa_list (for IPv4) / addr_list (for IPv6).
> > > For IPv6, we also have anycast (maintained by ac_list) as well.
> >
> > I am not sure that we need to iterate all of ifa_list in IPv4.
> > The reason is that we end with arp_send, and it initiates a broadcast.
> > So all neighbours will receive it and update their arp tables
> > accordingly.
> > The dest hw in the arp_send is NULL according to this patch ; this means that
> > we will assign dev->broadcast to dest_hw  in apr_create().
> >
> > It seems to me there's no reason to send more than one broadcast.
>
> Urgh? what is happend if you have multiple IPv4 addresses on the device?
>
>
> > In IPv6, I need to check, since it is multicast.
>
> Please read RFC2461 Section 7.2.6.  In short we should send a few
> unsolicited NA, but I think you can start from sending once per an
> address.
>
> --yoshfuji
>

^ permalink raw reply

* [PATCH net-2.6] [NEIGH] [resend] Updating affected neighbours when about MAC address change in arp_netdev_event()
From: David Shwatrz @ 2007-12-23 13:51 UTC (permalink / raw)
  To: davem, netdev

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

Hello,

This is a resend of the patch according to Yoshifuji comment. I will
send a patch for IPv6 later.

 We know that changes in MAC addresses are not frequent but
we are working on a special, highly advanced networking Linux based project
in our LABs, where we do change MAC addresses of an interface quite frequently.
(We do not go totally wild; the MAC addresses we are changing into are
from a set of given MAC addresses).

Normally, when we change a MAC address of some interface, the
relevant neighbours in the LAN which have entries with the previous MAC
address are not sent any update notification; instead, it is there
regular timers mechanisms which update the MAC address to the new
one in their ARP tables.

I had written a small patch to neigh_changeaddr() in net/ipv4/arp.c
against the 2.6 git net tree, which sends a gratuitous ARP to update
the list of
all the involved neighbours with the change of MAC address.
The patch is for arp_netdev_event() only.

This patch was tested and it does work in my LAB; if such a patch is
not needed,
I wonder why ?
It seems to me that it could not cause any troubles.
BTW, I had noticed that in irlan driver, there is such a mechanism
of sending a gratuitous ARP to update all the
neighbours when a MAC address is changed.

Signed-off-by: David Shwartz <dshwatrz@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 967 bytes --]

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 08174a2..7b1162b 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1185,6 +1185,8 @@ out:
 
 static int arp_netdev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
+	struct in_device *in_dev;
+	struct in_ifaddr *ifa;
 	struct net_device *dev = ptr;
 
 	if (dev->nd_net != &init_net)
@@ -1194,6 +1196,22 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event, vo
 	case NETDEV_CHANGEADDR:
 		neigh_changeaddr(&arp_tbl, dev);
 		rt_cache_flush(0);
+		
+		/* Send gratuitous ARP to the neighbours to update their arp tables */
+	
+		rcu_read_lock();
+		in_dev = __in_dev_get_rcu(dev);
+		if (in_dev == NULL)
+			goto out;
+		for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next)
+			arp_send(ARPOP_REQUEST, ETH_P_ARP, 
+				ifa->ifa_address,
+				dev, 
+				ifa->ifa_address,
+				NULL, dev->dev_addr, NULL);
+out:
+	rcu_read_unlock();
+
 		break;
 	default:
 		break;

^ permalink raw reply related

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: Herbert Xu @ 2007-12-23 14:02 UTC (permalink / raw)
  To: David Shwatrz; +Cc: yoshfuji, davem, netdev
In-Reply-To: <31436f4a0712230424n1cfaeb27o463e62093af41090@mail.gmail.com>

David Shwatrz <dshwatrz@gmail.com> wrote:
> 
> Hi,
> Oop, I am TWICE sorry ! I wrongly attached a wrong, empty file.
> Attached here is the patch.
> 
> Regarding your answer;  I accept it and I will soon send a revised
> version of this patch (making changes to
> arp_netdev_event() and ndisc_netdev_event().)
> I had  IPv4 in mind, there is no reason that it will no be also in IPv6.

Hmm, why can't you do this from user-space?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* ipv4_devconf.arp_accept mystery
From: Ian Brown @ 2007-12-23 16:00 UTC (permalink / raw)
  To: netdev

Hello,
I have a question regarding unsolicited ARPs in ipv4/arp.c

As I understand, by default this feature is disabled;
How can one set this feature ?

When working with ipv4_devconf.arp_accept is 0 by default.
This is with 2.6.21 kernel.

I wanted ipv4_devconf.arp_accept to be set to 1.

I added:
net.ipv4.conf.default.arp_accept = 1
net.ipv4.conf.eth0.arp_accept = 1
in /etc/sysctl.conf
and rebooted.

I have:
cat /proc/sys/net/ipv4/conf/default/arp_accept
=> 1
cat /proc/sys/net/ipv4/conf/eth0/arp_accept
=>1

Yet ,the printk() I added shows that ipv4_devconf.arp_accept shows 0 !!


Could it be that this variable could not be set ? or is it dependent
upon other things ?

BTW, in newer kernel version we have IPV4_DEVCONF_ALL(ARP_ACCEPT)
instead. So if anybody knows how to set this macro (instead)to be 1, it will be
also fine.


Ian

^ permalink raw reply

* Re: [PATCH 0/2] netem: trace enhancement
From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev,
	Rainer Baumann
In-Reply-To: <4755D2EB.4000807@candelatech.com>

I have added the possibility to configure the number
of buffers used to store the trace data for packet delays.
The complete command to start netem with a trace file is:
tc qdisc add dev eth1 root netem trace path/to/trace/file.bin buf 3 
loops 1 0
with buf: the number of buffers to be used
loops: how many times to loop through the tracefile
the last argument is optional and specifies whether the default is to 
drop packets or 0-delay them.

The patches are available at:
http://www.tcn.hypert.net/tcn_kernel_2_6_23_confbuf
http://www.tcn.hypert.net/tcn_iproute2_2_6_23_confbuf

I'm looking forward for your comments!
Thanks!
Ariane


Ben Greear wrote:
> Ariane Keller wrote:
> 
>> Yes, for short-term starvation it helps certainly.
>> But I'm still not convinced that it is really necessary to add more 
>> buffers, because I'm not sure whether the bottleneck is really the 
>> loading of data from user space to kernel space.
>> Some basic tests have shown that the kernel starts loosing packets at 
>> approximately the same packet rate regardless whether we use netem, or 
>> netem with the trace extension.
>> But if you have contrary experience I'm happy to add a parameter which 
>> defines the number of buffers.
> 
> I have no numbers, so if you think it works, then that is fine with me.
> 
> If you actually run out of the trace buffers, do you just continue to
> run with the last settings?  If so, that would keep up throughput
> even if you are out of trace buffers...
> 
> What rates do you see, btw?  (pps, bps).
> 
> Thanks,
> Ben
> 

-- 
Ariane Keller
Communication Systems Research Group, ETH Zurich
Web: http://www.csg.ethz.ch/people/arkeller
Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich

^ permalink raw reply

* Re: [PATCH 0/2] netem: trace enhancement: kernel
From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev,
	Rainer Baumann
In-Reply-To: <4755D2EB.4000807@candelatech.com>

This patch applies to kernel 2.6.23.
It enhances the network emulator netem with the possibility
to read all delay/drop/duplicate etc values from a trace file.
This trace file contains for each packet to be processed one value.
The values are read from the file in a user space process called
flowseed. These values are sent to the netem module with the help of
rtnetlink sockets.
In the netem module the values are "cached" in buffers.
The number of buffers is configurable upon start of netem.
If a buffer is empty the netem module sends a rtnetlink notification
to the flowseed process.
Upon receiving such a notification this process sends
the next 1000 values to the netem module.

signed-off-by: Ariane Keller <ariane.keller@tik.ee.ethz.ch>

---
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/linux/pkt_sched.h 
linux-2.6.23.8_mod/include/linux/pkt_sched.h
--- linux-2.6.23.8/include/linux/pkt_sched.h	2007-11-16 
19:14:27.000000000 +0100
+++ linux-2.6.23.8_mod/include/linux/pkt_sched.h	2007-12-21 
19:42:49.000000000 +0100
@@ -439,6 +439,9 @@ enum
  	TCA_NETEM_DELAY_DIST,
  	TCA_NETEM_REORDER,
  	TCA_NETEM_CORRUPT,
+	TCA_NETEM_TRACE,
+	TCA_NETEM_TRACE_DATA,
+	TCA_NETEM_STATS,
  	__TCA_NETEM_MAX,
  };

@@ -454,6 +457,26 @@ struct tc_netem_qopt
  	__u32	jitter;		/* random jitter in latency (us) */
  };

+struct tc_netem_stats
+{
+	int packetcount;
+	int packetok;
+	int normaldelay;
+	int drops;
+	int dupl;
+	int corrupt;
+	int novaliddata;
+	int reloadbuffer;
+};
+
+struct tc_netem_trace
+{
+	__u32   fid;             /*flowid */
+	__u32   def;          	 /* default action 0 = no delay, 1 = drop*/
+	__u32   ticks;	         /* number of ticks corresponding to 1ms */
+	__u32   nr_bufs;	 /* number of buffers to save trace data*/
+};
+
  struct tc_netem_corr
  {
  	__u32	delay_corr;	/* delay correlation */
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/net/flowseed.h 
linux-2.6.23.8_mod/include/net/flowseed.h
--- linux-2.6.23.8/include/net/flowseed.h	1970-01-01 01:00:00.000000000 
+0100
+++ linux-2.6.23.8_mod/include/net/flowseed.h	2007-12-21 
19:43:24.000000000 +0100
@@ -0,0 +1,34 @@
+/* flowseed.h     header file for the netem trace enhancement
+ */
+
+#ifndef _FLOWSEED_H
+#define _FLOWSEED_H
+#include <net/sch_generic.h>
+
+/* must be divisible by 4 (=#pkts)*/
+#define DATA_PACKAGE 4000
+#define DATA_PACKAGE_ID 4008
+
+/* struct per flow - kernel */
+struct tcn_control
+{
+	struct list_head full_buffer_list;
+	struct list_head empty_buffer_list;
+	struct buflist * buffer_in_use;		
+	int *offsetpos;       /* pointer to actual pos in the buffer in use */
+	int flowid;
+};
+
+struct tcn_statistic
+{
+	int packetcount;
+	int packetok;
+	int normaldelay;
+	int drops;
+	int dupl;
+	int corrupt;
+	int novaliddata;
+	int reloadbuffer;
+};
+
+#endif
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/include/net/pkt_sched.h 
linux-2.6.23.8_mod/include/net/pkt_sched.h
--- linux-2.6.23.8/include/net/pkt_sched.h	2007-11-16 19:14:27.000000000 
+0100
+++ linux-2.6.23.8_mod/include/net/pkt_sched.h	2007-12-21 
19:42:49.000000000 +0100
@@ -72,6 +72,9 @@ extern void qdisc_watchdog_cancel(struct
  extern struct Qdisc_ops pfifo_qdisc_ops;
  extern struct Qdisc_ops bfifo_qdisc_ops;

+extern int qdisc_notify_pid(int pid, struct nlmsghdr *n, u32 clid,
+			struct Qdisc *old, struct Qdisc *new);
+
  extern int register_qdisc(struct Qdisc_ops *qops);
  extern int unregister_qdisc(struct Qdisc_ops *qops);
  extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/net/core/rtnetlink.c linux-2.6.23.8_mod/net/core/rtnetlink.c
--- linux-2.6.23.8/net/core/rtnetlink.c	2007-11-16 19:14:27.000000000 +0100
+++ linux-2.6.23.8_mod/net/core/rtnetlink.c	2007-12-21 
19:42:49.000000000 +0100
@@ -460,7 +460,7 @@ int rtnetlink_send(struct sk_buff *skb,
  	NETLINK_CB(skb).dst_group = group;
  	if (echo)
  		atomic_inc(&skb->users);
-	netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL);
+	netlink_broadcast(rtnl, skb, pid, group, gfp_any());
  	if (echo)
  		err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT);
  	return err;
diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/net/sched/sch_api.c linux-2.6.23.8_mod/net/sched/sch_api.c
--- linux-2.6.23.8/net/sched/sch_api.c	2007-11-16 19:14:27.000000000 +0100
+++ linux-2.6.23.8_mod/net/sched/sch_api.c	2007-12-21 19:42:49.000000000 
+0100
@@ -28,6 +28,7 @@
  #include <linux/list.h>
  #include <linux/hrtimer.h>

+#include <net/sock.h>
  #include <net/netlink.h>
  #include <net/pkt_sched.h>

@@ -841,6 +842,62 @@ rtattr_failure:
  	nlmsg_trim(skb, b);
  	return -1;
  }
+static int tc_fill(struct sk_buff *skb, struct Qdisc *q, u32 clid,
+			 u32 pid, u32 seq, u16 flags, int event)
+{
+	struct tcmsg *tcm;
+	struct nlmsghdr  *nlh;
+	unsigned char *b = skb_tail_pointer(skb);
+
+	nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags);
+	tcm = NLMSG_DATA(nlh);
+	tcm->tcm_family = AF_UNSPEC;
+	tcm->tcm__pad1 = 0;
+	tcm->tcm__pad2 = 0;
+	tcm->tcm_ifindex = q->dev->ifindex;
+	tcm->tcm_parent = clid;
+	tcm->tcm_handle = q->handle;
+	tcm->tcm_info = atomic_read(&q->refcnt);
+	RTA_PUT(skb, TCA_KIND, IFNAMSIZ, q->ops->id);
+	if (q->ops->dump && q->ops->dump(q, skb) < 0)
+		goto rtattr_failure;
+
+	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
+
+	return skb->len;
+
+nlmsg_failure:
+rtattr_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+int qdisc_notify_pid(int pid, struct nlmsghdr *n,
+			u32 clid, struct Qdisc *old, struct Qdisc *new)
+{
+	struct sk_buff *skb;
+	skb = alloc_skb(NLMSG_GOODSIZE, gfp_any());
+	if (!skb)
+		return -ENOBUFS;
+
+	if (old && old->handle) {
+		if (tc_fill(skb, old, clid, pid, n->nlmsg_seq,
+				0, RTM_DELQDISC) < 0)
+			goto err_out;
+	}
+	if (new) {
+		if (tc_fill(skb, new, clid, pid, n->nlmsg_seq,
+				old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0)
+			goto err_out;
+	}
+	if (skb->len)
+		return rtnetlink_send(skb, pid, RTNLGRP_TC, n->nlmsg_flags);
+
+err_out:
+	kfree_skb(skb);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(qdisc_notify_pid);

  static int qdisc_notify(struct sk_buff *oskb, struct nlmsghdr *n,
  			u32 clid, struct Qdisc *old, struct Qdisc *new)
@@ -848,7 +905,7 @@ static int qdisc_notify(struct sk_buff *
  	struct sk_buff *skb;
  	u32 pid = oskb ? NETLINK_CB(oskb).pid : 0;

-	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+	skb = alloc_skb(NLMSG_GOODSIZE, gfp_any());
  	if (!skb)
  		return -ENOBUFS;

diff -uprN -X linux-2.6.23.8/Documentation/dontdiff 
linux-2.6.23.8/net/sched/sch_netem.c 
linux-2.6.23.8_mod/net/sched/sch_netem.c
--- linux-2.6.23.8/net/sched/sch_netem.c	2007-11-16 19:14:27.000000000 +0100
+++ linux-2.6.23.8_mod/net/sched/sch_netem.c	2007-12-21 
19:42:49.000000000 +0100
@@ -11,6 +11,8 @@
   *
   * Authors:	Stephen Hemminger <shemminger@osdl.org>
   *		Catalin(ux aka Dino) BOIE <catab at umbrella dot ro>
+ *              netem trace: Ariane Keller <arkeller at ee.ethz.ch> ETH 
Zurich
+ *                           Rainer Baumann <baumann at hypert.net> ETH 
Zurich
   */

  #include <linux/module.h>
@@ -19,11 +21,13 @@
  #include <linux/errno.h>
  #include <linux/skbuff.h>
  #include <linux/rtnetlink.h>
-
+#include <linux/list.h>
  #include <net/netlink.h>
  #include <net/pkt_sched.h>

-#define VERSION "1.2"
+#include "net/flowseed.h"
+
+#define VERSION "1.3"

  /*	Network Emulation Queuing algorithm.
  	====================================
@@ -49,6 +53,11 @@

  	 The simulator is limited by the Linux timer resolution
  	 and will create packet bursts on the HZ boundary (1ms).
+
+	 The trace option allows us to read the values for packet delay,
+	 duplication, loss and corruption from a tracefile. This permits
+	 the modulation of statistical properties such as long-range
+	 dependences. See http://tcn.hypert.net.
  */

  struct netem_sched_data {
@@ -65,7 +74,11 @@ struct netem_sched_data {
  	u32 duplicate;
  	u32 reorder;
  	u32 corrupt;
-
+	u32 trace;
+	u32 ticks;
+	u32 def;
+	u32 flowid;
+	u32 bufnr;
  	struct crndstate {
  		u32 last;
  		u32 rho;
@@ -75,13 +88,29 @@ struct netem_sched_data {
  		u32  size;
  		s16 table[0];
  	} *delay_dist;
+
+	struct tcn_statistic *statistic;
+	struct tcn_control *flowbuffer;
+};
+
+struct  buflist {
+	struct list_head list;
+	char *buf;
  };

+
  /* Time stamp put into socket buffer control block */
  struct netem_skb_cb {
  	psched_time_t	time_to_send;
  };

+
+#define MASK_BITS	29
+#define MASK_DELAY	((1<<MASK_BITS)-1)
+#define MASK_HEAD       ~MASK_DELAY
+
+enum tcn_action { FLOW_NORMAL, FLOW_DROP, FLOW_DUP, FLOW_MANGLE };
+
  /* init_crandom - initialize correlated random number generator
   * Use entropy source for initial seed.
   */
@@ -141,6 +170,72 @@ static psched_tdiff_t tabledist(psched_t
  	return  x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu;
  }

+/* don't call this function directly. It is called after
+ * a packet has been taken out of a buffer and it was the last.
+ */
+static int reload_flowbuffer(struct netem_sched_data *q, struct Qdisc *sch)
+{
+	struct tcn_control *flow = q->flowbuffer;
+	struct nlmsghdr n;
+	struct buflist *element = list_entry(flow->full_buffer_list.next,
+					     struct buflist, list);
+	/* the current buffer is empty */
+	list_add_tail(&flow->buffer_in_use->list, &flow->empty_buffer_list);
+
+	if (list_empty(&q->flowbuffer->full_buffer_list)) {
+		printk(KERN_ERR "netem: reload_flowbuffer, no full buffer\n");
+		return -EFAULT;
+	}
+
+	list_del_init(&element->list);
+	flow->buffer_in_use = element;
+	flow->offsetpos = (int *)element->buf;
+	memset(&n, 0, sizeof(struct nlmsghdr));
+	n.nlmsg_seq = 1;
+	n.nlmsg_flags = NLM_F_REQUEST;
+	if (qdisc_notify_pid(q->flowid, &n, sch->parent, NULL, sch) < 0)
+		printk(KERN_ERR "netem: unable to request for more data\n");
+
+	return 0;
+}
+
+/* return pktdelay with delay and drop/dupl/corrupt option */
+static int get_next_delay(struct netem_sched_data *q, enum tcn_action 
*head,
+			  struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct tcn_control *flow = q->flowbuffer;
+	u32 variout;
+	/*choose whether to drop or 0 delay packets on default*/
+	*head = q->def;
+
+	if (!flow) {
+		printk(KERN_ERR "netem: read from an uninitialized flow.\n");
+		q->statistic->novaliddata++;
+		return 0;
+	}
+	if (!flow->buffer_in_use) {
+		printk(KERN_ERR "netem: read from uninitialized flow\n");
+		return 0;
+	}
+	if (!flow->buffer_in_use->buf || !flow->offsetpos) {
+		printk(KERN_ERR "netem: buffer empty or offsetpos null\n");
+		return 0;
+	}
+
+	q->statistic->packetcount++;
+	/* check if we have to reload a buffer */
+	if ((void *)flow->offsetpos - (void *)flow->buffer_in_use->buf == 
DATA_PACKAGE)
+		reload_flowbuffer(q, sch);
+
+	variout = *flow->offsetpos++;
+	*head = (variout & MASK_HEAD) >> MASK_BITS;
+
+	(&q->statistic->normaldelay)[*head] += 1;
+	q->statistic->packetok++;
+
+	return ((variout & MASK_DELAY) * q->ticks) / 1000;
+}
+
  /*
   * Insert one skb into qdisc.
   * Note: parent depends on return value to account for queue length.
@@ -153,17 +248,23 @@ static int netem_enqueue(struct sk_buff
  	/* We don't fill cb now as skb_unshare() may invalidate it */
  	struct netem_skb_cb *cb;
  	struct sk_buff *skb2;
+	enum tcn_action action = FLOW_NORMAL;
+	psched_tdiff_t delay  = -1;
  	int ret;
  	int count = 1;

  	pr_debug("netem_enqueue skb=%p\n", skb);
+	if (q->trace)
+		delay = get_next_delay(q, &action, sch->q.next, sch);

  	/* Random duplication */
-	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
+	if (q->trace ? action == FLOW_DUP :
+	    (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)))
  		++count;

  	/* Random packet drop 0 => none, ~0 => all */
-	if (q->loss && q->loss >= get_crandom(&q->loss_cor))
+	if (q->trace ? action == FLOW_DROP :
+	    (q->loss && q->loss >= get_crandom(&q->loss_cor)))
  		--count;

  	if (count == 0) {
@@ -194,7 +295,8 @@ static int netem_enqueue(struct sk_buff
  	 * If packet is going to be hardware checksummed, then
  	 * do it now in software before we mangle it.
  	 */
-	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+	if (q->trace ? action == FLOW_MANGLE :
+	    (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor))) {
  		if (!(skb = skb_unshare(skb, GFP_ATOMIC))
  		    || (skb->ip_summed == CHECKSUM_PARTIAL
  			&& skb_checksum_help(skb))) {
@@ -210,10 +312,10 @@ static int netem_enqueue(struct sk_buff
  	    || q->counter < q->gap 	/* inside last reordering gap */
  	    || q->reorder < get_crandom(&q->reorder_cor)) {
  		psched_time_t now;
-		psched_tdiff_t delay;

-		delay = tabledist(q->latency, q->jitter,
-				  &q->delay_cor, q->delay_dist);
+		if (!q->trace)
+			delay = tabledist(q->latency, q->jitter,
+					  &q->delay_cor, q->delay_dist);

  		now = psched_get_time();
  		cb->time_to_send = now + delay;
@@ -332,6 +434,61 @@ static int set_fifo_limit(struct Qdisc *
  	return ret;
  }

+static void reset_stats(struct netem_sched_data *q)
+{
+	if (q->statistic)
+		memset(q->statistic, 0, sizeof(*(q->statistic)));
+	return;
+}
+
+static void free_flowbuffer(struct netem_sched_data *q)
+{
+	struct buflist *cursor;
+	struct buflist *next;
+	list_for_each_entry_safe(cursor, next,
+				 &q->flowbuffer->full_buffer_list, list) {
+		kfree(cursor->buf);
+		list_del(&cursor->list);
+		kfree(cursor);
+	}
+
+	list_for_each_entry_safe(cursor, next,
+				 &q->flowbuffer->empty_buffer_list, list) {
+		kfree(cursor->buf);
+		list_del(&cursor->list);
+		kfree(cursor);
+	}
+
+	kfree(q->flowbuffer->buffer_in_use->buf);
+	kfree(q->flowbuffer->buffer_in_use);
+
+	kfree(q->statistic);
+	kfree(q->flowbuffer);
+	q->statistic = NULL;
+	q->flowbuffer = NULL;
+
+}
+
+static int init_flowbuffer(unsigned int fid, struct netem_sched_data *q)
+{
+	q->statistic = kzalloc(sizeof(*(q->statistic)), GFP_KERNEL);
+	q->flowbuffer = kmalloc(sizeof(*(q->flowbuffer)), GFP_KERNEL);
+
+	INIT_LIST_HEAD(&q->flowbuffer->full_buffer_list);
+	INIT_LIST_HEAD(&q->flowbuffer->empty_buffer_list);
+
+	while (q->bufnr > 0) {
+		int size = sizeof(struct buflist);
+		struct buflist *element = kmalloc(size, GFP_KERNEL);
+		element->buf =  kmalloc(DATA_PACKAGE, GFP_KERNEL);
+		list_add(&element->list, &q->flowbuffer->empty_buffer_list);
+		q->bufnr--;
+	}
+	q->flowbuffer->buffer_in_use = NULL;
+	q->flowbuffer->offsetpos = NULL;
+	return 0;
+}
+
  /*
   * Distribution data is a variable size payload containing
   * signed 16 bit values.
@@ -403,6 +560,87 @@ static int get_corrupt(struct Qdisc *sch
  	return 0;
  }

+static int get_trace(struct Qdisc *sch, const struct rtattr *attr)
+{
+	struct netem_sched_data *q = qdisc_priv(sch);
+	const struct tc_netem_trace *traceopt = RTA_DATA(attr);
+	struct nlmsghdr n;
+	if (RTA_PAYLOAD(attr) != sizeof(*traceopt))
+		return -EINVAL;
+
+	if (traceopt->fid) {
+		q->ticks = traceopt->ticks;
+		q->bufnr = traceopt->nr_bufs;
+		q->trace = 1;
+		init_flowbuffer(traceopt->fid, q);
+	} else {
+		printk(KERN_ERR "netem: invalid flow id\n");
+		q->trace = 0;
+	}
+	q->def = traceopt->def;
+	q->flowid = traceopt->fid;
+
+	memset(&n, 0, sizeof(struct nlmsghdr));
+
+	n.nlmsg_seq = 1;
+	n.nlmsg_flags = NLM_F_REQUEST;
+
+	if (qdisc_notify_pid(traceopt->fid, &n, sch->parent, NULL, sch) < 0) {
+		printk(KERN_ERR "netem: could not send notification");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int get_trace_data(struct Qdisc *sch, const struct rtattr *attr)
+{
+	struct netem_sched_data *q = qdisc_priv(sch);
+	const char *msg = RTA_DATA(attr);
+	int fid, validData;
+	struct buflist *element;
+	struct tcn_control *flow;
+	if (RTA_PAYLOAD(attr) != DATA_PACKAGE_ID) {
+		printk("get_trace_data: invalid size\n");
+		return -EINVAL;
+	}
+	memcpy(&fid, msg + DATA_PACKAGE, sizeof(int));
+	memcpy(&validData, msg + DATA_PACKAGE + sizeof(int), sizeof(int));
+
+	/* check whether this process is allowed to send data */
+	if (fid != q->flowid)
+		return -EPERM;
+
+	/* no empty buffer */
+	if (list_empty(&q->flowbuffer->empty_buffer_list))
+		return -ENOBUFS;
+
+	element = list_entry(q->flowbuffer->empty_buffer_list.next,
+			     struct buflist, list);
+	if (element->buf == NULL)
+		return -ENOBUFS;
+
+	list_del_init(&element->list);
+	memcpy(element->buf, msg, DATA_PACKAGE);
+	flow = q->flowbuffer;
+	if (flow->buffer_in_use == NULL) {
+		flow->buffer_in_use = element;
+		flow->offsetpos = (int *)element->buf;
+	} else
+		list_add_tail(&element->list, &q->flowbuffer->full_buffer_list);
+
+	if (!list_empty(&q->flowbuffer->empty_buffer_list)) {
+		struct nlmsghdr n;
+		memset(&n, 0, sizeof(struct nlmsghdr));
+		n.nlmsg_flags = NLM_F_REQUEST;
+		n.nlmsg_seq = 1;
+		if (qdisc_notify_pid(fid, &n, sch->parent, NULL, sch) < 0)
+			printk(KERN_NOTICE "could not send data "
+				"request for flow %i\n", fid);
+	}
+	q->statistic->reloadbuffer++;
+	return 0;
+}
+
  /* Parse netlink message to set options */
  static int netem_change(struct Qdisc *sch, struct rtattr *opt)
  {
@@ -414,11 +652,6 @@ static int netem_change(struct Qdisc *sc
  		return -EINVAL;

  	qopt = RTA_DATA(opt);
-	ret = set_fifo_limit(q->qdisc, qopt->limit);
-	if (ret) {
-		pr_debug("netem: can't set fifo limit\n");
-		return ret;
-	}

  	q->latency = qopt->latency;
  	q->jitter = qopt->jitter;
@@ -444,6 +677,29 @@ static int netem_change(struct Qdisc *sc
  				 RTA_PAYLOAD(opt) - sizeof(*qopt)))
  			return -EINVAL;

+		/* its a user tc add or tc change command.
+		 * We free the flowbuffer*/
+		if (!tb[TCA_NETEM_TRACE_DATA-1] && q->trace) {
+			struct nlmsghdr n;
+			q->trace = 0;
+			memset(&n, 0, sizeof(struct nlmsghdr));
+			n.nlmsg_flags = NLM_F_REQUEST;
+			n.nlmsg_seq = 1;
+			if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0)
+				printk(KERN_NOTICE "netem: cannot send notification\n");
+
+			reset_stats(q);
+			free_flowbuffer(q);
+
+			/* we set the fifo limit: this is done here
+			 * since TRACE_DATA memset qopt to 0 */
+			ret = set_fifo_limit(q->qdisc, qopt->limit);
+			if (ret) {
+				pr_debug("netem: can't set fifo limit\n");
+				return ret;
+			}
+		}
+
  		if (tb[TCA_NETEM_CORR-1]) {
  			ret = get_correlation(sch, tb[TCA_NETEM_CORR-1]);
  			if (ret)
@@ -467,7 +723,40 @@ static int netem_change(struct Qdisc *sc
  			if (ret)
  				return ret;
  		}
+		if (tb[TCA_NETEM_TRACE-1]) {
+			ret = get_trace(sch, tb[TCA_NETEM_TRACE-1]);
+			if (ret)
+				return ret;
+		}
+		if (tb[TCA_NETEM_TRACE_DATA-1]) {
+			ret = get_trace_data(sch, tb[TCA_NETEM_TRACE_DATA-1]);
+			if (ret)
+				return ret;
+		}
+
  	}
+	/* it was a user tc add or tc change request,
+	 * we delete the current flowbuffer*/
+	else {
+		if (q->trace) {
+			struct nlmsghdr n;
+			q->trace = 0;
+			memset(&n, 0, sizeof(struct nlmsghdr));
+			n.nlmsg_flags = NLM_F_REQUEST;
+			n.nlmsg_seq = 1;
+			if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0)
+				printk(KERN_NOTICE "netem: could not send notification\n");
+			reset_stats(q);
+			free_flowbuffer(q);
+		}
+		/* we set the fifo limit */
+		ret = set_fifo_limit(q->qdisc, qopt->limit);
+		if (ret) {
+			pr_debug("netem: can't set fifo limit\n");
+			return ret;
+		}
+	}
+

  	return 0;
  }
@@ -567,6 +856,7 @@ static int netem_init(struct Qdisc *sch,

  	qdisc_watchdog_init(&q->watchdog, sch);

+	q->trace = 0;
  	q->qdisc = qdisc_create_dflt(sch->dev, &tfifo_qdisc_ops,
  				     TC_H_MAKE(sch->handle, 1));
  	if (!q->qdisc) {
@@ -585,6 +875,16 @@ static int netem_init(struct Qdisc *sch,
  static void netem_destroy(struct Qdisc *sch)
  {
  	struct netem_sched_data *q = qdisc_priv(sch);
+	if (q->trace) {
+		struct nlmsghdr n;
+		q->trace = 0;
+		memset(&n, 0, sizeof(struct nlmsghdr));
+		n.nlmsg_flags = NLM_F_REQUEST;
+		n.nlmsg_seq = 1;
+		if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0)
+			printk(KERN_NOTICE "netem: could not send notification\n");
+		free_flowbuffer(q);
+	}

  	qdisc_watchdog_cancel(&q->watchdog);
  	qdisc_destroy(q->qdisc);
@@ -600,6 +900,7 @@ static int netem_dump(struct Qdisc *sch,
  	struct tc_netem_corr cor;
  	struct tc_netem_reorder reorder;
  	struct tc_netem_corrupt corrupt;
+	struct tc_netem_trace traceopt;

  	qopt.latency = q->latency;
  	qopt.jitter = q->jitter;
@@ -622,6 +923,23 @@ static int netem_dump(struct Qdisc *sch,
  	corrupt.correlation = q->corrupt_cor.rho;
  	RTA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt);

+	traceopt.fid = q->trace;
+	traceopt.def = q->def;
+	traceopt.ticks = q->ticks;
+	RTA_PUT(skb, TCA_NETEM_TRACE, sizeof(traceopt), &traceopt);
+
+	if (q->trace) {
+		struct tc_netem_stats tstats;
+		tstats.packetcount = q->statistic->packetcount;
+		tstats.packetok = q->statistic->packetok;
+		tstats.normaldelay = q->statistic->normaldelay;
+		tstats.drops = q->statistic->drops;
+		tstats.dupl = q->statistic->dupl;
+		tstats.corrupt = q->statistic->corrupt;
+		tstats.novaliddata = q->statistic->novaliddata;
+		tstats.reloadbuffer = q->statistic->reloadbuffer;
+		RTA_PUT(skb, TCA_NETEM_STATS, sizeof(tstats), &tstats);
+	}
  	rta->rta_len = skb_tail_pointer(skb) - b;

  	return skb->len;


Ben Greear wrote:
> Ariane Keller wrote:
> 
>> Yes, for short-term starvation it helps certainly.
>> But I'm still not convinced that it is really necessary to add more 
>> buffers, because I'm not sure whether the bottleneck is really the 
>> loading of data from user space to kernel space.
>> Some basic tests have shown that the kernel starts loosing packets at 
>> approximately the same packet rate regardless whether we use netem, or 
>> netem with the trace extension.
>> But if you have contrary experience I'm happy to add a parameter which 
>> defines the number of buffers.
> 
> I have no numbers, so if you think it works, then that is fine with me.
> 
> If you actually run out of the trace buffers, do you just continue to
> run with the last settings?  If so, that would keep up throughput
> even if you are out of trace buffers...
> 
> What rates do you see, btw?  (pps, bps).
> 
> Thanks,
> Ben
> 

-- 
Ariane Keller
Communication Systems Research Group, ETH Zurich
Web: http://www.csg.ethz.ch/people/arkeller
Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich

^ permalink raw reply

* Re: [PATCH 0/2] netem: trace enhancement: iproute
From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev,
	Rainer Baumann
In-Reply-To: <4755D2EB.4000807@candelatech.com>

The iproute patch is to big to send on the mailing list,
since the distribution data have changed the directory.
For ease of discussion I add the important changes in this mail.

signed-of-by: Ariane Keller <ariane.keller@tik.ee.ethz.ch

---

diff -uprN iproute2-2.6.23/netem/trace/flowseed.c 
iproute2-2.6.23_buf/netem/trace/flowseed.c
--- iproute2-2.6.23/netem/trace/flowseed.c	1970-01-01 01:00:00.000000000 
+0100
+++ iproute2-2.6.23_buf/netem/trace/flowseed.c	2007-12-12 
08:43:01.000000000 +0100
@@ -0,0 +1,209 @@
+/* flowseed.c    flowseedprocess to deliver values for packet delay,
+ *               duplication, loss and curruption form userspace to netem
+ *
+ *               This program is free software; you can redistribute it 
and/or
+ *               modify it under the terms of the GNU General Public 
License
+ *               as published by the Free Software Foundation; either 
version
+ *               2 of the License, or (at your option) any later version.
+ *
+ *  Authors:     Ariane Keller <arkeller@ee.ethz.ch> ETH Zurich
+ *               Rainer Baumann <baumann@hypert.net> ETH Zurich
+ */
+
+#include <ctype.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <sys/ipc.h>
+#include <sys/sem.h>
+#include <signal.h>
+
+#include "utils.h"
+#include "linux/pkt_sched.h"
+
+#define DATA_PACKAGE 4000
+#define DATA_PACKAGE_ID DATA_PACKAGE + sizeof(unsigned int) + sizeof(int)
+#define TCA_BUF_MAX  (64*1024)
+/* maximal amount of parallel flows */
+struct rtnl_handle rth;
+unsigned int loop;
+int infinity = 0;
+int fdflowseed;
+char *sendpkg;
+int fid;
+int initialized = 0;
+int semid;
+int moreData = 1, r = 0, rold = 0;
+FILE * file;
+
+
+int printfct(const struct sockaddr_nl *who,
+		       struct nlmsghdr *n,
+		       void *arg)
+{
+	struct {
+		struct nlmsghdr 	n;
+		struct tcmsg 		t;
+		char   			buf[TCA_BUF_MAX];
+	} req;
+	struct tcmsg *t = NLMSG_DATA(n);
+	struct rtattr *tail = NULL;
+	struct tc_netem_qopt opt;
+	memset(&opt, 0, sizeof(opt));
+
+	if(n->nlmsg_type == RTM_DELQDISC) {
+		goto outerr;
+	}
+	else if(n->nlmsg_type == RTM_NEWQDISC){
+		initialized = 1;
+	
+		memset(&req, 0, sizeof(req));
+		req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+		req.n.nlmsg_flags = NLM_F_REQUEST;
+		req.n.nlmsg_type = RTM_NEWQDISC;
+		req.t.tcm_family = AF_UNSPEC;
+		req.t.tcm_handle = t->tcm_handle;
+		req.t.tcm_parent = t->tcm_parent;
+		req.t.tcm_ifindex = t->tcm_ifindex;
+
+		tail = NLMSG_TAIL(&req.n);
+again:
+		if (loop <= 0 && !infinity){
+			goto out;
+		}
+		if ((r = read(fdflowseed, sendpkg + rold, DATA_PACKAGE - rold)) >= 0) {
+			if (r + rold < DATA_PACKAGE) {
+			/* Tail of input file reached,
+			   set rest at start from next iteration */
+				rold = r;
+				fprintf(file, "flowseed: at end of file.\n");
+
+				if (lseek(fdflowseed, 0L, SEEK_SET) < 0){
+					perror("lseek reset");
+					goto out;
+				}
+				goto again;
+			}
+			r = 0;
+			rold = 0;
+			memcpy(sendpkg + DATA_PACKAGE, &fid, sizeof(int));
+			memcpy(sendpkg + DATA_PACKAGE + sizeof(int), &moreData, sizeof(int));
+		
+			/* opt has to be added for each netem request */
+			if (addattr_l(&req.n, TCA_BUF_MAX, TCA_OPTIONS, &opt, sizeof(opt)) < 0){
+				perror("add options");
+				return -1;
+			}
+
+			if(addattr_l(&req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, 
DATA_PACKAGE_ID) < 0){
+				perror("add data\n");
+				return -1;
+			}
+
+			tail->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)tail;
+
+			if(rtnl_send(&rth, (char*)&req, req.n.nlmsg_len) < 0){
+				perror("send data");
+				return -1;
+			}
+			return 0;
+		}
+	}
+/* no more data, what to do? we send a notification to the kernel module */
+out:
+	fprintf(stderr, "flowseed: Tail of input file reached. Exit.\n");
+	fprintf(file, "flowseed: Tail of input file reached. Exit.\n");
+	moreData = 0;
+	memcpy(sendpkg + DATA_PACKAGE, &fid, sizeof(int));
+	memcpy(sendpkg + DATA_PACKAGE + sizeof(int), &moreData, sizeof(int));
+	if (addattr_l(&req.n, TCA_BUF_MAX, TCA_OPTIONS, &opt, sizeof(opt)) < 0){
+		perror("add options");
+		goto outerr;
+	}
+	if(addattr_l(&req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, 
DATA_PACKAGE_ID) < 0){
+		perror("add data\n");
+		goto outerr;
+	}
+	
+	tail->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)tail;
+
+	if(rtnl_send(&rth, (char*)&req, req.n.nlmsg_len) < 0){
+		perror("rtnl_send");
+	}
+outerr:
+	fprintf(file, "flowseed: outerr Exit.\n");
+	fclose(file);
+	close(fdflowseed);
+	free(sendpkg);
+	rtnl_close(&rth);
+	exit(0);
+}
+
+void sigact(int signal){
+	if(initialized){
+		return;
+	}
+	else{
+		fprintf(stderr, "flowseed: not yet initialized. Exit\n");
+		exit(0);
+	}
+}
+int main(int argc, char *argv[])
+{
+	struct sembuf buf;
+        file = fopen("flowseedout.txt", "a+");
+	fprintf(file, "flowseed: initial msg.\n");
+
+	if (argc < 3) {
+		printf("usage: <tracefilename> <loop>");
+		return -1;
+	}
+	loop = strtoul(argv[2], NULL, 10);
+	if (loop == 0)
+		infinity = 1;
+
+	if ((fdflowseed = open(argv[1], O_RDONLY, 0)) < 0) {
+		perror("cannot open tracefile");
+		return -1;
+	}
+
+	fid = getpid();
+	sendpkg = malloc(DATA_PACKAGE_ID);
+
+	if (rtnl_open(&rth, 0) < 0) {
+		perror("Cannot open rtnetlink");
+		return -1;
+	}
+	ll_init_map(&rth);
+	/* we are ready to receive notifications */
+	if((semid = semget(0x12345678, 1, IPC_CREAT | 0666))<0){
+		perror("semget");
+		return -1;
+	}
+	buf.sem_num = 0;
+	buf.sem_op = +1;
+	buf.sem_flg = SEM_UNDO;
+	if(semop(semid, &buf, 1) < 0){
+		perror("semop");
+		return -1;
+	}
+	/* if the user typed an invalid command we cannot detect this
+ 	 * therefore we set a timer, if the timer expires before we receive
+ 	 * any message from the kernel module, we assume there was an
+ 	 * error and quit.
+ 	 */
+	signal(SIGALRM, sigact);
+	alarm(3);
+
+	/* listen to notifications from kernel */
+	if (rtnl_listen(&rth, printfct, NULL) < 0) {
+		perror("listen");
+		rtnl_close(&rth);
+		exit(2);
+	}
+	return 0;
+}


diff -uprN iproute2-2.6.23/tc/q_netem.c iproute2-2.6.23_buf/tc/q_netem.c
--- iproute2-2.6.23/tc/q_netem.c	2007-10-16 23:27:42.000000000 +0200
+++ iproute2-2.6.23_buf/tc/q_netem.c	2007-12-21 19:08:19.000000000 +0100
@@ -6,7 +6,12 @@
   *		as published by the Free Software Foundation; either version
   *		2 of the License, or (at your option) any later version.
   *
+ *		README files: 	iproute2/netem/distribution
+ *				iproute2/netem/trace
+ *
   * Authors:	Stephen Hemminger <shemminger@osdl.org>
+ *              netem trace: Ariane Keller <arkeller@ee.ethz.ch> ETH Zurich
+ *                           Rainer Baumann <baumann@hypert.net> ETH Zurich
   *
   */

@@ -20,6 +25,9 @@
  #include <arpa/inet.h>
  #include <string.h>
  #include <errno.h>
+#include <sys/types.h>
+#include <sys/ipc.h>
+#include <sys/sem.h>

  #include "utils.h"
  #include "tc_util.h"
@@ -34,7 +42,8 @@ static void explain(void)
  "                 [ drop PERCENT [CORRELATION]] \n" \
  "                 [ corrupt PERCENT [CORRELATION]] \n" \
  "                 [ duplicate PERCENT [CORRELATION]]\n" \
-"                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n");
+"                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n" \
+"                 [ trace PATH buf NR_BUFS loops NR_LOOPS [DEFAULT]\n");
  }

  static void explain1(const char *arg)
@@ -42,6 +51,7 @@ static void explain1(const char *arg)
  	fprintf(stderr, "Illegal \"%s\"\n", arg);
  }

+#define FLOWPATH "/usr/local/bin/flowseed"
  #define usage() return(-1)

  /*
@@ -129,6 +139,7 @@ static int netem_parse_opt(struct qdisc_
  	struct tc_netem_corr cor;
  	struct tc_netem_reorder reorder;
  	struct tc_netem_corrupt corrupt;
+	struct tc_netem_trace traceopt;
  	__s16 *dist_data = NULL;
  	int present[__TCA_NETEM_MAX];

@@ -137,8 +148,12 @@ static int netem_parse_opt(struct qdisc_
  	memset(&cor, 0, sizeof(cor));
  	memset(&reorder, 0, sizeof(reorder));
  	memset(&corrupt, 0, sizeof(corrupt));
+	memset(&traceopt, 0, sizeof(traceopt));
  	memset(present, 0, sizeof(present));
-
+	if (argc == 0) {
+		explain();
+		return -1;
+	}
  	while (argc > 0) {
  		if (matches(*argv, "limit") == 0) {
  			NEXT_ARG();
@@ -164,7 +179,7 @@ static int netem_parse_opt(struct qdisc_
  				if (NEXT_IS_NUMBER()) {
  					NEXT_ARG();
  					++present[TCA_NETEM_CORR];
-					if (get_percent(&cor.delay_corr,							*argv)) {
+					if (get_percent(&cor.delay_corr, *argv)) {
  						explain1("latency");
  						return -1;
  					}
@@ -243,6 +258,75 @@ static int netem_parse_opt(struct qdisc_
  		} else if (strcmp(*argv, "help") == 0) {
  			explain();
  			return -1;
+		} else if (strcmp(*argv, "trace") == 0) {
+			int fd;
+			int execvl;
+			char *filename;
+			int pid;
+		
+			/*get ticks correct since tracefile is in us,
+			 *and ticks may not be equal to us
+			 */
+			get_ticks(&traceopt.ticks, "1000us");
+			NEXT_ARG();
+			filename = *argv;
+			if ((fd = open(filename, O_RDONLY, 0)) < 0) {
+				fprintf(stderr, "Cannot open trace file %s! \n", filename);
+				return -1;
+			}
+			close(fd);
+			NEXT_ARG();
+			if(strcmp(*argv, "buf") == 0) {
+				NEXT_ARG();
+				traceopt.nr_bufs = atoi(*argv);
+			}
+			else{
+				explain();
+				return -1;
+			}
+			NEXT_ARG();
+			if (strcmp(*argv, "loops") == 0 && NEXT_IS_NUMBER()) {
+				NEXT_ARG();
+				/*child will load tracefile to kernel */
+				switch (pid = fork()) {
+				case -1:{
+					fprintf(stderr,
+						"Cannot fork\n");
+					return -1;
+					}
+				case 0:{
+					execvl = execl(FLOWPATH, "flowseed", filename, *argv, NULL);
+					if (execvl < 0) {
+						fprintf(stderr,
+						"starting child failed\n");
+						return -1;
+					}
+					}
+				default:{
+					/* parent has to wait until child has done rtnl_open.
+ 					 * otherwise the kernel module cannot send a notification
+ 					 * to the child
+ 					 */
+					int semid = semget(0x12345678, 1, IPC_CREAT | 0666);
+					struct sembuf buf;
+					buf.sem_num = 0;
+					buf.sem_op = -1;
+					buf.sem_flg = SEM_UNDO;
+					semop(semid, &buf, 1);
+					semctl(semid, 0, IPC_RMID);
+					}
+				}
+			}
+			else {
+				explain();
+				return -1;
+			}
+			traceopt.def = 0;
+			if (NEXT_IS_NUMBER()) {
+				NEXT_ARG();
+				traceopt.def = atoi(*argv);
+			}
+			traceopt.fid = pid;
  		} else {
  			fprintf(stderr, "What is \"%s\"?\n", *argv);
  			explain();
@@ -291,7 +375,13 @@ static int netem_parse_opt(struct qdisc_
  			      dist_data, dist_size*sizeof(dist_data[0])) < 0)
  			return -1;
  	}
-	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	if (traceopt.fid) {
+		if (addattr_l(n, TCA_BUF_MAX, TCA_NETEM_TRACE, &traceopt,
+		     sizeof(traceopt)) < 0)
+			return -1;
+	}
+
+	tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
  	return 0;
  }

@@ -300,6 +390,8 @@ static int netem_print_opt(struct qdisc_
  	const struct tc_netem_corr *cor = NULL;
  	const struct tc_netem_reorder *reorder = NULL;
  	const struct tc_netem_corrupt *corrupt = NULL;
+	const struct tc_netem_trace *traceopt = NULL;
+	const struct tc_netem_stats *tracestats = NULL;
  	struct tc_netem_qopt qopt;
  	int len = RTA_PAYLOAD(opt) - sizeof(qopt);
  	SPRINT_BUF(b1);
@@ -333,9 +425,31 @@ static int netem_print_opt(struct qdisc_
  				return -1;
  			corrupt = RTA_DATA(tb[TCA_NETEM_CORRUPT]);
  		}
+		if (tb[TCA_NETEM_TRACE]) {
+			if (RTA_PAYLOAD(tb[TCA_NETEM_TRACE]) < sizeof(*traceopt))
+				return -1;
+			traceopt = RTA_DATA(tb[TCA_NETEM_TRACE]);
+		}
+		if (tb[TCA_NETEM_STATS]) {
+			if (RTA_PAYLOAD(tb[TCA_NETEM_STATS]) < sizeof(*tracestats))
+				return -1;
+			tracestats = RTA_DATA(tb[TCA_NETEM_STATS]);
+		}
  	}

  	fprintf(f, "limit %d", qopt.limit);
+	if (traceopt && traceopt->fid) {
+		fprintf(f, " trace\n");
+
+		fprintf(f, "packetcount= %d\n", tracestats->packetcount);
+		fprintf(f, "packetok= %d\n", tracestats->packetok);
+		fprintf(f, "normaldelay= %d\n", tracestats->normaldelay);
+		fprintf(f, "drops= %d\n", tracestats->drops);
+		fprintf(f, "dupl= %d\n", tracestats->dupl);
+		fprintf(f, "corrupt= %d\n", tracestats->corrupt);
+		fprintf(f, "novaliddata= %d\n", tracestats->novaliddata);
+		fprintf(f, "bufferreload= %d\n", tracestats->reloadbuffer);
+		}

  	if (qopt.latency) {
  		fprintf(f, " delay %s", sprint_ticks(qopt.latency, b1));


^ permalink raw reply

* [RFC] potential bugs in nexten
From: Al Viro @ 2007-12-23 20:01 UTC (permalink / raw)
  To: dhananjay; +Cc: jgarzik, netdev


* what are default: doing in netxen_nic_hw_write_wx()/netxen_nic_hw_read_wx()?
Unlike all other cases they do iomem->iomem copying and AFAICS they are never
actually triggered.

* netxen_nic_flash_print() reads the entire user_info from card *in* *host-endian*,
then uses user_info.serial_number[].
	a) do we need to read the rest?
	b) more interesting question, don't we need cpu_to_le32() here?  After
all, that sucker is an array of char, so we want it in the same order regardless
of the host...

* in netxen_nic_xmit_frame() we do
	hw->cmd_desc_head[saved_producer].flags_opcode =
		cpu_to_le16(hw->cmd_desc_head[saved_producer].flags_opcode);
	hw->cmd_desc_head[saved_producer].num_of_buffers_total_length =
	  cpu_to_le32(hw->cmd_desc_head[saved_producer].
			  num_of_buffers_total_length);
Huh?  Everything that modifies either of those does so in little-endian already.
This code appeared in commit 6c80b18df3537d1221ab34555c150bccbfd90260 (NetXen:
Port swap feature for multi port cards); what's going on there?


^ permalink raw reply

* [RFC] potential bug in cxgb3
From: Al Viro @ 2007-12-23 20:01 UTC (permalink / raw)
  To: divy; +Cc: jgarzik, netdev


int t3_seeprom_wp(struct adapter *adapter, int enable)
{
        return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0);
}

looks fishy, since t3_seeprom_write() takes the last argument in little-endian,
converts to host-endian and feeds it to pci_write_config_dword().  Passing it
a host-endian instead will end up with different values seen by the card on
l-e and b-e hosts.  Shouldn't it be s/0xc/cpu_to_le32(0xc) ?


^ permalink raw reply

* Re: [ETH]: Combine format_addr() with print_mac().
From: Michael Chan @ 2007-12-24  0:20 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, netdev, anilgv, michaelc, david.somayajulu
In-Reply-To: <1198306965.4895.5.camel@localhost>

On Fri, 2007-12-21 at 23:02 -0800, Joe Perches wrote:
> On Fri, 2007-12-21 at 19:58 -0800, Michael Chan wrote:
> > > ssize_t? shouldn't it be size_t?
> > I'm just keeping the prototype unchanged as originally defined in net-
> > sysfs.c
> 
> It's painless to change the prototype.
> size_t seems more sensible.
> 

I'll change the internal function to use size_t, but the exported
function for sysfs use will be ssize_t since sysfs uses ssize_t.

Here's the revised patch:

[ETH]: Combine format_addr() with print_mac().

print_mac() used many most net drivers and format_addr() used by
net-sysfs.c are very similar and they can be intergrated.

format_addr() is also identically redefined in the qla4xxx iscsi
driver.

Export a new function sysfs_format_mac() to be used by net-sysfs,
qla4xxx and others in the future.  Both print_mac() and
sysfs_format_mac() call _format_mac_addr() to do the formatting.

Changed print_mac() to use unsigned char * to be consistent with
net_device struct's dev_addr.  Added buffer length overrun checking
as suggested by Joe Perches.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: David Somayajulu <david.somayajulu@qlogic.com>

diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 89460d2..4e9cf18 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -173,18 +173,6 @@ static void qla4xxx_conn_stop(struct iscsi_cls_conn *conn, int flag)
 		printk(KERN_ERR "iscsi: invalid stop flag %d\n", flag);
 }
 
-static ssize_t format_addr(char *buf, const unsigned char *addr, int len)
-{
-	int i;
-	char *cp = buf;
-
-	for (i = 0; i < len; i++)
-		cp += sprintf(cp, "%02x%c", addr[i],
-			      i == (len - 1) ? '\n' : ':');
-	return cp - buf;
-}
-
-
 static int qla4xxx_host_get_param(struct Scsi_Host *shost,
 				  enum iscsi_host_param param, char *buf)
 {
@@ -193,7 +181,7 @@ static int qla4xxx_host_get_param(struct Scsi_Host *shost,
 
 	switch (param) {
 	case ISCSI_HOST_PARAM_HWADDRESS:
-		len = format_addr(buf, ha->my_mac, MAC_ADDR_LEN);
+		len = sysfs_format_mac(buf, ha->my_mac, MAC_ADDR_LEN);
 		break;
 	case ISCSI_HOST_PARAM_IPADDRESS:
 		len = sprintf(buf, "%d.%d.%d.%d\n", ha->ip_address[0],
diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index cc002cb..7a1e011 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -124,12 +124,14 @@ int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr);
 extern struct ctl_table ether_table[];
 #endif
 
+extern ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len);
+
 /*
  *	Display a 6 byte device address (MAC) in a readable format.
  */
-#define MAC_FMT "%02x:%02x:%02x:%02x:%02x:%02x"
-extern char *print_mac(char *buf, const u8 *addr);
-#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused
+extern char *print_mac(char *buf, const unsigned char *addr);
+#define MAC_BUF_SIZE	18
+#define DECLARE_MAC_BUF(var) char var[MAC_BUF_SIZE] __maybe_unused
 
 #endif
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e41f4b9..7635d3f 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -95,17 +95,6 @@ NETDEVICE_SHOW(type, fmt_dec);
 NETDEVICE_SHOW(link_mode, fmt_dec);
 
 /* use same locking rules as GIFHWADDR ioctl's */
-static ssize_t format_addr(char *buf, const unsigned char *addr, int len)
-{
-	int i;
-	char *cp = buf;
-
-	for (i = 0; i < len; i++)
-		cp += sprintf(cp, "%02x%c", addr[i],
-			      i == (len - 1) ? '\n' : ':');
-	return cp - buf;
-}
-
 static ssize_t show_address(struct device *dev, struct device_attribute *attr,
 			    char *buf)
 {
@@ -114,7 +103,7 @@ static ssize_t show_address(struct device *dev, struct device_attribute *attr,
 
 	read_lock(&dev_base_lock);
 	if (dev_isalive(net))
-	    ret = format_addr(buf, net->dev_addr, net->addr_len);
+		ret = sysfs_format_mac(buf, net->dev_addr, net->addr_len);
 	read_unlock(&dev_base_lock);
 	return ret;
 }
@@ -124,7 +113,7 @@ static ssize_t show_broadcast(struct device *dev,
 {
 	struct net_device *net = to_net_dev(dev);
 	if (dev_isalive(net))
-		return format_addr(buf, net->broadcast, net->addr_len);
+		return sysfs_format_mac(buf, net->broadcast, net->addr_len);
 	return -EINVAL;
 }
 
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 6b2e454..a7b4175 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -359,10 +359,34 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count)
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
 
-char *print_mac(char *buf, const u8 *addr)
+static size_t _format_mac_addr(char *buf, int buflen,
+				const unsigned char *addr, int len)
 {
-	sprintf(buf, MAC_FMT,
-		addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
+	int i;
+	char *cp = buf;
+
+	for (i = 0; i < len; i++) {
+		cp += scnprintf(cp, buflen - (cp - buf), "%02x", addr[i]);
+		if (i == len - 1)
+			break;
+		cp += strlcpy(cp, ":", buflen - (cp - buf));
+	}
+	return cp - buf;
+}
+
+ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len)
+{
+	size_t l;
+
+	l = _format_mac_addr(buf, PAGE_SIZE, addr, len);
+	l += strlcpy(buf + l, "\n", PAGE_SIZE - l);
+	return ((ssize_t) l);
+}
+EXPORT_SYMBOL(sysfs_format_mac);
+
+char *print_mac(char *buf, const unsigned char *addr)
+{
+	_format_mac_addr(buf, MAC_BUF_SIZE, addr, ETH_ALEN);
 	return buf;
 }
 EXPORT_SYMBOL(print_mac);





^ permalink raw reply related

* [PATCH] via-velocity big-endian support
From: Al Viro @ 2007-12-24  5:06 UTC (permalink / raw)
  To: netdev; +Cc: jgarzik

	* killed multibyte bitfields in fixed-endian structs
	* annotated
	* added conversions where needed
	* fixed a couple of obvious brainos in (ifdefed out) zerocopy
stuff

Note that it's absofsckinglutely untested.  It should not give differences
in behaviour on l-e, but that's in the famous last words category...

Review and testing is welcome.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index 35cd65d..0fa1e4b 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -681,7 +681,7 @@ static void velocity_rx_reset(struct velocity_info *vptr)
 	 *	Init state, all RD entries belong to the NIC
 	 */
 	for (i = 0; i < vptr->options.numrx; ++i)
-		vptr->rd_ring[i].rdesc0.owner = OWNED_BY_NIC;
+		vptr->rd_ring[i].rdesc0.len |= OWNED_BY_NIC;
 
 	writew(vptr->options.numrx, &regs->RBRDU);
 	writel(vptr->rd_pool_dma, &regs->RDBaseLo);
@@ -777,7 +777,7 @@ static void velocity_init_registers(struct velocity_info *vptr,
 
 		vptr->int_mask = INT_MASK_DEF;
 
-		writel(cpu_to_le32(vptr->rd_pool_dma), &regs->RDBaseLo);
+		writel(vptr->rd_pool_dma, &regs->RDBaseLo);
 		writew(vptr->options.numrx - 1, &regs->RDCSize);
 		mac_rx_queue_run(regs);
 		mac_rx_queue_wake(regs);
@@ -785,7 +785,7 @@ static void velocity_init_registers(struct velocity_info *vptr,
 		writew(vptr->options.numtx - 1, &regs->TDCSize);
 
 		for (i = 0; i < vptr->num_txq; i++) {
-			writel(cpu_to_le32(vptr->td_pool_dma[i]), &(regs->TDBaseLo[i]));
+			writel(vptr->td_pool_dma[i], &regs->TDBaseLo[i]);
 			mac_tx_queue_run(regs, i);
 		}
 
@@ -1195,7 +1195,7 @@ static inline void velocity_give_many_rx_descs(struct velocity_info *vptr)
 	dirty = vptr->rd_dirty - unusable;
 	for (avail = vptr->rd_filled & 0xfffc; avail; avail--) {
 		dirty = (dirty > 0) ? dirty - 1 : vptr->options.numrx - 1;
-		vptr->rd_ring[dirty].rdesc0.owner = OWNED_BY_NIC;
+		vptr->rd_ring[dirty].rdesc0.len |= OWNED_BY_NIC;
 	}
 
 	writew(vptr->rd_filled & 0xfffc, &regs->RBRDU);
@@ -1210,7 +1210,7 @@ static int velocity_rx_refill(struct velocity_info *vptr)
 		struct rx_desc *rd = vptr->rd_ring + dirty;
 
 		/* Fine for an all zero Rx desc at init time as well */
-		if (rd->rdesc0.owner == OWNED_BY_NIC)
+		if (rd->rdesc0.len & OWNED_BY_NIC)
 			break;
 
 		if (!vptr->rd_info[dirty].skb) {
@@ -1409,31 +1409,33 @@ static int velocity_rx_srv(struct velocity_info *vptr, int status)
 
 	do {
 		struct rx_desc *rd = vptr->rd_ring + rd_curr;
+		u16 rsr;
 
 		if (!vptr->rd_info[rd_curr].skb)
 			break;
 
-		if (rd->rdesc0.owner == OWNED_BY_NIC)
+		if (rd->rdesc0.len & OWNED_BY_NIC)
 			break;
 
 		rmb();
 
+		rsr = le16_to_cpu(rd->rdesc0.RSR);
 		/*
 		 *	Don't drop CE or RL error frame although RXOK is off
 		 */
-		if ((rd->rdesc0.RSR & RSR_RXOK) || (!(rd->rdesc0.RSR & RSR_RXOK) && (rd->rdesc0.RSR & (RSR_CE | RSR_RL)))) {
+		if (rsr & (RSR_RXOK | RSR_CE | RSR_RL)) {
 			if (velocity_receive_frame(vptr, rd_curr) < 0)
 				stats->rx_dropped++;
 		} else {
-			if (rd->rdesc0.RSR & RSR_CRC)
+			if (rsr & RSR_CRC)
 				stats->rx_crc_errors++;
-			if (rd->rdesc0.RSR & RSR_FAE)
+			if (rsr & RSR_FAE)
 				stats->rx_frame_errors++;
 
 			stats->rx_dropped++;
 		}
 
-		rd->inten = 1;
+		rd->size |= RX_INTEN;
 
 		vptr->dev->last_rx = jiffies;
 
@@ -1554,16 +1556,17 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
 	struct net_device_stats *stats = &vptr->stats;
 	struct velocity_rd_info *rd_info = &(vptr->rd_info[idx]);
 	struct rx_desc *rd = &(vptr->rd_ring[idx]);
-	int pkt_len = rd->rdesc0.len;
+	int pkt_len = le16_to_cpu(rd->rdesc0.len) & 0x3fff;
+	u16 rsr = le16_to_cpu(rd->rdesc0.RSR);
 	struct sk_buff *skb;
 
-	if (rd->rdesc0.RSR & (RSR_STP | RSR_EDP)) {
+	if (rsr & (RSR_STP | RSR_EDP)) {
 		VELOCITY_PRT(MSG_LEVEL_VERBOSE, KERN_ERR " %s : the received frame span multple RDs.\n", vptr->dev->name);
 		stats->rx_length_errors++;
 		return -EINVAL;
 	}
 
-	if (rd->rdesc0.RSR & RSR_MAR)
+	if (rsr & RSR_MAR)
 		vptr->stats.multicast++;
 
 	skb = rd_info->skb;
@@ -1576,7 +1579,7 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
 	 */
 
 	if (vptr->flags & VELOCITY_FLAGS_VAL_PKT_LEN) {
-		if (rd->rdesc0.RSR & RSR_RL) {
+		if (rsr & RSR_RL) {
 			stats->rx_length_errors++;
 			return -EINVAL;
 		}
@@ -1637,8 +1640,7 @@ static int velocity_alloc_rx_buf(struct velocity_info *vptr, int idx)
  	 */
 
 	*((u32 *) & (rd->rdesc0)) = 0;
-	rd->len = cpu_to_le32(vptr->rx_buf_sz);
-	rd->inten = 1;
+	rd->size = cpu_to_le16(vptr->rx_buf_sz) | RX_INTEN;
 	rd->pa_low = cpu_to_le32(rd_info->skb_dma);
 	rd->pa_high = 0;
 	return 0;
@@ -1663,6 +1665,7 @@ static int velocity_tx_srv(struct velocity_info *vptr, u32 status)
 	int works = 0;
 	struct velocity_td_info *tdinfo;
 	struct net_device_stats *stats = &vptr->stats;
+	u16 tsr;
 
 	for (qnum = 0; qnum < vptr->num_txq; qnum++) {
 		for (idx = vptr->td_tail[qnum]; vptr->td_used[qnum] > 0;
@@ -1674,22 +1677,24 @@ static int velocity_tx_srv(struct velocity_info *vptr, u32 status)
 			td = &(vptr->td_rings[qnum][idx]);
 			tdinfo = &(vptr->td_infos[qnum][idx]);
 
-			if (td->tdesc0.owner == OWNED_BY_NIC)
+			if (td->tdesc0.len & OWNED_BY_NIC)
 				break;
 
 			if ((works++ > 15))
 				break;
+			
+			tsr = le16_to_cpu(td->tdesc0.TSR);
 
-			if (td->tdesc0.TSR & TSR0_TERR) {
+			if (tsr & TSR0_TERR) {
 				stats->tx_errors++;
 				stats->tx_dropped++;
-				if (td->tdesc0.TSR & TSR0_CDH)
+				if (tsr & TSR0_CDH)
 					stats->tx_heartbeat_errors++;
-				if (td->tdesc0.TSR & TSR0_CRS)
+				if (tsr & TSR0_CRS)
 					stats->tx_carrier_errors++;
-				if (td->tdesc0.TSR & TSR0_ABT)
+				if (tsr & TSR0_ABT)
 					stats->tx_aborted_errors++;
-				if (td->tdesc0.TSR & TSR0_OWC)
+				if (tsr & TSR0_OWC)
 					stats->tx_window_errors++;
 			} else {
 				stats->tx_packets++;
@@ -1874,7 +1879,7 @@ static void velocity_free_tx_buf(struct velocity_info *vptr, struct velocity_td_
 
 		for (i = 0; i < tdinfo->nskb_dma; i++) {
 #ifdef VELOCITY_ZERO_COPY_SUPPORT
-			pci_unmap_single(vptr->pdev, tdinfo->skb_dma[i], td->tdesc1.len, PCI_DMA_TODEVICE);
+			pci_unmap_single(vptr->pdev, tdinfo->skb_dma[i], le16_to_cpu(td->tdesc1.len), PCI_DMA_TODEVICE);
 #else
 			pci_unmap_single(vptr->pdev, tdinfo->skb_dma[i], skb->len, PCI_DMA_TODEVICE);
 #endif
@@ -2067,8 +2072,8 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct velocity_td_info *tdinfo;
 	unsigned long flags;
 	int index;
-
 	int pktlen = skb->len;
+	__le16 len = cpu_to_le16(pktlen);
 
 #ifdef VELOCITY_ZERO_COPY_SUPPORT
 	if (skb_shinfo(skb)->nr_frags > 6 && __skb_linearize(skb)) {
@@ -2085,7 +2090,7 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	td_ptr->tdesc1.TCPLS = TCPLS_NORMAL;
 	td_ptr->tdesc1.TCR = TCR0_TIC;
-	td_ptr->td_buf[0].queue = 0;
+	td_ptr->td_buf[0].size &= ~TD_QUEUE;
 
 	/*
 	 *	Pad short frames.
@@ -2093,14 +2098,15 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (pktlen < ETH_ZLEN) {
 		/* Cannot occur until ZC support */
 		pktlen = ETH_ZLEN;
+		len = cpu_to_le16(ETH_ZLEN);
 		skb_copy_from_linear_data(skb, tdinfo->buf, skb->len);
 		memset(tdinfo->buf + skb->len, 0, ETH_ZLEN - skb->len);
 		tdinfo->skb = skb;
 		tdinfo->skb_dma[0] = tdinfo->buf_dma;
-		td_ptr->tdesc0.pktsize = pktlen;
+		td_ptr->tdesc0.len = len;
 		td_ptr->td_buf[0].pa_low = cpu_to_le32(tdinfo->skb_dma[0]);
 		td_ptr->td_buf[0].pa_high = 0;
-		td_ptr->td_buf[0].bufsize = td_ptr->tdesc0.pktsize;
+		td_ptr->td_buf[0].size = len;	/* queue is 0 anyway */
 		tdinfo->nskb_dma = 1;
 		td_ptr->tdesc1.CMDZ = 2;
 	} else
@@ -2111,10 +2117,10 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (nfrags > 6) {
 			skb_copy_from_linear_data(skb, tdinfo->buf, skb->len);
 			tdinfo->skb_dma[0] = tdinfo->buf_dma;
-			td_ptr->tdesc0.pktsize =
+			td_ptr->tdesc0.len = len;
 			td_ptr->td_buf[0].pa_low = cpu_to_le32(tdinfo->skb_dma[0]);
 			td_ptr->td_buf[0].pa_high = 0;
-			td_ptr->td_buf[0].bufsize = td_ptr->tdesc0.pktsize;
+			td_ptr->td_buf[0].size = len;	/* queue is 0 anyway */
 			tdinfo->nskb_dma = 1;
 			td_ptr->tdesc1.CMDZ = 2;
 		} else {
@@ -2122,22 +2128,23 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 			tdinfo->nskb_dma = 0;
 			tdinfo->skb_dma[i] = pci_map_single(vptr->pdev, skb->data, skb->len - skb->data_len, PCI_DMA_TODEVICE);
 
-			td_ptr->tdesc0.pktsize = pktlen;
+			td_ptr->tdesc0.len = len;
 
 			/* FIXME: support 48bit DMA later */
 			td_ptr->td_buf[i].pa_low = cpu_to_le32(tdinfo->skb_dma);
 			td_ptr->td_buf[i].pa_high = 0;
-			td_ptr->td_buf[i].bufsize = skb->len->skb->data_len;
+			td_ptr->td_buf[i].size =
+				cpu_to_le16(skb->len->skb->data_len);
 
 			for (i = 0; i < nfrags; i++) {
 				skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-				void *addr = ((void *) page_address(frag->page + frag->page_offset));
+				void *addr = (void *)page_address(frag->page) + frag->page_offset;
 
 				tdinfo->skb_dma[i + 1] = pci_map_single(vptr->pdev, addr, frag->size, PCI_DMA_TODEVICE);
 
 				td_ptr->td_buf[i + 1].pa_low = cpu_to_le32(tdinfo->skb_dma[i + 1]);
 				td_ptr->td_buf[i + 1].pa_high = 0;
-				td_ptr->td_buf[i + 1].bufsize = frag->size;
+				td_ptr->td_buf[i + 1].size = cpu_to_le16(frag->size);
 			}
 			tdinfo->nskb_dma = i - 1;
 			td_ptr->tdesc1.CMDZ = i;
@@ -2152,18 +2159,16 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 		 */
 		tdinfo->skb = skb;
 		tdinfo->skb_dma[0] = pci_map_single(vptr->pdev, skb->data, pktlen, PCI_DMA_TODEVICE);
-		td_ptr->tdesc0.pktsize = pktlen;
+		td_ptr->tdesc0.len = len;
 		td_ptr->td_buf[0].pa_low = cpu_to_le32(tdinfo->skb_dma[0]);
 		td_ptr->td_buf[0].pa_high = 0;
-		td_ptr->td_buf[0].bufsize = td_ptr->tdesc0.pktsize;
+		td_ptr->td_buf[0].size = len;
 		tdinfo->nskb_dma = 1;
 		td_ptr->tdesc1.CMDZ = 2;
 	}
 
 	if (vptr->vlgrp && vlan_tx_tag_present(skb)) {
-		td_ptr->tdesc1.pqinf.VID = vlan_tx_tag_get(skb);
-		td_ptr->tdesc1.pqinf.priority = 0;
-		td_ptr->tdesc1.pqinf.CFI = 0;
+		td_ptr->tdesc1.vlan = cpu_to_le16(vlan_tx_tag_get(skb));
 		td_ptr->tdesc1.TCR |= TCR0_VETAG;
 	}
 
@@ -2185,7 +2190,7 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		if (prev < 0)
 			prev = vptr->options.numtx - 1;
-		td_ptr->tdesc0.owner = OWNED_BY_NIC;
+		td_ptr->tdesc0.len |= OWNED_BY_NIC;
 		vptr->td_used[qnum]++;
 		vptr->td_curr[qnum] = (index + 1) % vptr->options.numtx;
 
@@ -2193,7 +2198,7 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
 			netif_stop_queue(dev);
 
 		td_ptr = &(vptr->td_rings[qnum][prev]);
-		td_ptr->td_buf[0].queue = 1;
+		td_ptr->td_buf[0].size |= TD_QUEUE;
 		mac_tx_queue_wake(vptr->mac_regs, qnum);
 	}
 	dev->trans_start = jiffies;
@@ -3410,7 +3415,7 @@ static int velocity_suspend(struct pci_dev *pdev, pm_message_t state)
 		velocity_save_context(vptr, &vptr->context);
 		velocity_shutdown(vptr);
 		velocity_set_wol(vptr);
-		pci_enable_wake(pdev, 3, 1);
+		pci_enable_wake(pdev, PCI_D3hot, 1);
 		pci_set_power_state(pdev, PCI_D3hot);
 	} else {
 		velocity_save_context(vptr, &vptr->context);
diff --git a/drivers/net/via-velocity.h b/drivers/net/via-velocity.h
index aa91796..e0ec5d4 100644
--- a/drivers/net/via-velocity.h
+++ b/drivers/net/via-velocity.h
@@ -196,26 +196,29 @@
  *	Receive descriptor
  */
 
+#define DESC_OWNER cpu_to_le16(0x8000)
+
 struct rdesc0 {
-	u16 RSR;		/* Receive status */
-	u16 len:14;		/* Received packet length */
-	u16 reserved:1;
-	u16 owner:1;		/* Who owns this buffer ? */
+	__le16 RSR;		/* Receive status */
+	__le16 len;		/* bits 0--13; bit 15 - owner */
 };
 
 struct rdesc1 {
-	u16 PQTAG;
+	__le16 PQTAG;
 	u8 CSM;
 	u8 IPKT;
 };
 
+enum {
+	RX_INTEN = __constant_cpu_to_le16(0x8000)
+};
+
 struct rx_desc {
 	struct rdesc0 rdesc0;
 	struct rdesc1 rdesc1;
-	u32 pa_low;		/* Low 32 bit PCI address */
-	u16 pa_high;		/* Next 16 bit PCI address (48 total) */
-	u16 len:15;		/* Frame size */
-	u16 inten:1;		/* Enable interrupt */
+	__le32 pa_low;		/* Low 32 bit PCI address */
+	__le16 pa_high;		/* Next 16 bit PCI address (48 total) */
+	__le16 size;		/* bits 0--14 - frame size, bit 15 - enable int. */
 } __attribute__ ((__packed__));
 
 /*
@@ -223,32 +226,26 @@ struct rx_desc {
  */
 
 struct tdesc0 {
-	u16 TSR;		/* Transmit status register */
-	u16 pktsize:14;		/* Size of frame */
-	u16 reserved:1;
-	u16 owner:1;		/* Who owns the buffer */
+	__le16 TSR;		/* Transmit status register */
+	__le16 len;		/* bits 0--13 - size of frame, bit 15 - owner */
 };
 
-struct pqinf {			/* Priority queue info */
-	u16 VID:12;
-	u16 CFI:1;
-	u16 priority:3;
-} __attribute__ ((__packed__));
-
 struct tdesc1 {
-	struct pqinf pqinf;
+	__le16 vlan;
 	u8 TCR;
 	u8 TCPLS:2;
 	u8 reserved:2;
 	u8 CMDZ:4;
 } __attribute__ ((__packed__));
 
+enum {
+	TD_QUEUE = __constant_cpu_to_le16(0x8000)
+};
+
 struct td_buf {
-	u32 pa_low;
-	u16 pa_high;
-	u16 bufsize:14;
-	u16 reserved:1;
-	u16 queue:1;
+	__le32 pa_low;
+	__le16 pa_high;
+	__le16 size;		/* bits 0--13 - size, bit 15 - queue */
 } __attribute__ ((__packed__));
 
 struct tx_desc {
@@ -276,7 +273,7 @@ struct velocity_td_info {
 
 enum  velocity_owner {
 	OWNED_BY_HOST = 0,
-	OWNED_BY_NIC = 1
+	OWNED_BY_NIC = __constant_cpu_to_le16(0x8000)
 };
 
 
@@ -1012,45 +1009,45 @@ struct mac_regs {
 	volatile u8 RCR;
 	volatile u8 TCR;
 
-	volatile u32 CR0Set;		/* 0x08 */
-	volatile u32 CR0Clr;		/* 0x0C */
+	volatile __le32 CR0Set;		/* 0x08 */
+	volatile __le32 CR0Clr;		/* 0x0C */
 
 	volatile u8 MARCAM[8];		/* 0x10 */
 
-	volatile u32 DecBaseHi;		/* 0x18 */
-	volatile u16 DbfBaseHi;		/* 0x1C */
-	volatile u16 reserved_1E;
+	volatile __le32 DecBaseHi;	/* 0x18 */
+	volatile __le16 DbfBaseHi;	/* 0x1C */
+	volatile __le16 reserved_1E;
 
-	volatile u16 ISRCTL;		/* 0x20 */
+	volatile __le16 ISRCTL;		/* 0x20 */
 	volatile u8 TXESR;
 	volatile u8 RXESR;
 
-	volatile u32 ISR;		/* 0x24 */
-	volatile u32 IMR;
+	volatile __le32 ISR;		/* 0x24 */
+	volatile __le32 IMR;
 
-	volatile u32 TDStatusPort;	/* 0x2C */
+	volatile __le32 TDStatusPort;	/* 0x2C */
 
-	volatile u16 TDCSRSet;		/* 0x30 */
+	volatile __le16 TDCSRSet;	/* 0x30 */
 	volatile u8 RDCSRSet;
 	volatile u8 reserved_33;
-	volatile u16 TDCSRClr;
+	volatile __le16 TDCSRClr;
 	volatile u8 RDCSRClr;
 	volatile u8 reserved_37;
 
-	volatile u32 RDBaseLo;		/* 0x38 */
-	volatile u16 RDIdx;		/* 0x3C */
-	volatile u16 reserved_3E;
+	volatile __le32 RDBaseLo;	/* 0x38 */
+	volatile __le16 RDIdx;		/* 0x3C */
+	volatile __le16 reserved_3E;
 
-	volatile u32 TDBaseLo[4];	/* 0x40 */
+	volatile __le32 TDBaseLo[4];	/* 0x40 */
 
-	volatile u16 RDCSize;		/* 0x50 */
-	volatile u16 TDCSize;		/* 0x52 */
-	volatile u16 TDIdx[4];		/* 0x54 */
-	volatile u16 tx_pause_timer;	/* 0x5C */
-	volatile u16 RBRDU;		/* 0x5E */
+	volatile __le16 RDCSize;	/* 0x50 */
+	volatile __le16 TDCSize;	/* 0x52 */
+	volatile __le16 TDIdx[4];	/* 0x54 */
+	volatile __le16 tx_pause_timer;	/* 0x5C */
+	volatile __le16 RBRDU;		/* 0x5E */
 
-	volatile u32 FIFOTest0;		/* 0x60 */
-	volatile u32 FIFOTest1;		/* 0x64 */
+	volatile __le32 FIFOTest0;	/* 0x60 */
+	volatile __le32 FIFOTest1;	/* 0x64 */
 
 	volatile u8 CAMADDR;		/* 0x68 */
 	volatile u8 CAMCR;		/* 0x69 */
@@ -1063,18 +1060,18 @@ struct mac_regs {
 	volatile u8 PHYSR1;
 	volatile u8 MIICR;
 	volatile u8 MIIADR;
-	volatile u16 MIIDATA;
+	volatile __le16 MIIDATA;
 
-	volatile u16 SoftTimer0;	/* 0x74 */
-	volatile u16 SoftTimer1;
+	volatile __le16 SoftTimer0;	/* 0x74 */
+	volatile __le16 SoftTimer1;
 
 	volatile u8 CFGA;		/* 0x78 */
 	volatile u8 CFGB;
 	volatile u8 CFGC;
 	volatile u8 CFGD;
 
-	volatile u16 DCFG;		/* 0x7C */
-	volatile u16 MCFG;
+	volatile __le16 DCFG;		/* 0x7C */
+	volatile __le16 MCFG;
 
 	volatile u8 TBIST;		/* 0x80 */
 	volatile u8 RBIST;
@@ -1086,9 +1083,9 @@ struct mac_regs {
 	volatile u8 rev_id;
 	volatile u8 PORSTS;
 
-	volatile u32 MIBData;		/* 0x88 */
+	volatile __le32 MIBData;	/* 0x88 */
 
-	volatile u16 EEWrData;
+	volatile __le16 EEWrData;
 
 	volatile u8 reserved_8E;
 	volatile u8 BPMDWr;
@@ -1098,7 +1095,7 @@ struct mac_regs {
 	volatile u8 EECHKSUM;		/* 0x92 */
 	volatile u8 EECSR;
 
-	volatile u16 EERdData;		/* 0x94 */
+	volatile __le16 EERdData;	/* 0x94 */
 	volatile u8 EADDR;
 	volatile u8 EMBCMD;
 
@@ -1112,22 +1109,22 @@ struct mac_regs {
 	volatile u8 DEBUG;
 	volatile u8 CHIPGCR;
 
-	volatile u16 WOLCRSet;		/* 0xA0 */
+	volatile __le16 WOLCRSet;	/* 0xA0 */
 	volatile u8 PWCFGSet;
 	volatile u8 WOLCFGSet;
 
-	volatile u16 WOLCRClr;		/* 0xA4 */
+	volatile __le16 WOLCRClr;	/* 0xA4 */
 	volatile u8 PWCFGCLR;
 	volatile u8 WOLCFGClr;
 
-	volatile u16 WOLSRSet;		/* 0xA8 */
-	volatile u16 reserved_AA;
+	volatile __le16 WOLSRSet;	/* 0xA8 */
+	volatile __le16 reserved_AA;
 
-	volatile u16 WOLSRClr;		/* 0xAC */
-	volatile u16 reserved_AE;
+	volatile __le16 WOLSRClr;	/* 0xAC */
+	volatile __le16 reserved_AE;
 
-	volatile u16 PatternCRC[8];	/* 0xB0 */
-	volatile u32 ByteMask[4][4];	/* 0xC0 */
+	volatile __le16 PatternCRC[8];	/* 0xB0 */
+	volatile __le32 ByteMask[4][4];	/* 0xC0 */
 } __attribute__ ((__packed__));
 
 
@@ -1238,12 +1235,12 @@ typedef u8 MCAM_ADDR[ETH_ALEN];
 struct arp_packet {
 	u8 dest_mac[ETH_ALEN];
 	u8 src_mac[ETH_ALEN];
-	u16 type;
-	u16 ar_hrd;
-	u16 ar_pro;
+	__be16 type;
+	__be16 ar_hrd;
+	__be16 ar_pro;
 	u8 ar_hln;
 	u8 ar_pln;
-	u16 ar_op;
+	__be16 ar_op;
 	u8 ar_sha[ETH_ALEN];
 	u8 ar_sip[4];
 	u8 ar_tha[ETH_ALEN];
@@ -1253,7 +1250,7 @@ struct arp_packet {
 struct _magic_packet {
 	u8 dest_mac[6];
 	u8 src_mac[6];
-	u16 type;
+	__be16 type;
 	u8 MAC[16][6];
 	u8 password[6];
 } __attribute__ ((__packed__));

^ permalink raw reply related

* [PATCH] s2io LRO bugs
From: Al Viro @ 2007-12-24  6:14 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev, Ravinandan.Arakali

a) initiate_new_session() sets ->tcp_ack to ntohl(...); everything
   else stores and expects to find there the net-endian value.
b) check for monotonic timestamps in verify_l3_l4_lro_capable()
   compares the value sitting in TCP option (right there in the skb->data,
   net-endian 32bit) with the value picked from earlier packet.
   Doing that without ntohl() is an interesting idea and it might even
   work occasionally; unfortunately, it's quite broken.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/net/s2io.c |   20 ++++++++++----------
 drivers/net/s2io.h |    2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 9d80f1c..aef0875 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -7898,7 +7898,7 @@ static void initiate_new_session(struct lro *lro, u8 *l2h,
 	lro->iph = ip;
 	lro->tcph = tcp;
 	lro->tcp_next_seq = tcp_pyld_len + ntohl(tcp->seq);
-	lro->tcp_ack = ntohl(tcp->ack_seq);
+	lro->tcp_ack = tcp->ack_seq;
 	lro->sg_num = 1;
 	lro->total_len = ntohs(ip->tot_len);
 	lro->frags_len = 0;
@@ -7907,10 +7907,10 @@ static void initiate_new_session(struct lro *lro, u8 *l2h,
 	 * already been done.
  	 */
 	if (tcp->doff == 8) {
-		u32 *ptr;
-		ptr = (u32 *)(tcp+1);
+		__be32 *ptr;
+		ptr = (__be32 *)(tcp+1);
 		lro->saw_ts = 1;
-		lro->cur_tsval = *(ptr+1);
+		lro->cur_tsval = ntohl(*(ptr+1));
 		lro->cur_tsecr = *(ptr+2);
 	}
 	lro->in_use = 1;
@@ -7936,7 +7936,7 @@ static void update_L3L4_header(struct s2io_nic *sp, struct lro *lro)
 
 	/* Update tsecr field if this session has timestamps enabled */
 	if (lro->saw_ts) {
-		u32 *ptr = (u32 *)(tcp + 1);
+		__be32 *ptr = (__be32 *)(tcp + 1);
 		*(ptr+2) = lro->cur_tsecr;
 	}
 
@@ -7961,10 +7961,10 @@ static void aggregate_new_rx(struct lro *lro, struct iphdr *ip,
 	lro->window = tcp->window;
 
 	if (lro->saw_ts) {
-		u32 *ptr;
+		__be32 *ptr;
 		/* Update tsecr and tsval from this packet */
-		ptr = (u32 *) (tcp + 1);
-		lro->cur_tsval = *(ptr + 1);
+		ptr = (__be32 *) (tcp + 1);
+		lro->cur_tsval = ntohl(*(ptr + 1));
 		lro->cur_tsecr = *(ptr + 2);
 	}
 }
@@ -8015,11 +8015,11 @@ static int verify_l3_l4_lro_capable(struct lro *l_lro, struct iphdr *ip,
 
 		/* Ensure timestamp value increases monotonically */
 		if (l_lro)
-			if (l_lro->cur_tsval > *((u32 *)(ptr+2)))
+			if (l_lro->cur_tsval > ntohl(*((__be32 *)(ptr+2))))
 				return -1;
 
 		/* timestamp echo reply should be non-zero */
-		if (*((u32 *)(ptr+6)) == 0)
+		if (*((__be32 *)(ptr+6)) == 0)
 			return -1;
 	}
 
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index cc1797a..899d60c 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -797,7 +797,7 @@ struct lro {
 	int		in_use;
 	__be16		window;
 	u32		cur_tsval;
-	u32		cur_tsecr;
+	__be32		cur_tsecr;
 	u8		saw_ts;
 };
 
-- 
1.5.3.GIT


^ permalink raw reply related

* iproute2 action ipt + iptables 1.4.0
From: Denys Fedoryshchenko @ 2007-12-24  9:31 UTC (permalink / raw)
  To: netdev

Seems latest iproute2(even from GIT) searching for libipt_MARK (for example),
while it is libxt_MARK.

Even if i correct names it will be still not functional
"undefined symbol: xtables_register_target"

Probably some other issues actual with new version of iptables libraries?



--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply

* [RFC] skge csum problems
From: Al Viro @ 2007-12-24  9:43 UTC (permalink / raw)
  To: netdev

	Both variants of skge (drivers/net and drivers/net/sk98lin/ resp.)
have the same problem with rx checksums.  They pick checksum from rx
descriptor and use it as-is.  Normally that would be the right thing to
do.  However, skge is told to byteswap descriptors on big-endian boxen.

Checksum is fixed-endian and we want it that way; IOW, what we end up
storing in skb->csum should be fixed-endian as well.  Unless the card
is smart enough to byteswap everything in rx descriptor _except_ the
checksum, we have a trouble - we get a value converted to host-endian
by the general byteswap in descriptor and we must convert it to fixed-endian
ourselves.

FWIW, FreeBSD sk_if sidesteps that mess by not telling the card to 
byteswap, so that's not too informative.  Datasheet on
http://people.freebsd.org/~wpaul/SysKonnect/xmacii_datasheet_rev_c_9-29.pdf
is not clear on what's going on with checksum in byteswapping mode either...

Could somebody with that sucker on a card (all instances I have here are
on-board ones in little-endian boxen) test what's really going on for
big-endian hosts with either driver?

^ permalink raw reply

* Simple question about network stack
From: Badalian Vyacheslav @ 2007-12-24  9:52 UTC (permalink / raw)
  To: netdev

Hi all.
Sorry for offtopic.
Have problems with balance CPU load in networking.

Have 2 Ethernet adapters e1000. Have 8 CPU (4 real).
Computer work as Shaper. Use only TC rules to shape and IPTABLES to drop.

rx on eth0 go to CPU0. traffic above 400mbs do 90% SI.
rx on eth1 go to CPU1. traffic above 400mbs do 90% SI.
All other CPUS 100%idle.

question:
1. I may balance load to other cpu? I understand that i can't balance
polling place, but find in TC and IPTABLES hash may do different cpu?
2. If SI on 1 cpu more then 100% (600mbs traffic) i see strange. SOFTIRQ
process do 100%. Traffic bandwidth go from 400mbs to 100 mbs. pings
trough computer go from 0.5ms to 100ms. 1 cpu use 100%. All other cpu
100%idle.  If traffic down - after some time cpu load again go to
different cpu.

P.S. Very strange that computer with 4(8) CPU and 1 CPU HT do some
network performance.
P.P.S Sorry for my English

Thanks for answers.
Slavon

^ permalink raw reply

* Re: ipv4_devconf.arp_accept mystery
From: Herbert Xu @ 2007-12-24 12:51 UTC (permalink / raw)
  To: Ian Brown; +Cc: netdev
In-Reply-To: <d0383f90712230800n6e6d6b92x64eaa8dcff313915@mail.gmail.com>

Ian Brown <ianbrn@gmail.com> wrote:
>
> BTW, in newer kernel version we have IPV4_DEVCONF_ALL(ARP_ACCEPT)
> instead. So if anybody knows how to set this macro (instead)to be 1, it will be
> also fine.

As the name suggests you should use

	/proc/sys/net/ipv4/conf/all/arp_accept

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC] skge csum problems
From: Andi Kleen @ 2007-12-24 13:15 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev
In-Reply-To: <20071224094352.GU8181@ftp.linux.org.uk>

Al Viro <viro@ftp.linux.org.uk> writes:
>
> Checksum is fixed-endian and we want it that way; IOW, what we end up
> storing in skb->csum should be fixed-endian as well.

AFAIK skb->csum is always native endian because it normally
needs to be manipulated further even for RX.

-Andi

^ permalink raw reply

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: David Shwatrz @ 2007-12-24 13:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: yoshfuji, davem, netdev
In-Reply-To: <E1J6RPh-0004mf-00@gondolin.me.apana.org.au>

Hello,

First, it indeed can be handled by user space. (even though it should
be done twice, once for ifconig of net-tools  and once for ip of
iproute2) / However, we have already
methods which deal with bringing down an interface - neigh_ifdown(),
and changing MAC address of an interface (neigh_changeaddr). So why
not do it from
the kernel ?
DS

On Dec 23, 2007 4:02 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> David Shwatrz <dshwatrz@gmail.com> wrote:
> >
> > Hi,
> > Oop, I am TWICE sorry ! I wrongly attached a wrong, empty file.
> > Attached here is the patch.
> >
> > Regarding your answer;  I accept it and I will soon send a revised
> > version of this patch (making changes to
> > arp_netdev_event() and ndisc_netdev_event().)
> > I had  IPv4 in mind, there is no reason that it will no be also in IPv6.
>
> Hmm, why can't you do this from user-space?
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

^ permalink raw reply

* Re: ipv4_devconf.arp_accept mystery
From: Ian Brown @ 2007-12-24 13:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <E1J6mmW-0006dP-00@gondolin.me.apana.org.au>

Hello,

>As the name suggests you should use
>     /proc/sys/net/ipv4/conf/all/arp_accept
 Thanks. This is indeed true.
I first tried with ipv4_devconf.arp_accept, where the name probably
does **not** suggest it; I agree
that  for PV4_DEVCONF_ALL() the name indeed suggests it . It just
skipped my eye. I suppose this is true also to
ipv4_devconf.arp_accept.

Rgs,
Ian



On Dec 24, 2007 2:51 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Ian Brown <ianbrn@gmail.com> wrote:
> >
> > BTW, in newer kernel version we have IPV4_DEVCONF_ALL(ARP_ACCEPT)
> > instead. So if anybody knows how to set this macro (instead)to be 1, it will be
> > also fine.
>
> As the name suggests you should use
>
>         /proc/sys/net/ipv4/conf/all/arp_accept
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

^ permalink raw reply

* Re: iproute2 action ipt + iptables 1.4.0
From: jamal @ 2007-12-24 13:50 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, Pablo Neira Ayuso
In-Reply-To: <20071224090622.M86562@visp.net.lb>

On Mon, 2007-24-12 at 11:31 +0200, Denys Fedoryshchenko wrote:
> Seems latest iproute2(even from GIT) searching for libipt_MARK (for example),
> while it is libxt_MARK.

It seems that iptables broke backward compatibility.

> Even if i correct names it will be still not functional
> "undefined symbol: xtables_register_target"
> 
> Probably some other issues actual with new version of iptables libraries?

Yes. We do depend on libipt and expect the libraries to be in some 
environment variable IPTABLES_LIB_DIR/libipt_%s 
by default (if environment variable IPTABLES_LIB_DIR is not defined) we
look in /usr/local/lib/iptables

in iproute2 we also need to copy over headers into include/libiptc

Pablo - can we not have a backward compat mode to avoid these hassles?

cheers,
jamal


^ permalink raw reply

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: jamal @ 2007-12-24 14:33 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: dshwatrz, davem, netdev, kaber
In-Reply-To: <1198415873.4423.96.camel@localhost>

On Sun, 2007-23-12 at 08:17 -0500, jamal wrote:
> On Sun, 2007-23-12 at 22:04 +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
> 
> > If the secondary MACs are used with ARP/NDP, we should take care of
> > that, but I think we use the primary MAC for ARP/NDP, no?
> > (In other words, we always use primary MAC for ARP reply / NA, no?)
> 
> I think it maybe a policy decision; 

Never mind, that was my body being in a different time zone.
I went back and looked at a little chat i had with Patrick when he
posted with the macvlan driver.
At the moment we are still maintaining the model that a NIC can _only_
appear to have a single MAC to the upper layers. IOW, you have to
instantiate a macvlan device for each of the 16 MAC addresses on the
e1000 if you need to expose them. This means that the ip layer - even
with multiple ip addresses will only ever see one MAC address. 
Note also: The name macvlan is a little misleading since it allows for
the above without need for vlans.

cheers,
jamal





^ permalink raw reply

* Re: [PATCH net-2.6][NEIGH] Updating affected neighbours when about MAC address change
From: jamal @ 2007-12-24 14:50 UTC (permalink / raw)
  To: David Shwatrz; +Cc: Herbert Xu, yoshfuji, davem, netdev
In-Reply-To: <31436f4a0712240538n1b65c2a8u35109ce4c69a00d5@mail.gmail.com>

On Mon, 2007-24-12 at 15:38 +0200, David Shwatrz wrote:
> Hello,
> 
> First, it indeed can be handled by user space. (even though it should
> be done twice, once for ifconig of net-tools  and once for ip of
> iproute2) 

it needs to be done once only: reacting to netlink events when MAC
address changes.

> / However, we have already
> methods which deal with bringing down an interface - neigh_ifdown(),
> and changing MAC address of an interface (neigh_changeaddr). So why
> not do it from the kernel ?

Herbert, i agree with you that userspace is the best spot for this[1];
we unfortunately have precedence already on the kernel sending arps with
bonding when link status changes (that was added recently).
So it sounds reasonable to have this patch in the kernel as well.

cheers,
jamal

[1] Things like these tend to be very policy rich and thats why user
space is the best spot for them.
I have infact implemented this feature in user space in some random box
i have where i failover MACs for HA reasons. Depending on how much
traffic there is on the wire, arps do get dropped.
One of the hardest things to decide on was how many times to retry the
grat arp sending and what the timeout would be between each sent
gratarp. 
The earlier patch posted didnt consider this but would be nice to have a
couple of sysctls to add the two parameters if this makes it in.


^ permalink raw reply

* Strange Panic (Deadlock)
From: Badalian Vyacheslav @ 2007-12-24 15:12 UTC (permalink / raw)
  To: netdev

Hello all. Some time machine freeze. No information on monitor. No
rebooting on sysctl "kernel.panic".
Any idea?

Catched by netconsole:
[91922.085864] ------------[ cut here ]------------
[91922.085975] kernel BUG at kernel/timer.c:606!
[91922.086058] invalid opcode: 0000 [#1]
[91922.086127] SMP
[91922.086201] Modules linked in: netconsole cls_u32 sch_sfq sch_htb
xt_tcpudp iptable_filter ip_tables x_tables i2c_i801 i2c_core
[91922.086386] CPU:    1
[91922.086387] EIP:    0060:[<c0127387>]    Not tainted VLI
[91922.086389] EFLAGS: 00010087   (2.6.23-gentoo-r4-fw #4)
[91922.086600] EIP is at cascade+0x34/0x4f
[91922.086669] eax: c0452200   ebx: f450408c   ecx: 00000022   edx: f3c6e08c
[91922.086740] esi: 00000022   edi: c21ce000   ebp: 00000001   esp: c21c3ef8
[91922.086815] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
[91922.086885] Process swapper (pid: 0, ti=c21c2000 task=c21af000
task.ti=c21c2000)
[91922.086954] Stack: f3c6e08c c21bfb74 00000000 c21ce000 0000000a
c012767a c21af000 00000001
[91922.087119]        c21c3f18 c0106963 c21c3f68 00000001 00000021
c03c0b08 0000000a c0124556
[91922.087285]        00000046 00000000 c21c2008 00000000 c01245ec
c2015120 c0114a11 00000046
[91922.087451] Call Trace:
[91922.087586]  [<c012767a>] run_timer_softirq+0x51/0x154
[91922.087669]  [<c0106963>] profile_pc+0x21/0x46
[91922.087752]  [<c0124556>] __do_softirq+0x5d/0xc1
[91922.087833]  [<c01245ec>] do_softirq+0x32/0x36
[91922.087915]  [<c0114a11>] smp_apic_timer_interrupt+0x74/0x80
[91922.087997]  [<c010484c>] apic_timer_interrupt+0x28/0x30
[91922.088076]  [<c0102255>] mwait_idle_with_hints+0x3b/0x3f
[91922.088162]  [<c0102259>] mwait_idle+0x0/0xa
[91922.088237]  [<c0102398>] cpu_idle+0x91/0xaa
[91922.088319]  =======================
[91922.088390] Code: 08 8d 04 ca 8b 10 89 62 04 89 14 24 8b 50 04 89 22
89 00 89 54 24 04 8b 14 24 89 40 04 8b 1a eb 19 8b 42 14 83 e0 fe 39 f8
74 04 <0f> 0b eb fe 89 f8 e8 d8 fe ff ff 89 da 8b 1b 39 e2 75 e3 59 89
[91922.088864] EIP: [<c0127387>] cascade+0x34/0x4f SS:ESP 0068:c21c3ef8


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox