Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05  7:48 UTC (permalink / raw)
  To: victor; +Cc: eric.dumazet, netdev
In-Reply-To: <4E12B5A6.2020802@inliniac.net>

From: Victor Julien <victor@inliniac.net>
Date: Tue, 05 Jul 2011 08:56:38 +0200

> On 07/05/2011 08:21 AM, Eric Dumazet wrote:
>> Le lundi 04 juillet 2011 à 21:20 -0700, David Miller a écrit :
>>> Fanouts allow packet capturing to be demuxed to a set of AF_PACKET
>>> sockets.  Two fanout policies are implemented:
>>>
>>> 1) Hashing based upon skb->rxhash
>> 
>> ...
>> 
>>> +
>>> +static struct sock *fanout_demux_hash(struct packet_fanout *f, struct sk_buff *skb)
>>> +{
>>> +	u32 idx, hash = skb->rxhash;
>>> +
>>> +	idx = ((u64)hash * f->num_members) >> 32;
>>> +
>>> +	return f->arr[idx];
>>> +}
>>> +
>> 
>> rxhash is 0 unless skb_get_rxhash() was called, or some NIC set it in RX
>> path.
>> 
> 
> Is this still also true for IP fragments?

I have a plan to fix this.  But what I've posted will work as you want
it to for everything else.

^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05  7:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: victor, netdev
In-Reply-To: <1309849214.2720.45.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 05 Jul 2011 09:00:14 +0200

> Le mardi 05 juillet 2011 à 08:56 +0200, Victor Julien a écrit :
> 
>> Is this still also true for IP fragments?
>> 
> 
> This point was already raised. IP fragments have rxhash = 0, obviously,
> since we dont have full information (source / destination ports for
> example)

I will fix this.

I will add an option to the fanout, as a flag bit to the setsockopt,
that will cause the fanout to act as a ip_defrag() agent and rehash
the packet once all the fragments arrive.

Since this will operate internally on the clones passed to AF_PACKET,
it will not have any effect on the original packets traversing through
the machine.

^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Victor Julien @ 2011-07-05  7:52 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20110705.005011.1268380726966376331.davem@davemloft.net>

On 07/05/2011 09:50 AM, David Miller wrote:
> I will add an option to the fanout, as a flag bit to the setsockopt,
> that will cause the fanout to act as a ip_defrag() agent and rehash
> the packet once all the fragments arrive.
> 
> Since this will operate internally on the clones passed to AF_PACKET,
> it will not have any effect on the original packets traversing through
> the machine.
> 

Does this mean the user space app gets a reassembled packet from the fan
out?

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05  8:02 UTC (permalink / raw)
  To: victor; +Cc: eric.dumazet, netdev
In-Reply-To: <4E12C2B2.9060007@inliniac.net>

From: Victor Julien <victor@inliniac.net>
Date: Tue, 05 Jul 2011 09:52:18 +0200

> On 07/05/2011 09:50 AM, David Miller wrote:
>> I will add an option to the fanout, as a flag bit to the setsockopt,
>> that will cause the fanout to act as a ip_defrag() agent and rehash
>> the packet once all the fragments arrive.
>> 
>> Since this will operate internally on the clones passed to AF_PACKET,
>> it will not have any effect on the original packets traversing through
>> the machine.
>> 
> 
> Does this mean the user space app gets a reassembled packet from the fan
> out?

Yes.

Would you prefer to receive the frags individually?


^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Hagen Paul Pfeifer @ 2011-07-05  8:47 UTC (permalink / raw)
  To: David Miller; +Cc: victor, eric.dumazet, netdev
In-Reply-To: <20110705.010254.1376215864913143868.davem@davemloft.net>


On Tue, 05 Jul 2011 01:02:54 -0700 (PDT), David Miller wrote:



> Would you prefer to receive the frags individually?



I think both ways are favored. Often you just want to forward packets

unmodified after analysis, without any packet modification. For that the

packet should be received on the same socket (e.g. all with hash 0). On the

other hand there are use cases where packet analysis requires reassembled

packets.



Nice work David!



Hagen 

^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Victor Julien @ 2011-07-05  8:48 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20110705.010254.1376215864913143868.davem@davemloft.net>

On 07/05/2011 10:02 AM, David Miller wrote:
> Would you prefer to receive the frags individually?

Ideally what we receive is the same as on the wire, but I think your
solution would be fine.

In addition to the reassembled packet, having an option to receive the
individual frags as well would be of use. Suricata (and Snort as well)
has it's own defrag code that does anomaly detection. I can see a use
case where just for that anomaly detection we'd still want to see the
frags. In that case it wouldn't be a problem that they would all go to
fanout 0.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05  9:01 UTC (permalink / raw)
  To: victor; +Cc: eric.dumazet, netdev
In-Reply-To: <4E12CFC7.3070902@inliniac.net>

From: Victor Julien <victor@inliniac.net>
Date: Tue, 05 Jul 2011 10:48:07 +0200

> On 07/05/2011 10:02 AM, David Miller wrote:
>> Would you prefer to receive the frags individually?
> 
> Ideally what we receive is the same as on the wire, but I think your
> solution would be fine.
> 
> In addition to the reassembled packet, having an option to receive the
> individual frags as well would be of use. Suricata (and Snort as well)
> has it's own defrag code that does anomaly detection. I can see a use
> case where just for that anomaly detection we'd still want to see the
> frags. In that case it wouldn't be a problem that they would all go to
> fanout 0.

Ok, I'm about to post a new set of patches that add the pre-defragmentation
support.

^ permalink raw reply

* [PATCH v2 1/4] packet: Add helpers to register/unregister ->prot_hook
From: David Miller @ 2011-07-05  9:04 UTC (permalink / raw)
  To: victor; +Cc: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/packet/af_packet.c |  103 +++++++++++++++++++++++++++--------------------
 1 files changed, 59 insertions(+), 44 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 461b16f..bb281bf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -222,6 +222,55 @@ struct packet_skb_cb {
 
 #define PACKET_SKB_CB(__skb)	((struct packet_skb_cb *)((__skb)->cb))
 
+static inline struct packet_sock *pkt_sk(struct sock *sk)
+{
+	return (struct packet_sock *)sk;
+}
+
+/* register_prot_hook must be invoked with the po->bind_lock held,
+ * or from a context in which asynchronous accesses to the packet
+ * socket is not possible (packet_create()).
+ */
+static void register_prot_hook(struct sock *sk)
+{
+	struct packet_sock *po = pkt_sk(sk);
+	if (!po->running) {
+		dev_add_pack(&po->prot_hook);
+		sock_hold(sk);
+		po->running = 1;
+	}
+}
+
+/* {,__}unregister_prot_hook() must be invoked with the po->bind_lock
+ * held.   If the sync parameter is true, we will temporarily drop
+ * the po->bind_lock and do a synchronize_net to make sure no
+ * asynchronous packet processing paths still refer to the elements
+ * of po->prot_hook.  If the sync parameter is false, it is the
+ * callers responsibility to take care of this.
+ */
+static void __unregister_prot_hook(struct sock *sk, bool sync)
+{
+	struct packet_sock *po = pkt_sk(sk);
+
+	po->running = 0;
+	__dev_remove_pack(&po->prot_hook);
+	__sock_put(sk);
+
+	if (sync) {
+		spin_unlock(&po->bind_lock);
+		synchronize_net();
+		spin_lock(&po->bind_lock);
+	}
+}
+
+static void unregister_prot_hook(struct sock *sk, bool sync)
+{
+	struct packet_sock *po = pkt_sk(sk);
+
+	if (po->running)
+		__unregister_prot_hook(sk, sync);
+}
+
 static inline __pure struct page *pgv_to_page(void *addr)
 {
 	if (is_vmalloc_addr(addr))
@@ -324,11 +373,6 @@ static inline void packet_increment_head(struct packet_ring_buffer *buff)
 	buff->head = buff->head != buff->frame_max ? buff->head+1 : 0;
 }
 
-static inline struct packet_sock *pkt_sk(struct sock *sk)
-{
-	return (struct packet_sock *)sk;
-}
-
 static void packet_sock_destruct(struct sock *sk)
 {
 	skb_queue_purge(&sk->sk_error_queue);
@@ -1337,15 +1381,7 @@ static int packet_release(struct socket *sock)
 	spin_unlock_bh(&net->packet.sklist_lock);
 
 	spin_lock(&po->bind_lock);
-	if (po->running) {
-		/*
-		 * Remove from protocol table
-		 */
-		po->running = 0;
-		po->num = 0;
-		__dev_remove_pack(&po->prot_hook);
-		__sock_put(sk);
-	}
+	unregister_prot_hook(sk, false);
 	if (po->prot_hook.dev) {
 		dev_put(po->prot_hook.dev);
 		po->prot_hook.dev = NULL;
@@ -1392,15 +1428,7 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc
 	lock_sock(sk);
 
 	spin_lock(&po->bind_lock);
-	if (po->running) {
-		__sock_put(sk);
-		po->running = 0;
-		po->num = 0;
-		spin_unlock(&po->bind_lock);
-		dev_remove_pack(&po->prot_hook);
-		spin_lock(&po->bind_lock);
-	}
-
+	unregister_prot_hook(sk, true);
 	po->num = protocol;
 	po->prot_hook.type = protocol;
 	if (po->prot_hook.dev)
@@ -1413,9 +1441,7 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc
 		goto out_unlock;
 
 	if (!dev || (dev->flags & IFF_UP)) {
-		dev_add_pack(&po->prot_hook);
-		sock_hold(sk);
-		po->running = 1;
+		register_prot_hook(sk);
 	} else {
 		sk->sk_err = ENETDOWN;
 		if (!sock_flag(sk, SOCK_DEAD))
@@ -1542,9 +1568,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 
 	if (proto) {
 		po->prot_hook.type = proto;
-		dev_add_pack(&po->prot_hook);
-		sock_hold(sk);
-		po->running = 1;
+		register_prot_hook(sk);
 	}
 
 	spin_lock_bh(&net->packet.sklist_lock);
@@ -2240,9 +2264,7 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 			if (dev->ifindex == po->ifindex) {
 				spin_lock(&po->bind_lock);
 				if (po->running) {
-					__dev_remove_pack(&po->prot_hook);
-					__sock_put(sk);
-					po->running = 0;
+					__unregister_prot_hook(sk, false);
 					sk->sk_err = ENETDOWN;
 					if (!sock_flag(sk, SOCK_DEAD))
 						sk->sk_error_report(sk);
@@ -2259,11 +2281,8 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 		case NETDEV_UP:
 			if (dev->ifindex == po->ifindex) {
 				spin_lock(&po->bind_lock);
-				if (po->num && !po->running) {
-					dev_add_pack(&po->prot_hook);
-					sock_hold(sk);
-					po->running = 1;
-				}
+				if (po->num)
+					register_prot_hook(sk);
 				spin_unlock(&po->bind_lock);
 			}
 			break;
@@ -2530,10 +2549,8 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 	was_running = po->running;
 	num = po->num;
 	if (was_running) {
-		__dev_remove_pack(&po->prot_hook);
 		po->num = 0;
-		po->running = 0;
-		__sock_put(sk);
+		__unregister_prot_hook(sk, false);
 	}
 	spin_unlock(&po->bind_lock);
 
@@ -2564,11 +2581,9 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 	mutex_unlock(&po->pg_vec_lock);
 
 	spin_lock(&po->bind_lock);
-	if (was_running && !po->running) {
-		sock_hold(sk);
-		po->running = 1;
+	if (was_running) {
 		po->num = num;
-		dev_add_pack(&po->prot_hook);
+		register_prot_hook(sk);
 	}
 	spin_unlock(&po->bind_lock);
 
-- 
1.7.5.4


^ permalink raw reply related

* [PATCH v2 2/4] packet: Add fanout support.
From: David Miller @ 2011-07-05  9:04 UTC (permalink / raw)
  To: victor; +Cc: netdev


Fanouts allow packet capturing to be demuxed to a set of AF_PACKET
sockets.  Two fanout policies are implemented:

1) Hashing based upon skb->rxhash

2) Pure round-robin

An AF_PACKET socket must be fully bound before it tries to add itself
to a fanout.  All AF_PACKET sockets trying to join the same fanout
must all have the same bind settings.

Fanouts are identified (within a network namespace) by a 16-bit ID.
The first socket to try to add itself to a fanout with a particular
ID, creates that fanout.  When the last socket leaves the fanout
(which happens only when the socket is closed), that fanout is
destroyed.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/if_packet.h |    4 +
 net/packet/af_packet.c    |  252 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 251 insertions(+), 5 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 7b31863..1efa1cb 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -49,6 +49,10 @@ struct sockaddr_ll {
 #define PACKET_VNET_HDR			15
 #define PACKET_TX_TIMESTAMP		16
 #define PACKET_TIMESTAMP		17
+#define PACKET_FANOUT			18
+
+#define PACKET_FANOUT_HASH		0
+#define PACKET_FANOUT_LB		1
 
 struct tpacket_stats {
 	unsigned int	tp_packets;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bb281bf..696d19c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -187,9 +187,11 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg);
 
 static void packet_flush_mclist(struct sock *sk);
 
+struct packet_fanout;
 struct packet_sock {
 	/* struct sock has to be the first member of packet_sock */
 	struct sock		sk;
+	struct packet_fanout	*fanout;
 	struct tpacket_stats	stats;
 	struct packet_ring_buffer	rx_ring;
 	struct packet_ring_buffer	tx_ring;
@@ -212,6 +214,24 @@ struct packet_sock {
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
 };
 
+#define PACKET_FANOUT_MAX	256
+
+struct packet_fanout {
+#ifdef CONFIG_NET_NS
+	struct net		*net;
+#endif
+	int			num_members;
+	u16			id;
+	u8			type;
+	u8			pad;
+	atomic_t		rr_cur;
+	struct list_head	list;
+	struct sock		*arr[PACKET_FANOUT_MAX];
+	spinlock_t		lock;
+	atomic_t		sk_ref;
+	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
+};
+
 struct packet_skb_cb {
 	unsigned int origlen;
 	union {
@@ -227,6 +247,9 @@ static inline struct packet_sock *pkt_sk(struct sock *sk)
 	return (struct packet_sock *)sk;
 }
 
+static void __fanout_unlink(struct sock *sk, struct packet_sock *po);
+static void __fanout_link(struct sock *sk, struct packet_sock *po);
+
 /* register_prot_hook must be invoked with the po->bind_lock held,
  * or from a context in which asynchronous accesses to the packet
  * socket is not possible (packet_create()).
@@ -235,7 +258,10 @@ static void register_prot_hook(struct sock *sk)
 {
 	struct packet_sock *po = pkt_sk(sk);
 	if (!po->running) {
-		dev_add_pack(&po->prot_hook);
+		if (po->fanout)
+			__fanout_link(sk, po);
+		else
+			dev_add_pack(&po->prot_hook);
 		sock_hold(sk);
 		po->running = 1;
 	}
@@ -253,7 +279,10 @@ static void __unregister_prot_hook(struct sock *sk, bool sync)
 	struct packet_sock *po = pkt_sk(sk);
 
 	po->running = 0;
-	__dev_remove_pack(&po->prot_hook);
+	if (po->fanout)
+		__fanout_unlink(sk, po);
+	else
+		__dev_remove_pack(&po->prot_hook);
 	__sock_put(sk);
 
 	if (sync) {
@@ -388,6 +417,197 @@ static void packet_sock_destruct(struct sock *sk)
 	sk_refcnt_debug_dec(sk);
 }
 
+static int fanout_rr_next(struct packet_fanout *f)
+{
+	int x = atomic_read(&f->rr_cur) + 1;
+
+	if (x >= f->num_members)
+		x = 0;
+
+	return x;
+}
+
+static struct sock *fanout_demux_hash(struct packet_fanout *f, struct sk_buff *skb)
+{
+	u32 idx, hash = skb->rxhash;
+
+	idx = ((u64)hash * f->num_members) >> 32;
+
+	return f->arr[idx];
+}
+
+static struct sock *fanout_demux_lb(struct packet_fanout *f, struct sk_buff *skb)
+{
+	int cur, old;
+
+	cur = atomic_read(&f->rr_cur);
+	while ((old = atomic_cmpxchg(&f->rr_cur, cur,
+				     fanout_rr_next(f))) != cur)
+		cur = old;
+	return f->arr[cur];
+}
+
+static int packet_rcv_fanout_hash(struct sk_buff *skb, struct net_device *dev,
+				  struct packet_type *pt, struct net_device *orig_dev)
+{
+	struct packet_fanout *f = pt->af_packet_priv;
+	struct packet_sock *po;
+	struct sock *sk;
+
+	if (!net_eq(dev_net(dev), read_pnet(&f->net))) {
+		kfree_skb(skb);
+		return 0;
+	}
+
+	skb_get_rxhash(skb);
+
+	sk = fanout_demux_hash(f, skb);
+	po = pkt_sk(sk);
+
+	return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev);
+}
+
+static int packet_rcv_fanout_lb(struct sk_buff *skb, struct net_device *dev,
+				struct packet_type *pt, struct net_device *orig_dev)
+{
+	struct packet_fanout *f = pt->af_packet_priv;
+	struct packet_sock *po;
+	struct sock *sk;
+
+	if (!net_eq(dev_net(dev), read_pnet(&f->net))) {
+		kfree_skb(skb);
+		return 0;
+	}
+
+	sk = fanout_demux_lb(f, skb);
+	po = pkt_sk(sk);
+
+	return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev);
+}
+
+static DEFINE_MUTEX(fanout_mutex);
+static LIST_HEAD(fanout_list);
+
+static void __fanout_link(struct sock *sk, struct packet_sock *po)
+{
+	struct packet_fanout *f = po->fanout;
+
+	spin_lock(&f->lock);
+	f->arr[f->num_members] = sk;
+	smp_wmb();
+	f->num_members++;
+	spin_unlock(&f->lock);
+}
+
+static void __fanout_unlink(struct sock *sk, struct packet_sock *po)
+{
+	struct packet_fanout *f = po->fanout;
+	int i;
+
+	spin_unlock(&f->lock);
+	for (i = 0; i < f->num_members; i++) {
+		if (f->arr[i] == sk)
+			break;
+	}
+	BUG_ON(i >= f->num_members);
+	f->arr[i] = f->arr[f->num_members - 1];
+	f->num_members--;
+	spin_unlock(&f->lock);
+}
+
+static int fanout_add(struct sock *sk, u16 id, u8 type)
+{
+	struct packet_sock *po = pkt_sk(sk);
+	struct packet_fanout *f, *match;
+	int err;
+
+	switch (type) {
+	case PACKET_FANOUT_HASH:
+	case PACKET_FANOUT_LB:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (!po->running)
+		return -EINVAL;
+
+	if (po->fanout)
+		return -EALREADY;
+
+	mutex_lock(&fanout_mutex);
+	match = NULL;
+	list_for_each_entry(f, &fanout_list, list) {
+		if (f->id == id &&
+		    read_pnet(&f->net) == sock_net(sk)) {
+			match = f;
+			break;
+		}
+	}
+	if (!match) {
+		match = kzalloc(sizeof(*match), GFP_KERNEL);
+		if (match) {
+			write_pnet(&match->net, sock_net(sk));
+			match->id = id;
+			match->type = type;
+			atomic_set(&match->rr_cur, 0);
+			INIT_LIST_HEAD(&match->list);
+			spin_lock_init(&match->lock);
+			atomic_set(&match->sk_ref, 0);
+			match->prot_hook.type = po->prot_hook.type;
+			match->prot_hook.dev = po->prot_hook.dev;
+			switch (type) {
+			case PACKET_FANOUT_HASH:
+				match->prot_hook.func = packet_rcv_fanout_hash;
+				break;
+			case PACKET_FANOUT_LB:
+				match->prot_hook.func = packet_rcv_fanout_lb;
+				break;
+			}
+			match->prot_hook.af_packet_priv = match;
+			dev_add_pack(&match->prot_hook);
+			list_add(&match->list, &fanout_list);
+		}
+	}
+	err = -ENOMEM;
+	if (match) {
+		err = -EINVAL;
+		if (match->type == type &&
+		    match->prot_hook.type == po->prot_hook.type &&
+		    match->prot_hook.dev == po->prot_hook.dev) {
+			err = -ENOSPC;
+			if (atomic_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
+				__dev_remove_pack(&po->prot_hook);
+				po->fanout = match;
+				atomic_inc(&match->sk_ref);
+				__fanout_link(sk, po);
+				err = 0;
+			}
+		}
+	}
+	mutex_unlock(&fanout_mutex);
+	return err;
+}
+
+static void fanout_release(struct sock *sk)
+{
+	struct packet_sock *po = pkt_sk(sk);
+	struct packet_fanout *f;
+
+	f = po->fanout;
+	if (!f)
+		return;
+
+	po->fanout = NULL;
+
+	mutex_lock(&fanout_mutex);
+	if (atomic_dec_and_test(&f->sk_ref)) {
+		list_del(&f->list);
+		dev_remove_pack(&f->prot_hook);
+		kfree(f);
+	}
+	mutex_unlock(&fanout_mutex);
+}
 
 static const struct proto_ops packet_ops;
 
@@ -1398,6 +1618,8 @@ static int packet_release(struct socket *sock)
 	if (po->tx_ring.pg_vec)
 		packet_set_ring(sk, &req, 1, 1);
 
+	fanout_release(sk);
+
 	synchronize_net();
 	/*
 	 *	Now the socket is dead. No more input will appear.
@@ -1421,9 +1643,9 @@ static int packet_release(struct socket *sock)
 static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protocol)
 {
 	struct packet_sock *po = pkt_sk(sk);
-	/*
-	 *	Detach an existing hook if present.
-	 */
+
+	if (po->fanout)
+		return -EINVAL;
 
 	lock_sock(sk);
 
@@ -2133,6 +2355,17 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 		po->tp_tstamp = val;
 		return 0;
 	}
+	case PACKET_FANOUT:
+	{
+		int val;
+
+		if (optlen != sizeof(val))
+			return -EINVAL;
+		if (copy_from_user(&val, optval, sizeof(val)))
+			return -EFAULT;
+
+		return fanout_add(sk, val & 0xffff, val >> 16);
+	}
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -2231,6 +2464,15 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 		val = po->tp_tstamp;
 		data = &val;
 		break;
+	case PACKET_FANOUT:
+		if (len > sizeof(int))
+			len = sizeof(int);
+		val = (po->fanout ?
+		       ((u32)po->fanout->id |
+			((u32)po->fanout->type << 16)) :
+		       0);
+		data = &val;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.7.5.4


^ permalink raw reply related

* [PATCH v2 3/4] ipv4: Add ip_defrag() agent IP_DEFRAG_AF_PACKET.
From: David Miller @ 2011-07-05  9:04 UTC (permalink / raw)
  To: victor; +Cc: netdev


Elide the ICMP on frag queue timeouts unconditionally for
this user.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip.h       |    3 ++-
 net/ipv4/ip_fragment.c |    5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 9fa9416..aa76c7a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -404,7 +404,8 @@ enum ip_defrag_users {
 	__IP_DEFRAG_CONNTRACK_BRIDGE_IN = IP_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX,
 	IP_DEFRAG_VS_IN,
 	IP_DEFRAG_VS_OUT,
-	IP_DEFRAG_VS_FWD
+	IP_DEFRAG_VS_FWD,
+	IP_DEFRAG_AF_PACKET,
 };
 
 int ip_defrag(struct sk_buff *skb, u32 user);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 0ad6035..0e0ab98 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -261,8 +261,9 @@ static void ip_expire(unsigned long arg)
 		 * Only an end host needs to send an ICMP
 		 * "Fragment Reassembly Timeout" message, per RFC792.
 		 */
-		if (qp->user == IP_DEFRAG_CONNTRACK_IN &&
-		    skb_rtable(head)->rt_type != RTN_LOCAL)
+		if (qp->user == IP_DEFRAG_AF_PACKET ||
+		    (qp->user == IP_DEFRAG_CONNTRACK_IN &&
+		     skb_rtable(head)->rt_type != RTN_LOCAL))
 			goto out_rcu_unlock;
 
 
-- 
1.7.5.4


^ permalink raw reply related

* [PATCH 4/4] packet: Add pre-defragmentation support for ipv4 fanouts.
From: David Miller @ 2011-07-05  9:04 UTC (permalink / raw)
  To: victor; +Cc: netdev


The skb->rxhash cannot be properly computed if the
packet is a fragment.  To alleviate this, allow the
AF_PACKET client to ask for defragmentation to be
done at demux time.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/if_packet.h |    1 +
 net/packet/af_packet.c    |   50 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 1efa1cb..84e684e 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -53,6 +53,7 @@ struct sockaddr_ll {
 
 #define PACKET_FANOUT_HASH		0
 #define PACKET_FANOUT_LB		1
+#define PACKET_FANOUT_FLAG_DEFRAG	0x8000
 
 struct tpacket_stats {
 	unsigned int	tp_packets;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 696d19c..6f381ba 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -223,7 +223,7 @@ struct packet_fanout {
 	int			num_members;
 	u16			id;
 	u8			type;
-	u8			pad;
+	u8			defrag;
 	atomic_t		rr_cur;
 	struct list_head	list;
 	struct sock		*arr[PACKET_FANOUT_MAX];
@@ -447,6 +447,41 @@ static struct sock *fanout_demux_lb(struct packet_fanout *f, struct sk_buff *skb
 	return f->arr[cur];
 }
 
+static struct sk_buff *fanout_check_defrag(struct sk_buff *skb)
+{
+	const struct iphdr *iph;
+	u32 len;
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return skb;
+
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		return skb;
+
+	iph = ip_hdr(skb);
+	if (iph->ihl < 5 || iph->version != 4)
+		return skb;
+	if (!pskb_may_pull(skb, iph->ihl*4))
+		return skb;
+	iph = ip_hdr(skb);
+	len = ntohs(iph->tot_len);
+	if (skb->len < len || len < (iph->ihl * 4))
+		return skb;
+
+	if (ip_is_fragment(ip_hdr(skb))) {
+		skb = skb_clone(skb, GFP_ATOMIC);
+		if (skb) {
+			if (pskb_trim_rcsum(skb, len))
+				return skb;
+			memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+			if (ip_defrag(skb, IP_DEFRAG_AF_PACKET))
+				return NULL;
+			skb->rxhash = 0;
+		}
+	}
+	return skb;
+}
+
 static int packet_rcv_fanout_hash(struct sk_buff *skb, struct net_device *dev,
 				  struct packet_type *pt, struct net_device *orig_dev)
 {
@@ -459,6 +494,12 @@ static int packet_rcv_fanout_hash(struct sk_buff *skb, struct net_device *dev,
 		return 0;
 	}
 
+	if (f->defrag) {
+		skb = fanout_check_defrag(skb);
+		if (!skb)
+			return 0;
+	}
+
 	skb_get_rxhash(skb);
 
 	sk = fanout_demux_hash(f, skb);
@@ -515,10 +556,12 @@ static void __fanout_unlink(struct sock *sk, struct packet_sock *po)
 	spin_unlock(&f->lock);
 }
 
-static int fanout_add(struct sock *sk, u16 id, u8 type)
+static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 {
 	struct packet_sock *po = pkt_sk(sk);
 	struct packet_fanout *f, *match;
+	u8 type = type_flags & 0xff;
+	u8 defrag = (type_flags & PACKET_FANOUT_FLAG_DEFRAG) ? 1 : 0;
 	int err;
 
 	switch (type) {
@@ -544,12 +587,15 @@ static int fanout_add(struct sock *sk, u16 id, u8 type)
 			break;
 		}
 	}
+	if (match && match->defrag != defrag)
+		return -EINVAL;
 	if (!match) {
 		match = kzalloc(sizeof(*match), GFP_KERNEL);
 		if (match) {
 			write_pnet(&match->net, sock_net(sk));
 			match->id = id;
 			match->type = type;
+			match->defrag = defrag;
 			atomic_set(&match->rr_cur, 0);
 			INIT_LIST_HEAD(&match->list);
 			spin_lock_init(&match->lock);
-- 
1.7.5.4


^ permalink raw reply related

* Re: [RFC][PATCH 1/2] net: vlan: enable GSO by default
From: Shan Wei @ 2011-07-05  9:35 UTC (permalink / raw)
  To: Shan Wei
  Cc: Patrick McHardy, David Miller, netdev, Michał Mirosław,
	Ben Hutchings
In-Reply-To: <4E0406C4.6060004@cn.fujitsu.com>

Ping....

Shan Wei wrote, at 06/24/2011 11:38 AM:
> Currently, GSO for vlan device is off, and can't be set to on.
> Although underlying device don't support TSO, we still
> should use software segments for vlan device.
> 
> In vlan_dev_fix_features(), final features is decided by
> features of real device and vlan_features of real device.
> 
> real_dev->vlan_features is initialized in register_netdevice()
> only with NETIF_F_GRO, not NETIF_F_GSO.
> 
> So, now GRO is ok, but GSO is broken by default.
> 
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
> ---
>  net/8021q/vlan_dev.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 1c9aa8c..d8f45ba 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -588,9 +588,14 @@ static void vlan_dev_uninit(struct net_device *dev)
>  static u32 vlan_dev_fix_features(struct net_device *dev, u32 features)
>  {
>  	struct net_device *real_dev = vlan_dev_info(dev)->real_dev;
> +	u32 old_features = features;
>  
>  	features &= real_dev->features;
>  	features &= real_dev->vlan_features;
> +
> +	if (old_features & NETIF_F_SOFT_FEATURES)
> +		features |= old_features & NETIF_F_SOFT_FEATURES;
> +
>  	if (dev_ethtool_get_rx_csum(real_dev))
>  		features |= NETIF_F_RXCSUM;
>  	features |= NETIF_F_LLTX;


-- 

Best Regards
-----
Shan Wei

^ permalink raw reply

* Re: [PATCH 4/4] packet: Add pre-defragmentation support for ipv4 fanouts.
From: Victor Julien @ 2011-07-05  9:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110705.020426.1881513648974221645.davem@davemloft.net>

On 07/05/2011 11:04 AM, David Miller wrote:
> 
> The skb->rxhash cannot be properly computed if the
> packet is a fragment.  To alleviate this, allow the
> AF_PACKET client to ask for defragmentation to be
> done at demux time.

Does the same limitation exist for IPv6 packets? If I read the patch
correctly the defrag is only done for IPv4.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


^ permalink raw reply

* [PATCH net-next 1/6] r8169: adjust some registers
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang

Define new registers and modify some existing ones.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   30 +++++++++++++++++++++++-------
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index fbd6838..701ab6b 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -327,7 +327,7 @@ enum rtl8168_8101_registers {
 #define	EPHYAR_REG_SHIFT		16
 #define	EPHYAR_DATA_MASK		0xffff
 	DLLPR			= 0xd0,
-#define	PM_SWITCH			(1 << 6)
+#define	PFM_EN				(1 << 6)
 	DBG_REG			= 0xd1,
 #define	FIX_NAK_1			(1 << 4)
 #define	FIX_NAK_2			(1 << 3)
@@ -335,6 +335,7 @@ enum rtl8168_8101_registers {
 	MCU			= 0xd3,
 #define	EN_NDP				(1 << 3)
 #define	EN_OOB_RESET			(1 << 2)
+#define NOW_IS_OOB			(1 << 7)
 	EFUSEAR			= 0xdc,
 #define	EFUSEAR_FLAG			0x80000000
 #define	EFUSEAR_WRITE_CMD		0x80000000
@@ -345,18 +346,31 @@ enum rtl8168_8101_registers {
 };
 
 enum rtl8168_registers {
+	LED_FREQ		= 0x1a,
+	EEE_LED			= 0x1b,
+
+	/* TxConfig */
+#define AUTO_FIFO			(1 << 7)
+#define TX_EMPTY			(1 << 11)
+
+	/* RxConfig */
+#define RX128_INT_EN			(1 << 15) /* 8111c and later */
+#define RX_MULTI_EN			(1 << 14) /* 8111c only */
+
 	ERIDR			= 0x70,
 	ERIAR			= 0x74,
 #define ERIAR_FLAG			0x80000000
 #define ERIAR_WRITE_CMD			0x80000000
 #define ERIAR_READ_CMD			0x00000000
 #define ERIAR_ADDR_BYTE_ALIGN		4
-#define ERIAR_EXGMAC			0
-#define ERIAR_MSIX			1
-#define ERIAR_ASF			2
 #define ERIAR_TYPE_SHIFT		16
-#define ERIAR_BYTEEN			0x0f
-#define ERIAR_BYTEEN_SHIFT		12
+#define ERIAR_EXGMAC			(0x00 << ERIAR_TYPE_SHIFT)
+#define ERIAR_MSIX			(0x01 << ERIAR_TYPE_SHIFT)
+#define ERIAR_ASF			(0x02 << ERIAR_TYPE_SHIFT)
+#define ERIAR_MASK_SHIFT		12
+#define ERIAR_MASK_0001			(0x1 << ERIAR_MASK_SHIFT)
+#define ERIAR_MASK_0011			(0x3 << ERIAR_MASK_SHIFT)
+#define ERIAR_MASK_1111			(0xf << ERIAR_MASK_SHIFT)
 	EPHY_RXER_NUM		= 0x7c,
 	OCPDR			= 0xb0,	/* OCP GPHY access */
 #define OCPDR_WRITE_CMD			0x80000000
@@ -371,6 +385,7 @@ enum rtl8168_registers {
 	RDSAR1			= 0xd0,	/* 8168c only. Undocumented on 8168dp */
 	MISC			= 0xf0,	/* 8168e only. */
 #define TXPLA_RST			(1 << 29)
+#define PWM_EN				(1 << 22)
 };
 
 enum rtl_register_content {
@@ -395,6 +410,7 @@ enum rtl_register_content {
 	RxCRC	= (1 << 19),
 
 	/* ChipCmdBits */
+	StopReq		= 0x80,
 	CmdReset	= 0x10,
 	CmdRxEnb	= 0x08,
 	CmdTxEnb	= 0x04,
@@ -4368,7 +4384,7 @@ static void rtl_hw_start_8105e_1(void __iomem *ioaddr, struct pci_dev *pdev)
 	RTL_W32(FuncEvent, RTL_R32(FuncEvent) & ~0x010000);
 
 	RTL_W8(MCU, RTL_R8(MCU) | EN_NDP | EN_OOB_RESET);
-	RTL_W8(DLLPR, RTL_R8(DLLPR) | PM_SWITCH);
+	RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
 
 	rtl_ephy_init(ioaddr, e_info_8105e_1, ARRAY_SIZE(e_info_8105e_1));
 }
-- 
1.7.3.2

^ permalink raw reply related

* [PATCH net-next 2/6] r8169: modify the flow hw reset
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Replace rtl8169_asic_down with rtl8169_hw_reset. Clear RxConfig
bit 0 ~ 3 and do some checking before reset. Remove hw reset
which is before hw_start because reset would be done in close or
down.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   43 ++++++++++++++++++++++++-------------------
 1 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 701ab6b..cdbbe47 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -731,6 +731,8 @@ static int rtl8169_poll(struct napi_struct *napi, int budget);
 static const unsigned int rtl8169_rx_config =
 	(RX_FIFO_THRESH << RxCfgFIFOShift) | (RX_DMA_BURST << RxCfgDMAShift);
 
+static void rtl8169_init_ring_indexes(struct rtl8169_private *tp);
+
 static u32 ocp_read(struct rtl8169_private *tp, u8 mask, u16 reg)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -1076,13 +1078,6 @@ static void rtl8169_irq_mask_and_ack(void __iomem *ioaddr)
 	RTL_W16(IntrStatus, 0xffff);
 }
 
-static void rtl8169_asic_down(void __iomem *ioaddr)
-{
-	RTL_W8(ChipCmd, 0x00);
-	rtl8169_irq_mask_and_ack(ioaddr);
-	RTL_R16(CPlusCmd);
-}
-
 static unsigned int rtl8169_tbi_reset_pending(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -3352,10 +3347,12 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
 
 	/* Check that the chip has finished the reset. */
 	for (i = 0; i < 100; i++) {
+		udelay(100);
 		if ((RTL_R8(ChipCmd) & CmdReset) == 0)
 			break;
-		msleep_interruptible(1);
 	}
+
+	rtl8169_init_ring_indexes(tp);
 }
 
 static int __devinit
@@ -3737,6 +3734,16 @@ err_pm_runtime_put:
 	goto out;
 }
 
+static void rtl_rx_close(struct rtl8169_private *tp)
+{
+	void __iomem *ioaddr = tp->mmio_addr;
+	u32 rxcfg = RTL_R32(RxConfig);
+
+	rxcfg &= ~(AcceptBroadcast | AcceptMulticast |
+		   AcceptMyPhys | AcceptAllPhys);
+	RTL_W32(RxConfig, rxcfg);
+}
+
 static void rtl8169_hw_reset(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -3744,19 +3751,20 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 	/* Disable interrupts */
 	rtl8169_irq_mask_and_ack(ioaddr);
 
+	rtl_rx_close(tp);
+
 	if (tp->mac_version == RTL_GIGA_MAC_VER_27 ||
 	    tp->mac_version == RTL_GIGA_MAC_VER_28 ||
 	    tp->mac_version == RTL_GIGA_MAC_VER_31) {
 		while (RTL_R8(TxPoll) & NPQ)
 			udelay(20);
 
+	} else {
+		RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
+		udelay(100);
 	}
 
-	/* Reset the chipset */
-	RTL_W8(ChipCmd, CmdReset);
-
-	/* PCI commit */
-	RTL_R8(ChipCmd);
+	rtl_hw_reset(tp);
 }
 
 static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
@@ -3776,8 +3784,6 @@ static void rtl_hw_start(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 
-	rtl_hw_reset(tp);
-
 	tp->hw_start(dev);
 
 	netif_start_queue(dev);
@@ -4718,7 +4724,6 @@ static void rtl8169_reset_task(struct work_struct *work)
 
 	rtl8169_tx_clear(tp);
 
-	rtl8169_init_ring_indexes(tp);
 	rtl_hw_start(dev);
 	netif_wake_queue(dev);
 	rtl8169_check_link_status(dev, tp, tp->mmio_addr);
@@ -5132,7 +5137,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		 * the chip, so just exit the loop.
 		 */
 		if (unlikely(!netif_running(dev))) {
-			rtl8169_asic_down(ioaddr);
+			rtl8169_hw_reset(tp);
 			break;
 		}
 
@@ -5255,7 +5260,7 @@ static void rtl8169_down(struct net_device *dev)
 
 	spin_lock_irq(&tp->lock);
 
-	rtl8169_asic_down(ioaddr);
+	rtl8169_hw_reset(tp);
 	/*
 	 * At this point device interrupts can not be enabled in any function,
 	 * as netif_running is not true (rtl8169_interrupt, rtl8169_reset_task,
@@ -5509,7 +5514,7 @@ static void rtl_shutdown(struct pci_dev *pdev)
 
 	spin_lock_irq(&tp->lock);
 
-	rtl8169_asic_down(ioaddr);
+	rtl8169_hw_reset(tp);
 
 	spin_unlock_irq(&tp->lock);
 
-- 
1.7.3.2

^ permalink raw reply related

* [PATCH net-next 3/6] r8169: adjust the settings about RxConfig
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Set the init value before reset in probe function. And then just
modify the relative bits and keep the init settings.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   63 +++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index cdbbe47..3aeae68 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -71,7 +71,7 @@ static const int multicast_filter_limit = 32;
 
 #define MAX_READ_REQUEST_SHIFT	12
 #define RX_FIFO_THRESH	7	/* 7 means NO threshold, Rx buffer level before first PCI xfer. */
-#define RX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
+#define RX_DMA_BURST	7	/* Maximum PCI burst, '7' is Unlimited */
 #define TX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
 #define SafeMtu		0x1c20	/* ... actually life sucks beyond ~7k */
 #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
@@ -272,9 +272,6 @@ enum rtl_registers {
 	IntrStatus	= 0x3e,
 	TxConfig	= 0x40,
 	RxConfig	= 0x44,
-
-#define RTL_RX_CONFIG_MASK		0xff7e1880u
-
 	RxMissed	= 0x4c,
 	Cfg9346		= 0x50,
 	Config0		= 0x51,
@@ -727,10 +724,6 @@ static int rtl8169_change_mtu(struct net_device *dev, int new_mtu);
 static void rtl8169_down(struct net_device *dev);
 static void rtl8169_rx_clear(struct rtl8169_private *tp);
 static int rtl8169_poll(struct napi_struct *napi, int budget);
-
-static const unsigned int rtl8169_rx_config =
-	(RX_FIFO_THRESH << RxCfgFIFOShift) | (RX_DMA_BURST << RxCfgDMAShift);
-
 static void rtl8169_init_ring_indexes(struct rtl8169_private *tp);
 
 static u32 ocp_read(struct rtl8169_private *tp, u8 mask, u16 reg)
@@ -3337,6 +3330,45 @@ static void __devinit rtl_init_pll_power_ops(struct rtl8169_private *tp)
 	}
 }
 
+static void rtl_init_rxcfg(struct rtl8169_private *tp)
+{
+	void __iomem *ioaddr = tp->mmio_addr;
+
+	switch (tp->mac_version) {
+	case RTL_GIGA_MAC_VER_01:
+	case RTL_GIGA_MAC_VER_02:
+	case RTL_GIGA_MAC_VER_03:
+	case RTL_GIGA_MAC_VER_04:
+	case RTL_GIGA_MAC_VER_05:
+	case RTL_GIGA_MAC_VER_06:
+	case RTL_GIGA_MAC_VER_10:
+	case RTL_GIGA_MAC_VER_11:
+	case RTL_GIGA_MAC_VER_12:
+	case RTL_GIGA_MAC_VER_13:
+	case RTL_GIGA_MAC_VER_14:
+	case RTL_GIGA_MAC_VER_15:
+	case RTL_GIGA_MAC_VER_16:
+	case RTL_GIGA_MAC_VER_17:
+		RTL_W32(RxConfig, (RX_FIFO_THRESH << RxCfgFIFOShift) |
+			(RX_DMA_BURST << RxCfgDMAShift));
+		break;
+	case RTL_GIGA_MAC_VER_18:
+	case RTL_GIGA_MAC_VER_19:
+	case RTL_GIGA_MAC_VER_20:
+	case RTL_GIGA_MAC_VER_21:
+	case RTL_GIGA_MAC_VER_22:
+	case RTL_GIGA_MAC_VER_23:
+	case RTL_GIGA_MAC_VER_24:
+		RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN |
+			(RX_DMA_BURST << RxCfgDMAShift));
+		break;
+	default:
+		RTL_W32(RxConfig, RX128_INT_EN |
+			(RX_DMA_BURST << RxCfgDMAShift));
+		break;
+	}
+}
+
 static void rtl_hw_reset(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -3459,6 +3491,11 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!pci_is_pcie(pdev))
 		netif_info(tp, probe, dev, "not PCI Express\n");
 
+	/* Identify chip attached to board */
+	rtl8169_get_mac_version(tp, dev, cfg->default_ver);
+
+	rtl_init_rxcfg(tp);
+
 	RTL_W16(IntrMask, 0x0000);
 
 	rtl_hw_reset(tp);
@@ -3467,9 +3504,6 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_master(pdev);
 
-	/* Identify chip attached to board */
-	rtl8169_get_mac_version(tp, dev, cfg->default_ver);
-
 	/*
 	 * Pretend we are using VLANs; This bypasses a nasty bug where
 	 * Interrupts stop flowing on high load on 8110SCd controllers.
@@ -3770,10 +3804,6 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
-	u32 cfg = rtl8169_rx_config;
-
-	cfg |= (RTL_R32(RxConfig) & RTL_RX_CONFIG_MASK);
-	RTL_W32(RxConfig, cfg);
 
 	/* Set DMA burst size and Interframe Gap Time */
 	RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
@@ -5343,8 +5373,7 @@ static void rtl_set_rx_mode(struct net_device *dev)
 
 	spin_lock_irqsave(&tp->lock, flags);
 
-	tmp = rtl8169_rx_config | rx_mode |
-	      (RTL_R32(RxConfig) & RTL_RX_CONFIG_MASK);
+	tmp = RTL_R32(RxConfig) | rx_mode;
 
 	if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
 		u32 data = mc_filter[0];
-- 
1.7.3.2

^ permalink raw reply related

* [PATCH net-next 4/6] r8169: add ERI functions
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Add the ERI functions which would be used by the new chips.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 3aeae68..fa2c139 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1046,6 +1046,49 @@ static u32 rtl_csi_read(void __iomem *ioaddr, int addr)
 	return value;
 }
 
+static
+void rtl_eri_write(void __iomem *ioaddr, int addr, u32 mask, u32 val, int type)
+{
+	unsigned int i;
+
+	BUG_ON((addr & 3) || (mask == 0));
+	RTL_W32(ERIDR, val);
+	RTL_W32(ERIAR, ERIAR_WRITE_CMD | type | mask | addr);
+
+	for (i = 0; i < 100; i++) {
+		udelay(100);
+		if (!(RTL_R32(ERIAR) & ERIAR_FLAG))
+			break;
+	}
+}
+
+static  u32 rtl_eri_read(void __iomem *ioaddr, int addr, int type)
+{
+	unsigned int i;
+	u32 value = ~0x00;
+
+	RTL_W32(ERIAR, ERIAR_READ_CMD | type | ERIAR_MASK_1111 | addr);
+
+	for (i = 0; i < 100; i++) {
+		udelay(100);
+		if (RTL_R32(ERIAR) & ERIAR_FLAG) {
+			value = RTL_R32(ERIDR);
+			break;
+		}
+	}
+
+	return value;
+}
+
+static void
+rtl_eri_write_w0w1(void __iomem *ioaddr, int addr, u32 mask, u32 m, u32 p, int type)
+{
+	u32 val;
+
+	val = rtl_eri_read(ioaddr, addr, type);
+	rtl_eri_write(ioaddr, addr, mask, (val & ~m) | p, type);
+}
+
 static u8 rtl8168d_efuse_read(void __iomem *ioaddr, int reg_addr)
 {
 	u8 value = 0xff;
-- 
1.7.3.2

^ permalink raw reply related

* [PATCH net-next 5/6] r8169: fix wake on lan setting for 8111E
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Only 8111E needs enable RxConfig bit 0 ~ 3 when suspending or
shutdowning when supporting wake on lan.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index fa2c139..01da16a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3266,8 +3266,10 @@ static void r8168_pll_power_down(struct rtl8169_private *tp)
 		rtl_writephy(tp, 0x1f, 0x0000);
 		rtl_writephy(tp, MII_BMCR, 0x0000);
 
-		RTL_W32(RxConfig, RTL_R32(RxConfig) |
-			AcceptBroadcast | AcceptMulticast | AcceptMyPhys);
+		if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
+		    tp->mac_version == RTL_GIGA_MAC_VER_33)
+			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
+				AcceptMulticast | AcceptMyPhys);
 		return;
 	}
 
-- 
1.7.3.2

^ permalink raw reply related

* [PATCH net-next 6/6] r8169: support RTL8111E-VL
From: Hayes Wang @ 2011-07-05  9:44 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Support RTL8111E-VL

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |  188 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 181 insertions(+), 7 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 01da16a..bee9e49 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -41,6 +41,7 @@
 #define FIRMWARE_8168D_2	"rtl_nic/rtl8168d-2.fw"
 #define FIRMWARE_8168E_1	"rtl_nic/rtl8168e-1.fw"
 #define FIRMWARE_8168E_2	"rtl_nic/rtl8168e-2.fw"
+#define FIRMWARE_8168E_3	"rtl_nic/rtl8168e-3.fw"
 #define FIRMWARE_8105E_1	"rtl_nic/rtl8105e-1.fw"
 
 #ifdef RTL8169_DEBUG
@@ -133,6 +134,7 @@ enum mac_version {
 	RTL_GIGA_MAC_VER_31,
 	RTL_GIGA_MAC_VER_32,
 	RTL_GIGA_MAC_VER_33,
+	RTL_GIGA_MAC_VER_34,
 	RTL_GIGA_MAC_NONE   = 0xff,
 };
 
@@ -216,7 +218,9 @@ static const struct {
 	[RTL_GIGA_MAC_VER_32] =
 		_R("RTL8168e/8111e",	RTL_TD_1, FIRMWARE_8168E_1),
 	[RTL_GIGA_MAC_VER_33] =
-		_R("RTL8168e/8111e",	RTL_TD_1, FIRMWARE_8168E_2)
+		_R("RTL8168e/8111e",	RTL_TD_1, FIRMWARE_8168E_2),
+	[RTL_GIGA_MAC_VER_34] =
+		_R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3)
 };
 #undef _R
 
@@ -1151,6 +1155,39 @@ static void rtl8169_xmii_reset_enable(struct rtl8169_private *tp)
 	rtl_writephy(tp, MII_BMCR, val & 0xffff);
 }
 
+static void rtl_link_chg_patch(struct rtl8169_private *tp)
+{
+	void __iomem *ioaddr = tp->mmio_addr;
+	struct net_device *dev = tp->dev;
+
+	if (!netif_running(dev))
+		return;
+
+	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		if (RTL_R8(PHYstatus) & _1000bpsF) {
+			rtl_eri_write(ioaddr, 0x1bc, ERIAR_MASK_1111,
+				      0x00000011, ERIAR_EXGMAC);
+			rtl_eri_write(ioaddr, 0x1dc, ERIAR_MASK_1111,
+				      0x00000005, ERIAR_EXGMAC);
+		} else if (RTL_R8(PHYstatus) & _100bps) {
+			rtl_eri_write(ioaddr, 0x1bc, ERIAR_MASK_1111,
+				      0x0000001f, ERIAR_EXGMAC);
+			rtl_eri_write(ioaddr, 0x1dc, ERIAR_MASK_1111,
+				      0x00000005, ERIAR_EXGMAC);
+		} else {
+			rtl_eri_write(ioaddr, 0x1bc, ERIAR_MASK_1111,
+				      0x0000001f, ERIAR_EXGMAC);
+			rtl_eri_write(ioaddr, 0x1dc, ERIAR_MASK_1111,
+				      0x0000003f, ERIAR_EXGMAC);
+		}
+		/* Reset packet filter */
+		rtl_eri_write_w0w1(ioaddr, 0xdc, ERIAR_MASK_0001, 0x01,
+				   0x00, ERIAR_EXGMAC);
+		rtl_eri_write_w0w1(ioaddr, 0xdc, ERIAR_MASK_0001, 0x00,
+				   0x01, ERIAR_EXGMAC);
+	}
+}
+
 static void __rtl8169_check_link_status(struct net_device *dev,
 					struct rtl8169_private *tp,
 					void __iomem *ioaddr, bool pm)
@@ -1159,6 +1196,7 @@ static void __rtl8169_check_link_status(struct net_device *dev,
 
 	spin_lock_irqsave(&tp->lock, flags);
 	if (tp->link_ok(ioaddr)) {
+		rtl_link_chg_patch(tp);
 		/* This is to cancel a scheduled suspend if there's one. */
 		if (pm)
 			pm_request_resume(&tp->pci_dev->dev);
@@ -1687,6 +1725,7 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
 		int mac_version;
 	} mac_info[] = {
 		/* 8168E family. */
+		{ 0x7c800000, 0x2c800000,	RTL_GIGA_MAC_VER_34 },
 		{ 0x7cf00000, 0x2c200000,	RTL_GIGA_MAC_VER_33 },
 		{ 0x7cf00000, 0x2c100000,	RTL_GIGA_MAC_VER_32 },
 		{ 0x7c800000, 0x2c000000,	RTL_GIGA_MAC_VER_33 },
@@ -2664,7 +2703,7 @@ static void rtl8168d_4_hw_phy_config(struct rtl8169_private *tp)
 	rtl_patchphy(tp, 0x0d, 1 << 5);
 }
 
-static void rtl8168e_hw_phy_config(struct rtl8169_private *tp)
+static void rtl8168e_1_hw_phy_config(struct rtl8169_private *tp)
 {
 	static const struct phy_reg phy_reg_init[] = {
 		/* Enable Delay cap */
@@ -2737,6 +2776,91 @@ static void rtl8168e_hw_phy_config(struct rtl8169_private *tp)
 	rtl_writephy(tp, 0x0d, 0x0000);
 }
 
+static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
+{
+	static const struct phy_reg phy_reg_init[] = {
+		/* Enable Delay cap */
+		{ 0x1f, 0x0004 },
+		{ 0x1f, 0x0007 },
+		{ 0x1e, 0x00ac },
+		{ 0x18, 0x0006 },
+		{ 0x1f, 0x0002 },
+		{ 0x1f, 0x0000 },
+		{ 0x1f, 0x0000 },
+
+		/* Channel estimation fine tune */
+		{ 0x1f, 0x0003 },
+		{ 0x09, 0xa20f },
+		{ 0x1f, 0x0000 },
+		{ 0x1f, 0x0000 },
+
+		/* Green Setting */
+		{ 0x1f, 0x0005 },
+		{ 0x05, 0x8b5b },
+		{ 0x06, 0x9222 },
+		{ 0x05, 0x8b6d },
+		{ 0x06, 0x8000 },
+		{ 0x05, 0x8b76 },
+		{ 0x06, 0x8000 },
+		{ 0x1f, 0x0000 }
+	};
+
+	rtl_apply_firmware(tp);
+
+	rtl_writephy_batch(tp, phy_reg_init, ARRAY_SIZE(phy_reg_init));
+
+	/* For 4-corner performance improve */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b80);
+	rtl_w1w0_phy(tp, 0x17, 0x0006, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* PHY auto speed down */
+	rtl_writephy(tp, 0x1f, 0x0004);
+	rtl_writephy(tp, 0x1f, 0x0007);
+	rtl_writephy(tp, 0x1e, 0x002D);
+	rtl_w1w0_phy(tp, 0x18, 0x0010, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0002);
+	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl_w1w0_phy(tp, 0x14, 0x8000, 0x0000);
+
+	/* improve 10M EEE waveform */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b86);
+	rtl_w1w0_phy(tp, 0x06, 0x0001, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* Improve 2-pair detection performance */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b85);
+	rtl_w1w0_phy(tp, 0x06, 0x4000, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* EEE setting */
+	rtl_eri_write_w0w1(tp->mmio_addr, 0x1b0, ERIAR_MASK_1111, 0x0003,
+			   0x0000, ERIAR_EXGMAC);
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b85);
+	rtl_w1w0_phy(tp, 0x06, 0x0000, 0x2000);
+	rtl_writephy(tp, 0x1f, 0x0004);
+	rtl_writephy(tp, 0x1f, 0x0007);
+	rtl_writephy(tp, 0x1e, 0x0020);
+	rtl_w1w0_phy(tp, 0x06, 0x0000, 0x0100);
+	rtl_writephy(tp, 0x1f, 0x0002);
+	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl_writephy(tp, 0x0d, 0x0007);
+	rtl_writephy(tp, 0x0e, 0x003c);
+	rtl_writephy(tp, 0x0d, 0x4007);
+	rtl_writephy(tp, 0x0e, 0x0000);
+	rtl_writephy(tp, 0x0d, 0x0000);
+
+	/* Green feature */
+	rtl_writephy(tp, 0x1f, 0x0003);
+	rtl_w1w0_phy(tp, 0x19, 0x0000, 0x0001);
+	rtl_w1w0_phy(tp, 0x10, 0x0000, 0x0400);
+	rtl_writephy(tp, 0x1f, 0x0000);
+}
+
 static void rtl8102e_hw_phy_config(struct rtl8169_private *tp)
 {
 	static const struct phy_reg phy_reg_init[] = {
@@ -2856,7 +2980,10 @@ static void rtl_hw_phy_config(struct net_device *dev)
 		break;
 	case RTL_GIGA_MAC_VER_32:
 	case RTL_GIGA_MAC_VER_33:
-		rtl8168e_hw_phy_config(tp);
+		rtl8168e_1_hw_phy_config(tp);
+		break;
+	case RTL_GIGA_MAC_VER_34:
+		rtl8168e_2_hw_phy_config(tp);
 		break;
 
 	default:
@@ -3235,6 +3362,7 @@ static void r8168_phy_power_down(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_28:
 	case RTL_GIGA_MAC_VER_31:
 		rtl_writephy(tp, 0x0e, 0x0200);
+	case RTL_GIGA_MAC_VER_34:
 	default:
 		rtl_writephy(tp, MII_BMCR, BMCR_PDOWN);
 		break;
@@ -3364,6 +3492,7 @@ static void __devinit rtl_init_pll_power_ops(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_31:
 	case RTL_GIGA_MAC_VER_32:
 	case RTL_GIGA_MAC_VER_33:
+	case RTL_GIGA_MAC_VER_34:
 		ops->down	= r8168_pll_power_down;
 		ops->up		= r8168_pll_power_up;
 		break;
@@ -3838,6 +3967,9 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 		while (RTL_R8(TxPoll) & NPQ)
 			udelay(20);
 
+	} else if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		while (!(RTL_R32(TxConfig) & TX_EMPTY))
+			udelay(100);
 	} else {
 		RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
 		udelay(100);
@@ -4245,9 +4377,9 @@ static void rtl_hw_start_8168d_4(void __iomem *ioaddr, struct pci_dev *pdev)
 	rtl_enable_clock_request(pdev);
 }
 
-static void rtl_hw_start_8168e(void __iomem *ioaddr, struct pci_dev *pdev)
+static void rtl_hw_start_8168e_1(void __iomem *ioaddr, struct pci_dev *pdev)
 {
-	static const struct ephy_info e_info_8168e[] = {
+	static const struct ephy_info e_info_8168e_1[] = {
 		{ 0x00, 0x0200,	0x0100 },
 		{ 0x00, 0x0000,	0x0004 },
 		{ 0x06, 0x0002,	0x0001 },
@@ -4265,7 +4397,7 @@ static void rtl_hw_start_8168e(void __iomem *ioaddr, struct pci_dev *pdev)
 
 	rtl_csi_access_enable_2(ioaddr);
 
-	rtl_ephy_init(ioaddr, e_info_8168e, ARRAY_SIZE(e_info_8168e));
+	rtl_ephy_init(ioaddr, e_info_8168e_1, ARRAY_SIZE(e_info_8168e_1));
 
 	rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
 
@@ -4280,6 +4412,45 @@ static void rtl_hw_start_8168e(void __iomem *ioaddr, struct pci_dev *pdev)
 	RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
 }
 
+static void rtl_hw_start_8168e_2(void __iomem *ioaddr, struct pci_dev *pdev)
+{
+	static const struct ephy_info e_info_8168e_2[] = {
+		{ 0x09, 0x0000,	0x0080 },
+		{ 0x19, 0x0000,	0x0224 }
+	};
+
+	rtl_csi_access_enable_1(ioaddr);
+
+	rtl_ephy_init(ioaddr, e_info_8168e_2, ARRAY_SIZE(e_info_8168e_2));
+
+	rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
+
+	rtl_eri_write(ioaddr, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xc8, ERIAR_MASK_1111, 0x00100002, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xe8, ERIAR_MASK_1111, 0x00100006, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xcc, ERIAR_MASK_1111, 0x00000050, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xd0, ERIAR_MASK_1111, 0x07ff0060, ERIAR_EXGMAC);
+	rtl_eri_write_w0w1(ioaddr, 0x1b0, ERIAR_MASK_0001, 0x00, 0x10,
+			   ERIAR_EXGMAC);
+	rtl_eri_write_w0w1(ioaddr, 0xd4, ERIAR_MASK_0011, 0xff00, 0x0c00,
+			   ERIAR_EXGMAC);
+
+	RTL_W8(MaxTxPacketSize, 0x27);
+
+	rtl_disable_clock_request(pdev);
+
+	RTL_W32(TxConfig, RTL_R32(TxConfig) | AUTO_FIFO);
+	RTL_W8(MCU, RTL_R8(MCU) & ~NOW_IS_OOB);
+
+	/* Adjust EEE LED frequency */
+	RTL_W8(EEE_LED, RTL_R8(EEE_LED) & ~0x07);
+
+	RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
+	RTL_W32(MISC, RTL_R32(MISC) | PWM_EN);
+	RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
+}
+
 static void rtl_hw_start_8168(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -4368,7 +4539,10 @@ static void rtl_hw_start_8168(struct net_device *dev)
 
 	case RTL_GIGA_MAC_VER_32:
 	case RTL_GIGA_MAC_VER_33:
-		rtl_hw_start_8168e(ioaddr, pdev);
+		rtl_hw_start_8168e_1(ioaddr, pdev);
+		break;
+	case RTL_GIGA_MAC_VER_34:
+		rtl_hw_start_8168e_2(ioaddr, pdev);
 		break;
 
 	default:
-- 
1.7.3.2


^ permalink raw reply related

* Re: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05  9:50 UTC (permalink / raw)
  To: David Miller; +Cc: victor, netdev
In-Reply-To: <20110705.004640.2280047904273455681.davem@davemloft.net>

Le mardi 05 juillet 2011 à 00:46 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 05 Jul 2011 08:21:15 +0200
> 
> > rxhash is 0 unless skb_get_rxhash() was called, or some NIC set it in RX
> > path.
> 
> CONFIG_RPS is effectively on all the time for SMP builds.
> 
> If you want to make it a hard enable in that situation,
> I fully support such a change. :-)
> 

CONFIG_RPS can be on, but skb->rxhash still 0 by default on tg3 for
example.

get_rps_cpu() wont call skb_get_rxhash() if RPS/RFS is not setup on
rxqueue (rps_map == NULL and rps_flow_table = NULL)

You really have to call skb_get_rxhash() yourself to make sure rxhash is
set (if not already done)

Just replace   skb->rxhash by skb_get_rxhash(skb)




^ permalink raw reply

* Re: [PATCH 4/4] packet: Add pre-defragmentation support for ipv4 fanouts.
From: David Miller @ 2011-07-05 10:06 UTC (permalink / raw)
  To: victor; +Cc: netdev
In-Reply-To: <4E12DAED.3070408@inliniac.net>

From: Victor Julien <victor@inliniac.net>
Date: Tue, 05 Jul 2011 11:35:41 +0200

> On 07/05/2011 11:04 AM, David Miller wrote:
>> 
>> The skb->rxhash cannot be properly computed if the
>> packet is a fragment.  To alleviate this, allow the
>> AF_PACKET client to ask for defragmentation to be
>> done at demux time.
> 
> Does the same limitation exist for IPv6 packets?

Yes.

> If I read the patch correctly the defrag is only done for IPv4.

Right, one thing at a time :-)

^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: David Miller @ 2011-07-05 10:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: victor, netdev
In-Reply-To: <1309859424.2271.5.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 05 Jul 2011 11:50:24 +0200

> You really have to call skb_get_rxhash() yourself to make sure rxhash is
> set (if not already done)
> 
> Just replace   skb->rxhash by skb_get_rxhash(skb)

Read v2 of my patch :-)

^ permalink raw reply

* Re: [RFC][PATCH 1/2] net: vlan: enable GSO by default
From: David Miller @ 2011-07-05 10:08 UTC (permalink / raw)
  To: shanwei; +Cc: kaber, netdev, mirq-linux, bhutchings
In-Reply-To: <4E12DAE3.8050904@cn.fujitsu.com>

From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Tue, 05 Jul 2011 17:35:31 +0800

> Ping....

I hope you're not waiting for something from me.  If you mean for me
to seriously review and apply this patch, resubmit it without "RFC" in
subject.

^ permalink raw reply

* Re: [PATCH 4/4] packet: Add pre-defragmentation support for ipv4 fanouts.
From: Victor Julien @ 2011-07-05 10:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110705.030639.1088590555237733790.davem@davemloft.net>

On 07/05/2011 12:06 PM, David Miller wrote:
>> If I read the patch correctly the defrag is only done for IPv4.
> 
> Right, one thing at a time :-)
> 

Cool, just checking :)

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


^ permalink raw reply

* Re: libpcap and tc filters
From: jamal @ 2011-07-05 10:56 UTC (permalink / raw)
  To: Adam Katz; +Cc: netdev
In-Reply-To: <CAA0qwj4FUpfdux73WCxFcjX0xp-zNtMwncRRmN7ea1Kr9FX3Kw@mail.gmail.com>

On Mon, 2011-07-04 at 17:16 +0300, Adam Katz wrote:
> thanks a lot
> I can tell you I'm not the first one to have this problem, but it
> doesn't seem to be common... but that's probably because people
> usually don't try to shape traffic sent using libpcap.
> 
> I found this post from 2003 on lartc with the exact same problem but
> with no replies:
> http://mailman.ds9a.nl/pipermail/lartc/2003q4/011004.html

I downloaded tcpreplay and reproduced the issue with your rules.
Will look into it..

cheers,
jamal


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox