* Re: [PATCH] ipv6: Fix Makefile offload objects
From: Vlad Yasevich @ 2012-12-17 15:40 UTC (permalink / raw)
To: Simon Arlott; +Cc: David Miller, Linux Kernel Mailing List, netdev
In-Reply-To: <50CDFB36.1020604@simon.arlott.org.uk>
On 12/16/2012 11:47 AM, Simon Arlott wrote:
> The following commit breaks IPv6 TCP transmission for me:
> Commit 75fe83c32248d99e6d5fe64155e519b78bb90481
> Author: Vlad Yasevich <vyasevic@redhat.com>
> Date: Fri Nov 16 09:41:21 2012 +0000
> ipv6: Preserve ipv6 functionality needed by NET
>
> This patch fixes the typo "ipv6_offload" which should be
> "ipv6-offload".
>
> I don't know why not including the offload modules should
> break TCP. Disabling all offload options on the NIC didn't
> help. Outgoing pulseaudio traffic kept stalling.
Did you restart your application to restart the socket?\
The trouble is that whe GSO is turned on, we try to perform
it on output. If the output path can't find the gso handler
for the protocol (in your case tcp over IPv6), it drops the
packet. This causes tcp to retransmit eventually withough GSO.
If you were in a VM, GSO is always used even though you might
disable it on the interface with ethtool. The only way I've been
able to disable it when using virtio driver is by passing gso=0
parameter to the module.
-vlad
>
> Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
> ---
> net/ipv6/Makefile | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index 2068ac4..4ea2448 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -41,6 +41,6 @@ obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o
> obj-$(CONFIG_IPV6_GRE) += ip6_gre.o
>
> obj-y += addrconf_core.o exthdrs_core.o
> -obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6_offload)
> +obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload)
>
> obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o
>
^ permalink raw reply
* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Shawn Guo @ 2012-12-17 15:14 UTC (permalink / raw)
To: Frank Li
Cc: Sascha Hauer, Frank Li, lznua, richardcochran, linux-arm-kernel,
netdev, davem
In-Reply-To: <CAHrpEqTVuSR_-Tpdzb98=VJbg7grSFvSQ9xA6mPsHpGb7RvNCg@mail.gmail.com>
Hi Sascha,
On Mon, Dec 17, 2012 at 10:48:31PM +0800, Frank Li wrote:
> > I don't know how to continue from here. Since the whole patch doesn't
> > seem to reviewed very much I tend to say we should revert it for now and
> > let Frank redo it for the next merge window.
> >
> > Other opinions?
>
> Can we just disable CONFIG_FEC_PTP defaut instead of revert whole patch?
>
To be clear, the following is what Frank meant. Since Frank is out of
office for some time, I will send this immediate fix to David, if you
are fine with it.
Shawn
diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index 5ba6e1c..d1edb2e 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -96,7 +96,6 @@ config FEC_PTP
bool "PTP Hardware Clock (PHC)"
depends on FEC && ARCH_MXC
select PTP_1588_CLOCK
- default y if SOC_IMX6Q
--help---
Say Y here if you want to use PTP Hardware Clock (PHC) in the
driver. Only the basic clock operations have been implemented.
^ permalink raw reply related
* Re: Do I need to skb_put() Ethernet frames to a minimum of 60 bytes?
From: Arvid Brodin @ 2012-12-17 15:15 UTC (permalink / raw)
To: Nicolas Ferre
Cc: Ben Hutchings, netdev@vger.kernel.org, Eric Dumazet,
linux-arm-kernel
In-Reply-To: <50CF216F.2010107@atmel.com>
On 2012-12-17 14:43, Nicolas Ferre wrote:
> On 08/21/2012 07:34 PM, Arvid Brodin :
>> On 2012-08-14 22:35, Ben Hutchings wrote:
>>> On Tue, 2012-08-14 at 18:53 +0000, Arvid Brodin wrote:
>>>> Hi,
>>>>
>>>> If I create an sk_buff with a payload of less than 28 bytes (ethheader + data),
>>>> and send it using the cadence/macb (Ethernet) driver, I get
>>>>
>>>> eth0: TX underrun, resetting buffers
>>>>
>>>> Now I know the minimum Ethernet frame size is 64 bytes (including the 4-byte
>>>> FCS), but whose responsibility is it to pad the frame to this size if necessary?
>>>> Mine or the driver's - i.e. should I just skb_put() to the minimum size or
>>>> should I report the underrun as a driver bug?
>>>
>>> If the hardware doesn't pad frames automatically then it's the driver's
>>> reponsibility to do so.
>>>
>>
>> Nicolas, can you take a look at this? At the moment I'm using the following change
>> in macb.c to avoid TX underruns on short packages:
>>
>> --- a/drivers/net/ethernet/cadence/macb.c 2012-05-04 19:14:41.927719667 +0200
>> +++ b/drivers/net/ethernet/cadence/macb.c 2012-08-21 19:22:40.063739049 +0200
>> @@ -618,6 +618,7 @@ static void macb_poll_controller(struct
>> }
>> #endif
>>
>> +#define MIN_ETHFRAME_LEN 60
>> static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> {
>> struct macb *bp = netdev_priv(dev);
>> @@ -635,6 +636,12 @@ static int macb_start_xmit(struct sk_buf
>> printk("\n");
>> #endif
>>
>> + if (skb->len < MIN_ETHFRAME_LEN) {
>> + /* Pad skb to minium Ethernet frame size */
>> + if (skb_tailroom(skb) >= MIN_ETHFRAME_LEN - skb->len)
>> + memset(skb_put(skb, MIN_ETHFRAME_LEN - skb->len), 0,
>> + MIN_ETHFRAME_LEN - skb->len);
>> + }
>> len = skb->len;
>> spin_lock_irqsave(&bp->lock, flags);
>>
>>
>> ... but as you can see this is limited to linear skbs which has been allocated with
>> enough tailroom. Perhaps there are better ways to fix the problem? (Maybe the hardware
>> is actually doing the padding already and the problem has to do with the way the DMA
>> transfer is set up?)
>
> I come back to this issue. It seems to me that the macb Cadence IP is
> padding automatically a too little packet. It is the usual behavior
> unless you specify otherwise in the CTRL register embedded in the tx
> descriptor. I also verified this with wireshark on both ICMP and UDP
> packets.
>
> The error that you are experiencing is on at91sam9260 or at91sam9263
> SoCs, am I right?
No, this was on an AVR32 AP7000 board.
I believe this is what I did to solve the issue (patch for linux-2.6.37):
diff -Nurp linux-2.6.37-001-bsa400/drivers/net//macb.c
linux-2.6.37-macb-hsr/drivers/net//macb.c
--- linux-2.6.37-orig/drivers/net//macb.c 2012-09-16 22:41:02.746754672 +0200
+++ linux-2.6.37-macb/drivers/net//macb.c 2012-09-17 00:34:35.161389720 +0200
@@ -376,8 +379,9 @@ static void macb_tx(struct macb *bp)
rmb();
- dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
- DMA_TO_DEVICE);
+ dma_unmap_single(&bp->pdev->dev, rp->mapping,
+ max(skb->len, (unsigned int) ETH_ZLEN),
+ DMA_TO_DEVICE);
rp->skb = NULL;
dev_kfree_skb_irq(skb);
}
@@ -413,7 +417,8 @@ static void macb_tx(struct macb *bp)
dev_dbg(&bp->pdev->dev, "skb %u (data %p) TX complete\n",
tail, skb->data);
- dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
+ dma_unmap_single(&bp->pdev->dev, rp->mapping,
+ max(skb->len, (unsigned int) ETH_ZLEN),
DMA_TO_DEVICE);
bp->stats.tx_packets++;
bp->stats.tx_bytes += skb->len;
@@ -675,7 +680,10 @@ static int macb_start_xmit(struct sk_buf
printk("\n");
#endif
- len = skb->len;
+ if (skb_padto(skb, ETH_ZLEN) != 0)
+ return NETDEV_TX_OK; /* There is no NETDEV_TX_FAIL... */
+
+ len = max(skb->len, (unsigned int) ETH_ZLEN);
spin_lock_irqsave(&bp->lock, flags);
/* This is a hard error, log it. */
--
Arvid Brodin | Consultant (Linux)
XDIN AB | Knarrarnäsgatan 7 | SE-164 40 Kista | Sweden | xdin.com
^ permalink raw reply
* [PATCH 14/15] openvswitch: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
To: Jesse Gross, David S. Miller, dev, netdev, linux-kernel; +Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>
Switch openvswitch to use the new hashtable implementation. This reduces the
amount of generic unrelated code in openvswitch.
This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
net/openvswitch/vport.c | 35 ++++++++++++-----------------------
1 file changed, 12 insertions(+), 23 deletions(-)
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 70af0be..a946529 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -28,6 +28,7 @@
#include <linux/rtnetlink.h>
#include <linux/compat.h>
#include <net/net_namespace.h>
+#include <linux/hashtable.h>
#include "datapath.h"
#include "vport.h"
@@ -41,8 +42,8 @@ static const struct vport_ops *vport_ops_list[] = {
};
/* Protected by RCU read lock for reading, RTNL lock for writing. */
-static struct hlist_head *dev_table;
-#define VPORT_HASH_BUCKETS 1024
+#define VPORT_HASH_BITS 10
+static DEFINE_HASHTABLE(dev_table, VPORT_HASH_BITS);
/**
* ovs_vport_init - initialize vport subsystem
@@ -51,11 +52,6 @@ static struct hlist_head *dev_table;
*/
int ovs_vport_init(void)
{
- dev_table = kzalloc(VPORT_HASH_BUCKETS * sizeof(struct hlist_head),
- GFP_KERNEL);
- if (!dev_table)
- return -ENOMEM;
-
return 0;
}
@@ -66,13 +62,6 @@ int ovs_vport_init(void)
*/
void ovs_vport_exit(void)
{
- kfree(dev_table);
-}
-
-static struct hlist_head *hash_bucket(struct net *net, const char *name)
-{
- unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
- return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
}
/**
@@ -84,13 +73,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
*/
struct vport *ovs_vport_locate(struct net *net, const char *name)
{
- struct hlist_head *bucket = hash_bucket(net, name);
struct vport *vport;
struct hlist_node *node;
+ int key = full_name_hash(name, strlen(name));
- hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
- if (!strcmp(name, vport->ops->get_name(vport)) &&
- net_eq(ovs_dp_get_net(vport->dp), net))
+ hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
+ if (!strcmp(name, vport->ops->get_name(vport)))
return vport;
return NULL;
@@ -174,7 +162,8 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
for (i = 0; i < ARRAY_SIZE(vport_ops_list); i++) {
if (vport_ops_list[i]->type == parms->type) {
- struct hlist_head *bucket;
+ int key;
+ const char *name;
vport = vport_ops_list[i]->create(parms);
if (IS_ERR(vport)) {
@@ -182,9 +171,9 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
goto out;
}
- bucket = hash_bucket(ovs_dp_get_net(vport->dp),
- vport->ops->get_name(vport));
- hlist_add_head_rcu(&vport->hash_node, bucket);
+ name = vport->ops->get_name(vport);
+ key = full_name_hash(name, strlen(name));
+ hash_add_rcu(dev_table, &vport->hash_node, key);
return vport;
}
}
@@ -225,7 +214,7 @@ void ovs_vport_del(struct vport *vport)
{
ASSERT_RTNL();
- hlist_del_rcu(&vport->hash_node);
+ hash_del_rcu(&vport->hash_node);
vport->ops->destroy(vport);
}
--
1.8.0
^ permalink raw reply related
* [PATCH 13/15] net,rds: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
To: Venkat Venkatsubra, David S. Miller, rds-devel, netdev,
linux-kernel
Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>
Switch rds to use the new hashtable implementation. This reduces the amount of
generic unrelated code in rds.
This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
net/rds/bind.c | 20 +++++------
net/rds/connection.c | 100 ++++++++++++++++++++++-----------------------------
2 files changed, 53 insertions(+), 67 deletions(-)
diff --git a/net/rds/bind.c b/net/rds/bind.c
index 637bde5..a99e524 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -36,16 +36,16 @@
#include <linux/if_arp.h>
#include <linux/jhash.h>
#include <linux/ratelimit.h>
+#include <linux/hashtable.h>
#include "rds.h"
-#define BIND_HASH_SIZE 1024
-static struct hlist_head bind_hash_table[BIND_HASH_SIZE];
+#define BIND_HASH_BITS 10
+static DEFINE_HASHTABLE(bind_hash_table, BIND_HASH_BITS);
static DEFINE_SPINLOCK(rds_bind_lock);
-static struct hlist_head *hash_to_bucket(__be32 addr, __be16 port)
+static u32 rds_hash(__be32 addr, __be16 port)
{
- return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
- (BIND_HASH_SIZE - 1));
+ return jhash_2words((u32)addr, (u32)port, 0);
}
static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
@@ -53,12 +53,12 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
{
struct rds_sock *rs;
struct hlist_node *node;
- struct hlist_head *head = hash_to_bucket(addr, port);
+ u32 key = rds_hash(addr, port);
u64 cmp;
u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
rcu_read_lock();
- hlist_for_each_entry_rcu(rs, node, head, rs_bound_node) {
+ hash_for_each_possible_rcu(bind_hash_table, rs, node, rs_bound_node, key) {
cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
be16_to_cpu(rs->rs_bound_port);
@@ -74,13 +74,13 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
* make sure our addr and port are set before
* we are added to the list, other people
* in rcu will find us as soon as the
- * hlist_add_head_rcu is done
+ * hash_add_rcu is done
*/
insert->rs_bound_addr = addr;
insert->rs_bound_port = port;
rds_sock_addref(insert);
- hlist_add_head_rcu(&insert->rs_bound_node, head);
+ hash_add_rcu(bind_hash_table, &insert->rs_bound_node, key);
}
return NULL;
}
@@ -152,7 +152,7 @@ void rds_remove_bound(struct rds_sock *rs)
rs, &rs->rs_bound_addr,
ntohs(rs->rs_bound_port));
- hlist_del_init_rcu(&rs->rs_bound_node);
+ hash_del_rcu(&rs->rs_bound_node);
rds_sock_put(rs);
rs->rs_bound_addr = 0;
}
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 9e07c75..a9afcb8 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -34,28 +34,24 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/export.h>
+#include <linux/hashtable.h>
#include <net/inet_hashtables.h>
#include "rds.h"
#include "loop.h"
#define RDS_CONNECTION_HASH_BITS 12
-#define RDS_CONNECTION_HASH_ENTRIES (1 << RDS_CONNECTION_HASH_BITS)
-#define RDS_CONNECTION_HASH_MASK (RDS_CONNECTION_HASH_ENTRIES - 1)
/* converting this to RCU is a chore for another day.. */
static DEFINE_SPINLOCK(rds_conn_lock);
static unsigned long rds_conn_count;
-static struct hlist_head rds_conn_hash[RDS_CONNECTION_HASH_ENTRIES];
+static DEFINE_HASHTABLE(rds_conn_hash, RDS_CONNECTION_HASH_BITS);
static struct kmem_cache *rds_conn_slab;
-static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
+static unsigned long rds_conn_hashfn(__be32 laddr, __be32 faddr)
{
/* Pass NULL, don't need struct net for hash */
- unsigned long hash = inet_ehashfn(NULL,
- be32_to_cpu(laddr), 0,
- be32_to_cpu(faddr), 0);
- return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK];
+ return inet_ehashfn(NULL, be32_to_cpu(laddr), 0, be32_to_cpu(faddr), 0);
}
#define rds_conn_info_set(var, test, suffix) do { \
@@ -64,14 +60,14 @@ static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
} while (0)
/* rcu read lock must be held or the connection spinlock */
-static struct rds_connection *rds_conn_lookup(struct hlist_head *head,
- __be32 laddr, __be32 faddr,
+static struct rds_connection *rds_conn_lookup(__be32 laddr, __be32 faddr,
struct rds_transport *trans)
{
struct rds_connection *conn, *ret = NULL;
struct hlist_node *pos;
+ unsigned long key = rds_conn_hashfn(laddr, faddr);
- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
+ hash_for_each_possible_rcu(rds_conn_hash, conn, pos, c_hash_node, key) {
if (conn->c_faddr == faddr && conn->c_laddr == laddr &&
conn->c_trans == trans) {
ret = conn;
@@ -117,13 +113,12 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
int is_outgoing)
{
struct rds_connection *conn, *parent = NULL;
- struct hlist_head *head = rds_conn_bucket(laddr, faddr);
struct rds_transport *loop_trans;
unsigned long flags;
int ret;
rcu_read_lock();
- conn = rds_conn_lookup(head, laddr, faddr, trans);
+ conn = rds_conn_lookup(laddr, faddr, trans);
if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport &&
!is_outgoing) {
/* This is a looped back IB connection, and we're
@@ -224,13 +219,15 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
/* Creating normal conn */
struct rds_connection *found;
- found = rds_conn_lookup(head, laddr, faddr, trans);
+ found = rds_conn_lookup(laddr, faddr, trans);
if (found) {
trans->conn_free(conn->c_transport_data);
kmem_cache_free(rds_conn_slab, conn);
conn = found;
} else {
- hlist_add_head_rcu(&conn->c_hash_node, head);
+ unsigned long key = rds_conn_hashfn(laddr, faddr);
+
+ hash_add_rcu(rds_conn_hash, &conn->c_hash_node, key);
rds_cong_add_conn(conn);
rds_conn_count++;
}
@@ -303,7 +300,7 @@ void rds_conn_shutdown(struct rds_connection *conn)
* conn - the reconnect is always triggered by the active peer. */
cancel_delayed_work_sync(&conn->c_conn_w);
rcu_read_lock();
- if (!hlist_unhashed(&conn->c_hash_node)) {
+ if (hash_hashed(&conn->c_hash_node)) {
rcu_read_unlock();
rds_queue_reconnect(conn);
} else {
@@ -329,7 +326,7 @@ void rds_conn_destroy(struct rds_connection *conn)
/* Ensure conn will not be scheduled for reconnect */
spin_lock_irq(&rds_conn_lock);
- hlist_del_init_rcu(&conn->c_hash_node);
+ hash_del(&conn->c_hash_node);
spin_unlock_irq(&rds_conn_lock);
synchronize_rcu();
@@ -375,7 +372,6 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,
struct rds_info_lengths *lens,
int want_send)
{
- struct hlist_head *head;
struct hlist_node *pos;
struct list_head *list;
struct rds_connection *conn;
@@ -388,27 +384,24 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,
rcu_read_lock();
- for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
- i++, head++) {
- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
- if (want_send)
- list = &conn->c_send_queue;
- else
- list = &conn->c_retrans;
-
- spin_lock_irqsave(&conn->c_lock, flags);
-
- /* XXX too lazy to maintain counts.. */
- list_for_each_entry(rm, list, m_conn_item) {
- total++;
- if (total <= len)
- rds_inc_info_copy(&rm->m_inc, iter,
- conn->c_laddr,
- conn->c_faddr, 0);
- }
-
- spin_unlock_irqrestore(&conn->c_lock, flags);
+ hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+ if (want_send)
+ list = &conn->c_send_queue;
+ else
+ list = &conn->c_retrans;
+
+ spin_lock_irqsave(&conn->c_lock, flags);
+
+ /* XXX too lazy to maintain counts.. */
+ list_for_each_entry(rm, list, m_conn_item) {
+ total++;
+ if (total <= len)
+ rds_inc_info_copy(&rm->m_inc, iter,
+ conn->c_laddr,
+ conn->c_faddr, 0);
}
+
+ spin_unlock_irqrestore(&conn->c_lock, flags);
}
rcu_read_unlock();
@@ -438,7 +431,6 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
size_t item_len)
{
uint64_t buffer[(item_len + 7) / 8];
- struct hlist_head *head;
struct hlist_node *pos;
struct rds_connection *conn;
size_t i;
@@ -448,23 +440,19 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
lens->nr = 0;
lens->each = item_len;
- for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
- i++, head++) {
- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
-
- /* XXX no c_lock usage.. */
- if (!visitor(conn, buffer))
- continue;
-
- /* We copy as much as we can fit in the buffer,
- * but we count all items so that the caller
- * can resize the buffer. */
- if (len >= item_len) {
- rds_info_copy(iter, buffer, item_len);
- len -= item_len;
- }
- lens->nr++;
+ hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+ /* XXX no c_lock usage.. */
+ if (!visitor(conn, buffer))
+ continue;
+
+ /* We copy as much as we can fit in the buffer,
+ * but we count all items so that the caller
+ * can resize the buffer. */
+ if (len >= item_len) {
+ rds_info_copy(iter, buffer, item_len);
+ len -= item_len;
}
+ lens->nr++;
}
rcu_read_unlock();
}
@@ -525,8 +513,6 @@ void rds_conn_exit(void)
{
rds_loop_exit();
- WARN_ON(!hlist_empty(rds_conn_hash));
-
kmem_cache_destroy(rds_conn_slab);
rds_info_deregister_func(RDS_INFO_CONNECTIONS, rds_conn_info);
--
1.8.0
^ permalink raw reply related
* [PATCH 10/15] net,l2tp: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
To: David S. Miller, James Chapman, Eric Dumazet, Dmitry Kozlov,
Sasha Levin, Chris Elston, Joe Perches, netdev, linux-kernel
Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>
Switch l2tp to use the new hashtable implementation. This reduces the amount
of generic unrelated code in l2tp.
This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
net/l2tp/l2tp_core.c | 140 +++++++++++++++++++-----------------------------
net/l2tp/l2tp_core.h | 15 ++++--
net/l2tp/l2tp_debugfs.c | 19 +++----
3 files changed, 74 insertions(+), 100 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1a9f372..0b369e4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -44,6 +44,7 @@
#include <linux/udp.h>
#include <linux/l2tp.h>
#include <linux/hash.h>
+#include <linux/hashtable.h>
#include <linux/sort.h>
#include <linux/file.h>
#include <linux/nsproxy.h>
@@ -107,8 +108,14 @@ static unsigned int l2tp_net_id;
struct l2tp_net {
struct list_head l2tp_tunnel_list;
spinlock_t l2tp_tunnel_list_lock;
- struct hlist_head l2tp_session_hlist[L2TP_HASH_SIZE_2];
- spinlock_t l2tp_session_hlist_lock;
+/*
+ * Session hash global list for L2TPv3.
+ * The session_id SHOULD be random according to RFC3931, but several
+ * L2TP implementations use incrementing session_ids. So we do a real
+ * hash on the session_id, rather than a simple bitmask.
+ */
+ DECLARE_HASHTABLE(l2tp_session_hash, L2TP_HASH_BITS_2);
+ spinlock_t l2tp_session_hash_lock;
};
static void l2tp_session_set_header_len(struct l2tp_session *session, int version);
@@ -156,30 +163,17 @@ do { \
#define l2tp_tunnel_dec_refcount(t) l2tp_tunnel_dec_refcount_1(t)
#endif
-/* Session hash global list for L2TPv3.
- * The session_id SHOULD be random according to RFC3931, but several
- * L2TP implementations use incrementing session_ids. So we do a real
- * hash on the session_id, rather than a simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash_2(struct l2tp_net *pn, u32 session_id)
-{
- return &pn->l2tp_session_hlist[hash_32(session_id, L2TP_HASH_BITS_2)];
-
-}
-
/* Lookup a session by id in the global session list
*/
static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
{
struct l2tp_net *pn = l2tp_pernet(net);
- struct hlist_head *session_list =
- l2tp_session_id_hash_2(pn, session_id);
struct l2tp_session *session;
struct hlist_node *walk;
rcu_read_lock_bh();
- hlist_for_each_entry_rcu(session, walk, session_list, global_hlist) {
+ hash_for_each_possible_rcu(pn->l2tp_session_hash, session, walk,
+ global_hlist, session_id) {
if (session->session_id == session_id) {
rcu_read_unlock_bh();
return session;
@@ -190,23 +184,10 @@ static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
return NULL;
}
-/* Session hash list.
- * The session_id SHOULD be random according to RFC2661, but several
- * L2TP implementations (Cisco and Microsoft) use incrementing
- * session_ids. So we do a real hash on the session_id, rather than a
- * simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash(struct l2tp_tunnel *tunnel, u32 session_id)
-{
- return &tunnel->session_hlist[hash_32(session_id, L2TP_HASH_BITS)];
-}
-
/* Lookup a session by id
*/
struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id)
{
- struct hlist_head *session_list;
struct l2tp_session *session;
struct hlist_node *walk;
@@ -217,15 +198,14 @@ struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunn
if (tunnel == NULL)
return l2tp_session_find_2(net, session_id);
- session_list = l2tp_session_id_hash(tunnel, session_id);
- read_lock_bh(&tunnel->hlist_lock);
- hlist_for_each_entry(session, walk, session_list, hlist) {
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each_possible(tunnel->session_hash, session, walk, hlist, session_id) {
if (session->session_id == session_id) {
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);
return session;
}
}
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);
return NULL;
}
@@ -238,17 +218,15 @@ struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth)
struct l2tp_session *session;
int count = 0;
- read_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
- hlist_for_each_entry(session, walk, &tunnel->session_hlist[hash], hlist) {
- if (++count > nth) {
- read_unlock_bh(&tunnel->hlist_lock);
- return session;
- }
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+ if (++count > nth) {
+ read_unlock_bh(&tunnel->hash_lock);
+ return session;
}
}
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);
return NULL;
}
@@ -265,12 +243,10 @@ struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname)
struct l2tp_session *session;
rcu_read_lock_bh();
- for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++) {
- hlist_for_each_entry_rcu(session, walk, &pn->l2tp_session_hlist[hash], global_hlist) {
- if (!strcmp(session->ifname, ifname)) {
- rcu_read_unlock_bh();
- return session;
- }
+ hash_for_each_rcu(pn->l2tp_session_hash, hash, walk, session, global_hlist) {
+ if (!strcmp(session->ifname, ifname)) {
+ rcu_read_unlock_bh();
+ return session;
}
}
@@ -1272,7 +1248,7 @@ end:
*/
static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
{
- int hash;
+ int hash, found = 0;
struct hlist_node *walk;
struct hlist_node *tmp;
struct l2tp_session *session;
@@ -1282,16 +1258,14 @@ static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
l2tp_info(tunnel, L2TP_MSG_CONTROL, "%s: closing all sessions...\n",
tunnel->name);
- write_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
-again:
- hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
- session = hlist_entry(walk, struct l2tp_session, hlist);
-
+ write_lock_bh(&tunnel->hash_lock);
+ do {
+ found = 0;
+ hash_for_each_safe(tunnel->session_hash, hash, walk, tmp, session, hlist) {
l2tp_info(session, L2TP_MSG_CONTROL,
"%s: closing session\n", session->name);
- hlist_del_init(&session->hlist);
+ hash_del(&session->hlist);
/* Since we should hold the sock lock while
* doing any unbinding, we need to release the
@@ -1302,14 +1276,14 @@ again:
if (session->ref != NULL)
(*session->ref)(session);
- write_unlock_bh(&tunnel->hlist_lock);
+ write_unlock_bh(&tunnel->hash_lock);
if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_del_init_rcu(&session->global_hlist);
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_del_rcu(&session->global_hlist);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
synchronize_rcu();
}
@@ -1319,17 +1293,17 @@ again:
if (session->deref != NULL)
(*session->deref)(session);
- write_lock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);
/* Now restart from the beginning of this hash
* chain. We always remove a session from the
* list so we are guaranteed to make forward
* progress.
*/
- goto again;
+ found = 1;
}
- }
- write_unlock_bh(&tunnel->hlist_lock);
+ } while (found);
+ write_unlock_bh(&tunnel->hash_lock);
}
/* Really kill the tunnel.
@@ -1576,7 +1550,7 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
tunnel->magic = L2TP_TUNNEL_MAGIC;
sprintf(&tunnel->name[0], "tunl %u", tunnel_id);
- rwlock_init(&tunnel->hlist_lock);
+ rwlock_init(&tunnel->hash_lock);
/* The net we belong to */
tunnel->l2tp_net = net;
@@ -1613,6 +1587,8 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
/* Add tunnel to our list */
INIT_LIST_HEAD(&tunnel->list);
+
+ hash_init(tunnel->session_hash);
atomic_inc(&l2tp_tunnel_count);
/* Bump the reference count. The tunnel context is deleted
@@ -1677,17 +1653,17 @@ void l2tp_session_free(struct l2tp_session *session)
BUG_ON(tunnel->magic != L2TP_TUNNEL_MAGIC);
/* Delete the session from the hash */
- write_lock_bh(&tunnel->hlist_lock);
- hlist_del_init(&session->hlist);
- write_unlock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);
+ hash_del(&session->hlist);
+ write_unlock_bh(&tunnel->hash_lock);
/* Unlink from the global hash if not L2TPv2 */
if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_del_init_rcu(&session->global_hlist);
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_del_rcu(&session->global_hlist);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
synchronize_rcu();
}
@@ -1800,19 +1776,17 @@ struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunn
sock_hold(tunnel->sock);
/* Add session to the tunnel's hash list */
- write_lock_bh(&tunnel->hlist_lock);
- hlist_add_head(&session->hlist,
- l2tp_session_id_hash(tunnel, session_id));
- write_unlock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);
+ hash_add(tunnel->session_hash, &session->hlist, session_id);
+ write_unlock_bh(&tunnel->hash_lock);
/* And to the global session list if L2TPv3 */
if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);
- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_add_head_rcu(&session->global_hlist,
- l2tp_session_id_hash_2(pn, session_id));
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_add(pn->l2tp_session_hash, &session->global_hlist, session_id);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
}
/* Ignore management session in session count value */
@@ -1831,15 +1805,13 @@ EXPORT_SYMBOL_GPL(l2tp_session_create);
static __net_init int l2tp_init_net(struct net *net)
{
struct l2tp_net *pn = net_generic(net, l2tp_net_id);
- int hash;
INIT_LIST_HEAD(&pn->l2tp_tunnel_list);
spin_lock_init(&pn->l2tp_tunnel_list_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++)
- INIT_HLIST_HEAD(&pn->l2tp_session_hlist[hash]);
+ hash_init(pn->l2tp_session_hash);
- spin_lock_init(&pn->l2tp_session_hlist_lock);
+ spin_lock_init(&pn->l2tp_session_hash_lock);
return 0;
}
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 56d583e..fc58c85 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -11,17 +11,17 @@
#ifndef _L2TP_CORE_H_
#define _L2TP_CORE_H_
+#include <linux/hashtable.h>
+
/* Just some random numbers */
#define L2TP_TUNNEL_MAGIC 0x42114DDA
#define L2TP_SESSION_MAGIC 0x0C04EB7D
/* Per tunnel, session hash table size */
#define L2TP_HASH_BITS 4
-#define L2TP_HASH_SIZE (1 << L2TP_HASH_BITS)
/* System-wide, session hash table size */
#define L2TP_HASH_BITS_2 8
-#define L2TP_HASH_SIZE_2 (1 << L2TP_HASH_BITS_2)
/* Debug message categories for the DEBUG socket option */
enum {
@@ -164,8 +164,15 @@ struct l2tp_tunnel_cfg {
struct l2tp_tunnel {
int magic; /* Should be L2TP_TUNNEL_MAGIC */
struct rcu_head rcu;
- rwlock_t hlist_lock; /* protect session_hlist */
- struct hlist_head session_hlist[L2TP_HASH_SIZE];
+ rwlock_t hash_lock; /* protect session_hash */
+/*
+ * Session hash list.
+ * The session_id SHOULD be random according to RFC2661, but several
+ * L2TP implementations (Cisco and Microsoft) use incrementing
+ * session_ids. So we do a real hash on the session_id, rather than a
+ * simple bitmask.
+*/
+ DECLARE_HASHTABLE(session_hash, L2TP_HASH_BITS);
/* hashed list of sessions,
* hashed by id */
u32 tunnel_id;
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index c3813bc..655f1fa 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -105,21 +105,16 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
int session_count = 0;
int hash;
struct hlist_node *walk;
- struct hlist_node *tmp;
+ struct l2tp_session *session;
- read_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
- hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
- struct l2tp_session *session;
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+ if (session->session_id == 0)
+ continue;
- session = hlist_entry(walk, struct l2tp_session, hlist);
- if (session->session_id == 0)
- continue;
-
- session_count++;
- }
+ session_count++;
}
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);
seq_printf(m, "\nTUNNEL %u peer %u", tunnel->tunnel_id, tunnel->peer_tunnel_id);
if (tunnel->sock) {
--
1.8.0
^ permalink raw reply related
* [PATCH 08/15] SUNRPC/cache: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
To: Trond Myklebust, J. Bruce Fields, David S. Miller, linux-nfs,
netdev, linux-kernel
Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>
Switch cache to use the new hashtable implementation. This reduces the amount
of generic unrelated code in the cache implementation.
This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Tested-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
net/sunrpc/cache.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 9afa439..d4539b6 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -28,6 +28,7 @@
#include <linux/workqueue.h>
#include <linux/mutex.h>
#include <linux/pagemap.h>
+#include <linux/hashtable.h>
#include <asm/ioctls.h>
#include <linux/sunrpc/types.h>
#include <linux/sunrpc/cache.h>
@@ -524,19 +525,18 @@ EXPORT_SYMBOL_GPL(cache_purge);
* it to be revisited when cache info is available
*/
-#define DFR_HASHSIZE (PAGE_SIZE/sizeof(struct list_head))
-#define DFR_HASH(item) ((((long)item)>>4 ^ (((long)item)>>13)) % DFR_HASHSIZE)
+#define DFR_HASH_BITS 9
#define DFR_MAX 300 /* ??? */
static DEFINE_SPINLOCK(cache_defer_lock);
static LIST_HEAD(cache_defer_list);
-static struct hlist_head cache_defer_hash[DFR_HASHSIZE];
+static DEFINE_HASHTABLE(cache_defer_hash, DFR_HASH_BITS);
static int cache_defer_cnt;
static void __unhash_deferred_req(struct cache_deferred_req *dreq)
{
- hlist_del_init(&dreq->hash);
+ hash_del(&dreq->hash);
if (!list_empty(&dreq->recent)) {
list_del_init(&dreq->recent);
cache_defer_cnt--;
@@ -545,10 +545,7 @@ static void __unhash_deferred_req(struct cache_deferred_req *dreq)
static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_head *item)
{
- int hash = DFR_HASH(item);
-
- INIT_LIST_HEAD(&dreq->recent);
- hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
+ hash_add(cache_defer_hash, &dreq->hash, (unsigned long)item);
}
static void setup_deferral(struct cache_deferred_req *dreq,
@@ -600,7 +597,7 @@ static void cache_wait_req(struct cache_req *req, struct cache_head *item)
* to clean up
*/
spin_lock(&cache_defer_lock);
- if (!hlist_unhashed(&sleeper.handle.hash)) {
+ if (hash_hashed(&sleeper.handle.hash)) {
__unhash_deferred_req(&sleeper.handle);
spin_unlock(&cache_defer_lock);
} else {
@@ -671,12 +668,11 @@ static void cache_revisit_request(struct cache_head *item)
struct cache_deferred_req *dreq;
struct list_head pending;
struct hlist_node *lp, *tmp;
- int hash = DFR_HASH(item);
INIT_LIST_HEAD(&pending);
spin_lock(&cache_defer_lock);
- hlist_for_each_entry_safe(dreq, lp, tmp, &cache_defer_hash[hash], hash)
+ hash_for_each_possible_safe(cache_defer_hash, dreq, lp, tmp, hash, (unsigned long)item)
if (dreq->item == item) {
__unhash_deferred_req(dreq);
list_add(&dreq->recent, &pending);
--
1.8.0
^ permalink raw reply related
* [PATCH 06/15] net,9p: use new hashtable implementation
From: Sasha Levin @ 2012-12-17 15:01 UTC (permalink / raw)
To: David S. Miller, Sasha Levin, Eric Van Hensbergen, Joe Perches,
netdev, linux-kernel
Cc: Sasha Levin
In-Reply-To: <1355756497-15834-1-git-send-email-sasha.levin@oracle.com>
Switch 9p error table to use the new hashtable implementation. This reduces
the amount of generic unrelated code in 9p.
This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
net/9p/error.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/net/9p/error.c b/net/9p/error.c
index 2ab2de7..a394b37 100644
--- a/net/9p/error.c
+++ b/net/9p/error.c
@@ -34,6 +34,7 @@
#include <linux/jhash.h>
#include <linux/errno.h>
#include <net/9p/9p.h>
+#include <linux/hashtable.h>
/**
* struct errormap - map string errors from Plan 9 to Linux numeric ids
@@ -50,8 +51,8 @@ struct errormap {
struct hlist_node list;
};
-#define ERRHASHSZ 32
-static struct hlist_head hash_errmap[ERRHASHSZ];
+#define ERR_HASH_BITS 5
+static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS);
/* FixMe - reduce to a reasonable size */
static struct errormap errmap[] = {
@@ -193,18 +194,14 @@ static struct errormap errmap[] = {
int p9_error_init(void)
{
struct errormap *c;
- int bucket;
-
- /* initialize hash table */
- for (bucket = 0; bucket < ERRHASHSZ; bucket++)
- INIT_HLIST_HEAD(&hash_errmap[bucket]);
+ u32 hash;
/* load initial error map into hash table */
for (c = errmap; c->name != NULL; c++) {
c->namelen = strlen(c->name);
- bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ;
+ hash = jhash(c->name, c->namelen, 0);
INIT_HLIST_NODE(&c->list);
- hlist_add_head(&c->list, &hash_errmap[bucket]);
+ hash_add(hash_errmap, &c->list, hash);
}
return 1;
@@ -223,13 +220,13 @@ int p9_errstr2errno(char *errstr, int len)
int errno;
struct hlist_node *p;
struct errormap *c;
- int bucket;
+ u32 hash;
errno = 0;
p = NULL;
c = NULL;
- bucket = jhash(errstr, len, 0) % ERRHASHSZ;
- hlist_for_each_entry(c, p, &hash_errmap[bucket], list) {
+ hash = jhash(errstr, len, 0);
+ hash_for_each_possible(hash_errmap, c, p, list, hash) {
if (c->namelen == len && !memcmp(c->name, errstr, len)) {
errno = c->val;
break;
--
1.8.0
^ permalink raw reply related
* Re: [PATCH 4/4] FEC: Add time stamping code and a PTP hardware clock
From: Frank Li @ 2012-12-17 14:48 UTC (permalink / raw)
To: Sascha Hauer
Cc: Frank Li, lznua, richardcochran, shawn.guo, linux-arm-kernel,
netdev, davem
In-Reply-To: <20121217091345.GA753@pengutronix.de>
2012/12/17 Sascha Hauer <s.hauer@pengutronix.de>:
> On Wed, Oct 31, 2012 at 12:25:31PM +0800, Frank Li wrote:
>> This patch adds a driver for the FEC(MX6) that offers time
>> stamping and a PTP haderware clock. Because FEC\ENET(MX6)
>> hardware frequency adjustment is complex, we have implemented
>> this in software by changing the multiplication factor of the
>> timecounter.
>>
>> Signed-off-by: Frank Li <Frank.Li@freescale.com>
>> ---
>> drivers/net/ethernet/freescale/Kconfig | 9 +
>> drivers/net/ethernet/freescale/Makefile | 1 +
>> drivers/net/ethernet/freescale/fec.c | 88 +++++++-
>> drivers/net/ethernet/freescale/fec.h | 38 +++
>> drivers/net/ethernet/freescale/fec_ptp.c | 386 ++++++++++++++++++++++++++++++
>> 5 files changed, 521 insertions(+), 1 deletions(-)
>> create mode 100644 drivers/net/ethernet/freescale/fec_ptp.c
>>
>> diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
>> index feff516..ff3be53 100644
>> --- a/drivers/net/ethernet/freescale/Kconfig
>> +++ b/drivers/net/ethernet/freescale/Kconfig
>> @@ -92,4 +92,13 @@ config GIANFAR
>> This driver supports the Gigabit TSEC on the MPC83xx, MPC85xx,
>> and MPC86xx family of chips, and the FEC on the 8540.
>>
>> +config FEC_PTP
>> + bool "PTP Hardware Clock (PHC)"
>> + depends on FEC
>> + select PPS
>> + select PTP_1588_CLOCK
>> + --help---
>> + Say Y here if you want to use PTP Hardware Clock (PHC) in the
>> + driver. Only the basic clock operations have been implemented.
>> +
>> endif # NET_VENDOR_FREESCALE
>> diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
>> index 3d1839a..d4d19b3 100644
>> --- a/drivers/net/ethernet/freescale/Makefile
>> +++ b/drivers/net/ethernet/freescale/Makefile
>> @@ -3,6 +3,7 @@
>> #
>>
>> obj-$(CONFIG_FEC) += fec.o
>> +obj-$(CONFIG_FEC_PTP) += fec_ptp.o
>> obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx.o
>> ifeq ($(CONFIG_FEC_MPC52xx_MDIO),y)
>> obj-$(CONFIG_FEC_MPC52xx) += fec_mpc52xx_phy.o
>> diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
>> index d0e1b33..2665162 100644
>> --- a/drivers/net/ethernet/freescale/fec.c
>> +++ b/drivers/net/ethernet/freescale/fec.c
>> @@ -280,6 +280,17 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> | BD_ENET_TX_LAST | BD_ENET_TX_TC);
>> bdp->cbd_sc = status;
>>
>> +#ifdef CONFIG_FEC_PTP
>
> This ifdef desert in the fec driver currently breaks all SoCs except
> i.MX6 in the imx_v6_v7_defconfig.
>
> Most of these could be fixed with something like if (fec_use_ptp(fep)),
>
>
>> #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
>> defined(CONFIG_M520x) || defined(CONFIG_M532x) || \
>> defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
>> @@ -88,6 +94,13 @@ struct bufdesc {
>> unsigned short cbd_datlen; /* Data length */
>> unsigned short cbd_sc; /* Control and status info */
>> unsigned long cbd_bufaddr; /* Buffer address */
>> +#ifdef CONFIG_FEC_PTP
>> + unsigned long cbd_esc;
>> + unsigned long cbd_prot;
>> + unsigned long cbd_bdu;
>> + unsigned long ts;
>> + unsigned short res0[4];
>> +#endif
>> };
>
> This one changes the layout of the hardware buffer description which is
> not so easy to fix.
Yes, it is not easy to fix if dynamic check mx6 or other devices.
>
> I don't know how to continue from here. Since the whole patch doesn't
> seem to reviewed very much I tend to say we should revert it for now and
> let Frank redo it for the next merge window.
>
> Other opinions?
Can we just disable CONFIG_FEC_PTP defaut instead of revert whole patch?
>
> Sascha
>
> --
> Pengutronix e.K. | |
> Industrial Linux Solutions | http://www.pengutronix.de/ |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] tuntap: reset network header before calling skb_get_rxhash()
From: Eric Dumazet @ 2012-12-17 14:39 UTC (permalink / raw)
To: Daniel Borkmann, David Miller; +Cc: Kirill A. Shutemov, maxk, netdev, dwmw2
In-Reply-To: <CAD6jFUTXP-XgwFeyN11bfGcbiit=aUOoP7sSm64WK+4s6tg2TQ@mail.gmail.com>
From: Eric Dumazet <edumazet@google.com>
Commit 499744209b2c (tuntap: dont use skb after netif_rx_ni(skb))
introduced another bug.
skb_get_rxhash() needs to access the network header, and it was
set for us in netif_rx_ni().
We need to reset network header or else skb_flow_dissect() behavior
is out of control.
Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/tun.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 255a9f5..173acf5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1199,6 +1199,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
}
+ skb_reset_network_header(skb);
rxhash = skb_get_rxhash(skb);
netif_rx_ni(skb);
^ permalink raw reply related
* Re: netconsole fun
From: Neil Horman @ 2012-12-17 14:20 UTC (permalink / raw)
To: Peter Hurley; +Cc: Cong Wang, netdev
In-Reply-To: <1355580838.2467.39.camel@thor>
On Sat, Dec 15, 2012 at 09:13:58AM -0500, Peter Hurley wrote:
> On Fri, 2012-12-14 at 09:20 -0500, Neil Horman wrote:
> > Ah! I'm sorry, I didn't realize this was really about getting netconsole up
> > early in the boot, rather than just getting it up robustly using the startup
> > script.
>
> Well, it's both but I should have been clearer here. Sorry about that.
>
> > If thats the case, then I would recommend that you modify the initramfs
> > to do something simmilar to the startup script (since thats where the netconsole
> > module will get loaded anyway). You can write a script there that will let you
> > specify the destination ip address and figure out the output dev based on the
> > routing tables. If you're using dracut to build your initramfs, then this
> > should be pretty straightforward.
>
> When I get some more free time I'll experiment with this approach.
>
> Just to clarify something from earlier in the discussion:
>
> On Thu, 2012-12-13 at 13:08 -0500, Neil Horman wrote:
> > On Thu, Dec 13, 2012 at 09:49:31AM -0500, Peter Hurley wrote:
> ....
> > > There is an unforeseen consequence of the patch: it breaks device
> > > renaming because the device will already be in use by netconsole. Which
> > > is the whole problem with userspace device renaming to begin with...
> > >
> > That is bad, but see above, the netconsole service can work around this for you,
> > allowing you to never have to specify a particular device at all.
>
> The breakage is a normal consequence of being able to load netconsole
> before the udev rules that do device renaming. The same thing would
> happen modifying initramfs.
>
> Basically, once netconsole attaches to a device, that device cannot be
> renamed. Unfortunately, the default udev behavior messes things up
> further because it will try to do this:
> eth0->eth1
> eth1->eth0
> which means neither device will be renamed.
>
> Maybe the net core should just implement persistent device names ;)
>
Theres no good way for the kernel to do that, as persistent naming in this case
is a matter of user policy, not kernel hardware management (i.e. do you want a
network name to follow a mac address, a pci slot, or the network its connected
to)? You can use smbios to get some modicum of persistent device naming
currently, but I don't recall if that requires udev rules to implement as well
You're best bet is to simply make your initramfs more robust. I understand what
you're saying regarding renaming after you've taken a reference on a device not
being possible, but you can run udev within the initramfs, and do your renaming
prior to your netconsole load.
Thanks
Neil
> Thanks again for all your time,
> Peter Hurley
>
>
^ permalink raw reply
* Re: [PATCH 3/3 v2] net/macb: Try to optimize struct macb layout
From: Ben Hutchings @ 2012-12-17 13:52 UTC (permalink / raw)
To: Nicolas Ferre
Cc: David S. Miller, netdev, linux-arm-kernel, linux-kernel,
Joachim Eastwood, Jean-Christophe PLAGNIOL-VILLARD,
Havard Skinnemoen
In-Reply-To: <cd54582850d50ec7ae391aae423b8982875a123a.1355748676.git.nicolas.ferre@atmel.com>
On Mon, 2012-12-17 at 14:01 +0100, Nicolas Ferre wrote:
> From: Havard Skinnemoen <havard@skinnemoen.net>
>
> Move TX-related fields to the top of the struct so that they end up on
> the same cache line. Move the NAPI struct below that since it is used
> from the interrupt handler. RX-related fields go below those.
> Move the spinlock before regs since they are usually used together.
>
> Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
> [nicolas.ferre@atmel.com: adapt to newer kernel]
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
> ---
> drivers/net/ethernet/cadence/macb.h | 25 +++++++++++++------------
> 1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index cef146f..aeeb729 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -548,38 +548,39 @@ struct macb_or_gem_ops {
> };
>
> struct macb {
> + spinlock_t lock;
> void __iomem *regs;
>
> + unsigned int tx_head;
> + unsigned int tx_tail;
> + struct macb_dma_desc *tx_ring;
> + struct macb_tx_skb *tx_skb;
> + dma_addr_t tx_ring_dma;
> + struct work_struct tx_error_task;
> +
> + struct napi_struct napi;
[...]
If this driver may be used on SMP systems them 'napi' should be declared
with the suffix ___cacheline_aligned_in_smp.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: Do I need to skb_put() Ethernet frames to a minimum of 60 bytes?
From: Nicolas Ferre @ 2012-12-17 13:43 UTC (permalink / raw)
To: Arvid Brodin
Cc: Ben Hutchings, netdev@vger.kernel.org, Eric Dumazet,
linux-arm-kernel
In-Reply-To: <5033C6B0.4060508@xdin.com>
On 08/21/2012 07:34 PM, Arvid Brodin :
> On 2012-08-14 22:35, Ben Hutchings wrote:
>> On Tue, 2012-08-14 at 18:53 +0000, Arvid Brodin wrote:
>>> Hi,
>>>
>>> If I create an sk_buff with a payload of less than 28 bytes (ethheader + data),
>>> and send it using the cadence/macb (Ethernet) driver, I get
>>>
>>> eth0: TX underrun, resetting buffers
>>>
>>> Now I know the minimum Ethernet frame size is 64 bytes (including the 4-byte
>>> FCS), but whose responsibility is it to pad the frame to this size if necessary?
>>> Mine or the driver's - i.e. should I just skb_put() to the minimum size or
>>> should I report the underrun as a driver bug?
>>
>> If the hardware doesn't pad frames automatically then it's the driver's
>> reponsibility to do so.
>>
>
> Nicolas, can you take a look at this? At the moment I'm using the following change
> in macb.c to avoid TX underruns on short packages:
>
> --- a/drivers/net/ethernet/cadence/macb.c 2012-05-04 19:14:41.927719667 +0200
> +++ b/drivers/net/ethernet/cadence/macb.c 2012-08-21 19:22:40.063739049 +0200
> @@ -618,6 +618,7 @@ static void macb_poll_controller(struct
> }
> #endif
>
> +#define MIN_ETHFRAME_LEN 60
> static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> struct macb *bp = netdev_priv(dev);
> @@ -635,6 +636,12 @@ static int macb_start_xmit(struct sk_buf
> printk("\n");
> #endif
>
> + if (skb->len < MIN_ETHFRAME_LEN) {
> + /* Pad skb to minium Ethernet frame size */
> + if (skb_tailroom(skb) >= MIN_ETHFRAME_LEN - skb->len)
> + memset(skb_put(skb, MIN_ETHFRAME_LEN - skb->len), 0,
> + MIN_ETHFRAME_LEN - skb->len);
> + }
> len = skb->len;
> spin_lock_irqsave(&bp->lock, flags);
>
>
> ... but as you can see this is limited to linear skbs which has been allocated with
> enough tailroom. Perhaps there are better ways to fix the problem? (Maybe the hardware
> is actually doing the padding already and the problem has to do with the way the DMA
> transfer is set up?)
I come back to this issue. It seems to me that the macb Cadence IP is
padding automatically a too little packet. It is the usual behavior
unless you specify otherwise in the CTRL register embedded in the tx
descriptor. I also verified this with wireshark on both ICMP and UDP
packets.
The error that you are experiencing is on at91sam9260 or at91sam9263
SoCs, am I right?
Best regards,
--
Nicolas Ferre
^ permalink raw reply
* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: Daniel Borkmann @ 2012-12-17 13:38 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: Eric Dumazet, David Miller, maxk, netdev, dwmw2
In-Reply-To: <20121217081121.GA24173@shutemov.name>
On Mon, Dec 17, 2012 at 9:11 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Sun, Dec 16, 2012 at 08:46:29PM -0800, Eric Dumazet wrote:
>> On Mon, 2012-12-17 at 03:46 +0200, Kirill A. Shutemov wrote:
>> > On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
>> > >
>> > > Already fixed in Linus's tree by:
>> > >
>> > > From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
>> >
>> > No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
>> > still see the issue.
>> >
>>
>> Coud you try the following one liner ?
>
> Works for me. So far no problems.
>
> Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name>
I can confirm the same, ran into the same issue when being connected
via VPN and seems stable now.
Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
^ permalink raw reply
* Re: tc ipt action
From: Jan Engelhardt @ 2012-12-17 13:28 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <50CF16FE.5040300@mojatatu.com>
On Monday 2012-12-17 13:58, Jamal Hadi Salim wrote:
> On 12-12-16 04:21 PM, Jan Engelhardt wrote:
>
>> If you have a preexisting clone of any linux tree, you can utilize
>> `git remote add ...` to only grab the deltas.
>
>It downloaded eventually. So looking at this quickly, basic
>question: is xtables2 different API wise from what we do today in
>act_ipt?
AFAICS, (one instance of) act_ipt today directly invokes (exactly one
instance of) a target. With act_xt2 as drafted, it instead invokes a
chain, which would
1. leave the construction of the target data and calling it
to the subsystems they conceptually belong to - the packet filter
2. lets you do matches, jumps and all that.
>Second: Are chain names unique system wide?
Good thing you ask. Chain names are unique within a netns, and this
act_xtables.c draft looks at the packet to get to know its netns, so
that seems fine.
However, your question also leads to looking at whether TC Actions
themselves are sufficiently netns-ified, and it seems this is _not_
the case. Am I right in the observation that variables like
"tcf_ipt_ht" are in fact global rather tha per-netns?
^ permalink raw reply
* [PATCH 3/3 v2] net/macb: Try to optimize struct macb layout
From: Nicolas Ferre @ 2012-12-17 13:01 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: linux-arm-kernel, linux-kernel, Joachim Eastwood,
Jean-Christophe PLAGNIOL-VILLARD, Havard Skinnemoen,
Nicolas Ferre
In-Reply-To: <cover.1355748676.git.nicolas.ferre@atmel.com>
From: Havard Skinnemoen <havard@skinnemoen.net>
Move TX-related fields to the top of the struct so that they end up on
the same cache line. Move the NAPI struct below that since it is used
from the interrupt handler. RX-related fields go below those.
Move the spinlock before regs since they are usually used together.
Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
drivers/net/ethernet/cadence/macb.h | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index cef146f..aeeb729 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -548,38 +548,39 @@ struct macb_or_gem_ops {
};
struct macb {
+ spinlock_t lock;
void __iomem *regs;
+ unsigned int tx_head;
+ unsigned int tx_tail;
+ struct macb_dma_desc *tx_ring;
+ struct macb_tx_skb *tx_skb;
+ dma_addr_t tx_ring_dma;
+ struct work_struct tx_error_task;
+
+ struct napi_struct napi;
+
unsigned int rx_tail;
unsigned int rx_prepared_head;
struct macb_dma_desc *rx_ring;
struct sk_buff **rx_skbuff;
void *rx_buffers;
size_t rx_buffer_size;
+ dma_addr_t rx_ring_dma;
+ dma_addr_t rx_buffers_dma;
- unsigned int tx_head, tx_tail;
- struct macb_dma_desc *tx_ring;
- struct macb_tx_skb *tx_skb;
+ struct macb_or_gem_ops macbgem_ops;
- spinlock_t lock;
struct platform_device *pdev;
struct clk *pclk;
struct clk *hclk;
struct net_device *dev;
- struct napi_struct napi;
- struct work_struct tx_error_task;
struct net_device_stats stats;
union {
struct macb_stats macb;
struct gem_stats gem;
} hw_stats;
- dma_addr_t rx_ring_dma;
- dma_addr_t tx_ring_dma;
- dma_addr_t rx_buffers_dma;
-
- struct macb_or_gem_ops macbgem_ops;
-
struct mii_bus *mii_bus;
struct phy_device *phy_dev;
unsigned int link;
--
1.8.0
^ permalink raw reply related
* [PATCH 2/3 v2] net/macb: change RX path for GEM
From: Nicolas Ferre @ 2012-12-17 13:01 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: Nicolas Ferre, Joachim Eastwood, Jean-Christophe PLAGNIOL-VILLARD,
linux-kernel, linux-arm-kernel
In-Reply-To: <cover.1355748676.git.nicolas.ferre@atmel.com>
GEM is able to adapt its DMA buffer size, so change
the RX path to take advantage of this possibility and
remove all kind of memcpy in this path.
This modification introduces function pointers for managing
differences between MACB and GEM adapter type.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
drivers/net/ethernet/cadence/macb.c | 308 ++++++++++++++++++++++++++++++------
drivers/net/ethernet/cadence/macb.h | 13 ++
2 files changed, 272 insertions(+), 49 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 50f8669..16ec751 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -33,7 +33,6 @@
#include "macb.h"
#define MACB_RX_BUFFER_SIZE 128
-#define GEM_RX_BUFFER_SIZE 2048
#define RX_BUFFER_MULTIPLE 64 /* bytes */
#define RX_RING_SIZE 512 /* must be power of 2 */
#define RX_RING_BYTES (sizeof(struct macb_dma_desc) * RX_RING_SIZE)
@@ -527,6 +526,155 @@ static void macb_tx_interrupt(struct macb *bp)
netif_wake_queue(bp->dev);
}
+static void gem_rx_refill(struct macb *bp)
+{
+ unsigned int entry;
+ struct sk_buff *skb;
+ struct macb_dma_desc *desc;
+ dma_addr_t paddr;
+
+ while (CIRC_SPACE(bp->rx_prepared_head, bp->rx_tail, RX_RING_SIZE) > 0) {
+ u32 addr, ctrl;
+
+ entry = macb_rx_ring_wrap(bp->rx_prepared_head);
+ desc = &bp->rx_ring[entry];
+
+ /* Make hw descriptor updates visible to CPU */
+ rmb();
+
+ addr = desc->addr;
+ ctrl = desc->ctrl;
+ bp->rx_prepared_head++;
+
+ if ((addr & MACB_BIT(RX_USED)))
+ continue;
+
+ if (bp->rx_skbuff[entry] == NULL) {
+ /* allocate sk_buff for this free entry in ring */
+ skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size);
+ if (unlikely(skb == NULL)) {
+ netdev_err(bp->dev,
+ "Unable to allocate sk_buff\n");
+ break;
+ }
+ bp->rx_skbuff[entry] = skb;
+
+ /* now fill corresponding descriptor entry */
+ paddr = dma_map_single(&bp->pdev->dev, skb->data,
+ bp->rx_buffer_size, DMA_FROM_DEVICE);
+
+ if (entry == RX_RING_SIZE - 1)
+ paddr |= MACB_BIT(RX_WRAP);
+ bp->rx_ring[entry].addr = paddr;
+ bp->rx_ring[entry].ctrl = 0;
+
+ /* properly align Ethernet header */
+ skb_reserve(skb, NET_IP_ALIGN);
+ }
+ }
+
+ /* Make descriptor updates visible to hardware */
+ wmb();
+
+ netdev_vdbg(bp->dev, "rx ring: prepared head %d, tail %d\n",
+ bp->rx_prepared_head, bp->rx_tail);
+}
+
+/* Mark DMA descriptors from begin up to and not including end as unused */
+static void discard_partial_frame(struct macb *bp, unsigned int begin,
+ unsigned int end)
+{
+ unsigned int frag;
+
+ for (frag = begin; frag != end; frag++) {
+ struct macb_dma_desc *desc = macb_rx_desc(bp, frag);
+ desc->addr &= ~MACB_BIT(RX_USED);
+ }
+
+ /* Make descriptor updates visible to hardware */
+ wmb();
+
+ /*
+ * When this happens, the hardware stats registers for
+ * whatever caused this is updated, so we don't have to record
+ * anything.
+ */
+}
+
+static int gem_rx(struct macb *bp, int budget)
+{
+ unsigned int len;
+ unsigned int entry;
+ struct sk_buff *skb;
+ struct macb_dma_desc *desc;
+ int count = 0;
+
+ while (count < budget) {
+ u32 addr, ctrl;
+
+ entry = macb_rx_ring_wrap(bp->rx_tail);
+ desc = &bp->rx_ring[entry];
+
+ /* Make hw descriptor updates visible to CPU */
+ rmb();
+
+ addr = desc->addr;
+ ctrl = desc->ctrl;
+
+ if (!(addr & MACB_BIT(RX_USED)))
+ break;
+
+ desc->addr &= ~MACB_BIT(RX_USED);
+ bp->rx_tail++;
+ count++;
+
+ if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
+ netdev_err(bp->dev,
+ "not whole frame pointed by descriptor\n");
+ bp->stats.rx_dropped++;
+ break;
+ }
+ skb = bp->rx_skbuff[entry];
+ if (unlikely(!skb)) {
+ netdev_err(bp->dev,
+ "inconsistent Rx descriptor chain\n");
+ bp->stats.rx_dropped++;
+ break;
+ }
+ /* now everything is ready for receiving packet */
+ bp->rx_skbuff[entry] = NULL;
+ len = MACB_BFEXT(RX_FRMLEN, ctrl);
+
+ netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
+
+ skb_put(skb, len);
+ addr = MACB_BF(RX_WADDR, MACB_BFEXT(RX_WADDR, addr));
+ dma_unmap_single(&bp->pdev->dev, addr,
+ len, DMA_FROM_DEVICE);
+
+ skb->protocol = eth_type_trans(skb, bp->dev);
+ skb_checksum_none_assert(skb);
+
+ bp->stats.rx_packets++;
+ bp->stats.rx_bytes += skb->len;
+
+#if defined(DEBUG) && defined(VERBOSE_DEBUG)
+ netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
+ skb->len, skb->csum);
+ print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
+ skb->mac_header, 16, true);
+ print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
+ skb->data, 32, true);
+#endif
+
+ netif_receive_skb(skb);
+ }
+
+ gem_rx_refill(bp);
+
+ return count;
+}
+
static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
unsigned int last_frag)
{
@@ -605,27 +753,6 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
return 0;
}
-/* Mark DMA descriptors from begin up to and not including end as unused */
-static void discard_partial_frame(struct macb *bp, unsigned int begin,
- unsigned int end)
-{
- unsigned int frag;
-
- for (frag = begin; frag != end; frag++) {
- struct macb_dma_desc *desc = macb_rx_desc(bp, frag);
- desc->addr &= ~MACB_BIT(RX_USED);
- }
-
- /* Make descriptor updates visible to hardware */
- wmb();
-
- /*
- * When this happens, the hardware stats registers for
- * whatever caused this is updated, so we don't have to record
- * anything.
- */
-}
-
static int macb_rx(struct macb *bp, int budget)
{
int received = 0;
@@ -686,7 +813,7 @@ static int macb_poll(struct napi_struct *napi, int budget)
netdev_vdbg(bp->dev, "poll: status = %08lx, budget = %d\n",
(unsigned long)status, budget);
- work_done = macb_rx(bp, budget);
+ work_done = bp->macbgem_ops.mog_rx(bp, budget);
if (work_done < budget) {
napi_complete(napi);
@@ -862,29 +989,63 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
-static void macb_init_rx_buffer_size(struct macb *bp)
+static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
{
if (!macb_is_gem(bp)) {
bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
} else {
- bp->rx_buffer_size = GEM_RX_BUFFER_SIZE;
+ bp->rx_buffer_size = size;
- if (bp->rx_buffer_size > PAGE_SIZE) {
- netdev_warn(bp->dev,
- "RX buffer cannot be bigger than PAGE_SIZE, shrinking\n");
- bp->rx_buffer_size = PAGE_SIZE;
- }
if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
- netdev_warn(bp->dev,
- "RX buffer must be multiple of %d bytes, shrinking\n",
+ netdev_dbg(bp->dev,
+ "RX buffer must be multiple of %d bytes, expanding\n",
RX_BUFFER_MULTIPLE);
bp->rx_buffer_size =
- rounddown(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
+ roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
}
- bp->rx_buffer_size = max(RX_BUFFER_MULTIPLE, GEM_RX_BUFFER_SIZE);
}
+
+ netdev_dbg(bp->dev, "mtu [%d] rx_buffer_size [%d]\n",
+ bp->dev->mtu, bp->rx_buffer_size);
}
+static void gem_free_rx_buffers(struct macb *bp)
+{
+ struct sk_buff *skb;
+ struct macb_dma_desc *desc;
+ dma_addr_t addr;
+ int i;
+
+ if (!bp->rx_skbuff)
+ return;
+
+ for (i = 0; i < RX_RING_SIZE; i++) {
+ skb = bp->rx_skbuff[i];
+
+ if (skb == NULL)
+ continue;
+
+ desc = &bp->rx_ring[i];
+ addr = MACB_BF(RX_WADDR, MACB_BFEXT(RX_WADDR, desc->addr));
+ dma_unmap_single(&bp->pdev->dev, addr, skb->len,
+ DMA_FROM_DEVICE);
+ dev_kfree_skb_any(skb);
+ skb = NULL;
+ }
+
+ kfree(bp->rx_skbuff);
+ bp->rx_skbuff = NULL;
+}
+
+static void macb_free_rx_buffers(struct macb *bp)
+{
+ if (bp->rx_buffers) {
+ dma_free_coherent(&bp->pdev->dev,
+ RX_RING_SIZE * bp->rx_buffer_size,
+ bp->rx_buffers, bp->rx_buffers_dma);
+ bp->rx_buffers = NULL;
+ }
+}
static void macb_free_consistent(struct macb *bp)
{
@@ -892,6 +1053,7 @@ static void macb_free_consistent(struct macb *bp)
kfree(bp->tx_skb);
bp->tx_skb = NULL;
}
+ bp->macbgem_ops.mog_free_rx_buffers(bp);
if (bp->rx_ring) {
dma_free_coherent(&bp->pdev->dev, RX_RING_BYTES,
bp->rx_ring, bp->rx_ring_dma);
@@ -902,12 +1064,37 @@ static void macb_free_consistent(struct macb *bp)
bp->tx_ring, bp->tx_ring_dma);
bp->tx_ring = NULL;
}
- if (bp->rx_buffers) {
- dma_free_coherent(&bp->pdev->dev,
- RX_RING_SIZE * bp->rx_buffer_size,
- bp->rx_buffers, bp->rx_buffers_dma);
- bp->rx_buffers = NULL;
- }
+}
+
+static int gem_alloc_rx_buffers(struct macb *bp)
+{
+ int size;
+
+ size = RX_RING_SIZE * sizeof(struct sk_buff *);
+ bp->rx_skbuff = kzalloc(size, GFP_KERNEL);
+ if (!bp->rx_skbuff)
+ return -ENOMEM;
+ else
+ netdev_dbg(bp->dev,
+ "Allocated %d RX struct sk_buff entries at %p\n",
+ RX_RING_SIZE, bp->rx_skbuff);
+ return 0;
+}
+
+static int macb_alloc_rx_buffers(struct macb *bp)
+{
+ int size;
+
+ size = RX_RING_SIZE * bp->rx_buffer_size;
+ bp->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size,
+ &bp->rx_buffers_dma, GFP_KERNEL);
+ if (!bp->rx_buffers)
+ return -ENOMEM;
+ else
+ netdev_dbg(bp->dev,
+ "Allocated RX buffers of %d bytes at %08lx (mapped %p)\n",
+ size, (unsigned long)bp->rx_buffers_dma, bp->rx_buffers);
+ return 0;
}
static int macb_alloc_consistent(struct macb *bp)
@@ -937,14 +1124,8 @@ static int macb_alloc_consistent(struct macb *bp)
"Allocated TX ring of %d bytes at %08lx (mapped %p)\n",
size, (unsigned long)bp->tx_ring_dma, bp->tx_ring);
- size = RX_RING_SIZE * bp->rx_buffer_size;
- bp->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size,
- &bp->rx_buffers_dma, GFP_KERNEL);
- if (!bp->rx_buffers)
+ if (bp->macbgem_ops.mog_alloc_rx_buffers(bp))
goto out_err;
- netdev_dbg(bp->dev,
- "Allocated RX buffers of %d bytes at %08lx (mapped %p)\n",
- size, (unsigned long)bp->rx_buffers_dma, bp->rx_buffers);
return 0;
@@ -953,6 +1134,21 @@ out_err:
return -ENOMEM;
}
+static void gem_init_rings(struct macb *bp)
+{
+ int i;
+
+ for (i = 0; i < TX_RING_SIZE; i++) {
+ bp->tx_ring[i].addr = 0;
+ bp->tx_ring[i].ctrl = MACB_BIT(TX_USED);
+ }
+ bp->tx_ring[TX_RING_SIZE - 1].ctrl |= MACB_BIT(TX_WRAP);
+
+ bp->rx_tail = bp->rx_prepared_head = bp->tx_head = bp->tx_tail = 0;
+
+ gem_rx_refill(bp);
+}
+
static void macb_init_rings(struct macb *bp)
{
int i;
@@ -1236,6 +1432,7 @@ EXPORT_SYMBOL_GPL(macb_set_rx_mode);
static int macb_open(struct net_device *dev)
{
struct macb *bp = netdev_priv(dev);
+ size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
int err;
netdev_dbg(bp->dev, "open\n");
@@ -1248,7 +1445,7 @@ static int macb_open(struct net_device *dev)
return -EAGAIN;
/* RX buffers initialization */
- macb_init_rx_buffer_size(bp);
+ macb_init_rx_buffer_size(bp, bufsz);
err = macb_alloc_consistent(bp);
if (err) {
@@ -1259,7 +1456,7 @@ static int macb_open(struct net_device *dev)
napi_enable(&bp->napi);
- macb_init_rings(bp);
+ bp->macbgem_ops.mog_init_rings(bp);
macb_init_hw(bp);
/* schedule a link state check */
@@ -1611,6 +1808,19 @@ static int __init macb_probe(struct platform_device *pdev)
dev->base_addr = regs->start;
+ /* setup appropriated routines according to adapter type */
+ if (macb_is_gem(bp)) {
+ bp->macbgem_ops.mog_alloc_rx_buffers = gem_alloc_rx_buffers;
+ bp->macbgem_ops.mog_free_rx_buffers = gem_free_rx_buffers;
+ bp->macbgem_ops.mog_init_rings = gem_init_rings;
+ bp->macbgem_ops.mog_rx = gem_rx;
+ } else {
+ bp->macbgem_ops.mog_alloc_rx_buffers = macb_alloc_rx_buffers;
+ bp->macbgem_ops.mog_free_rx_buffers = macb_free_rx_buffers;
+ bp->macbgem_ops.mog_init_rings = macb_init_rings;
+ bp->macbgem_ops.mog_rx = macb_rx;
+ }
+
/* Set MII management clock divider */
config = macb_mdc_clk_div(bp);
config |= macb_dbw(bp);
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 2681455..cef146f 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -538,11 +538,22 @@ struct gem_stats {
u32 rx_udp_checksum_errors;
};
+struct macb;
+
+struct macb_or_gem_ops {
+ int (*mog_alloc_rx_buffers)(struct macb *bp);
+ void (*mog_free_rx_buffers)(struct macb *bp);
+ void (*mog_init_rings)(struct macb *bp);
+ int (*mog_rx)(struct macb *bp, int budget);
+};
+
struct macb {
void __iomem *regs;
unsigned int rx_tail;
+ unsigned int rx_prepared_head;
struct macb_dma_desc *rx_ring;
+ struct sk_buff **rx_skbuff;
void *rx_buffers;
size_t rx_buffer_size;
@@ -567,6 +578,8 @@ struct macb {
dma_addr_t tx_ring_dma;
dma_addr_t rx_buffers_dma;
+ struct macb_or_gem_ops macbgem_ops;
+
struct mii_bus *mii_bus;
struct phy_device *phy_dev;
unsigned int link;
--
1.8.0
^ permalink raw reply related
* [PATCH 1/3 v2] net/macb: increase RX buffer size for GEM
From: Nicolas Ferre @ 2012-12-17 13:01 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: linux-arm-kernel, linux-kernel, Joachim Eastwood,
Jean-Christophe PLAGNIOL-VILLARD, Nicolas Ferre
In-Reply-To: <cover.1355748676.git.nicolas.ferre@atmel.com>
Macb Ethernet controller requires a RX buffer of 128 bytes. It is
highly sub-optimal for Gigabit-capable GEM that is able to use
a bigger DMA buffer. Change this constant and associated macros
with data stored in the private structure.
RX DMA buffer size has to be multiple of 64 bytes as indicated in
DMA Configuration Register specification.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
---
drivers/net/ethernet/cadence/macb.c | 45 ++++++++++++++++++++++++++++++-------
drivers/net/ethernet/cadence/macb.h | 1 +
2 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index a9b0830..50f8669 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -32,7 +32,9 @@
#include "macb.h"
-#define RX_BUFFER_SIZE 128
+#define MACB_RX_BUFFER_SIZE 128
+#define GEM_RX_BUFFER_SIZE 2048
+#define RX_BUFFER_MULTIPLE 64 /* bytes */
#define RX_RING_SIZE 512 /* must be power of 2 */
#define RX_RING_BYTES (sizeof(struct macb_dma_desc) * RX_RING_SIZE)
@@ -92,7 +94,7 @@ static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index)
static void *macb_rx_buffer(struct macb *bp, unsigned int index)
{
- return bp->rx_buffers + RX_BUFFER_SIZE * macb_rx_ring_wrap(index);
+ return bp->rx_buffers + bp->rx_buffer_size * macb_rx_ring_wrap(index);
}
void macb_set_hwaddr(struct macb *bp)
@@ -572,7 +574,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
skb_put(skb, len);
for (frag = first_frag; ; frag++) {
- unsigned int frag_len = RX_BUFFER_SIZE;
+ unsigned int frag_len = bp->rx_buffer_size;
if (offset + frag_len > len) {
BUG_ON(frag != last_frag);
@@ -580,7 +582,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int first_frag,
}
skb_copy_to_linear_data_offset(skb, offset,
macb_rx_buffer(bp, frag), frag_len);
- offset += RX_BUFFER_SIZE;
+ offset += bp->rx_buffer_size;
desc = macb_rx_desc(bp, frag);
desc->addr &= ~MACB_BIT(RX_USED);
@@ -860,6 +862,30 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
+static void macb_init_rx_buffer_size(struct macb *bp)
+{
+ if (!macb_is_gem(bp)) {
+ bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
+ } else {
+ bp->rx_buffer_size = GEM_RX_BUFFER_SIZE;
+
+ if (bp->rx_buffer_size > PAGE_SIZE) {
+ netdev_warn(bp->dev,
+ "RX buffer cannot be bigger than PAGE_SIZE, shrinking\n");
+ bp->rx_buffer_size = PAGE_SIZE;
+ }
+ if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE) {
+ netdev_warn(bp->dev,
+ "RX buffer must be multiple of %d bytes, shrinking\n",
+ RX_BUFFER_MULTIPLE);
+ bp->rx_buffer_size =
+ rounddown(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
+ }
+ bp->rx_buffer_size = max(RX_BUFFER_MULTIPLE, GEM_RX_BUFFER_SIZE);
+ }
+}
+
+
static void macb_free_consistent(struct macb *bp)
{
if (bp->tx_skb) {
@@ -878,7 +904,7 @@ static void macb_free_consistent(struct macb *bp)
}
if (bp->rx_buffers) {
dma_free_coherent(&bp->pdev->dev,
- RX_RING_SIZE * RX_BUFFER_SIZE,
+ RX_RING_SIZE * bp->rx_buffer_size,
bp->rx_buffers, bp->rx_buffers_dma);
bp->rx_buffers = NULL;
}
@@ -911,7 +937,7 @@ static int macb_alloc_consistent(struct macb *bp)
"Allocated TX ring of %d bytes at %08lx (mapped %p)\n",
size, (unsigned long)bp->tx_ring_dma, bp->tx_ring);
- size = RX_RING_SIZE * RX_BUFFER_SIZE;
+ size = RX_RING_SIZE * bp->rx_buffer_size;
bp->rx_buffers = dma_alloc_coherent(&bp->pdev->dev, size,
&bp->rx_buffers_dma, GFP_KERNEL);
if (!bp->rx_buffers)
@@ -936,7 +962,7 @@ static void macb_init_rings(struct macb *bp)
for (i = 0; i < RX_RING_SIZE; i++) {
bp->rx_ring[i].addr = addr;
bp->rx_ring[i].ctrl = 0;
- addr += RX_BUFFER_SIZE;
+ addr += bp->rx_buffer_size;
}
bp->rx_ring[RX_RING_SIZE - 1].addr |= MACB_BIT(RX_WRAP);
@@ -1046,7 +1072,7 @@ static void macb_configure_dma(struct macb *bp)
if (macb_is_gem(bp)) {
dmacfg = gem_readl(bp, DMACFG) & ~GEM_BF(RXBS, -1L);
- dmacfg |= GEM_BF(RXBS, RX_BUFFER_SIZE / 64);
+ dmacfg |= GEM_BF(RXBS, bp->rx_buffer_size / RX_BUFFER_MULTIPLE);
dmacfg |= GEM_BF(FBLDO, 16);
dmacfg |= GEM_BIT(TXPBMS) | GEM_BF(RXBMS, -1L);
gem_writel(bp, DMACFG, dmacfg);
@@ -1221,6 +1247,9 @@ static int macb_open(struct net_device *dev)
if (!bp->phy_dev)
return -EAGAIN;
+ /* RX buffers initialization */
+ macb_init_rx_buffer_size(bp);
+
err = macb_alloc_consistent(bp);
if (err) {
netdev_err(dev, "Unable to allocate DMA memory (error %d)\n",
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 570908b..2681455 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -544,6 +544,7 @@ struct macb {
unsigned int rx_tail;
struct macb_dma_desc *rx_ring;
void *rx_buffers;
+ size_t rx_buffer_size;
unsigned int tx_head, tx_tail;
struct macb_dma_desc *tx_ring;
--
1.8.0
^ permalink raw reply related
* [PATCH 0/3 v2] net/macb: RX path enhancement
From: Nicolas Ferre @ 2012-12-17 13:01 UTC (permalink / raw)
To: David S. Miller, netdev
Cc: linux-arm-kernel, linux-kernel, Joachim Eastwood,
Jean-Christophe PLAGNIOL-VILLARD, Nicolas Ferre
Hi,
Here is the patch series for modifying the RX path in macb driver.
This change applies on GEM variant of the Cadence IP and introduces
function pointers to match the path to the proper adapter. The move
to RX buffers adapted to MTU and that can be DMAed directly in SKB
is done in two steps but can be merged in a single patch.
v2: - gave up the idea of using non-coherent memory for
rx buffers
- addition of the struct macb layout optimization
Havard Skinnemoen (1):
net/macb: Try to optimize struct macb layout
Nicolas Ferre (2):
net/macb: increase RX buffer size for GEM
net/macb: change RX path for GEM
drivers/net/ethernet/cadence/macb.c | 323 +++++++++++++++++++++++++++++++-----
drivers/net/ethernet/cadence/macb.h | 35 ++--
2 files changed, 306 insertions(+), 52 deletions(-)
--
1.8.0
^ permalink raw reply
* [PATCH] bugfix: network namespace & device dummy
From: V. Lavrov @ 2012-12-17 13:01 UTC (permalink / raw)
To: netdev
If container has a network device dummyX (with lxc.network.type = phys), then it disappears from the system after you close the container.
The patch returns the device dummyX to the initial network namespace after container is closed.
Signed-off-by: Vitaly Lavrov <lve@guap.ru>
---
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index bab0158..efa990c 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -160,6 +160,41 @@ static struct rtnl_link_ops dummy_link_ops __read_mostly = {
module_param(numdummies, int, 0);
MODULE_PARM_DESC(numdummies, "Number of dummy pseudo devices");
+
+static void __net_exit dummy_net_exit(struct net *net) {
+ struct net_device *dev, *aux;
+ int err;
+
+ if(net == &init_net) return;
+
+ rtnl_lock();
+ for_each_netdev_safe(net, dev, aux) {
+ if(dev->rtnl_link_ops == &dummy_link_ops) {
+ err = dev_change_net_namespace(dev, &init_net, dev->name);
+ if(err) {
+ char fb_name[IFNAMSIZ];
+ printk (KERN_INFO "%s: dev_change_net_namespace(init_net,%s) err: %d\n",
+ __func__,dev->name,err);
+ snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
+ err = dev_change_net_namespace(dev, &init_net, dev->name);
+ if(err)
+ printk (KERN_INFO "%s: dev_change_net_namespace(%s,init_net,%s) err: %d\n",
+ __func__,dev->name,fb_name,err);
+ else
+ printk (KERN_INFO "%s: %s rename to %s\n",
+ __func__,dev->name,fb_name);
+
+ }
+ }
+ }
+ rtnl_unlock();
+}
+
+static struct pernet_operations __net_initdata dummy_net_ops = {
+ .exit = dummy_net_exit,
+};
+
+
static int __init dummy_init_one(void)
{
struct net_device *dev_dummy;
@@ -184,6 +219,10 @@ static int __init dummy_init_module(void)
{
int i, err = 0;
+ err = register_pernet_device(&dummy_net_ops);
+ if(err)
+ return err;
+
rtnl_lock();
err = __rtnl_link_register(&dummy_link_ops);
@@ -191,8 +230,10 @@ static int __init dummy_init_module(void)
err = dummy_init_one();
cond_resched();
}
- if (err < 0)
+ if (err < 0) {
__rtnl_link_unregister(&dummy_link_ops);
+ unregister_pernet_device(&dummy_net_ops);
+ }
rtnl_unlock();
return err;
@@ -201,6 +242,7 @@ static int __init dummy_init_module(void)
static void __exit dummy_cleanup_module(void)
{
rtnl_link_unregister(&dummy_link_ops);
+ unregister_pernet_device(&dummy_net_ops);
}
module_init(dummy_init_module);
--
^ permalink raw reply related
* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-17 12:58 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212162220270.20281@nerf07.vanv.qr>
On 12-12-16 04:21 PM, Jan Engelhardt wrote:
> If you have a preexisting clone of any linux tree, you can utilize
> `git remote add ...` to only grab the deltas.
It downloaded eventually.
So looking at this quickly, basic question: is xtables2 different API
wise from what we do today in act_ipt?
Second: Are chain names unique system wide? i.e at the moment we send
a hook and table selection?
The patch i have currently for the kernel tries to pursue an approach
that maximizes code reuse - depending on your answer I may go the
approach of having a separate act_xt and hope you can build on top of that.
cheers,
jamal
^ permalink raw reply
* bug? mac 00:00:00:00:00:00 with natsemi DP83815 after driver load
From: Roland Kletzing @ 2012-12-17 12:38 UTC (permalink / raw)
To: netdev
Hello,
i recently played with my older evo t20/wyse 3235le thin clients and flashed
a linux kernel into those, apparently there seems an issue with the natsemi
driver.
after driver load (natsemi.ko) eth0 has no valid mac adress, dmesg and
ifconfig shows just zero`s: 00:00:00:00:00:00.
despite that , the nic is working fine for me (in this test setup i set the
mac manually: ifconfig eth0 hw ether de:ad:be:ef:be:ef )
apparently, the driver fails to read the proper mac from the eeprom, as
"natsemi-diag -ee" (from nictools-pci in debian squeeze) shows, that there
is a valid "Ethernet MAC Station Address" stored inside the eeprom. (see
below)
looks like a driver bug !?
does anybody have a clue what`s going wrong here?
regards
roland
#lspci
00:00.0 Host bridge: Cyrix Corporation PCI Master
00:0f.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller
00:12.0 ISA bridge: Cyrix Corporation 5530 Legacy [Kahlua] (rev 30)
00:12.1 Bridge: Cyrix Corporation 5530 SMI [Kahlua]
00:12.2 IDE interface: Cyrix Corporation 5530 IDE [Kahlua]
00:12.3 Multimedia audio controller: Cyrix Corporation 5530 Audio [Kahlua]
00:12.4 VGA compatible controller: Cyrix Corporation 5530 Video [Kahlua]
00:13.0 USB Controller: Compaq Computer Corporation ZFMicro Chipset USB (rev
06)
#dmesg |egrep "natsemi|eth"
natsemi dp8381x driver, version 2.1, Sept 11, 2006
natsemi 0000:00:0f.0: setting latency timer to 64
natsemi eth0: NatSemi DP8381[56] at 0x4010000 (0000:00:0f.0),
00:00:00:00:00:00, IRQ 10, port TP.
eth0: DSPCFG accepted after 0 usec.
eth0: link up.
eth0: Setting full-duplex based on negotiated link capability.
#natsemi-diag -aa
natsemi-diag.c:v2.08 2/28/2005 Donald Becker (becker@scyld.com)
http://www.scyld.com/diag/index.html
Index #1: Found a NatSemi DP83815 adapter at 0xf800.
Natsemi 83815 series with station address de:ad:be:ef:be:ef
Transceiver setting Autonegotation advertise 10/100 Mbps half and full
duplex.
This device appears to be active, so some registers will not be read.
To see all register values use the '-f' flag.
NatSemi DP83815 chip registers at 0xf800
0x000: 00000004 e805e000 00000002 00000000 ******** 00f1cd65 00000001
00000000
0x020: 03abd200 d0f01002 00000000 00000000 03abd000 18700010 00000000
00000000
0x040: ******** 00200000 00000004 0000efbe ffff000b 30303030 00000403
00000000
0x060: ******** ******** ******** ******** ******** ******** ********
********
0x080: 00003100 0000786d 00002000 00005c21 000005e1 000045e1 00000005
00002801
0x0A0: ******** ******** ******** ******** ******** ******** ********
********
0x0C0: 00000615 00000002 00000000 00000000 00000000 00000000 00000100
00000030
0x0E0: 00000000 000000bf 00000804 00008200 00000000 00000000 00000000
00000000
Interrupt sources are pending (00000200).
Tx queue emptied indication.
Receive mode is 0xc8200000: Normal unicast and hashed multicast.
Rx filter contents: adde efbe efbe 0000 0000 0000 0000 0000
#natsemi-diag -ee
natsemi-diag.c:v2.08 2/28/2005 Donald Becker (becker@scyld.com)
http://www.scyld.com/diag/index.html
Index #1: Found a NatSemi DP83815 adapter at 0xf800.
Natsemi 83815 series with station address de:ad:be:ef:be:ef
Transceiver setting Autonegotation advertise 10/100 Mbps half and full
duplex.
EEPROM address length 6, 64 words.
EEPROM contents (64 words):
0x00: 100b 0020 0b34 41fb 0000 0000 0000 4000
0x08: 0d32 dff4 1905 aa48 0000 0000 129c 4c4c
0x10: ca52 2ccc 0cb2 9c6c 0c6c 8c0c 2020 6080
0x18: 0800 0000 0000 0000 0000 0000 0000 0000
0x20: 0000 0000 0000 0000 0000 0000 0000 0000
0x28: 0000 0000 0000 0000 0000 0000 0000 0000
0x30: 0000 0000 0000 0000 0000 0000 0000 0000
0x38: 0000 0000 0000 0000 0000 0000 0000 e418
Decoded EEPROM contents:
PCI Subsystem IDs -- Vendor 0x100b, Device 0x0020.
PCI timer settings -- minimum grant 11, maximum latency 52.
Ethernet MAC Station Address 00:80:64:1a:e8:bf.
Wake-On-LAN password 00:00:00:00:00:00.
Transceiver setting 0x--f-: advertise 10/100 Mbps half and full duplex.
Flow control enabled.
EEPROM active region checksum read as aa48, vs aa48 calculated value.
^ permalink raw reply
* RFC [PATCH] iproute2: temporary solution to fix xt breakage
From: Jamal Hadi Salim @ 2012-12-17 12:30 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich,
netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50CE3203.9080007@mojatatu.com>
[-- Attachment #1: Type: text/plain, Size: 704 bytes --]
On 12-12-16 03:41 PM, Jamal Hadi Salim wrote:
>
> There is an "intermediate solution" from Hasan which doesnt require
> the kernel change. It changes the kernel endpoint to "ipt". I am
> conflicted because it is a quick hack while otoh forcing people to
> upgrade kernel is a usability issue.
>
Attached. Author is Hasan - I didnt sign it because i am looking for
feedback and i find it distasteful but it solves the problem.
This is needed until we have a proper fix in the kernel propagated.
Once that kernel change is ubiquitous this change is noise and a
maintanance pain. I am making it hard to even turn it on
(i.e someone knowledgeable will have to compile with CONFIG_XT_HACK)
cheers,
jamal
[-- Attachment #2: p1 --]
[-- Type: text/plain, Size: 1092 bytes --]
diff --git a/tc/m_action.c b/tc/m_action.c
index 1fe2431..fa9a7c8 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -209,10 +209,17 @@ done0:
tail = NLMSG_TAIL(n);
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
+ /*XXX: hack to work around old kernels, newer xtables */
+#ifdef CONFIG_XT_HACK
+ if (strncmp(k,"xt",2)==0)
+ addattr_l(n, MAX_MSG, TCA_ACT_KIND, "ipt" , strlen("ipt") + 1);
+ else
+ addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
+#else
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
+#endif
ret = a->parse_aopt(a,&argc, &argv, TCA_ACT_OPTIONS, n);
-
if (ret < 0) {
fprintf(stderr,"bad action parsing\n");
goto bad_val;
@@ -259,7 +266,15 @@ tc_print_one_action(FILE * f, struct rtattr *arg)
}
+ /*XXX: hack to work around old kernels, newer xtables */
+#ifdef CONFIG_XT_HACK
+ if (strcmp(RTA_DATA(tb[TCA_ACT_KIND]), "ipt")==0)
+ a = get_action_kind("xt");
+ else
+ a = get_action_kind(RTA_DATA(tb[TCA_ACT_KIND]));
+#else
a = get_action_kind(RTA_DATA(tb[TCA_ACT_KIND]));
+#endif
if (NULL == a)
return err;
^ permalink raw reply related
* Re: vlan tagged packets and libpcap breakage
From: Daniel Borkmann @ 2012-12-17 11:08 UTC (permalink / raw)
To: Guy Harris; +Cc: Michael Richardson, netdev, tcpdump-workers, Francesco Ruggeri
In-Reply-To: <DE6D5B28-FA1E-4F04-9BDF-F6D35878776E@alum.mit.edu>
On Mon, Dec 17, 2012 at 11:35 AM, Guy Harris <guy@alum.mit.edu> wrote:
> On Dec 17, 2012, at 1:50 AM, "David Laight" <David.Laight@ACULAB.COM> wrote:
>
>> How are you going to tell whether a feature is present in a non-Linux
>> kernel ?
>
> The Linux memory-mapped capture mechanism is not present in a non-Linux kernel, so all the libpcap work involved here would, if necessary on other platforms, have to be done differently on those platforms. Those platforms would have to have their own mechanisms to indicate whether any changes to filter code, processing of VLAN tags supplied out of band, etc. would need to be done.
>
> The same would apply to other additional features of the Linux memory-mapped capture mechanism that require changes in libpcap.
Exactly.
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
^ permalink raw reply
* Re: vlan tagged packets and libpcap breakage
From: Guy Harris @ 2012-12-17 10:35 UTC (permalink / raw)
To: David Laight
Cc: Michael Richardson, netdev, Francesco Ruggeri, Daniel Borkmann,
tcpdump-workers
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70EF@saturn3.aculab.com>
On Dec 17, 2012, at 1:50 AM, "David Laight" <David.Laight@ACULAB.COM> wrote:
> How are you going to tell whether a feature is present in a non-Linux
> kernel ?
The Linux memory-mapped capture mechanism is not present in a non-Linux kernel, so all the libpcap work involved here would, if necessary on other platforms, have to be done differently on those platforms. Those platforms would have to have their own mechanisms to indicate whether any changes to filter code, processing of VLAN tags supplied out of band, etc. would need to be done.
The same would apply to other additional features of the Linux memory-mapped capture mechanism that require changes in libpcap. (Ideally, those changes would only require changes in order to use them, and would not break existing userland code, including but not limited to libpcap - your reply was to Daniel Borkmann, who is, I believe, the originator of netsniff-ng:
http://netsniff-ng.org
which has its own code using PF_PACKET sockets.)
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox