* nfs41: potential null deref in xprt_reserve_xprt()?
From: Dan Carpenter @ 2010-04-23 12:00 UTC (permalink / raw)
To: iyer-HgOvQuBEEgTQT0dZR+AlfA
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
I'm going through some Smatch results and had a question.
Until commit 343952fa5a: "nfs41: Get the rpc_xprt * from the rpc_rqst
instead of the rpc_clnt." we assumed that "task->tk_rqstp" can be NULL.
But that patch dereferences it unconditionally.
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 0eea2bf..c144611 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -195,8 +195,8 @@ EXPORT_SYMBOL_GPL(xprt_load_transport);
*/
int xprt_reserve_xprt(struct rpc_task *task)
{
- struct rpc_xprt *xprt = task->tk_xprt;
struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_xprt *xprt = req->rq_xprt;
^^^^^^^^^^^^^
Can "req" be null here? The patch is a year old, so presumably it
isn't null very often.
If you would like, I can remove the checks for null from the rest of the
function.
regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [patch] sctp: cleanup: remove unneeded null check
From: Dan Carpenter @ 2010-04-23 11:59 UTC (permalink / raw)
To: Vlad Yasevich
Cc: Sridhar Samudrala, David S. Miller, Wei Yongjun, Chris Dischino,
linux-sctp, netdev, kernel-janitors
"chunk" can never be null here. We dereferenced it earlier in the
function and also at the start of the function we passed it to
sctp_pack_cookie() which dereferences it.
This code has been around since the dawn of git history so if "chunk"
were ever null someone would have complained about it.
Signed-off-by: Dan Carpenter <error27@gmail.com>
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 17cb400..52352fc 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -470,8 +470,7 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
*
* [INIT ACK back to where the INIT came from.]
*/
- if (chunk)
- retval->transport = chunk->transport;
+ retval->transport = chunk->transport;
nomem_chunk:
kfree(cookie);
^ permalink raw reply related
* [PATCH net-next-2.6] net: disallow to use net_assign_generic externally
From: Jiri Pirko @ 2010-04-23 11:40 UTC (permalink / raw)
To: netdev; +Cc: davem, ebiederm
Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h
index ff4982a..81a31c0 100644
--- a/include/net/netns/generic.h
+++ b/include/net/netns/generic.h
@@ -14,11 +14,8 @@
* The rules are simple:
* 1. set pernet_operations->id. After register_pernet_device you
* will have the id of your private pointer.
- * 2. Either set pernet_operations->size (to have the code allocate and
- * free a private structure pointed to from struct net ) or
- * call net_assign_generic() to put the private data on the struct
- * net (most preferably this should be done in the ->init callback
- * of the ops registered);
+ * 2. set pernet_operations->size to have the code allocate and free
+ * a private structure pointed to from struct net.
* 3. do not change this pointer while the net is alive;
* 4. do not try to have any private reference on the net_generic object.
*
@@ -46,6 +43,4 @@ static inline void *net_generic(struct net *net, int id)
return ptr;
}
-
-extern int net_assign_generic(struct net *net, int id, void *data);
#endif
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index bd8c471..777477c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -27,6 +27,51 @@ EXPORT_SYMBOL(init_net);
#define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */
+static void net_generic_release(struct rcu_head *rcu)
+{
+ struct net_generic *ng;
+
+ ng = container_of(rcu, struct net_generic, rcu);
+ kfree(ng);
+}
+
+static int net_assign_generic(struct net *net, int id, void *data)
+{
+ struct net_generic *ng, *old_ng;
+
+ BUG_ON(!mutex_is_locked(&net_mutex));
+ BUG_ON(id == 0);
+
+ ng = old_ng = net->gen;
+ if (old_ng->len >= id)
+ goto assign;
+
+ ng = kzalloc(sizeof(struct net_generic) +
+ id * sizeof(void *), GFP_KERNEL);
+ if (ng == NULL)
+ return -ENOMEM;
+
+ /*
+ * Some synchronisation notes:
+ *
+ * The net_generic explores the net->gen array inside rcu
+ * read section. Besides once set the net->gen->ptr[x]
+ * pointer never changes (see rules in netns/generic.h).
+ *
+ * That said, we simply duplicate this array and schedule
+ * the old copy for kfree after a grace period.
+ */
+
+ ng->len = id;
+ memcpy(&ng->ptr, &old_ng->ptr, old_ng->len * sizeof(void*));
+
+ rcu_assign_pointer(net->gen, ng);
+ call_rcu(&old_ng->rcu, net_generic_release);
+assign:
+ ng->ptr[id - 1] = data;
+ return 0;
+}
+
static int ops_init(const struct pernet_operations *ops, struct net *net)
{
int err;
@@ -526,49 +571,3 @@ void unregister_pernet_device(struct pernet_operations *ops)
mutex_unlock(&net_mutex);
}
EXPORT_SYMBOL_GPL(unregister_pernet_device);
-
-static void net_generic_release(struct rcu_head *rcu)
-{
- struct net_generic *ng;
-
- ng = container_of(rcu, struct net_generic, rcu);
- kfree(ng);
-}
-
-int net_assign_generic(struct net *net, int id, void *data)
-{
- struct net_generic *ng, *old_ng;
-
- BUG_ON(!mutex_is_locked(&net_mutex));
- BUG_ON(id == 0);
-
- ng = old_ng = net->gen;
- if (old_ng->len >= id)
- goto assign;
-
- ng = kzalloc(sizeof(struct net_generic) +
- id * sizeof(void *), GFP_KERNEL);
- if (ng == NULL)
- return -ENOMEM;
-
- /*
- * Some synchronisation notes:
- *
- * The net_generic explores the net->gen array inside rcu
- * read section. Besides once set the net->gen->ptr[x]
- * pointer never changes (see rules in netns/generic.h).
- *
- * That said, we simply duplicate this array and schedule
- * the old copy for kfree after a grace period.
- */
-
- ng->len = id;
- memcpy(&ng->ptr, &old_ng->ptr, old_ng->len * sizeof(void*));
-
- rcu_assign_pointer(net->gen, ng);
- call_rcu(&old_ng->rcu, net_generic_release);
-assign:
- ng->ptr[id - 1] = data;
- return 0;
-}
-EXPORT_SYMBOL_GPL(net_assign_generic);
^ permalink raw reply related
* [PATCH/RFC Resubmission] cdc_ether: Identify MBM devices by GUID in MDLM descriptor
From: Jonas Sjoquist @ 2010-04-23 11:07 UTC (permalink / raw)
To: oneukum, davem; +Cc: netdev
From: Jonas Sjöquist <jonas.sjoquist@ericsson.com>
This patch removes vid/pid for Ericsson MBM devices from the whitelist set of
devices. The MBM devices are instead identified by GUID.
In order for cdc_ether to handle these devices the GUID in the MDLM descriptor
is tested. All MBM devices currently handled by cdc_ether as well as future
CDC Ethernet MBM devices can be identified by the GUID.
This is the same solution used in Carl Nordbeck's mbm driver,
http://kerneltrap.org/mailarchive/linux-usb/2008/11/17/4141384/thread
I post this as RFC to get feedback on however cdc_ether is the correct place to
do the binding, or if it should be done in a separate driver, e.g. zaurus.
Signed-off-by: Jonas Sjöquist <jonas.sjoquist@ericsson.com>
---
diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index c8cdb7f..811b2dc 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -64,6 +64,11 @@ static int is_wireless_rndis(struct usb_interface_descriptor *desc)
#endif
+static const u8 mbm_guid[16] = {
+ 0xa3, 0x17, 0xa8, 0x8b, 0x04, 0x5e, 0x4f, 0x01,
+ 0xa6, 0x07, 0xc0, 0xff, 0xcb, 0x7e, 0x39, 0x2a,
+};
+
/*
* probes control interface, claims data interface, collects the bulk
* endpoints, activates data interface (if needed), maybe sets MTU.
@@ -79,6 +84,8 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf)
int status;
int rndis;
struct usb_driver *driver = driver_of(intf);
+ struct usb_cdc_mdlm_desc *desc = NULL;
+ struct usb_cdc_mdlm_detail_desc *detail = NULL;
if (sizeof dev->data < sizeof *info)
return -EDOM;
@@ -229,6 +236,34 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf)
* side link address we were given.
*/
break;
+ case USB_CDC_MDLM_TYPE:
+ if (desc) {
+ dev_dbg(&intf->dev, "extra MDLM descriptor\n");
+ goto bad_desc;
+ }
+
+ desc = (void *)buf;
+
+ if (desc->bLength != sizeof(*desc))
+ goto bad_desc;
+
+ if (memcmp(&desc->bGUID, mbm_guid, 16))
+ goto bad_desc;
+ break;
+ case USB_CDC_MDLM_DETAIL_TYPE:
+ if (detail) {
+ dev_dbg(&intf->dev, "extra MDLM detail descriptor\n");
+ goto bad_desc;
+ }
+
+ detail = (void *)buf;
+
+ if (detail->bGuidDescriptorType == 0) {
+ if (detail->bLength < (sizeof(*detail) + 1))
+ goto bad_desc;
+ } else
+ goto bad_desc;
+ break;
}
next_desc:
len -= buf [0]; /* bLength */
@@ -542,80 +577,10 @@ static const struct usb_device_id products [] = {
USB_CDC_PROTO_NONE),
.driver_info = (unsigned long) &cdc_info,
}, {
- /* Ericsson F3507g */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1900, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3507g ver. 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1902, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3607gw */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1904, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3607gw ver 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1905, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3607gw ver 3 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1906, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3307 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x190a, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson F3307 ver 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1909, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson C3607w */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x1049, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Ericsson C3607w ver 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0bdb, 0x190b, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Toshiba F3507g */
- USB_DEVICE_AND_INTERFACE_INFO(0x0930, 0x130b, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Toshiba F3607gw */
- USB_DEVICE_AND_INTERFACE_INFO(0x0930, 0x130c, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Toshiba F3607gw ver 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x0930, 0x1311, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Dell F3507g */
- USB_DEVICE_AND_INTERFACE_INFO(0x413c, 0x8147, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Dell F3607gw */
- USB_DEVICE_AND_INTERFACE_INFO(0x413c, 0x8183, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
-}, {
- /* Dell F3607gw ver 2 */
- USB_DEVICE_AND_INTERFACE_INFO(0x413c, 0x8184, USB_CLASS_COMM,
- USB_CDC_SUBCLASS_MDLM, USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long) &mbm_info,
+ USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_MDLM,
+ USB_CDC_PROTO_NONE),
+ .driver_info = (unsigned long)&mbm_info,
+
},
{ }, // END
};
^ permalink raw reply related
* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-23 11:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, paulmck, Changli Gao, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <1272020717.7895.7974.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le vendredi 23 avril 2010 à 12:55 +0200, Patrick McHardy a écrit :
>> Eric Dumazet wrote:
>>> OK but a lookup last a fraction of a micro second, unless interrupted by
>>> hard irq.
>>>
>>> Probability of a change during a lookup should be very very small.
>>>
>>> Note that the scenario for a restart is :
>>>
>>> The lookup go through the chain.
>>> While it is examining one object, this object is deleted.
>>> The object is re-allocated by another cpu and inserted to a new chain.
>> I think another scenario that seems a bit more likely would be
>> that a new entry is added to the chain after it was fully searched.
>> Perhaps we could continue searching at the last position if the
>> last entry is not a nulls entry to improve this.
>
> But the last entry is always a nulls entry, what do you mean exactly ?
>
> When an unsert (of a fresh object, not a reused one) is done, this
> doesnt affect lookups in any way, since its done at the head of list.
Right, I missed that :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Eric Dumazet @ 2010-04-23 11:06 UTC (permalink / raw)
To: Patrick McHardy
Cc: Jesper Dangaard Brouer, paulmck, Changli Gao, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <4BD1784A.6010306@trash.net>
Le vendredi 23 avril 2010 à 12:36 +0200, Patrick McHardy a écrit :
> Eric Dumazet wrote:
> > Le jeudi 22 avril 2010 à 23:03 +0200, Eric Dumazet a écrit :
> >>> Guess I have to reproduce the DoS attack in a testlab (I will first have
> >>> time Tuesday). So we can determine if its bad hashing or restart of the
> >>> search loop.
> >>>
> >
> > Or very long chains, if attacker managed to find a jhash flaw.
>
> That should be visible in the "searched" statistic.
>
> > You could add a lookup_restart counter :
>
> I've applied Jespers equivalent patch.
Yes of course, I missed it or I would not have cooked it ;)
Thanks
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Eric Dumazet @ 2010-04-23 11:05 UTC (permalink / raw)
To: Patrick McHardy
Cc: Jesper Dangaard Brouer, paulmck, Changli Gao, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <4BD17CAA.4090708@trash.net>
Le vendredi 23 avril 2010 à 12:55 +0200, Patrick McHardy a écrit :
> Eric Dumazet wrote:
> >
> > OK but a lookup last a fraction of a micro second, unless interrupted by
> > hard irq.
> >
> > Probability of a change during a lookup should be very very small.
> >
> > Note that the scenario for a restart is :
> >
> > The lookup go through the chain.
> > While it is examining one object, this object is deleted.
> > The object is re-allocated by another cpu and inserted to a new chain.
>
> I think another scenario that seems a bit more likely would be
> that a new entry is added to the chain after it was fully searched.
> Perhaps we could continue searching at the last position if the
> last entry is not a nulls entry to improve this.
But the last entry is always a nulls entry, what do you mean exactly ?
When an unsert (of a fresh object, not a reused one) is done, this
doesnt affect lookups in any way, since its done at the head of list.
^ permalink raw reply
* [PATCH net-next-2.6] l2tp_eth: fix memory allocation
From: Jiri Pirko @ 2010-04-23 11:01 UTC (permalink / raw)
To: netdev; +Cc: davem, kleptog, jchapman
Since .size is set properly in "struct pernet_operations l2tp_eth_net_ops",
allocating space for "struct l2tp_eth_net" by hand is not correct, even causes
memory leakage.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index ca1164a..58c6c4c 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -276,43 +276,16 @@ out:
static __net_init int l2tp_eth_init_net(struct net *net)
{
- struct l2tp_eth_net *pn;
- int err;
-
- pn = kzalloc(sizeof(*pn), GFP_KERNEL);
- if (!pn)
- return -ENOMEM;
+ struct l2tp_eth_net *pn = net_generic(net, l2tp_eth_net_id);
INIT_LIST_HEAD(&pn->l2tp_eth_dev_list);
spin_lock_init(&pn->l2tp_eth_lock);
- err = net_assign_generic(net, l2tp_eth_net_id, pn);
- if (err)
- goto out;
-
return 0;
-
-out:
- kfree(pn);
- return err;
-}
-
-static __net_exit void l2tp_eth_exit_net(struct net *net)
-{
- struct l2tp_eth_net *pn;
-
- pn = net_generic(net, l2tp_eth_net_id);
- /*
- * if someone has cached our net then
- * further net_generic call will return NULL
- */
- net_assign_generic(net, l2tp_eth_net_id, NULL);
- kfree(pn);
}
static __net_initdata struct pernet_operations l2tp_eth_net_ops = {
.init = l2tp_eth_init_net,
- .exit = l2tp_eth_exit_net,
.id = &l2tp_eth_net_id,
.size = sizeof(struct l2tp_eth_net),
};
^ permalink raw reply related
* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-23 10:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Changli Gao, hawk, Linux Kernel Network Hackers, netfilter-devel,
Paul E McKenney
In-Reply-To: <1271946961.7895.5665.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le jeudi 22 avril 2010 à 15:17 +0200, Patrick McHardy a écrit :
>> Changli Gao wrote:
>>>> struct nf_conntrack_tuple_hash *
>>>> __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple)
>>>> ...
>>> We should add a retry limit there.
>> We can't do that since that would allow false negatives.
>
> If one hash slot is under attack, then there is a bug somewhere.
>
> If we cannot avoid this, we can fallback to a secure mode at the second
> retry, and take the spinlock.
>
> Tis way, most of lookups stay lockless (one pass), and some might take
> the slot lock to avoid the possibility of a loop.
That sounds like a good idea. But lets what for Jesper's test results
before we start fixing this problem :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-23 10:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, paulmck, Changli Gao, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <1271970199.7895.6482.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le jeudi 22 avril 2010 à 22:38 +0200, Jesper Dangaard Brouer a écrit :
>> On Thu, 22 Apr 2010, Eric Dumazet wrote:
>>
>>> Le jeudi 22 avril 2010 à 08:51 -0700, Paul E. McKenney a écrit :
>>>> On Thu, Apr 22, 2010 at 04:53:49PM +0200, Eric Dumazet wrote:
>>>>> Le jeudi 22 avril 2010 à 16:36 +0200, Eric Dumazet a écrit :
>>>>>
>>>>> If we can do the 'retry' a 10 times, it means the attacker was really
>>>>> clever enough to inject new packets (new conntracks) at the right
>>>>> moment, in the right hash chain, and this sounds so higly incredible
>>>>> that I cannot believe it at all :)
>>>> Or maybe the DoS attack is injecting so many new conntracks that a large
>>>> fraction of the hash chains are being modified at any given time?
>>>>
>> I think its plausable, there is a lot of modification going on.
>> Approx 40.000 deletes/sec and 40.000 inserts/sec.
>> The hash bucket size is 300032, and with 80000 modifications/sec, we are
>> (potentially) changing 26.6% of the hash chains each second.
>>
>
> OK but a lookup last a fraction of a micro second, unless interrupted by
> hard irq.
>
> Probability of a change during a lookup should be very very small.
>
> Note that the scenario for a restart is :
>
> The lookup go through the chain.
> While it is examining one object, this object is deleted.
> The object is re-allocated by another cpu and inserted to a new chain.
I think another scenario that seems a bit more likely would be
that a new entry is added to the chain after it was fully searched.
Perhaps we could continue searching at the last position if the
last entry is not a nulls entry to improve this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next-2.6] l2tp: fix memory allocation
From: Jiri Pirko @ 2010-04-23 10:53 UTC (permalink / raw)
To: netdev; +Cc: davem, kleptog, jchapman
Since .size is set properly in "struct pernet_operations l2tp_net_ops",
allocating space for "struct l2tp_net" by hand is not correct, even causes
memory leakage.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index ecc7aea..1712af1 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1617,14 +1617,9 @@ EXPORT_SYMBOL_GPL(l2tp_session_create);
static __net_init int l2tp_init_net(struct net *net)
{
- struct l2tp_net *pn;
- int err;
+ struct l2tp_net *pn = net_generic(net, l2tp_net_id);
int hash;
- pn = kzalloc(sizeof(*pn), GFP_KERNEL);
- if (!pn)
- return -ENOMEM;
-
INIT_LIST_HEAD(&pn->l2tp_tunnel_list);
spin_lock_init(&pn->l2tp_tunnel_list_lock);
@@ -1633,33 +1628,11 @@ static __net_init int l2tp_init_net(struct net *net)
spin_lock_init(&pn->l2tp_session_hlist_lock);
- err = net_assign_generic(net, l2tp_net_id, pn);
- if (err)
- goto out;
-
return 0;
-
-out:
- kfree(pn);
- return err;
-}
-
-static __net_exit void l2tp_exit_net(struct net *net)
-{
- struct l2tp_net *pn;
-
- pn = net_generic(net, l2tp_net_id);
- /*
- * if someone has cached our net then
- * further net_generic call will return NULL
- */
- net_assign_generic(net, l2tp_net_id, NULL);
- kfree(pn);
}
static struct pernet_operations l2tp_net_ops = {
.init = l2tp_init_net,
- .exit = l2tp_exit_net,
.id = &l2tp_net_id,
.size = sizeof(struct l2tp_net),
};
^ permalink raw reply related
* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-23 10:36 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, paulmck, Changli Gao, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <1271970893.7895.6507.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le jeudi 22 avril 2010 à 23:03 +0200, Eric Dumazet a écrit :
>>> Guess I have to reproduce the DoS attack in a testlab (I will first have
>>> time Tuesday). So we can determine if its bad hashing or restart of the
>>> search loop.
>>>
>
> Or very long chains, if attacker managed to find a jhash flaw.
That should be visible in the "searched" statistic.
> You could add a lookup_restart counter :
I've applied Jespers equivalent patch.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-23 10:35 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Changli Gao, Eric Dumazet, Linux Kernel Network Hackers,
netfilter-devel, Paul E McKenney
In-Reply-To: <1271943066.14501.194.camel@jdb-workstation>
Jesper Dangaard Brouer wrote:
> I have added a stats counter to prove my case, which I think we should add to the kernel (to detect the case in the future).
> The DDoS attack has disappeared, so I guess I'll try to see if I can reproduce the problem in my testlab.
>
>
>
> [PATCH] net: netfilter conntrack extended with extra stat counter.
>
> From: Jesper Dangaard Brouer <hawk@comx.dk>
>
> I suspect an unfortunatly series of events occuring under a DDoS
> attack, in function __nf_conntrack_find() nf_contrack_core.c.
>
> Adding a stats counter to see if the search is restarted too often.
Applied, thanks Jesper.
^ permalink raw reply
* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-23 10:26 UTC (permalink / raw)
To: Changli Gao
Cc: David S. Miller, jamal, Tom Herbert, Stephen Hemminger, netdev
In-Reply-To: <1272010378-2955-1-git-send-email-xiaosuo@gmail.com>
Le vendredi 23 avril 2010 à 16:12 +0800, Changli Gao a écrit :
> batch skb dequeueing from softnet input_pkt_queue.
>
> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> contention when RPS is enabled.
>
> Note: in the worst case, the number of packets in a softnet_data may be double
> of netdev_max_backlog.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
Oops, reading it again, I found process_backlog() was still taking the
lock twice, if only one packet is waiting in input_pkt_queue.
Possible fix, on top of your patch :
diff --git a/net/core/dev.c b/net/core/dev.c
index 0eddd23..0569be7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3296,8 +3296,9 @@ static int process_backlog(struct napi_struct *napi, int quota)
#endif
napi->weight = weight_p;
local_irq_disable();
- while (1) {
+ while (work < quota) {
struct sk_buff *skb;
+ unsigned int qlen;
while ((skb = __skb_dequeue(&sd->process_queue))) {
local_irq_enable();
@@ -3308,13 +3309,15 @@ static int process_backlog(struct napi_struct *napi, int quota)
}
rps_lock(sd);
- input_queue_head_add(sd, skb_queue_len(&sd->input_pkt_queue));
- skb_queue_splice_tail_init(&sd->input_pkt_queue,
- &sd->process_queue);
- if (skb_queue_empty(&sd->process_queue)) {
+ qlen = skb_queue_len(&sd->input_pkt_queue);
+ if (qlen) {
+ input_queue_head_add(sd, qlen);
+ skb_queue_splice_tail_init(&sd->input_pkt_queue,
+ &sd->process_queue);
+ }
+ if (qlen < quota - work) {
__napi_complete(napi);
- rps_unlock(sd);
- break;
+ quota = work + qlen;
}
rps_unlock(sd);
}
^ permalink raw reply related
* Re: [RFC 2/2] phylib: Convert MDIO bitbang to new MDIO 45 format
From: Ben Hutchings @ 2010-04-23 10:22 UTC (permalink / raw)
To: Andy Fleming; +Cc: davem, netdev
In-Reply-To: <1271997497-6896-3-git-send-email-afleming@freescale.com>
On Thu, 2010-04-22 at 23:38 -0500, Andy Fleming wrote:
> Now that we've added somewhat more complete MDIO 45 support to the PHY
> Lib, convert the MDIO bitbang driver to use this new infrastructure.
>
> Signed-off-by: Andy Fleming <afleming@freescale.com>
> ---
> drivers/net/phy/mdio-bitbang.c | 23 +++++++++++------------
> 1 files changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/phy/mdio-bitbang.c b/drivers/net/phy/mdio-bitbang.c
> index 2f6f02e..4c0c89b 100644
> --- a/drivers/net/phy/mdio-bitbang.c
> +++ b/drivers/net/phy/mdio-bitbang.c
[...]
> @@ -157,9 +154,10 @@ static int mdiobb_read(struct mii_bus *bus, int phy, int devad, int reg)
> struct mdiobb_ctrl *ctrl = bus->priv;
> int ret, i;
>
> - if (reg & MII_ADDR_C45) {
> - reg = mdiobb_cmd_addr(ctrl, phy, reg);
> - mdiobb_cmd(ctrl, MDIO_C45_READ, phy, reg);
> + /* Clause 22 PHYs only use devad = 0, and Clause 45 only use nonzero */
> + if (devad) {
> + mdiobb_cmd_addr(ctrl, phy, devad, reg);
> + mdiobb_cmd(ctrl, MDIO_C45_READ, phy, devad);
> } else
> mdiobb_cmd(ctrl, MDIO_READ, phy, reg);
>
[...]
I don't believe there's any protocol requirement in clause 45 that
devad != 0 (although the address is not allocated). In the mdio module
I played safe and defined MDIO_DEVAD_NONE == -1 to indicate a clause 22
request.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: John Fastabend @ 2010-04-23 9:27 UTC (permalink / raw)
To: Ben Hutchings
Cc: Waskiewicz Jr, Peter P, tglx@linutronix.de, davem@davemloft.net,
arjan@linux.jf.intel.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <1271950900.2095.25.camel@achroite.uk.solarflarecom.com>
Ben Hutchings wrote:
> On Thu, 2010-04-22 at 05:11 -0700, Peter P Waskiewicz Jr wrote:
>> On Wed, 21 Apr 2010, Ben Hutchings wrote:
>>
>>> On Tue, 2010-04-20 at 11:01 -0700, Peter P Waskiewicz Jr wrote:
>>>> This patch adds a callback function pointer to the irq_desc
>>>> structure, along with a registration function and a read-only
>>>> proc entry for each interrupt.
>>>>
>>>> This affinity_hint handle for each interrupt can be used by
>>>> underlying drivers that need a better mechanism to control
>>>> interrupt affinity. The underlying driver can register a
>>>> callback for the interrupt, which will allow the driver to
>>>> provide the CPU mask for the interrupt to anything that
>>>> requests it. The intent is to extend the userspace daemon,
>>>> irqbalance, to help hint to it a preferred CPU mask to balance
>>>> the interrupt into.
>>> Doesn't it make more sense to have the driver follow affinity decisions
>>> made from user-space? I realise that reallocating queues is disruptive
>>> and we probably don't want irqbalance to trigger that, but there should
>>> be a mechanism for the administrator to trigger it.
>> The driver here would be assisting userspace (irqbalance) to provide
>> better details how the HW is laid out with respect to flows. As it stands
>> today, irqbalance is almost guaranteed to move interrups to CPUs that are
>> not aligned with where applications are running for network adapters.
>> This is very apparent when running at speeds in the 10 Gigabit range, or
>> even multiple 1 Gigabit ports running at the same time.
>
> I'm well aware that irqbalance isn't making good decisions at the
> moment. The question is whether this will really help irqbalance to do
> better.
>
FCoE is one example where these hints can really help irqbalance make
good decisions. By aligning the interrupt affinity with the FCoE
receive processing thread we can avoid context switching from the NET_RX
softirq to the receive processing thread.
Because the base driver knows which rx rings are being used for FCoE in
a particular configuration and their corresponding vectors it seems to
be in the best position to provide good hints to irqbalance. Also if
the mapping changes at some point the base driver will be aware of it.
> [...]
>>> This just assigns IRQs to the first n CPU threads. Depending on the
>>> enumeration order, this might result in assigning an IRQ to each of 2
>>> threads on a core while leaving other cores unused!
>> This ixgbe patch is only meant to be an example of how you could use it.
>> I didn't hammer out all the corner cases of interrupt alignment in it yet.
>> However, ixgbe is already aligning Tx flows onto the CPU/queue pair the Tx
>> occurred (i.e. Tx session from CPU 4 will be queued on Tx queue 4),
> [...]
>
> OK, now I remember ixgbe has this odd select_queue() implementation.
> But this behaviour can result in reordering whenever a user thread
> migrates, and in any case Dave discourages people from setting
> select_queue(). So I see that these changes would be useful for ixgbe
> (together with an update to irqbalance), but they don't seem to fit the
> general direction of multiqueue networking on Linux.
For DCB setting select_queue() is useful because we want to map traffic
types to specific tx queues not hash them across all queues. In this
case where we are placing specific traffic on specific queues it also
makes sense to align the interrupts for some types such as FCoE. There
shouldn't be any issues with user thread migration in this specific example.
>
> (Actually, the hints seem to be incomplete. If there are more than 16
> CPU threads then multiple CPU threads can map to the same queues, but it
> looks like you only include the first in the queue's hint.)
>
> An alternate approach is to use the RX queue index to drive TX queue
> selection. I posted a patch to do that earlier this week. However I
> haven't yet had a chance to try that on a suitably large system.
>
I'll post an FCoE example patch soon and take a closer look at your
patch, but mapping TX/RX queues in sock's won't help for cases like FCoE.
Thanks,
John.
^ permalink raw reply
* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-23 9:27 UTC (permalink / raw)
To: Changli Gao
Cc: David S. Miller, jamal, Tom Herbert, Stephen Hemminger, netdev
In-Reply-To: <1272010378-2955-1-git-send-email-xiaosuo@gmail.com>
Le vendredi 23 avril 2010 à 16:12 +0800, Changli Gao a écrit :
> batch skb dequeueing from softnet input_pkt_queue.
>
> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> contention when RPS is enabled.
>
> Note: in the worst case, the number of packets in a softnet_data may be double
> of netdev_max_backlog.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Very good patch Changli, thanks !
Lets see how it improves thing for Jamal benchs ;)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ----
> include/linux/netdevice.h | 6 +++--
> net/core/dev.c | 50 +++++++++++++++++++++++++++++++---------------
> 2 files changed, 38 insertions(+), 18 deletions(-)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c5ed5f..6ae9f2b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1387,6 +1387,7 @@ struct softnet_data {
> struct Qdisc *output_queue;
> struct list_head poll_list;
> struct sk_buff *completion_queue;
> + struct sk_buff_head process_queue;
>
> #ifdef CONFIG_RPS
> struct softnet_data *rps_ipi_list;
> @@ -1401,10 +1402,11 @@ struct softnet_data {
> struct napi_struct backlog;
> };
>
> -static inline void input_queue_head_incr(struct softnet_data *sd)
> +static inline void input_queue_head_add(struct softnet_data *sd,
> + unsigned int len)
> {
> #ifdef CONFIG_RPS
> - sd->input_queue_head++;
> + sd->input_queue_head += len;
> #endif
> }
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a4a7c36..c1585f9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2409,12 +2409,13 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
> __get_cpu_var(netdev_rx_stat).total++;
>
> rps_lock(sd);
> - if (sd->input_pkt_queue.qlen <= netdev_max_backlog) {
> - if (sd->input_pkt_queue.qlen) {
> + if (skb_queue_len(&sd->input_pkt_queue) <= netdev_max_backlog) {
> + if (skb_queue_len(&sd->input_pkt_queue)) {
> enqueue:
> __skb_queue_tail(&sd->input_pkt_queue, skb);
> #ifdef CONFIG_RPS
> - *qtail = sd->input_queue_head + sd->input_pkt_queue.qlen;
> + *qtail = sd->input_queue_head +
> + skb_queue_len(&sd->input_pkt_queue);
> #endif
> rps_unlock(sd);
> local_irq_restore(flags);
> @@ -2934,13 +2935,21 @@ static void flush_backlog(void *arg)
> struct sk_buff *skb, *tmp;
>
> rps_lock(sd);
> - skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp)
> + skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
> if (skb->dev == dev) {
> __skb_unlink(skb, &sd->input_pkt_queue);
> kfree_skb(skb);
> - input_queue_head_incr(sd);
> + input_queue_head_add(sd, 1);
> }
> + }
> rps_unlock(sd);
> +
> + skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
> + if (skb->dev == dev) {
> + __skb_unlink(skb, &sd->process_queue);
> + kfree_skb(skb);
> + }
> + }
> }
>
> static int napi_gro_complete(struct sk_buff *skb)
> @@ -3286,24 +3295,30 @@ static int process_backlog(struct napi_struct *napi, int quota)
> }
> #endif
> napi->weight = weight_p;
> - do {
> + local_irq_disable();
> + while (1) {
> struct sk_buff *skb;
>
> - local_irq_disable();
> + while ((skb = __skb_dequeue(&sd->process_queue))) {
> + local_irq_enable();
> + __netif_receive_skb(skb);
> + if (++work >= quota)
> + return work;
> + local_irq_disable();
> + }
> +
> rps_lock(sd);
> - skb = __skb_dequeue(&sd->input_pkt_queue);
> - if (!skb) {
> + input_queue_head_add(sd, skb_queue_len(&sd->input_pkt_queue));
> + skb_queue_splice_tail_init(&sd->input_pkt_queue,
> + &sd->process_queue);
> + if (skb_queue_empty(&sd->process_queue)) {
> __napi_complete(napi);
> rps_unlock(sd);
> - local_irq_enable();
> break;
> }
> - input_queue_head_incr(sd);
> rps_unlock(sd);
> - local_irq_enable();
> -
> - __netif_receive_skb(skb);
> - } while (++work < quota);
> + }
> + local_irq_enable();
>
> return work;
> }
> @@ -5631,8 +5646,10 @@ static int dev_cpu_callback(struct notifier_block *nfb,
> /* Process offline CPU's input_pkt_queue */
> while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
> netif_rx(skb);
> - input_queue_head_incr(oldsd);
> + input_queue_head_add(oldsd, 1);
> }
> + while ((skb = __skb_dequeue(&oldsd->process_queue)))
> + netif_rx(skb);
>
> return NOTIFY_OK;
> }
> @@ -5851,6 +5868,7 @@ static int __init net_dev_init(void)
> struct softnet_data *sd = &per_cpu(softnet_data, i);
>
> skb_queue_head_init(&sd->input_pkt_queue);
> + skb_queue_head_init(&sd->process_queue);
> sd->completion_queue = NULL;
> INIT_LIST_HEAD(&sd->poll_list);
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Eric Dumazet @ 2010-04-23 9:23 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Jesper Dangaard Brouer, Patrick McHardy, hawk,
Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <alpine.LSU.2.01.1004230955030.26168@obet.zrqbmnf.qr>
Le vendredi 23 avril 2010 à 09:55 +0200, Jan Engelhardt a écrit :
> On Friday 2010-04-23 09:46, Eric Dumazet wrote:
> >Years ago, we had to manually change PAGE_OFFSET, and I remember some
> >machines with PAGE_OFFSET 0xA0000000 (1.5 GB LOWMEM),
> >or 0xB0000000 (1.25 GB), (PAE off)
>
> I notice that 0xB0000000, which is now known as LOWMEM_3G_OPT,
> is only available when PAE is off. Would you know the reason for
> that decision? Are some values unsuitable for PAE?
>
If PAE was on, PAGE_OFFSET must be a 1GB multiple.
This is because of hardware limitations.
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: Jesper Dangaard Brouer @ 2010-04-23 8:40 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, paulmck, Patrick McHardy, xiaosuo, netdev,
Netfilter Developers
In-Reply-To: <20100423.011845.254684857.davem@davemloft.net>
On Fri, 23 Apr 2010, David Miller wrote:
> This all reminds me of the namespace bug we dealt with
> a month or two ago.
>
> Jesper, you don't happen to be using network namespaces are you?
No, I don't use network namespaces.
(In .config CONFIG_NAMESPACES is not set.)
Cheers,
Jesper Brouer
--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------
^ permalink raw reply
* [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-23 8:12 UTC (permalink / raw)
To: David S. Miller
Cc: jamal, Tom Herbert, Eric Dumazet, Stephen Hemminger, netdev,
Changli Gao
batch skb dequeueing from softnet input_pkt_queue.
batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled.
Note: in the worst case, the number of packets in a softnet_data may be double
of netdev_max_backlog.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/linux/netdevice.h | 6 +++--
net/core/dev.c | 50 +++++++++++++++++++++++++++++++---------------
2 files changed, 38 insertions(+), 18 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c5ed5f..6ae9f2b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1387,6 +1387,7 @@ struct softnet_data {
struct Qdisc *output_queue;
struct list_head poll_list;
struct sk_buff *completion_queue;
+ struct sk_buff_head process_queue;
#ifdef CONFIG_RPS
struct softnet_data *rps_ipi_list;
@@ -1401,10 +1402,11 @@ struct softnet_data {
struct napi_struct backlog;
};
-static inline void input_queue_head_incr(struct softnet_data *sd)
+static inline void input_queue_head_add(struct softnet_data *sd,
+ unsigned int len)
{
#ifdef CONFIG_RPS
- sd->input_queue_head++;
+ sd->input_queue_head += len;
#endif
}
diff --git a/net/core/dev.c b/net/core/dev.c
index a4a7c36..c1585f9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2409,12 +2409,13 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
__get_cpu_var(netdev_rx_stat).total++;
rps_lock(sd);
- if (sd->input_pkt_queue.qlen <= netdev_max_backlog) {
- if (sd->input_pkt_queue.qlen) {
+ if (skb_queue_len(&sd->input_pkt_queue) <= netdev_max_backlog) {
+ if (skb_queue_len(&sd->input_pkt_queue)) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
#ifdef CONFIG_RPS
- *qtail = sd->input_queue_head + sd->input_pkt_queue.qlen;
+ *qtail = sd->input_queue_head +
+ skb_queue_len(&sd->input_pkt_queue);
#endif
rps_unlock(sd);
local_irq_restore(flags);
@@ -2934,13 +2935,21 @@ static void flush_backlog(void *arg)
struct sk_buff *skb, *tmp;
rps_lock(sd);
- skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp)
+ skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
if (skb->dev == dev) {
__skb_unlink(skb, &sd->input_pkt_queue);
kfree_skb(skb);
- input_queue_head_incr(sd);
+ input_queue_head_add(sd, 1);
}
+ }
rps_unlock(sd);
+
+ skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
+ if (skb->dev == dev) {
+ __skb_unlink(skb, &sd->process_queue);
+ kfree_skb(skb);
+ }
+ }
}
static int napi_gro_complete(struct sk_buff *skb)
@@ -3286,24 +3295,30 @@ static int process_backlog(struct napi_struct *napi, int quota)
}
#endif
napi->weight = weight_p;
- do {
+ local_irq_disable();
+ while (1) {
struct sk_buff *skb;
- local_irq_disable();
+ while ((skb = __skb_dequeue(&sd->process_queue))) {
+ local_irq_enable();
+ __netif_receive_skb(skb);
+ if (++work >= quota)
+ return work;
+ local_irq_disable();
+ }
+
rps_lock(sd);
- skb = __skb_dequeue(&sd->input_pkt_queue);
- if (!skb) {
+ input_queue_head_add(sd, skb_queue_len(&sd->input_pkt_queue));
+ skb_queue_splice_tail_init(&sd->input_pkt_queue,
+ &sd->process_queue);
+ if (skb_queue_empty(&sd->process_queue)) {
__napi_complete(napi);
rps_unlock(sd);
- local_irq_enable();
break;
}
- input_queue_head_incr(sd);
rps_unlock(sd);
- local_irq_enable();
-
- __netif_receive_skb(skb);
- } while (++work < quota);
+ }
+ local_irq_enable();
return work;
}
@@ -5631,8 +5646,10 @@ static int dev_cpu_callback(struct notifier_block *nfb,
/* Process offline CPU's input_pkt_queue */
while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
netif_rx(skb);
- input_queue_head_incr(oldsd);
+ input_queue_head_add(oldsd, 1);
}
+ while ((skb = __skb_dequeue(&oldsd->process_queue)))
+ netif_rx(skb);
return NOTIFY_OK;
}
@@ -5851,6 +5868,7 @@ static int __init net_dev_init(void)
struct softnet_data *sd = &per_cpu(softnet_data, i);
skb_queue_head_init(&sd->input_pkt_queue);
+ skb_queue_head_init(&sd->process_queue);
sd->completion_queue = NULL;
INIT_LIST_HEAD(&sd->poll_list);
^ permalink raw reply related
* Re: [PATCH 1/2][RESEND] ehea: error handling improvement
From: Thomas Klein @ 2010-04-23 8:22 UTC (permalink / raw)
To: David Miller; +Cc: tklein, netdev, linuxppc-dev, linux-kernel, themann
In-Reply-To: <20100421.223620.257172362.davem@davemloft.net>
On 04/22/2010 07:36 AM, David Miller wrote:
> From: Thomas Klein<tklein@de.ibm.com>
> Date: Wed, 21 Apr 2010 11:10:55 +0200
>
>> Reset a port's resources only if they're actually in an error state
>>
>> Signed-off-by: Thomas Klein<tklein@de.ibm.com>
>> ---
>>
>> Patch created against net-2.6
>
> I thought you were sorry for wasting my time and that you were going
> to follow the directions I gave you last time, and I quote:
>
> --------------------
> 3) These are not appropriate for net-2.6 as we are deep in
> the -rcX series at this point and only the most diabolical
> bug fixes are appropriate. Therefore, please generate these
> against net-next-2.6, thanks.
> --------------------
>
> And here you are generating your patches against net-2.6. Heck, you
> even feel it's worth mentioning explicitly.
Guilty! Allows no excuse. Screwed it. Deeply sorry.
>
> Lucky for you the patches happen to apply cleanly to net-next-2.6 so
> I've put them there.
Thanks!
Thomas
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: David Miller @ 2010-04-23 8:18 UTC (permalink / raw)
To: eric.dumazet; +Cc: hawk, paulmck, kaber, xiaosuo, hawk, netdev, netfilter-devel
In-Reply-To: <20100423.011328.107238355.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Fri, 23 Apr 2010 01:13:28 -0700 (PDT)
> I really can't see what might cause this behavior then.
This all reminds me of the namespace bug we dealt with
a month or two ago.
Jesper, you don't happen to be using network namespaces are you?
Because if so, the following might be your cure.
commit 5b3501faa8741d50617ce4191c20061c6ef36cb3
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Feb 8 11:16:56 2010 -0800
netfilter: nf_conntrack: per netns nf_conntrack_cachep
^ permalink raw reply
* [PATCH] can: Add driver for esd CAN-USB/2 device
From: Matthias Fuchs @ 2010-04-23 8:15 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
This patch adds a driver for esd's USB high speed
CAN interface. The driver supports devices with
multiple CAN interfaces.
Signed-off-by: Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>
---
drivers/net/can/usb/Kconfig | 6 +
drivers/net/can/usb/Makefile | 1 +
drivers/net/can/usb/esd_usb2.c | 1107 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 1114 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/can/usb/esd_usb2.c
diff --git a/drivers/net/can/usb/Kconfig b/drivers/net/can/usb/Kconfig
index 97ff6fe..0452549 100644
--- a/drivers/net/can/usb/Kconfig
+++ b/drivers/net/can/usb/Kconfig
@@ -7,4 +7,10 @@ config CAN_EMS_USB
This driver is for the one channel CPC-USB/ARM7 CAN/USB interface
from EMS Dr. Thomas Wuensche (http://www.ems-wuensche.de).
+config CAN_ESD_USB2
+ tristate "ESD USB/2 CAN/USB interface"
+ ---help---
+ This driver supports the CAN-USB/2 interface
+ from esd electronic system design gmbh (http://www.esd.eu).
+
endmenu
diff --git a/drivers/net/can/usb/Makefile b/drivers/net/can/usb/Makefile
index 0afd51d..fce3cf1 100644
--- a/drivers/net/can/usb/Makefile
+++ b/drivers/net/can/usb/Makefile
@@ -3,5 +3,6 @@
#
obj-$(CONFIG_CAN_EMS_USB) += ems_usb.o
+obj-$(CONFIG_CAN_ESD_USB2) += esd_usb2.o
ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c
new file mode 100644
index 0000000..c714ce9
--- /dev/null
+++ b/drivers/net/can/usb/esd_usb2.c
@@ -0,0 +1,1107 @@
+/*
+ * CAN driver for esd CAN-USB/2
+ *
+ * Copyright (C) 2010 Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>, esd gmbh
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#include <linux/init.h>
+#include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/usb.h>
+
+#include <linux/can.h>
+#include <linux/can/dev.h>
+#include <linux/can/error.h>
+
+MODULE_AUTHOR("Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>");
+MODULE_DESCRIPTION("CAN driver for esd CAN-USB/2 interfaces");
+MODULE_LICENSE("GPL v2");
+
+/* Define these values to match your devices */
+#define USB_ESDGMBH_VENDOR_ID 0x0ab4
+#define USB_CANUSB2_PRODUCT_ID 0x0010
+
+#define ESD_USB2_CAN_CLOCK 60000000
+#define ESD_USB2_MAX_NETS 2
+
+/* USB2 commands */
+#define CMD_VERSION 1 /* also used for VERSION_REPLY */
+#define CMD_CAN_RX 2 /* device to host only */
+#define CMD_CAN_TX 3 /* also used for TX_DONE */
+#define CMD_SETBAUD 4 /* also used for SETBAUD_REPLY */
+#define CMD_TS 5 /* also used for TS_REPLY */
+#define CMD_IDADD 6 /* also used for IDADD_REPLY */
+
+/* esd CAN message flags - dlc field */
+#define ESD_RTR 0x10
+
+/* esd CAN message flags - id field */
+#define ESD_EXTID 0x20000000
+#define ESD_EVENT 0x40000000
+#define ESD_IDMASK 0x1fffffff
+
+/* esd CAN event ids used by this driver */
+#define ESD_EV_CAN_ERROR_EXT 2
+
+/* baudrate message flags */
+#define ESD_USB2_UBR 0x80000000
+#define ESD_USB2_LOM 0x40000000
+#define ESD_USB2_NO_BAUDRATE 0x7fffffff
+#define ESD_USB2_TSEG1_MIN 1
+#define ESD_USB2_TSEG1_MAX 16
+#define ESD_USB2_TSEG1_SHIFT 16
+#define ESD_USB2_TSEG2_MIN 1
+#define ESD_USB2_TSEG2_MAX 8
+#define ESD_USB2_TSEG2_SHIFT 20
+#define ESD_USB2_SJW_MAX 4
+#define ESD_USB2_SJW_SHIFT 14
+#define ESD_USB2_BRP_MIN 1
+#define ESD_USB2_BRP_MAX 1024
+#define ESD_USB2_BRP_INC 1
+#define ESD_USB2_3_SAMPLES 0x00800000
+
+/* esd IDADD message */
+#define ESD_ID_ENABLE 0x80
+#define ESD_MAX_ID_SEGMENT 64
+
+/* SJA1000 ECC register (emulated by usb2 firmware) */
+#define SJA1000_ECC_SEG 0x1F
+#define SJA1000_ECC_DIR 0x20
+#define SJA1000_ECC_ERR 0x06
+#define SJA1000_ECC_BIT 0x00
+#define SJA1000_ECC_FORM 0x40
+#define SJA1000_ECC_STUFF 0x80
+#define SJA1000_ECC_MASK 0xc0
+
+/* esd bus state event codes */
+#define ESD_BUSSTATE_MASK 0xc0
+#define ESD_BUSSTATE_WARN 0x40
+#define ESD_BUSSTATE_ERRPASSIVE 0x80
+#define ESD_BUSSTATE_BUSOFF 0xc0
+
+#define RX_BUFFER_SIZE 1024
+#define MAX_RX_URBS 4
+#define MAX_TX_URBS 16 /* must be power of 2 */
+
+struct header_msg {
+ u8 len; /* len is always the total message length in 32bit words */
+ u8 cmd;
+ u8 rsvd[2];
+};
+
+struct version_msg {
+ u8 len;
+ u8 cmd;
+ u8 rsvd;
+ u8 flags;
+ __le32 drv_version;
+};
+
+struct version_reply_msg {
+ u8 len;
+ u8 cmd;
+ u8 nets;
+ u8 features;
+ __le32 version;
+ u8 name[16];
+ __le32 rsvd;
+ __le32 ts;
+};
+
+struct rx_msg {
+ u8 len;
+ u8 cmd;
+ u8 net;
+ u8 dlc;
+ __le32 ts;
+ __le32 id; /* upper 3 bits contain flags */
+ u8 data[8];
+};
+
+struct tx_msg {
+ u8 len;
+ u8 cmd;
+ u8 net;
+ u8 dlc;
+ __le32 hnd;
+ __le32 id; /* upper 3 bits contain flags */
+ u8 data[8];
+};
+
+struct tx_done_msg {
+ u8 len;
+ u8 cmd;
+ u8 net;
+ u8 status;
+ __le32 hnd;
+ __le32 ts;
+};
+
+struct id_filter_msg {
+ u8 len;
+ u8 cmd;
+ u8 net;
+ u8 option;
+ __le32 mask[65];
+};
+
+struct set_baudrate_msg {
+ u8 len;
+ u8 cmd;
+ u8 net;
+ u8 rsvd;
+ __le32 baud;
+};
+
+/* Main message type used between library and application */
+struct __attribute__ ((packed)) esd_usb2_msg {
+ union {
+ struct header_msg hdr;
+ struct version_msg version;
+ struct version_reply_msg version_reply;
+ struct rx_msg rx;
+ struct tx_msg tx;
+ struct tx_done_msg txdone;
+ struct set_baudrate_msg setbaud;
+ struct id_filter_msg filter;
+ } msg;
+};
+
+static struct usb_device_id esd_usb2_table[] = {
+ {USB_DEVICE(USB_ESDGMBH_VENDOR_ID, USB_CANUSB2_PRODUCT_ID)},
+ {}
+};
+MODULE_DEVICE_TABLE(usb, esd_usb2_table);
+
+struct esd_usb2_net_priv;
+
+struct esd_tx_urb_context {
+ struct esd_usb2_net_priv *priv;
+ u32 echo_index;
+ int dlc;
+};
+
+struct esd_usb2 {
+ struct usb_device *udev;
+ struct esd_usb2_net_priv *nets[ESD_USB2_MAX_NETS];
+
+ struct usb_anchor rx_submitted;
+
+ int net_count;
+ u32 version;
+ int rxinitdone;
+};
+
+struct esd_usb2_net_priv {
+ struct can_priv can; /* must be the first member */
+
+ atomic_t active_tx_jobs;
+ struct usb_anchor tx_submitted;
+ struct esd_tx_urb_context tx_contexts[MAX_TX_URBS];
+
+ int open_time;
+ struct esd_usb2 *usb2;
+ struct net_device *netdev;
+ int index;
+ u8 old_state;
+};
+
+static void esd_usb2_rx_event(struct esd_usb2_net_priv *priv,
+ struct esd_usb2_msg *msg)
+{
+ struct net_device_stats *stats = &priv->netdev->stats;
+ struct can_frame *cf;
+ struct sk_buff *skb;
+ u32 id = le32_to_cpu(msg->msg.rx.id) & ESD_IDMASK;
+
+ if (id == ESD_EV_CAN_ERROR_EXT) {
+ u8 state = msg->msg.rx.data[0];
+ u8 ecc = msg->msg.rx.data[1];
+ u8 txerr = msg->msg.rx.data[2];
+ u8 rxerr = msg->msg.rx.data[3];
+
+ skb = alloc_can_err_skb(priv->netdev, &cf);
+ if (skb == NULL) {
+ stats->rx_dropped++;
+ return;
+ }
+
+ if (state != priv->old_state) {
+ priv->old_state = state;
+
+ switch (state & ESD_BUSSTATE_MASK) {
+ case ESD_BUSSTATE_BUSOFF:
+ priv->can.state = CAN_STATE_BUS_OFF;
+ cf->can_id |= CAN_ERR_BUSOFF;
+ can_bus_off(priv->netdev);
+ break;
+ case ESD_BUSSTATE_WARN:
+ priv->can.state = CAN_STATE_ERROR_WARNING;
+ priv->can.can_stats.error_warning++;
+ break;
+ case ESD_BUSSTATE_ERRPASSIVE:
+ priv->can.state = CAN_STATE_ERROR_PASSIVE;
+ priv->can.can_stats.error_passive++;
+ break;
+ default:
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
+ break;
+ }
+ } else {
+ priv->can.can_stats.bus_error++;
+ stats->rx_errors++;
+
+ cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+
+ switch (ecc & SJA1000_ECC_MASK) {
+ case SJA1000_ECC_BIT:
+ cf->data[2] |= CAN_ERR_PROT_BIT;
+ break;
+ case SJA1000_ECC_FORM:
+ cf->data[2] |= CAN_ERR_PROT_FORM;
+ break;
+ case SJA1000_ECC_STUFF:
+ cf->data[2] |= CAN_ERR_PROT_STUFF;
+ break;
+ default:
+ cf->data[2] |= CAN_ERR_PROT_UNSPEC;
+ cf->data[3] = ecc & SJA1000_ECC_SEG;
+ break;
+ }
+
+ /* Error occured during transmission? */
+ if (!(ecc & SJA1000_ECC_DIR))
+ cf->data[2] |= CAN_ERR_PROT_TX;
+
+ if (priv->can.state == CAN_STATE_ERROR_WARNING ||
+ priv->can.state == CAN_STATE_ERROR_PASSIVE) {
+ cf->data[1] = (txerr > rxerr) ?
+ CAN_ERR_CRTL_TX_PASSIVE :
+ CAN_ERR_CRTL_RX_PASSIVE;
+ }
+ }
+
+ netif_rx(skb);
+
+ stats->rx_packets++;
+ stats->rx_bytes += cf->can_dlc;
+ }
+}
+
+static void esd_usb2_rx_can_msg(struct esd_usb2_net_priv *priv,
+ struct esd_usb2_msg *msg)
+{
+ struct net_device_stats *stats = &priv->netdev->stats;
+ struct can_frame *cf;
+ struct sk_buff *skb;
+ int i;
+ u32 id;
+
+ if (!netif_device_present(priv->netdev))
+ return;
+
+ id = le32_to_cpu(msg->msg.rx.id);
+
+ if (id & ESD_EVENT) {
+ esd_usb2_rx_event(priv, msg);
+ } else {
+ skb = alloc_can_skb(priv->netdev, &cf);
+ if (skb == NULL) {
+ stats->rx_dropped++;
+ return;
+ }
+
+ cf->can_id = id & ESD_IDMASK;
+ cf->can_dlc = get_can_dlc(msg->msg.rx.dlc);
+
+ if (id & ESD_EXTID)
+ cf->can_id |= CAN_EFF_FLAG;
+
+ if (msg->msg.rx.dlc & ESD_RTR) {
+ cf->can_id |= CAN_RTR_FLAG;
+ } else {
+ for (i = 0; i < cf->can_dlc; i++)
+ cf->data[i] = msg->msg.rx.data[i];
+ }
+
+ netif_rx(skb);
+
+ stats->rx_packets++;
+ stats->rx_bytes += cf->can_dlc;
+ }
+
+ return;
+}
+
+static void esd_usb2_tx_done_msg(struct esd_usb2_net_priv *priv,
+ struct esd_usb2_msg *msg)
+{
+ struct net_device_stats *stats = &priv->netdev->stats;
+ struct net_device *netdev = priv->netdev;
+ struct esd_tx_urb_context *context;
+
+ if (!netif_device_present(netdev))
+ return;
+
+ context = &priv->tx_contexts[msg->msg.txdone.hnd & (MAX_TX_URBS - 1)];
+
+ if (!msg->msg.txdone.status) {
+ stats->tx_packets++;
+ stats->tx_bytes += context->dlc;
+ can_get_echo_skb(netdev, context->echo_index);
+ } else {
+ stats->tx_errors++;
+ can_free_echo_skb(netdev, context->echo_index);
+ }
+
+ /* Release context */
+ context->echo_index = MAX_TX_URBS;
+ atomic_dec(&priv->active_tx_jobs);
+
+ netif_wake_queue(netdev);
+}
+
+static void esd_usb2_read_bulk_callback(struct urb *urb)
+{
+ struct esd_usb2 *dev = urb->context;
+ int retval;
+ int pos = 0;
+ int i;
+
+ switch (urb->status) {
+ case 0: /* success */
+ break;
+
+ case -ENOENT:
+ case -ESHUTDOWN:
+ return;
+
+ default:
+ dev_info(dev->udev->dev.parent,
+ "Rx URB aborted (%d)\n", urb->status);
+ goto resubmit_urb;
+ }
+
+ while (pos < urb->actual_length) {
+ struct esd_usb2_msg *msg;
+
+ msg = (struct esd_usb2_msg *)(urb->transfer_buffer + pos);
+
+ switch (msg->msg.hdr.cmd) {
+ case CMD_CAN_RX:
+ esd_usb2_rx_can_msg(dev->nets[msg->msg.rx.net], msg);
+ break;
+
+ case CMD_CAN_TX:
+ esd_usb2_tx_done_msg(dev->nets[msg->msg.txdone.net],
+ msg);
+ break;
+ }
+
+ pos += msg->msg.hdr.len << 2;
+
+ if (pos > urb->actual_length) {
+ dev_err(dev->udev->dev.parent, "format error\n");
+ break;
+ }
+ }
+
+resubmit_urb:
+ usb_fill_bulk_urb(urb, dev->udev, usb_rcvbulkpipe(dev->udev, 1),
+ urb->transfer_buffer, RX_BUFFER_SIZE,
+ esd_usb2_read_bulk_callback, dev);
+
+ retval = usb_submit_urb(urb, GFP_ATOMIC);
+ if (retval == -ENODEV) {
+ for (i = 0; i < dev->net_count; i++) {
+ if (dev->nets[i])
+ netif_device_detach(dev->nets[i]->netdev);
+ }
+ } else if (retval) {
+ dev_err(dev->udev->dev.parent,
+ "failed resubmitting read bulk urb: %d\n", retval);
+ }
+
+ return;
+}
+
+/*
+ * callback for bulk IN urb
+ */
+static void esd_usb2_write_bulk_callback(struct urb *urb)
+{
+ struct esd_tx_urb_context *context = urb->context;
+ struct esd_usb2_net_priv *priv;
+ struct esd_usb2 *dev;
+ struct net_device *netdev;
+ size_t size = sizeof(struct esd_usb2_msg);
+
+ BUG_ON(!context);
+
+ priv = context->priv;
+ netdev = priv->netdev;
+ dev = priv->usb2;
+
+ /* free up our allocated buffer */
+ usb_buffer_free(urb->dev, size,
+ urb->transfer_buffer, urb->transfer_dma);
+
+ if (!netif_device_present(netdev))
+ return;
+
+ if (urb->status)
+ dev_info(netdev->dev.parent, "Tx URB aborted (%d)\n",
+ urb->status);
+
+ netdev->trans_start = jiffies;
+}
+
+#ifdef CONFIG_SYSFS
+static ssize_t show_firmware(struct device *d,
+ struct device_attribute *attr, char *buf)
+{
+ struct usb_interface *intf = to_usb_interface(d);
+ struct esd_usb2 *dev = usb_get_intfdata(intf);
+
+ return sprintf(buf, "%d.%d.%d\n",
+ (dev->version >> 12) & 0xf,
+ (dev->version >> 8) & 0xf,
+ dev->version & 0xff);
+}
+static DEVICE_ATTR(firmware, S_IRUGO, show_firmware, NULL);
+
+static ssize_t show_hardware(struct device *d,
+ struct device_attribute *attr, char *buf)
+{
+ struct usb_interface *intf = to_usb_interface(d);
+ struct esd_usb2 *dev = usb_get_intfdata(intf);
+
+ return sprintf(buf, "%d.%d.%d\n",
+ (dev->version >> 28) & 0xf,
+ (dev->version >> 24) & 0xf,
+ (dev->version >> 16) & 0xff);
+}
+static DEVICE_ATTR(hardware, S_IRUGO, show_hardware, NULL);
+
+static ssize_t show_nets(struct device *d,
+ struct device_attribute *attr, char *buf)
+{
+ struct usb_interface *intf = to_usb_interface(d);
+ struct esd_usb2 *dev = usb_get_intfdata(intf);
+
+ return sprintf(buf, "%d", dev->net_count);
+}
+static DEVICE_ATTR(nets, S_IRUGO, show_nets, NULL);
+#endif
+
+static int esd_usb2_send_msg(struct esd_usb2 *dev, struct esd_usb2_msg *msg)
+{
+ int actual_length;
+
+ return usb_bulk_msg(dev->udev,
+ usb_sndbulkpipe(dev->udev, 2),
+ msg,
+ msg->msg.hdr.len << 2,
+ &actual_length,
+ 1000);
+}
+
+static int esd_usb2_wait_msg(struct esd_usb2 *dev,
+ struct esd_usb2_msg *msg)
+{
+ int actual_length;
+
+ return usb_bulk_msg(dev->udev,
+ usb_rcvbulkpipe(dev->udev, 1),
+ msg,
+ sizeof(*msg),
+ &actual_length,
+ 1000);
+}
+
+static int esd_usb2_setup_rx_urbs(struct esd_usb2 *dev)
+{
+ int i, err = 0;
+
+ if (dev->rxinitdone)
+ return 0;
+
+ for (i = 0; i < MAX_RX_URBS; i++) {
+ struct urb *urb = NULL;
+ u8 *buf = NULL;
+
+ /* create a URB, and a buffer for it */
+ urb = usb_alloc_urb(0, GFP_KERNEL);
+ if (!urb) {
+ dev_warn(dev->udev->dev.parent,
+ "No memory left for URBs\n");
+ err = -ENOMEM;
+ break;
+ }
+
+ buf = usb_buffer_alloc(dev->udev, RX_BUFFER_SIZE, GFP_KERNEL,
+ &urb->transfer_dma);
+ if (!buf) {
+ dev_warn(dev->udev->dev.parent,
+ "No memory left for USB buffer\n");
+ err = -ENOMEM;
+ goto freeurb;
+ }
+
+ usb_fill_bulk_urb(urb, dev->udev,
+ usb_rcvbulkpipe(dev->udev, 1),
+ buf, RX_BUFFER_SIZE,
+ esd_usb2_read_bulk_callback, dev);
+ urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+ usb_anchor_urb(urb, &dev->rx_submitted);
+
+ err = usb_submit_urb(urb, GFP_KERNEL);
+ if (err) {
+ usb_unanchor_urb(urb);
+ usb_buffer_free(dev->udev, RX_BUFFER_SIZE, buf,
+ urb->transfer_dma);
+ }
+
+freeurb:
+ /* Drop reference, USB core will take care of freeing it */
+ usb_free_urb(urb);
+ if (err)
+ break;
+ }
+
+ /* Did we submit any URBs */
+ if (i == 0) {
+ dev_err(dev->udev->dev.parent, "couldn't setup read URBs\n");
+ return err;
+ }
+
+ /* Warn if we've couldn't transmit all the URBs */
+ if (i < MAX_RX_URBS) {
+ dev_warn(dev->udev->dev.parent,
+ "rx performance may be slow\n");
+ }
+
+ dev->rxinitdone = 1;
+ return 0;
+}
+
+/*
+ * Start interface
+ */
+static int esd_usb2_start(struct esd_usb2_net_priv *priv)
+{
+ struct esd_usb2 *dev = priv->usb2;
+ struct net_device *netdev = priv->netdev;
+ struct esd_usb2_msg msg;
+ int err, i;
+
+ /*
+ * Enable all IDs
+ * The IDADD message takes up to 64 32 bit bitmasks (2048 bits).
+ * Each bit represents one 11 bit CAN identifier. A set bit
+ * enables reception of the corresponding CAN identifier. A cleared
+ * bit disabled this identifier. An additional bitmask value
+ * following the CAN 2.0A bits is used to enable reception of
+ * extended CAN frames. Only the LSB of this final mask is checked
+ * for the complete 29 bit ID range. The IDADD message also allows
+ * filter configuration for an ID subset. In this case you can add
+ * the number of the starting bitmask (0..64) to the filter.option
+ * field followed by only some bitmasks.
+ */
+ msg.msg.hdr.cmd = CMD_IDADD;
+ msg.msg.hdr.len = 2 + ESD_MAX_ID_SEGMENT;
+ msg.msg.filter.net = priv->index;
+ msg.msg.filter.option = ESD_ID_ENABLE; /* start with segment 0 */
+ for (i = 0; i < ESD_MAX_ID_SEGMENT; i++)
+ msg.msg.filter.mask[i] = cpu_to_le32(0xffffffff);
+ /* enable 29bit extended IDs */
+ msg.msg.filter.mask[ESD_MAX_ID_SEGMENT] = cpu_to_le32(0x00000001);
+
+ err = esd_usb2_send_msg(dev, &msg);
+ if (err)
+ goto failed;
+
+ err = esd_usb2_setup_rx_urbs(dev);
+ if (err)
+ goto failed;
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
+
+ return 0;
+
+failed:
+ if (err == -ENODEV)
+ netif_device_detach(netdev);
+
+ dev_err(netdev->dev.parent, "couldn't start device: %d\n", err);
+
+ return err;
+}
+
+static void unlink_all_urbs(struct esd_usb2 *dev)
+{
+ struct esd_usb2_net_priv *priv;
+ int i;
+
+ usb_kill_anchored_urbs(&dev->rx_submitted);
+ for (i = 0; i < dev->net_count; i++) {
+ priv = dev->nets[i];
+ if (priv) {
+ usb_kill_anchored_urbs(&priv->tx_submitted);
+ atomic_set(&priv->active_tx_jobs, 0);
+
+ for (i = 0; i < MAX_TX_URBS; i++)
+ priv->tx_contexts[i].echo_index = MAX_TX_URBS;
+ }
+ }
+}
+
+static int esd_usb2_open(struct net_device *netdev)
+{
+ struct esd_usb2_net_priv *priv = netdev_priv(netdev);
+ int err;
+
+ /* common open */
+ err = open_candev(netdev);
+ if (err)
+ return err;
+
+ /* finally start device */
+ err = esd_usb2_start(priv);
+ if (err) {
+ dev_warn(netdev->dev.parent,
+ "couldn't start device: %d\n", err);
+ close_candev(netdev);
+ return err;
+ }
+
+ priv->open_time = jiffies;
+
+ netif_start_queue(netdev);
+
+ return 0;
+}
+
+static netdev_tx_t esd_usb2_start_xmit(struct sk_buff *skb,
+ struct net_device *netdev)
+{
+ struct esd_usb2_net_priv *priv = netdev_priv(netdev);
+ struct esd_usb2 *dev = priv->usb2;
+ struct esd_tx_urb_context *context = NULL;
+ struct net_device_stats *stats = &netdev->stats;
+ struct can_frame *cf = (struct can_frame *)skb->data;
+ struct esd_usb2_msg *msg;
+ struct urb *urb;
+ u8 *buf;
+ int i, err;
+ int ret = NETDEV_TX_OK;
+ size_t size = sizeof(struct esd_usb2_msg);
+
+ if (can_dropped_invalid_skb(netdev, skb))
+ return NETDEV_TX_OK;
+
+ /* create a URB, and a buffer for it, and copy the data to the URB */
+ urb = usb_alloc_urb(0, GFP_ATOMIC);
+ if (!urb) {
+ dev_err(netdev->dev.parent, "No memory left for URBs\n");
+ stats->tx_dropped++;
+ dev_kfree_skb(skb);
+ goto nourbmem;
+ }
+
+ buf = usb_buffer_alloc(dev->udev, size, GFP_ATOMIC, &urb->transfer_dma);
+ if (!buf) {
+ dev_err(netdev->dev.parent, "No memory left for USB buffer\n");
+ stats->tx_dropped++;
+ dev_kfree_skb(skb);
+ goto nobufmem;
+ }
+
+ msg = (struct esd_usb2_msg *)buf;
+
+ msg->msg.hdr.len = 3; /* minimal length */
+ msg->msg.hdr.cmd = CMD_CAN_TX;
+ msg->msg.tx.net = priv->index;
+ msg->msg.tx.dlc = cf->can_dlc;
+ msg->msg.tx.id = cpu_to_le32(cf->can_id & CAN_ERR_MASK);
+
+ if (cf->can_id & CAN_RTR_FLAG)
+ msg->msg.tx.dlc |= ESD_RTR;
+
+ if (cf->can_id & CAN_EFF_FLAG)
+ msg->msg.tx.id |= cpu_to_le32(ESD_EXTID);
+
+ for (i = 0; i < cf->can_dlc; i++)
+ msg->msg.tx.data[i] = cf->data[i];
+
+ msg->msg.hdr.len += (cf->can_dlc + 3) >> 2;
+
+ for (i = 0; i < MAX_TX_URBS; i++) {
+ if (priv->tx_contexts[i].echo_index == MAX_TX_URBS) {
+ context = &priv->tx_contexts[i];
+ break;
+ }
+ }
+
+ /*
+ * This may never happen.
+ */
+ if (!context) {
+ dev_warn(netdev->dev.parent, "couldn't find free context\n");
+ ret = NETDEV_TX_BUSY;
+ goto releasebuf;
+ }
+
+ context->priv = priv;
+ context->echo_index = i;
+ context->dlc = cf->can_dlc;
+
+ /* hnd must not be 0 */
+ msg->msg.tx.hnd = 0x80000000 | i; /* returned in TX done message */
+
+ usb_fill_bulk_urb(urb, dev->udev, usb_sndbulkpipe(dev->udev, 2), buf,
+ msg->msg.hdr.len << 2,
+ esd_usb2_write_bulk_callback, context);
+
+ urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+
+ usb_anchor_urb(urb, &priv->tx_submitted);
+
+ can_put_echo_skb(skb, netdev, context->echo_index);
+
+ atomic_inc(&priv->active_tx_jobs);
+
+ err = usb_submit_urb(urb, GFP_ATOMIC);
+ if (err) {
+ can_free_echo_skb(netdev, context->echo_index);
+
+ atomic_dec(&priv->active_tx_jobs);
+ usb_unanchor_urb(urb);
+
+ stats->tx_dropped++;
+
+ if (err == -ENODEV)
+ netif_device_detach(netdev);
+ else
+ dev_warn(netdev->dev.parent, "failed tx_urb %d\n", err);
+
+ goto releasebuf;
+ }
+
+ netdev->trans_start = jiffies;
+
+ /* Slow down tx path */
+ if (atomic_read(&priv->active_tx_jobs) >= MAX_TX_URBS)
+ netif_stop_queue(netdev);
+
+ /*
+ * Release our reference to this URB, the USB core will eventually free
+ * it entirely.
+ */
+ usb_free_urb(urb);
+
+ return NETDEV_TX_OK;
+
+releasebuf:
+ usb_buffer_free(dev->udev, size, buf, urb->transfer_dma);
+
+nobufmem:
+ usb_free_urb(urb);
+
+nourbmem:
+ return ret;
+}
+
+static int esd_usb2_close(struct net_device *netdev)
+{
+ struct esd_usb2_net_priv *priv = netdev_priv(netdev);
+ struct esd_usb2_msg msg;
+ int i;
+
+ /* Disable all IDs (see esd_usb2_start()) */
+ msg.msg.hdr.cmd = CMD_IDADD;
+ msg.msg.hdr.len = 2 + ESD_MAX_ID_SEGMENT;
+ msg.msg.filter.net = priv->index;
+ msg.msg.filter.option = ESD_ID_ENABLE; /* start with segment 0 */
+ for (i = 0; i <= ESD_MAX_ID_SEGMENT; i++)
+ msg.msg.filter.mask[i] = 0;
+ esd_usb2_send_msg(priv->usb2, &msg);
+
+ /* set CAN controller to reset mode */
+ msg.msg.hdr.len = 2;
+ msg.msg.hdr.cmd = CMD_SETBAUD;
+ msg.msg.setbaud.net = priv->index;
+ msg.msg.setbaud.rsvd = 0;
+ msg.msg.setbaud.baud = cpu_to_le32(ESD_USB2_NO_BAUDRATE);
+ esd_usb2_send_msg(priv->usb2, &msg);
+
+ priv->can.state = CAN_STATE_STOPPED;
+
+ netif_stop_queue(netdev);
+
+ close_candev(netdev);
+
+ priv->open_time = 0;
+
+ return 0;
+}
+
+static const struct net_device_ops esd_usb2_netdev_ops = {
+ .ndo_open = esd_usb2_open,
+ .ndo_stop = esd_usb2_close,
+ .ndo_start_xmit = esd_usb2_start_xmit,
+};
+
+static struct can_bittiming_const esd_usb2_bittiming_const = {
+ .name = "esd_usb2",
+ .tseg1_min = ESD_USB2_TSEG1_MIN,
+ .tseg1_max = ESD_USB2_TSEG1_MAX,
+ .tseg2_min = ESD_USB2_TSEG2_MIN,
+ .tseg2_max = ESD_USB2_TSEG2_MAX,
+ .sjw_max = ESD_USB2_SJW_MAX,
+ .brp_min = ESD_USB2_BRP_MIN,
+ .brp_max = ESD_USB2_BRP_MAX,
+ .brp_inc = ESD_USB2_BRP_INC,
+};
+
+static int esd_usb2_set_bittiming(struct net_device *netdev)
+{
+ struct esd_usb2_net_priv *priv = netdev_priv(netdev);
+ struct can_bittiming *bt = &priv->can.bittiming;
+ struct esd_usb2_msg msg;
+ u32 canbtr;
+
+ canbtr = ESD_USB2_UBR;
+ canbtr |= (bt->brp - 1) & (ESD_USB2_BRP_MAX - 1);
+ canbtr |= ((bt->sjw - 1) & (ESD_USB2_SJW_MAX - 1))
+ << ESD_USB2_SJW_SHIFT;
+ canbtr |= ((bt->prop_seg + bt->phase_seg1 - 1)
+ & (ESD_USB2_TSEG1_MAX - 1))
+ << ESD_USB2_TSEG1_SHIFT;
+ canbtr |= ((bt->phase_seg2 - 1) & (ESD_USB2_TSEG2_MAX - 1))
+ << ESD_USB2_TSEG2_SHIFT;
+ if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
+ canbtr |= ESD_USB2_3_SAMPLES;
+
+ msg.msg.hdr.len = 2;
+ msg.msg.hdr.cmd = CMD_SETBAUD;
+ msg.msg.setbaud.net = priv->index;
+ msg.msg.setbaud.rsvd = 0;
+ msg.msg.setbaud.baud = cpu_to_le32(canbtr);
+
+ dev_info(netdev->dev.parent, "setting BTR=%#x\n", canbtr);
+
+ return esd_usb2_send_msg(priv->usb2, &msg);
+}
+
+static int esd_usb2_set_mode(struct net_device *netdev, enum can_mode mode)
+{
+ struct esd_usb2_net_priv *priv = netdev_priv(netdev);
+
+ if (!priv->open_time)
+ return -EINVAL;
+
+ switch (mode) {
+ case CAN_MODE_START:
+ netif_wake_queue(netdev);
+ break;
+
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int esd_usb2_probe_one_net(struct usb_interface *intf, int index)
+{
+ struct esd_usb2 *dev = usb_get_intfdata(intf);
+ struct net_device *netdev;
+ struct esd_usb2_net_priv *priv;
+ int err;
+ int i;
+
+ netdev = alloc_candev(sizeof(*priv), MAX_TX_URBS);
+ if (!netdev) {
+ dev_err(&intf->dev, "couldn't alloc candev\n");
+ return -ENOMEM;
+ }
+
+ priv = netdev_priv(netdev);
+
+ init_usb_anchor(&priv->tx_submitted);
+ atomic_set(&priv->active_tx_jobs, 0);
+
+ for (i = 0; i < MAX_TX_URBS; i++)
+ priv->tx_contexts[i].echo_index = MAX_TX_URBS;
+
+ priv->usb2 = dev;
+ priv->netdev = netdev;
+ priv->index = index;
+
+ priv->can.state = CAN_STATE_STOPPED;
+ priv->can.clock.freq = ESD_USB2_CAN_CLOCK;
+ priv->can.bittiming_const = &esd_usb2_bittiming_const;
+ priv->can.do_set_bittiming = esd_usb2_set_bittiming;
+ priv->can.do_set_mode = esd_usb2_set_mode;
+ priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES;
+
+ netdev->flags |= IFF_ECHO; /* we support local echo */
+
+ netdev->netdev_ops = &esd_usb2_netdev_ops;
+
+ SET_NETDEV_DEV(netdev, &intf->dev);
+
+ err = register_candev(netdev);
+ if (err) {
+ dev_err(&intf->dev,
+ "couldn't register CAN device: %d\n", err);
+ free_candev(netdev);
+ return -ENOMEM;
+ }
+
+ dev->nets[index] = priv;
+ dev_info(netdev->dev.parent, "device %s registered\n", netdev->name);
+ return 0;
+}
+
+/*
+ * probe function for new USB2 devices
+ *
+ * check version information and number of available
+ * CAN interfaces
+ */
+static int esd_usb2_probe(struct usb_interface *intf,
+ const struct usb_device_id *id)
+{
+ struct esd_usb2 *dev;
+ struct esd_usb2_msg msg;
+ int i, err = -ENOMEM;
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return -ENOMEM;
+
+ dev->udev = interface_to_usbdev(intf);
+
+ init_usb_anchor(&dev->rx_submitted);
+
+ usb_set_intfdata(intf, dev);
+
+ /* query number of CAN interfaces (nets) */
+ msg.msg.hdr.cmd = CMD_VERSION;
+ msg.msg.hdr.len = 2;
+ msg.msg.version.rsvd = 0;
+ msg.msg.version.flags = 0;
+ msg.msg.version.drv_version = 0;
+
+ if (esd_usb2_send_msg(dev, &msg) < 0) {
+ dev_err(&intf->dev, "sending version message failed\n");
+ goto free_dev;
+ }
+
+ if (esd_usb2_wait_msg(dev, &msg) < 0) {
+ dev_err(&intf->dev, "no version message answer\n");
+ goto free_dev;
+ }
+
+ dev->net_count = (int)msg.msg.version_reply.nets;
+ dev->version = le32_to_cpu(msg.msg.version_reply.version);
+
+#ifdef CONFIG_SYSFS
+ if (device_create_file(&intf->dev, &dev_attr_firmware))
+ dev_err(&intf->dev,
+ "Couldn't create device file for firmware\n");
+
+ if (device_create_file(&intf->dev, &dev_attr_hardware))
+ dev_err(&intf->dev,
+ "Couldn't create device file for hardware\n");
+
+ if (device_create_file(&intf->dev, &dev_attr_nets))
+ dev_err(&intf->dev,
+ "Couldn't create device file for nets\n");
+#endif
+
+ /* do per device probing */
+ for (i = 0; i < dev->net_count; i++)
+ esd_usb2_probe_one_net(intf, i);
+
+ return 0;
+
+free_dev:
+ kfree(dev);
+ return err;
+}
+
+/*
+ * called by the usb core when the device is removed from the system
+ */
+static void esd_usb2_disconnect(struct usb_interface *intf)
+{
+ struct esd_usb2 *dev = usb_get_intfdata(intf);
+ struct net_device *netdev;
+ int i;
+
+#ifdef CONFIG_SYSFS
+ device_remove_file(&intf->dev, &dev_attr_firmware);
+ device_remove_file(&intf->dev, &dev_attr_hardware);
+ device_remove_file(&intf->dev, &dev_attr_nets);
+#endif
+ usb_set_intfdata(intf, NULL);
+
+ if (dev) {
+ for (i = 0; i < dev->net_count; i++) {
+ if (dev->nets[i]) {
+ netdev = dev->nets[i]->netdev;
+ unregister_netdev(netdev);
+ free_candev(netdev);
+ }
+ }
+ unlink_all_urbs(dev);
+ }
+}
+
+/* usb specific object needed to register this driver with the usb subsystem */
+static struct usb_driver esd_usb2_driver = {
+ .name = "esd_usb2",
+ .probe = esd_usb2_probe,
+ .disconnect = esd_usb2_disconnect,
+ .id_table = esd_usb2_table,
+};
+
+static int __init esd_usb2_init(void)
+{
+ int err;
+
+ /* register this driver with the USB subsystem */
+ err = usb_register(&esd_usb2_driver);
+
+ if (err) {
+ err("usb_register failed. Error number %d\n", err);
+ return err;
+ }
+
+ return 0;
+}
+module_init(esd_usb2_init);
+
+static void __exit esd_usb2_exit(void)
+{
+ /* deregister this driver with the USB subsystem */
+ usb_deregister(&esd_usb2_driver);
+}
+module_exit(esd_usb2_exit);
--
1.5.6.3
^ permalink raw reply related
* Re: [PATCH] NIU support for skb->rxhash
From: David Miller @ 2010-04-23 8:14 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <20100422.141922.39169749.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Thu, 22 Apr 2010 14:19:22 -0700 (PDT)
> Also I have some ideas about what we can do if we have
> just the rxhash. It seems we can avoid the type_trans
> overhead on the interrupting cpu.
>
> Things like eth_type_trans() become a netdev operation rather than
> something drivers statically call by hand. ->ndo_type_trans or
> similar.
>
> SKB has a state bit saying whether ->ndo_type_trans has been invoked
> yet on RX.
>
> Drivers pass raw SKBs up into the stack.
>
> We defer the ->ndo_type_trans as far as possible, for RPS when we have
> ->rxhash we can defer this all the way to the destination RPS cpu.
>
> If we lack ->rxhash, the source cpu will need to invoke
> ->ndo_type_trans before it can begin parsing the packet.
I looked into implementing this and it doesn't work. The
problem is GRO want's to look into the packet very early
and we want to batch GRO a set of packets into a big packet
before shooting them over to a remote cpu.
This reminds me that we can start using ->rxhash as a quick
mismatch check in the GRO flow matcher.
^ permalink raw reply
* Re: DDoS attack causing bad effect on conntrack searches
From: David Miller @ 2010-04-23 8:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: hawk, paulmck, kaber, xiaosuo, hawk, netdev, netfilter-devel
In-Reply-To: <1272001478.7895.7545.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 23 Apr 2010 07:44:38 +0200
> Le jeudi 22 avril 2010 à 16:44 -0700, David Miller a écrit :
>> Eric, I wonder if we run into some kind of issue on 32-bit systems
>> because we always lose a bit of the conntrack hash value when we store
>> it into the 'nulls' area?
>>
>> Wouldn't that make the "get_nulls_value(n) != hash" fail?
>> --
>
>
> Well, 'hash' at this time is not the result of the jhash() transform [0
> - 0xFFFFFFFF], but a slot number in htable [0 - (300032-1)].
Aha, I see.
I really can't see what might cause this behavior then.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox