* Re: Question about tg3 and bnx2 driver suppliers
From: Stephen Hemminger @ 2011-02-20 20:41 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Micha Nelissen, Michael Durket, netdev@vger.kernel.org
In-Reply-To: <1298229420.8559.59.camel@edumazet-laptop>
On Sun, 20 Feb 2011 20:17:00 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le dimanche 20 février 2011 à 19:19 +0100, Micha Nelissen a écrit :
> > Eric Dumazet wrote:
> > > One possible cause of packet drops is when softirqs are disabled for too
> > > long periods, even if NIC has a big RX ring (check ethtool -g eth0)
> >
> > Why aren't the softirqs converted to workqueues? Wouldn't that cut
> > dependencies to other softirq users and improve latency?
> >
>
> Because it was done like that in the old days.
>
> Its a bit less important these days, now typical machines have 8+ cpus.
> Each device interrupt can be handled by its own cpu :)
The latency to schedule a work queue is still much higher
than the latency to do a softirq. Last time I played around with it,
things like loopback performance dropped 10% if using work queue.
^ permalink raw reply
* Re: [PATCH] net: fix unreg list corruption in dev_deactivate()
From: David Miller @ 2011-02-20 19:50 UTC (permalink / raw)
To: eric.dumazet; +Cc: stf_xl, netdev, opurdila
In-Reply-To: <1298203890.8559.54.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 20 Feb 2011 13:11:30 +0100
> Hmm, you should read Eric B patch, he already addressed this problem a
> few hours ago.
>
> A full audit _is_ needed.
>
> https://lkml.org/lkml/2011/2/20/4
I'll apply Eric B.'s patch to net-2.6
^ permalink raw reply
* Re: how to listen() on single IP address but very many ports?
From: Eric Dumazet @ 2011-02-20 19:20 UTC (permalink / raw)
To: Chris Friesen; +Cc: netdev
In-Reply-To: <4D5EB2AE.5050703@genband.com>
Le vendredi 18 février 2011 à 11:55 -0600, Chris Friesen a écrit :
> I have an application team that needs to listen() for tcp connections on
> many ports (and by many I mean pretty much all 64K ports). However, the
> connections are short-lived, and the number of active connections at any
> given time is small.
>
> Apparently when they tried this before on an older kernel the
> performance of the naive "open 60K sockets and call listen()" solution
> was not acceptable, so they used NAT with port mapping to direct all the
> incoming packets to a single real port. However, they now want to add
> support for IPv6 and this solution won't work.
>
> What's the recommended method for efficiently listening on this many
> ports? Should I be able to efficiently listen() on that many sockets
> using epoll or similar? If there isn't a way to do this, is there an
> equivalent IPv6 workaround?
>
> One possible solution that came up was to implement a PORT_ANY which
> would match any incoming request that didn't already have an explicit
> listener. Even better would be a way to bind a single listening socket
> to a range of ports.
>
> Has anyone ever considered something like this?
>
I really dont see how listening to 60K sockets can be "not acceptable".
It just runs OK, at exactly same speed than 1 socket, if using epoll.
Only 'problem' could be memory usage, a bit more heavy of course, but
who cares ?
^ permalink raw reply
* Re: Question about tg3 and bnx2 driver suppliers
From: Eric Dumazet @ 2011-02-20 19:17 UTC (permalink / raw)
To: Micha Nelissen; +Cc: Michael Durket, netdev@vger.kernel.org
In-Reply-To: <4D615B2D.5080804@neli.hopto.org>
Le dimanche 20 février 2011 à 19:19 +0100, Micha Nelissen a écrit :
> Eric Dumazet wrote:
> > One possible cause of packet drops is when softirqs are disabled for too
> > long periods, even if NIC has a big RX ring (check ethtool -g eth0)
>
> Why aren't the softirqs converted to workqueues? Wouldn't that cut
> dependencies to other softirq users and improve latency?
>
Because it was done like that in the old days.
Its a bit less important these days, now typical machines have 8+ cpus.
Each device interrupt can be handled by its own cpu :)
^ permalink raw reply
* Re: Advice on network driver design
From: arnd @ 2011-02-20 19:13 UTC (permalink / raw)
To: Felix Radensky; +Cc: netdev@vger.kernel.org
In-Reply-To: <4D5FC7A7.5050704@embedded-sol.com>
On Saturday 19 February 2011 14:37:43 Felix Radensky wrote:
> Hi,
>
> I'm in the process of designing a network driver for a custom
> hardware and would like to get some advice from linux network
> gurus.
>
> The host platform is Freescale P2020. The custom hardware is
> FPGA with several TX FIFOs, single RX FIFO and a set of registers.
> FPGA is connected to CPU via PCI-E. Host CPU DMA controller is used
> to get packets to/from FIFOs. Each FIFO has its set of events,
> generating interrupts, which can be enabled and disabled. Status
> register reflects the current state of events, the bit in status
> register is cleared by FPGA when event is handled. Reads or writes to
> status register have no impact on its contents.
>
> The device driver should support 80Mbit/sec of traffic in each direction.
>
> So far I have TX side working. I'm using Linux dmaengine APIs to
> transfer packets to FIFOs. The DMA completion interrupt is handled
> by per-fifo work queue.
>
> My question is about RX. Would such design benefit from NAPI ?
> If my understanding of NAPI is correct, it runs in softirq context,
> so I cannot do any DMA work in dev->poll(). If I were to use NAPI,
> I should probably disable RX interrupts, do all DMA work in some
> work queue, keep RX packets in a list and only then call dev->poll().
> Is that correct ?
>
> Any other advice and how to write an efficient driver for this
> hardware is most welcome. I can influence FPGA design to some degree,
> so if you think FPGA should be changed to improve things, please let
> me know.
There are currently discussions ongoing about using virtio for this
kind of connection. See http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg49294.html
for an archive.
When you use virtio as the base, you can use the regular virtio-net
driver or any other virtio high-level driver on top.
Arnd
^ permalink raw reply
* Re: [PATCH net-next] sctp: fix compile warnings in sctp_tsnmap_num_gabs
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
To: shanwei; +Cc: vladislav.yasevich, netdev, linux-sctp
In-Reply-To: <4D60C966.2000302@cn.fujitsu.com>
From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:57:26 +0800
> net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
> net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
> net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH] tcp: Remove debug macro of TCP_CHECK_TIMER
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
To: shanwei; +Cc: netdev, kuznet, pekkas, jmorris, kaber
In-Reply-To: <4D60C901.8050307@cn.fujitsu.com>
From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:55:45 +0800
>
> Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
> And, it has been there for several years, maybe 6 years.
>
> Remove it to keep code clearer.
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH]tcp: document tcp_max_ssthresh (Limited Slow-Start)
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
To: shanwei; +Cc: ilpo.jarvinen, netdev, jheffner
In-Reply-To: <4D60C849.40905@cn.fujitsu.com>
From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:52:41 +0800
> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>
> Base on Ilpo's patch about documenting tcp_max_ssthresh.
> (see http://marc.info/?l=linux-netdev&m=117950581307310&w=2)
>
> According to errata of RFC3742, fix the number of segments increased
> during RTT time.
>
> Just to state the occasion to use this parameter, But
> about how to set parameter value, maybe some others can do it.
>
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: Question about tg3 and bnx2 driver suppliers
From: Micha Nelissen @ 2011-02-20 18:19 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michael Durket, netdev@vger.kernel.org
In-Reply-To: <1297952255.2604.115.camel@edumazet-laptop>
Eric Dumazet wrote:
> One possible cause of packet drops is when softirqs are disabled for too
> long periods, even if NIC has a big RX ring (check ethtool -g eth0)
Why aren't the softirqs converted to workqueues? Wouldn't that cut
dependencies to other softirq users and improve latency?
Probably a stupid question, thanks.
Micha
^ permalink raw reply
* Re: Advice on network driver design
From: Micha Nelissen @ 2011-02-20 18:14 UTC (permalink / raw)
To: Felix Radensky; +Cc: netdev@vger.kernel.org
In-Reply-To: <4D5FC7A7.5050704@embedded-sol.com>
Felix Radensky wrote:
> The host platform is Freescale P2020. The custom hardware is
> FPGA with several TX FIFOs, single RX FIFO and a set of registers.
Wasn't it easier to use the in-SOC ethernet controllers?
Micha
^ permalink raw reply
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-20 15:07 UTC (permalink / raw)
To: Nicolas de Pesloüan
Cc: Jay Vosburgh, David Miller, kaber, eric.dumazet, netdev,
shemminger, andy, Fischer, Anna
In-Reply-To: <4D610511.4050902@gmail.com>
Sun, Feb 20, 2011 at 01:12:01PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 20/02/2011 11:36, Jiri Pirko a écrit :
>>Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@gmail.com wrote:
>>>Le 19/02/2011 14:46, Jiri Pirko a écrit :
>>>>Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>>>>>Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>>>>>Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>>>>Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>>>>>This patch converts bonding to use rx_handler. Results in cleaner
>>>>>>>>>__netif_receive_skb() with much less exceptions needed. Also
>>>>>>>>>bond-specific work is moved into bond code.
>>>>>>>>>
>>>>>>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>>>>
>>>>>>>>>v1->v2:
>>>>>>>>> using skb_iif instead of new input_dev to remember original
>>>>>>>>> device
>>>>>>>>>v2->v3:
>>>>>>>>> set orig_dev = skb->dev if skb_iif is set
>>>>>>>>>
>>>>>>>>
>>>>>>>>Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>>>>
>>>>>>>>Bonding used to be handled with very few overhead, simply replacing
>>>>>>>>skb->dev with skb->dev->master. Time has passed and we eventually
>>>>>>>>added many special processing for bonding into __netif_receive_skb(),
>>>>>>>>but the overhead remained very light.
>>>>>>>>
>>>>>>>>Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>>>>
>>>>>>>>Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>>>>>whatever need to be delivered, to whoever need, inside the loop ?
>>>>>>>>
>>>>>>>>rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>>>>>while (rx_handler) {
>>>>>>>> /* ... */
>>>>>>>> orig_dev = skb->dev;
>>>>>>>> skb = rx_handler(skb);
>>>>>>>> /* ... */
>>>>>>>> rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>>>>>}
>>>>>>>>
>>>>>>>>This would reduce the overhead, while still allowing nesting: vlan on
>>>>>>>>top on bonding, bridge on top on bonding, ...
>>>>>>>
>>>>>>>I see your point. Makes sense to me. But the loop would have to include
>>>>>>>at least processing of ptype_all too. I'm going to cook a follow-up
>>>>>>>patch.
>>>>>>>
>>>>>>
>>>>>>DRAFT (doesn't modify rx_handlers):
>>>>>>
>>>>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>index 4ebf7fe..e5dba47 100644
>>>>>>--- a/net/core/dev.c
>>>>>>+++ b/net/core/dev.c
>>>>>>@@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>> {
>>>>>> struct packet_type *ptype, *pt_prev;
>>>>>> rx_handler_func_t *rx_handler;
>>>>>>+ struct net_device *dev;
>>>>>> struct net_device *orig_dev;
>>>>>> struct net_device *null_or_dev;
>>>>>> int ret = NET_RX_DROP;
>>>>>>@@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>> if (netpoll_receive_skb(skb))
>>>>>> return NET_RX_DROP;
>>>>>>
>>>>>>- __this_cpu_inc(softnet_data.processed);
>>>>>>+ skb->skb_iif = skb->dev->ifindex;
>>>>>>+ orig_dev = skb->dev;
>>>>>
>>>>>orig_dev should be set inside the loop, to reflect "previously
>>>>>crossed device", while following the path:
>>>>>
>>>>>eth0 -> bond0 -> br0.
>>>>>
>>>>>First step inside loop:
>>>>>
>>>>>orig_dev = eth0
>>>>>skb->dev = bond0 (at the end of the loop).
>>>>>
>>>>>Second step inside loop:
>>>>>
>>>>>orig_dev = bond0
>>>>>skb->dev = br0 (et the end of the loop).
>>>>>
>>>>>This would allow for exact match delivery to bond0 if someone bind there.
>>>>>
>>>>>>+
>>>>>> skb_reset_network_header(skb);
>>>>>> skb_reset_transport_header(skb);
>>>>>> skb->mac_len = skb->network_header - skb->mac_header;
>>>>>>@@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>>
>>>>>> rcu_read_lock();
>>>>>>
>>>>>>- if (!skb->skb_iif) {
>>>>>>- skb->skb_iif = skb->dev->ifindex;
>>>>>>- orig_dev = skb->dev;
>>>>>>- } else {
>>>>>>- orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>>>>>- }
>>>>>
>>>>>I like the fact that it removes the above part.
>>>>>
>>>>>>+another_round:
>>>>>>+ __this_cpu_inc(softnet_data.processed);
>>>>>>+ dev = skb->dev;
>>>>>>
>>>>>> #ifdef CONFIG_NET_CLS_ACT
>>>>>> if (skb->tc_verd& TC_NCLS) {
>>>>>>@@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>> #endif
>>>>>>
>>>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>>- if (!ptype->dev || ptype->dev == skb->dev) {
>>>>>>+ if (!ptype->dev || ptype->dev == dev) {
>>>>>> if (pt_prev)
>>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>> pt_prev = ptype;
>>>>>
>>>>>Inside the loop, we should only do exact match delivery, for
>>>>>&ptype_all and for&ptype_base[ntohs(type)& PTYPE_HASH_MASK]:
>>>>>
>>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>- if (!ptype->dev || ptype->dev == dev) {
>>>>>+ if (ptype->dev == dev) {
>>>>> if (pt_prev)
>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>> pt_prev = ptype;
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> list_for_each_entry_rcu(ptype,
>>>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>>> if (ptype->type == type&&
>>>>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>>>+ (ptype->dev == skb->dev)) {
>>>>> if (pt_prev)
>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>> pt_prev = ptype;
>>>>> }
>>>>> }
>>>>>
>>>>>After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>>>>
>>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>- if (!ptype->dev || ptype->dev == dev) {
>>>>>+ if (!ptype->dev) {
>>>>> if (pt_prev)
>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>> pt_prev = ptype;
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> list_for_each_entry_rcu(ptype,
>>>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>>>- if (ptype->type == type&&
>>>>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>>>+ if (ptype->type == type&& !ptype->dev) {
>>>>> if (pt_prev)
>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>> pt_prev = ptype;
>>>>> }
>>>>> }
>>>>>
>>>>>This would reduce the number of tests inside the
>>>>>list_for_each_entry_rcu() loops. And because we match only ptype->dev
>>>>>== dev inside the loop and !ptype->dev outside the loop, this should
>>>>>avoid duplicate delivery.
>>>>
>>>>Would you care to put this into patch so I can see the whole picture?
>>>>Thanks.
>>>
>>>Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.
>>>
>>>Only compile tested !!
>>>
>>>I don't know if every pieces are at the right place. I wonder what to
>>>do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all
>>>and ptype_base processing.
>>>
>>>Anyway, the general idea is there.
>>>
>>> Nicolas.
>>>
>>>net/core/dev.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
>>>1 files changed, 60 insertions(+), 10 deletions(-)
>>>
>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>index e5dba47..7e007a9 100644
>>>--- a/net/core/dev.c
>>>+++ b/net/core/dev.c
>>>@@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>> rx_handler_func_t *rx_handler;
>>> struct net_device *dev;
>>> struct net_device *orig_dev;
>>>- struct net_device *null_or_dev;
>>> int ret = NET_RX_DROP;
>>> __be16 type;
>>>
>>>@@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>> if (netpoll_receive_skb(skb))
>>> return NET_RX_DROP;
>>>
>>>- skb->skb_iif = skb->dev->ifindex;
>>>- orig_dev = skb->dev;
>>>-
>>> skb_reset_network_header(skb);
>>> skb_reset_transport_header(skb);
>>> skb->mac_len = skb->network_header - skb->mac_header;
>>>@@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>
>>>another_round:
>>> __this_cpu_inc(softnet_data.processed);
>>>+ skb->skb_iif = skb->dev->ifindex;
>>>+ orig_dev = skb->dev;
>>orig_dev should be set at the end of the loop. Now you are going to have
>>it always the same as dev and skb->dev.
>>
>
>Yes, you are right.
>
>I thinking about all this, I wonder what the protocol handlers expect as the orig_dev value ?
>
>Lest imagine the following configuration: eth0 -> bond0 -> br0.
>
>What does a protocol handler listening on br0 expect for orig_dev ?
>bond0 or eth0 ? Current implementation give eth0, but I think bond0
>should be the right value, for proper nesting.
I agree with you.
>
>>> dev = skb->dev;
>>>
>>>#ifdef CONFIG_NET_CLS_ACT
>>>@@ -3152,8 +3150,13 @@ another_round:
>>> }
>>>#endif
>>>
>>>+ /*
>>>+ * Deliver to ptype_all protocol handlers that match current dev.
>>>+ * This happens before rx_handler is given a chance to change skb->dev.
>>>+ */
>>>+
>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>- if (!ptype->dev || ptype->dev == dev) {
>>>+ if (ptype->dev == dev) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>>@@ -3167,6 +3170,31 @@ another_round:
>>>ncls:
>>>#endif
>>>
>>>+ /*
>>>+ * Deliver to ptype_base protocol handlers that match current dev.
>>>+ * This happens before rx_handler is given a chance to change skb->dev.
>>>+ */
>>>+
>>>+ type = skb->protocol;
>>>+ list_for_each_entry_rcu(ptype,
>>>+ &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>+ if (ptype->type == type&& ptype->dev == skb->dev) {
>>>+ if (pt_prev)
>>>+ ret = deliver_skb(skb, pt_prev, orig_dev);
>>>+ pt_prev = ptype;
>>>+ }
>>>+ }
>>
>>I'm not sure it is ok to deliver ptype_base here. See comment above
>>ptype_head() (I'm not sure I understand that correctly)
>
>Anyway, all this is probably plain wrong: Delivering the skb to
>protocol handlers while still changing the skb is guaranteed to cause
>strange behaviors.
>
>If we want to be able to deliver the skb to different protocol
>handlers and give all of them the right values for dev->skb and
>orig_dev (or previous_dev), we might end up with copying the skb. I
>hate the idea, but currently can't find a cleaner way to do so.
That would be unfortunate :/
>
>We first need to clarify what orig_dev should be, as stated above.
>
>>>+
>>>+ /*
>>>+ * Call rx_handler for current device.
>>>+ * If rx_handler return NULL, skip wilcard protocol handler delivery.
>>>+ * Else, if skb->dev changed, restart the whole delivery process, to
>>>+ * allow for device nesting.
>>>+ *
>>>+ * Warning:
>>>+ * rx_handlers must kfree_skb(skb) if they return NULL.
>>Well this is not true. They can return NULL and call netif_rx as they
>>have before. No changes necessary I believe.
>
>I don't really know. This needs to be double checked, anyway.
>
>>>+ */
>>>+
>>> rx_handler = rcu_dereference(dev->rx_handler);
>>> if (rx_handler) {
>>> if (pt_prev) {
>>>@@ -3176,10 +3204,15 @@ ncls:
>>> skb = rx_handler(skb);
>>> if (!skb)
>>> goto out;
>>>- if (dev != skb->dev)
>>>+ if (skb->dev != dev)
>>> goto another_round;
>>> }
>>>
>>>+ /*
>>>+ * FIXME: The part below should use rx_handler instead of being hard
>>>+ * coded here.
>>I'm not sure it is doable atm. For bridge and bond it should not be a
>>problem, but for macvlan, there is possible to have macvlans and vlans
>>on the same dev. This possibility should persist.
>>/me scratches head on the idea to have multiple rx_handlers although it
>>was his original idea....
>
>I think your original proposal of having several rx_handlers per device was right.
>
>At the time you introduced the rx_handler system, only bridge and
>macvlan used it. Even if using bridge and macvlan on the same base
>device might be useless, this is not true for every possible
>rx_handler configuration.
>
>Now that we want to move bonding and vlan to the rx_handler system,
>it becomes obvious that we need several rx_handlers per device. At
>least, vlan should properly mix with bridge. And who know what would
>be the fifth rx_handler...
>
>>>+ */
>>>+
>>> if (vlan_tx_tag_present(skb)) {
>>> if (pt_prev) {
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>@@ -3192,16 +3225,33 @@ ncls:
>>> goto out;
>>> }
>>>
>>>+ /*
>>>+ * FIXME: Can't this be moved into the rx_handler for bonding,
>>>+ * or into a futur rx_handler for vlan?
>>This hook is something I do not like at all :/ But anyway if should be in vlan
>>part I think.
>
>Yes, and in order for the future rx_handler for vlan to properly
>handle it, it needs to know the device just below it, not the pure
>original device. Hence, my question about the exact meaning of
>orig_dev...
>
> Nicolas.
>
>>>+ */
>>>+
>>> vlan_on_bond_hook(skb);
>>>
>>>- /* deliver only exact match when indicated */
>>>- null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
>>>+ /*
>>>+ * Deliver to wildcard ptype_all protocol handlers.
>>>+ */
>>>+
>>>+ list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>+ if (!ptype->dev) {
>>>+ if (pt_prev)
>>>+ ret = deliver_skb(skb, pt_prev, orig_dev);
>>>+ pt_prev = ptype;
>>>+ }
>>>+ }
>>>+
>>>+ /*
>>>+ * Deliver to wildcard ptype_all protocol handlers.
>>>+ */
>>>
>>> type = skb->protocol;
>>> list_for_each_entry_rcu(ptype,
>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>- if (ptype->type == type&&
>>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>+ if (ptype->type == type&& !ptype->dev) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>>--
>>>1.7.2.3
>>>
>>>
>>>
^ permalink raw reply
* [PATCH kernel 2.6.38-rc5] fmvj18x_cs: add new id
From: Ken Kawasaki @ 2011-02-20 15:07 UTC (permalink / raw)
To: netdev
In-Reply-To: <20110131061616.05b2fa6f.ken_kawasaki@spring.nifty.jp>
fmvj18x_cs:add new id
Toshiba lan&modem multifuction card (model name:IPC5010A)
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
---
--- linux-2.6.38-rc5/drivers/net/pcmcia/fmvj18x_cs.c.orig 2011-02-20 14:04:06.000000000 +0900
+++ linux-2.6.38-rc5/drivers/net/pcmcia/fmvj18x_cs.c 2011-02-20 14:04:21.000000000 +0900
@@ -691,6 +691,7 @@ static struct pcmcia_device_id fmvj18x_i
PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0105, 0x0e0a),
PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0e01),
PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0a05),
+ PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0b05),
PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x1101),
PCMCIA_DEVICE_NULL,
};
--- linux-2.6.38-rc5/drivers/tty/serial/serial_cs.c.orig 2011-02-20 14:05:09.000000000 +0900
+++ linux-2.6.38-rc5/drivers/tty/serial/serial_cs.c 2011-02-20 14:05:28.000000000 +0900
@@ -712,6 +712,7 @@ static struct pcmcia_device_id serial_id
PCMCIA_PFC_DEVICE_PROD_ID12(1, "Xircom", "CreditCard Ethernet+Modem II", 0x2e3ee845, 0xeca401bf),
PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0e01),
PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0a05),
+ PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0b05),
PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x1101),
PCMCIA_MFC_DEVICE_MANF_CARD(0, 0x0104, 0x0070),
PCMCIA_MFC_DEVICE_MANF_CARD(1, 0x0101, 0x0562),
^ permalink raw reply
* Re: [PATCH] connector: Convert char *name to const char *name
From: Evgeniy Polyakov @ 2011-02-20 14:32 UTC (permalink / raw)
To: Joe Perches
Cc: Javier Martinez Canillas, Greg Kroah-Hartman, devel,
K. Y. Srinivasan, netdev
In-Reply-To: <1298159129.7179.32.camel@Joe-Laptop>
Hi Joe.
On Sat, Feb 19, 2011 at 03:45:29PM -0800, Joe Perches (joe@perches.com) wrote:
> Allow more const declarations.
>
> Signed-off-by: Joe Perches <joe@perches.com>
>
> ---
>
> Better to change the declarations and uses as this argument
> is not modified.
Looks good, thank you.
Greg, please push it into your tree.
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
--
Evgeniy Polyakov
^ permalink raw reply
* [PATCH net-2.6] bnx2x: Add a missing bit for PXP parity register of 57712.
From: Vlad Zolotarov @ 2011-02-20 14:27 UTC (permalink / raw)
To: Dave Miller, netdev@vger.kernel.org, Eilon Greenstein
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
drivers/net/bnx2x/bnx2x_init.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bnx2x/bnx2x_init.h b/drivers/net/bnx2x/bnx2x_init.h
index 5a268e9..fa6dbe3 100644
--- a/drivers/net/bnx2x/bnx2x_init.h
+++ b/drivers/net/bnx2x/bnx2x_init.h
@@ -241,7 +241,7 @@ static const struct {
/* Block IGU, MISC, PXP and PXP2 parity errors as long as we don't
* want to handle "system kill" flow at the moment.
*/
- BLOCK_PRTY_INFO(PXP, 0x3ffffff, 0x3ffffff, 0x3ffffff, 0x3ffffff),
+ BLOCK_PRTY_INFO(PXP, 0x7ffffff, 0x3ffffff, 0x3ffffff, 0x7ffffff),
BLOCK_PRTY_INFO_0(PXP2, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff),
BLOCK_PRTY_INFO_1(PXP2, 0x7ff, 0x7f, 0x7f, 0x7ff),
BLOCK_PRTY_INFO(HC, 0x7, 0x7, 0x7, 0),
--
1.7.0.4
^ permalink raw reply related
* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: Chris Metcalf @ 2011-02-20 13:33 UTC (permalink / raw)
To: Cypher Wu
Cc: David Miller, xiyou.wangcong, linux-kernel, eric.dumazet, netdev
In-Reply-To: <AANLkTi=uQTQjrXn4_w-YwKCutpENFdFrxQed5kVKXTDF@mail.gmail.com>
On 2/18/2011 11:07 PM, Cypher Wu wrote:
> On Sat, Feb 19, 2011 at 5:51 AM, Chris Metcalf <cmetcalf@tilera.com> wrote:
>> I heard from one of our support folks that you were asking through that
>> channel, so I asked him to go ahead and give you the spinlock sources
>> directly. I will be spending time next week syncing up our internal tree
>> with the public git repository so you'll see it on LKML at that time.
> I've got your source code, thank you very much.
>
> There is still two more question:
> 1. Why we merge the inlined code and the *_slow into none inlined functions?
Those functions were always borderline in terms of being sensible inlined
functions. In my opinion, adding the SPR writes as well pushed them over
the edge, so I made them just straight function calls instead, for code
density reasons. It also makes the code simpler, which is a plus. And
since I was changing the read_lock versions I changed the write_lock
versions as well for consistency.
> 2. I've seen the use of 'mb()' in unlock operation, but we don't use
> that in the lock operation.
You don't need a memory barrier when acquiring a lock. (Well, some
architectures require a read barrier, but Tile doesn't speculate loads past
control dependencies at the moment.)
> I've released a temporary version with that modification under our
> customer' demand, since they want to do a long time test though this
> weekend. I'll appreciate that if you gave some comment on my
> modifications:
It seems OK functionally, and has the advantage of addressing the deadlock
without changing the module API, so it's appropriate if you're trying to
maintain binary compatibility.
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply
* Re: [PATCH] net: fix unreg list corruption in dev_deactivate()
From: Stanislaw Gruszka @ 2011-02-20 12:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Octavian Purdila, David S. Miller
In-Reply-To: <1298203890.8559.54.camel@edumazet-laptop>
On Sun, Feb 20, 2011 at 01:11:30PM +0100, Eric Dumazet wrote:
> Hmm, you should read Eric B patch, he already addressed this problem a
> few hours ago.
Ajjj, my few hours of debugging wasted :-/
Stanislaw
^ permalink raw reply
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-20 12:12 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jay Vosburgh, David Miller, kaber, eric.dumazet, netdev,
shemminger, andy, Fischer, Anna
In-Reply-To: <20110220103609.GA2750@psychotron.redhat.com>
Le 20/02/2011 11:36, Jiri Pirko a écrit :
> Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@gmail.com wrote:
>> Le 19/02/2011 14:46, Jiri Pirko a écrit :
>>> Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>>>> Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>>>> Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>>>> Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>>> Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>>>> This patch converts bonding to use rx_handler. Results in cleaner
>>>>>>>> __netif_receive_skb() with much less exceptions needed. Also
>>>>>>>> bond-specific work is moved into bond code.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>>>
>>>>>>>> v1->v2:
>>>>>>>> using skb_iif instead of new input_dev to remember original
>>>>>>>> device
>>>>>>>> v2->v3:
>>>>>>>> set orig_dev = skb->dev if skb_iif is set
>>>>>>>>
>>>>>>>
>>>>>>> Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>>>
>>>>>>> Bonding used to be handled with very few overhead, simply replacing
>>>>>>> skb->dev with skb->dev->master. Time has passed and we eventually
>>>>>>> added many special processing for bonding into __netif_receive_skb(),
>>>>>>> but the overhead remained very light.
>>>>>>>
>>>>>>> Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>>>
>>>>>>> Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>>>> whatever need to be delivered, to whoever need, inside the loop ?
>>>>>>>
>>>>>>> rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>>>> while (rx_handler) {
>>>>>>> /* ... */
>>>>>>> orig_dev = skb->dev;
>>>>>>> skb = rx_handler(skb);
>>>>>>> /* ... */
>>>>>>> rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>>>> }
>>>>>>>
>>>>>>> This would reduce the overhead, while still allowing nesting: vlan on
>>>>>>> top on bonding, bridge on top on bonding, ...
>>>>>>
>>>>>> I see your point. Makes sense to me. But the loop would have to include
>>>>>> at least processing of ptype_all too. I'm going to cook a follow-up
>>>>>> patch.
>>>>>>
>>>>>
>>>>> DRAFT (doesn't modify rx_handlers):
>>>>>
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index 4ebf7fe..e5dba47 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>> {
>>>>> struct packet_type *ptype, *pt_prev;
>>>>> rx_handler_func_t *rx_handler;
>>>>> + struct net_device *dev;
>>>>> struct net_device *orig_dev;
>>>>> struct net_device *null_or_dev;
>>>>> int ret = NET_RX_DROP;
>>>>> @@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>> if (netpoll_receive_skb(skb))
>>>>> return NET_RX_DROP;
>>>>>
>>>>> - __this_cpu_inc(softnet_data.processed);
>>>>> + skb->skb_iif = skb->dev->ifindex;
>>>>> + orig_dev = skb->dev;
>>>>
>>>> orig_dev should be set inside the loop, to reflect "previously
>>>> crossed device", while following the path:
>>>>
>>>> eth0 -> bond0 -> br0.
>>>>
>>>> First step inside loop:
>>>>
>>>> orig_dev = eth0
>>>> skb->dev = bond0 (at the end of the loop).
>>>>
>>>> Second step inside loop:
>>>>
>>>> orig_dev = bond0
>>>> skb->dev = br0 (et the end of the loop).
>>>>
>>>> This would allow for exact match delivery to bond0 if someone bind there.
>>>>
>>>>> +
>>>>> skb_reset_network_header(skb);
>>>>> skb_reset_transport_header(skb);
>>>>> skb->mac_len = skb->network_header - skb->mac_header;
>>>>> @@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>
>>>>> rcu_read_lock();
>>>>>
>>>>> - if (!skb->skb_iif) {
>>>>> - skb->skb_iif = skb->dev->ifindex;
>>>>> - orig_dev = skb->dev;
>>>>> - } else {
>>>>> - orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>>>> - }
>>>>
>>>> I like the fact that it removes the above part.
>>>>
>>>>> +another_round:
>>>>> + __this_cpu_inc(softnet_data.processed);
>>>>> + dev = skb->dev;
>>>>>
>>>>> #ifdef CONFIG_NET_CLS_ACT
>>>>> if (skb->tc_verd& TC_NCLS) {
>>>>> @@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>> #endif
>>>>>
>>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>> - if (!ptype->dev || ptype->dev == skb->dev) {
>>>>> + if (!ptype->dev || ptype->dev == dev) {
>>>>> if (pt_prev)
>>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>> pt_prev = ptype;
>>>>
>>>> Inside the loop, we should only do exact match delivery, for
>>>> &ptype_all and for&ptype_base[ntohs(type)& PTYPE_HASH_MASK]:
>>>>
>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>> - if (!ptype->dev || ptype->dev == dev) {
>>>> + if (ptype->dev == dev) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>> }
>>>> }
>>>>
>>>>
>>>> list_for_each_entry_rcu(ptype,
>>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>> if (ptype->type == type&&
>>>> - (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>> + (ptype->dev == skb->dev)) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>> }
>>>> }
>>>>
>>>> After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>>>
>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>> - if (!ptype->dev || ptype->dev == dev) {
>>>> + if (!ptype->dev) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>> }
>>>> }
>>>>
>>>>
>>>> list_for_each_entry_rcu(ptype,
>>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>> - if (ptype->type == type&&
>>>> - (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>> + if (ptype->type == type&& !ptype->dev) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>> }
>>>> }
>>>>
>>>> This would reduce the number of tests inside the
>>>> list_for_each_entry_rcu() loops. And because we match only ptype->dev
>>>> == dev inside the loop and !ptype->dev outside the loop, this should
>>>> avoid duplicate delivery.
>>>
>>> Would you care to put this into patch so I can see the whole picture?
>>> Thanks.
>>
>> Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.
>>
>> Only compile tested !!
>>
>> I don't know if every pieces are at the right place. I wonder what to
>> do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all
>> and ptype_base processing.
>>
>> Anyway, the general idea is there.
>>
>> Nicolas.
>>
>> net/core/dev.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
>> 1 files changed, 60 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index e5dba47..7e007a9 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> rx_handler_func_t *rx_handler;
>> struct net_device *dev;
>> struct net_device *orig_dev;
>> - struct net_device *null_or_dev;
>> int ret = NET_RX_DROP;
>> __be16 type;
>>
>> @@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> if (netpoll_receive_skb(skb))
>> return NET_RX_DROP;
>>
>> - skb->skb_iif = skb->dev->ifindex;
>> - orig_dev = skb->dev;
>> -
>> skb_reset_network_header(skb);
>> skb_reset_transport_header(skb);
>> skb->mac_len = skb->network_header - skb->mac_header;
>> @@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>
>> another_round:
>> __this_cpu_inc(softnet_data.processed);
>> + skb->skb_iif = skb->dev->ifindex;
>> + orig_dev = skb->dev;
> orig_dev should be set at the end of the loop. Now you are going to have
> it always the same as dev and skb->dev.
>
Yes, you are right.
I thinking about all this, I wonder what the protocol handlers expect as the orig_dev value ?
Lest imagine the following configuration: eth0 -> bond0 -> br0.
What does a protocol handler listening on br0 expect for orig_dev ? bond0 or eth0 ? Current
implementation give eth0, but I think bond0 should be the right value, for proper nesting.
>> dev = skb->dev;
>>
>> #ifdef CONFIG_NET_CLS_ACT
>> @@ -3152,8 +3150,13 @@ another_round:
>> }
>> #endif
>>
>> + /*
>> + * Deliver to ptype_all protocol handlers that match current dev.
>> + * This happens before rx_handler is given a chance to change skb->dev.
>> + */
>> +
>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> - if (!ptype->dev || ptype->dev == dev) {
>> + if (ptype->dev == dev) {
>> if (pt_prev)
>> ret = deliver_skb(skb, pt_prev, orig_dev);
>> pt_prev = ptype;
>> @@ -3167,6 +3170,31 @@ another_round:
>> ncls:
>> #endif
>>
>> + /*
>> + * Deliver to ptype_base protocol handlers that match current dev.
>> + * This happens before rx_handler is given a chance to change skb->dev.
>> + */
>> +
>> + type = skb->protocol;
>> + list_for_each_entry_rcu(ptype,
>> + &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>> + if (ptype->type == type&& ptype->dev == skb->dev) {
>> + if (pt_prev)
>> + ret = deliver_skb(skb, pt_prev, orig_dev);
>> + pt_prev = ptype;
>> + }
>> + }
>
> I'm not sure it is ok to deliver ptype_base here. See comment above
> ptype_head() (I'm not sure I understand that correctly)
Anyway, all this is probably plain wrong: Delivering the skb to protocol handlers while still
changing the skb is guaranteed to cause strange behaviors.
If we want to be able to deliver the skb to different protocol handlers and give all of them the
right values for dev->skb and orig_dev (or previous_dev), we might end up with copying the skb. I
hate the idea, but currently can't find a cleaner way to do so.
We first need to clarify what orig_dev should be, as stated above.
>> +
>> + /*
>> + * Call rx_handler for current device.
>> + * If rx_handler return NULL, skip wilcard protocol handler delivery.
>> + * Else, if skb->dev changed, restart the whole delivery process, to
>> + * allow for device nesting.
>> + *
>> + * Warning:
>> + * rx_handlers must kfree_skb(skb) if they return NULL.
> Well this is not true. They can return NULL and call netif_rx as they
> have before. No changes necessary I believe.
I don't really know. This needs to be double checked, anyway.
>> + */
>> +
>> rx_handler = rcu_dereference(dev->rx_handler);
>> if (rx_handler) {
>> if (pt_prev) {
>> @@ -3176,10 +3204,15 @@ ncls:
>> skb = rx_handler(skb);
>> if (!skb)
>> goto out;
>> - if (dev != skb->dev)
>> + if (skb->dev != dev)
>> goto another_round;
>> }
>>
>> + /*
>> + * FIXME: The part below should use rx_handler instead of being hard
>> + * coded here.
> I'm not sure it is doable atm. For bridge and bond it should not be a
> problem, but for macvlan, there is possible to have macvlans and vlans
> on the same dev. This possibility should persist.
> /me scratches head on the idea to have multiple rx_handlers although it
> was his original idea....
I think your original proposal of having several rx_handlers per device was right.
At the time you introduced the rx_handler system, only bridge and macvlan used it. Even if using
bridge and macvlan on the same base device might be useless, this is not true for every possible
rx_handler configuration.
Now that we want to move bonding and vlan to the rx_handler system, it becomes obvious that we need
several rx_handlers per device. At least, vlan should properly mix with bridge. And who know what
would be the fifth rx_handler...
>> + */
>> +
>> if (vlan_tx_tag_present(skb)) {
>> if (pt_prev) {
>> ret = deliver_skb(skb, pt_prev, orig_dev);
>> @@ -3192,16 +3225,33 @@ ncls:
>> goto out;
>> }
>>
>> + /*
>> + * FIXME: Can't this be moved into the rx_handler for bonding,
>> + * or into a futur rx_handler for vlan?
> This hook is something I do not like at all :/ But anyway if should be in vlan
> part I think.
Yes, and in order for the future rx_handler for vlan to properly handle it, it needs to know the
device just below it, not the pure original device. Hence, my question about the exact meaning of
orig_dev...
Nicolas.
>> + */
>> +
>> vlan_on_bond_hook(skb);
>>
>> - /* deliver only exact match when indicated */
>> - null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
>> + /*
>> + * Deliver to wildcard ptype_all protocol handlers.
>> + */
>> +
>> + list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> + if (!ptype->dev) {
>> + if (pt_prev)
>> + ret = deliver_skb(skb, pt_prev, orig_dev);
>> + pt_prev = ptype;
>> + }
>> + }
>> +
>> + /*
>> + * Deliver to wildcard ptype_all protocol handlers.
>> + */
>>
>> type = skb->protocol;
>> list_for_each_entry_rcu(ptype,
>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>> - if (ptype->type == type&&
>> - (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>> + if (ptype->type == type&& !ptype->dev) {
>> if (pt_prev)
>> ret = deliver_skb(skb, pt_prev, orig_dev);
>> pt_prev = ptype;
>> --
>> 1.7.2.3
>>
>>
>>
^ permalink raw reply
* Re: [PATCH] net: fix unreg list corruption in dev_deactivate()
From: Eric Dumazet @ 2011-02-20 12:11 UTC (permalink / raw)
To: Stanislaw Gruszka; +Cc: netdev, Octavian Purdila, David S. Miller
In-Reply-To: <20110220113429.GA27047@localhost.localdomain>
Le dimanche 20 février 2011 à 12:34 +0100, Stanislaw Gruszka a écrit :
> Patch fix issue introduced by 443457242beb6716b43db4d62fe148eab5515505
> "net: factorize sync-rcu call in unregister_netdevice_many". It manifest
> on my system by following warning when removing usb wireless device.
>
> [ 3539.368139] WARNING: at lib/list_debug.c:53 __list_del_entry+0x62/0x71()
> [ 3539.368149] list_del corruption. prev->next should be f035e05c, but was f1ce670c
> [ 3539.368242] Call Trace:
> [ 3539.368254] [<c04393d7>] ? warn_slowpath_common+0x6a/0x7f
> [ 3539.368262] [<c05bd062>] ? __list_del_entry+0x62/0x71
> [ 3539.368269] [<c043945f>] ? warn_slowpath_fmt+0x2b/0x2f
> [ 3539.368276] [<c05bd062>] ? __list_del_entry+0x62/0x71
> [ 3539.368286] [<c06f6d06>] ? unregister_netdevice_queue+0x41/0x6e
> [ 3539.368322] [<fa1ee998>] ? ieee80211_remove_interfaces+0x7b/0x9a [mac80211]
> [ 3539.368348] [<fa1e208a>] ? ieee80211_unregister_hw+0x48/0xf9 [mac80211]
> [ 3539.368363] [<fa223903>] ? rt2x00lib_remove_dev+0x76/0xd1 [rt2x00lib]
> [ 3539.368372] [<fa2770b1>] ? rt2x00usb_disconnect+0x29/0x8c [rt2x00usb]
> [ 3539.368382] [<c069ef8c>] ? usb_unbind_interface+0x48/0xfd
>
> I'm no longer seeing warning with patch applied.
>
> Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
> ---
> I did not try review related code. I think someone who understand it,
> should audit it carefully to exclude similar issues. Adding
> dev->unreg_list to various local list, when device will not gonna be
> destroyed looks really fishy.
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 34dc598..1bc6980 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -839,6 +839,7 @@ void dev_deactivate(struct net_device *dev)
>
> list_add(&dev->unreg_list, &single);
> dev_deactivate_many(&single);
> + list_del(&single);
> }
>
> static void dev_init_scheduler_queue(struct net_device *dev,
Hmm, you should read Eric B patch, he already addressed this problem a
few hours ago.
A full audit _is_ needed.
https://lkml.org/lkml/2011/2/20/4
^ permalink raw reply
* [PATCH] net: fix unreg list corruption in dev_deactivate()
From: Stanislaw Gruszka @ 2011-02-20 11:34 UTC (permalink / raw)
To: netdev; +Cc: Octavian Purdila, Eric Dumazet, David S. Miller
Patch fix issue introduced by 443457242beb6716b43db4d62fe148eab5515505
"net: factorize sync-rcu call in unregister_netdevice_many". It manifest
on my system by following warning when removing usb wireless device.
[ 3539.368139] WARNING: at lib/list_debug.c:53 __list_del_entry+0x62/0x71()
[ 3539.368149] list_del corruption. prev->next should be f035e05c, but was f1ce670c
[ 3539.368242] Call Trace:
[ 3539.368254] [<c04393d7>] ? warn_slowpath_common+0x6a/0x7f
[ 3539.368262] [<c05bd062>] ? __list_del_entry+0x62/0x71
[ 3539.368269] [<c043945f>] ? warn_slowpath_fmt+0x2b/0x2f
[ 3539.368276] [<c05bd062>] ? __list_del_entry+0x62/0x71
[ 3539.368286] [<c06f6d06>] ? unregister_netdevice_queue+0x41/0x6e
[ 3539.368322] [<fa1ee998>] ? ieee80211_remove_interfaces+0x7b/0x9a [mac80211]
[ 3539.368348] [<fa1e208a>] ? ieee80211_unregister_hw+0x48/0xf9 [mac80211]
[ 3539.368363] [<fa223903>] ? rt2x00lib_remove_dev+0x76/0xd1 [rt2x00lib]
[ 3539.368372] [<fa2770b1>] ? rt2x00usb_disconnect+0x29/0x8c [rt2x00usb]
[ 3539.368382] [<c069ef8c>] ? usb_unbind_interface+0x48/0xfd
I'm no longer seeing warning with patch applied.
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
---
I did not try review related code. I think someone who understand it,
should audit it carefully to exclude similar issues. Adding
dev->unreg_list to various local list, when device will not gonna be
destroyed looks really fishy.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 34dc598..1bc6980 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -839,6 +839,7 @@ void dev_deactivate(struct net_device *dev)
list_add(&dev->unreg_list, &single);
dev_deactivate_many(&single);
+ list_del(&single);
}
static void dev_init_scheduler_queue(struct net_device *dev,
^ permalink raw reply related
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-20 10:36 UTC (permalink / raw)
To: Nicolas de Pesloüan
Cc: Jay Vosburgh, David Miller, kaber, eric.dumazet, netdev,
shemminger, andy
In-Reply-To: <4D6027B9.6050108@gmail.com>
Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 19/02/2011 14:46, Jiri Pirko a écrit :
>>Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>>>Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>>>Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>>>Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>>Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>>>This patch converts bonding to use rx_handler. Results in cleaner
>>>>>>>__netif_receive_skb() with much less exceptions needed. Also
>>>>>>>bond-specific work is moved into bond code.
>>>>>>>
>>>>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>>
>>>>>>>v1->v2:
>>>>>>> using skb_iif instead of new input_dev to remember original
>>>>>>> device
>>>>>>>v2->v3:
>>>>>>> set orig_dev = skb->dev if skb_iif is set
>>>>>>>
>>>>>>
>>>>>>Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>>
>>>>>>Bonding used to be handled with very few overhead, simply replacing
>>>>>>skb->dev with skb->dev->master. Time has passed and we eventually
>>>>>>added many special processing for bonding into __netif_receive_skb(),
>>>>>>but the overhead remained very light.
>>>>>>
>>>>>>Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>>
>>>>>>Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>>>whatever need to be delivered, to whoever need, inside the loop ?
>>>>>>
>>>>>>rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>>>while (rx_handler) {
>>>>>> /* ... */
>>>>>> orig_dev = skb->dev;
>>>>>> skb = rx_handler(skb);
>>>>>> /* ... */
>>>>>> rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>>>}
>>>>>>
>>>>>>This would reduce the overhead, while still allowing nesting: vlan on
>>>>>>top on bonding, bridge on top on bonding, ...
>>>>>
>>>>>I see your point. Makes sense to me. But the loop would have to include
>>>>>at least processing of ptype_all too. I'm going to cook a follow-up
>>>>>patch.
>>>>>
>>>>
>>>>DRAFT (doesn't modify rx_handlers):
>>>>
>>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>>index 4ebf7fe..e5dba47 100644
>>>>--- a/net/core/dev.c
>>>>+++ b/net/core/dev.c
>>>>@@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>> {
>>>> struct packet_type *ptype, *pt_prev;
>>>> rx_handler_func_t *rx_handler;
>>>>+ struct net_device *dev;
>>>> struct net_device *orig_dev;
>>>> struct net_device *null_or_dev;
>>>> int ret = NET_RX_DROP;
>>>>@@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>> if (netpoll_receive_skb(skb))
>>>> return NET_RX_DROP;
>>>>
>>>>- __this_cpu_inc(softnet_data.processed);
>>>>+ skb->skb_iif = skb->dev->ifindex;
>>>>+ orig_dev = skb->dev;
>>>
>>>orig_dev should be set inside the loop, to reflect "previously
>>>crossed device", while following the path:
>>>
>>>eth0 -> bond0 -> br0.
>>>
>>>First step inside loop:
>>>
>>>orig_dev = eth0
>>>skb->dev = bond0 (at the end of the loop).
>>>
>>>Second step inside loop:
>>>
>>>orig_dev = bond0
>>>skb->dev = br0 (et the end of the loop).
>>>
>>>This would allow for exact match delivery to bond0 if someone bind there.
>>>
>>>>+
>>>> skb_reset_network_header(skb);
>>>> skb_reset_transport_header(skb);
>>>> skb->mac_len = skb->network_header - skb->mac_header;
>>>>@@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>
>>>> rcu_read_lock();
>>>>
>>>>- if (!skb->skb_iif) {
>>>>- skb->skb_iif = skb->dev->ifindex;
>>>>- orig_dev = skb->dev;
>>>>- } else {
>>>>- orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>>>- }
>>>
>>>I like the fact that it removes the above part.
>>>
>>>>+another_round:
>>>>+ __this_cpu_inc(softnet_data.processed);
>>>>+ dev = skb->dev;
>>>>
>>>> #ifdef CONFIG_NET_CLS_ACT
>>>> if (skb->tc_verd& TC_NCLS) {
>>>>@@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>> #endif
>>>>
>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>- if (!ptype->dev || ptype->dev == skb->dev) {
>>>>+ if (!ptype->dev || ptype->dev == dev) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>
>>>Inside the loop, we should only do exact match delivery, for
>>>&ptype_all and for&ptype_base[ntohs(type)& PTYPE_HASH_MASK]:
>>>
>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>- if (!ptype->dev || ptype->dev == dev) {
>>>+ if (ptype->dev == dev) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>> }
>>> }
>>>
>>>
>>> list_for_each_entry_rcu(ptype,
>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>> if (ptype->type == type&&
>>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>+ (ptype->dev == skb->dev)) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>> }
>>> }
>>>
>>>After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>>
>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>- if (!ptype->dev || ptype->dev == dev) {
>>>+ if (!ptype->dev) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>> }
>>> }
>>>
>>>
>>> list_for_each_entry_rcu(ptype,
>>> &ptype_base[ntohs(type)& PTYPE_HASH_MASK], list) {
>>>- if (ptype->type == type&&
>>>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>+ if (ptype->type == type&& !ptype->dev) {
>>> if (pt_prev)
>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>> pt_prev = ptype;
>>> }
>>> }
>>>
>>>This would reduce the number of tests inside the
>>>list_for_each_entry_rcu() loops. And because we match only ptype->dev
>>>== dev inside the loop and !ptype->dev outside the loop, this should
>>>avoid duplicate delivery.
>>
>>Would you care to put this into patch so I can see the whole picture?
>>Thanks.
>
>Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.
>
>Only compile tested !!
>
>I don't know if every pieces are at the right place. I wonder what to
>do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all
>and ptype_base processing.
>
>Anyway, the general idea is there.
>
> Nicolas.
>
> net/core/dev.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 files changed, 60 insertions(+), 10 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index e5dba47..7e007a9 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
> rx_handler_func_t *rx_handler;
> struct net_device *dev;
> struct net_device *orig_dev;
>- struct net_device *null_or_dev;
> int ret = NET_RX_DROP;
> __be16 type;
>
>@@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
> if (netpoll_receive_skb(skb))
> return NET_RX_DROP;
>
>- skb->skb_iif = skb->dev->ifindex;
>- orig_dev = skb->dev;
>-
> skb_reset_network_header(skb);
> skb_reset_transport_header(skb);
> skb->mac_len = skb->network_header - skb->mac_header;
>@@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>
> another_round:
> __this_cpu_inc(softnet_data.processed);
>+ skb->skb_iif = skb->dev->ifindex;
>+ orig_dev = skb->dev;
orig_dev should be set at the end of the loop. Now you are going to have
it always the same as dev and skb->dev.
> dev = skb->dev;
>
> #ifdef CONFIG_NET_CLS_ACT
>@@ -3152,8 +3150,13 @@ another_round:
> }
> #endif
>
>+ /*
>+ * Deliver to ptype_all protocol handlers that match current dev.
>+ * This happens before rx_handler is given a chance to change skb->dev.
>+ */
>+
> list_for_each_entry_rcu(ptype, &ptype_all, list) {
>- if (!ptype->dev || ptype->dev == dev) {
>+ if (ptype->dev == dev) {
> if (pt_prev)
> ret = deliver_skb(skb, pt_prev, orig_dev);
> pt_prev = ptype;
>@@ -3167,6 +3170,31 @@ another_round:
> ncls:
> #endif
>
>+ /*
>+ * Deliver to ptype_base protocol handlers that match current dev.
>+ * This happens before rx_handler is given a chance to change skb->dev.
>+ */
>+
>+ type = skb->protocol;
>+ list_for_each_entry_rcu(ptype,
>+ &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
>+ if (ptype->type == type && ptype->dev == skb->dev) {
>+ if (pt_prev)
>+ ret = deliver_skb(skb, pt_prev, orig_dev);
>+ pt_prev = ptype;
>+ }
>+ }
I'm not sure it is ok to deliver ptype_base here. See comment above
ptype_head() (I'm not sure I understand that correctly)
>+
>+ /*
>+ * Call rx_handler for current device.
>+ * If rx_handler return NULL, skip wilcard protocol handler delivery.
>+ * Else, if skb->dev changed, restart the whole delivery process, to
>+ * allow for device nesting.
>+ *
>+ * Warning:
>+ * rx_handlers must kfree_skb(skb) if they return NULL.
Well this is not true. They can return NULL and call netif_rx as they
have before. No changes necessary I believe.
>+ */
>+
> rx_handler = rcu_dereference(dev->rx_handler);
> if (rx_handler) {
> if (pt_prev) {
>@@ -3176,10 +3204,15 @@ ncls:
> skb = rx_handler(skb);
> if (!skb)
> goto out;
>- if (dev != skb->dev)
>+ if (skb->dev != dev)
> goto another_round;
> }
>
>+ /*
>+ * FIXME: The part below should use rx_handler instead of being hard
>+ * coded here.
I'm not sure it is doable atm. For bridge and bond it should not be a
problem, but for macvlan, there is possible to have macvlans and vlans
on the same dev. This possibility should persist.
/me scratches head on the idea to have multiple rx_handlers although it
was his original idea....
>+ */
>+
> if (vlan_tx_tag_present(skb)) {
> if (pt_prev) {
> ret = deliver_skb(skb, pt_prev, orig_dev);
>@@ -3192,16 +3225,33 @@ ncls:
> goto out;
> }
>
>+ /*
>+ * FIXME: Can't this be moved into the rx_handler for bonding,
>+ * or into a futur rx_handler for vlan?
This hook is something I do not like at all :/ But anyway if should be in vlan
part I think.
>+ */
>+
> vlan_on_bond_hook(skb);
>
>- /* deliver only exact match when indicated */
>- null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
>+ /*
>+ * Deliver to wildcard ptype_all protocol handlers.
>+ */
>+
>+ list_for_each_entry_rcu(ptype, &ptype_all, list) {
>+ if (!ptype->dev) {
>+ if (pt_prev)
>+ ret = deliver_skb(skb, pt_prev, orig_dev);
>+ pt_prev = ptype;
>+ }
>+ }
>+
>+ /*
>+ * Deliver to wildcard ptype_all protocol handlers.
>+ */
>
> type = skb->protocol;
> list_for_each_entry_rcu(ptype,
> &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
>- if (ptype->type == type &&
>- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>+ if (ptype->type == type && !ptype->dev) {
> if (pt_prev)
> ret = deliver_skb(skb, pt_prev, orig_dev);
> pt_prev = ptype;
>--
>1.7.2.3
>
>
>
^ permalink raw reply
* [PATCH net-next] sctp: fix compile warnings in sctp_tsnmap_num_gabs
From: Shan Wei @ 2011-02-20 7:57 UTC (permalink / raw)
To: Vlad Yasevich, David Miller, Network-Maillist, SCTP-Maillist
net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
net/sctp/tsnmap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/sctp/tsnmap.c b/net/sctp/tsnmap.c
index 747d541..f1e40ce 100644
--- a/net/sctp/tsnmap.c
+++ b/net/sctp/tsnmap.c
@@ -344,7 +344,7 @@ __u16 sctp_tsnmap_num_gabs(struct sctp_tsnmap *map,
/* Refresh the gap ack information. */
if (sctp_tsnmap_has_gap(map)) {
- __u16 start, end;
+ __u16 start = 0, end = 0;
sctp_tsnmap_iter_init(map, &iter);
while (sctp_tsnmap_next_gap_ack(map, &iter,
&start,
--
1.6.3.3
^ permalink raw reply related
* [PATCH] tcp: Remove debug macro of TCP_CHECK_TIMER
From: Shan Wei @ 2011-02-20 7:55 UTC (permalink / raw)
To: David Miller, Network-Maillist, kuznet, pekkas, jmorris,
Patrick McHardy
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.
Remove it to keep code clearer.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
include/net/tcp.h | 2 --
net/ipv4/tcp.c | 9 ---------
net/ipv4/tcp_ipv4.c | 5 -----
net/ipv4/tcp_timer.c | 3 ---
net/ipv6/tcp_ipv6.c | 4 ----
5 files changed, 0 insertions(+), 23 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index adfe6db..cda30ea 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1068,8 +1068,6 @@ static inline int tcp_paws_reject(const struct tcp_options_received *rx_opt,
return 1;
}
-#define TCP_CHECK_TIMER(sk) do { } while (0)
-
static inline void tcp_mib_init(struct net *net)
{
/* See RFC 2012 */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f9867d2..a17a5a7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -873,9 +873,7 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
flags);
lock_sock(sk);
- TCP_CHECK_TIMER(sk);
res = do_tcp_sendpages(sk, &page, offset, size, flags);
- TCP_CHECK_TIMER(sk);
release_sock(sk);
return res;
}
@@ -916,7 +914,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
long timeo;
lock_sock(sk);
- TCP_CHECK_TIMER(sk);
flags = msg->msg_flags;
timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
@@ -1104,7 +1101,6 @@ wait_for_memory:
out:
if (copied)
tcp_push(sk, flags, mss_now, tp->nonagle);
- TCP_CHECK_TIMER(sk);
release_sock(sk);
return copied;
@@ -1123,7 +1119,6 @@ do_error:
goto out;
out_err:
err = sk_stream_error(sk, flags, err);
- TCP_CHECK_TIMER(sk);
release_sock(sk);
return err;
}
@@ -1415,8 +1410,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
lock_sock(sk);
- TCP_CHECK_TIMER(sk);
-
err = -ENOTCONN;
if (sk->sk_state == TCP_LISTEN)
goto out;
@@ -1767,12 +1760,10 @@ skip_copy:
/* Clean up data we have read: This will do ACK frames. */
tcp_cleanup_rbuf(sk, copied);
- TCP_CHECK_TIMER(sk);
release_sock(sk);
return copied;
out:
- TCP_CHECK_TIMER(sk);
release_sock(sk);
return err;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e2b9be2..ef5a90b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1556,12 +1556,10 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
sock_rps_save_rxhash(sk, skb->rxhash);
- TCP_CHECK_TIMER(sk);
if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) {
rsk = sk;
goto reset;
}
- TCP_CHECK_TIMER(sk);
return 0;
}
@@ -1583,13 +1581,10 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
} else
sock_rps_save_rxhash(sk, skb->rxhash);
-
- TCP_CHECK_TIMER(sk);
if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) {
rsk = sk;
goto reset;
}
- TCP_CHECK_TIMER(sk);
return 0;
reset:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 74a6aa0..ecd44b0 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -259,7 +259,6 @@ static void tcp_delack_timer(unsigned long data)
tcp_send_ack(sk);
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKS);
}
- TCP_CHECK_TIMER(sk);
out:
if (tcp_memory_pressure)
@@ -481,7 +480,6 @@ static void tcp_write_timer(unsigned long data)
tcp_probe_timer(sk);
break;
}
- TCP_CHECK_TIMER(sk);
out:
sk_mem_reclaim(sk);
@@ -589,7 +587,6 @@ static void tcp_keepalive_timer (unsigned long data)
elapsed = keepalive_time_when(tp) - elapsed;
}
- TCP_CHECK_TIMER(sk);
sk_mem_reclaim(sk);
resched:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d6954e3..1d0ab55 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1636,10 +1636,8 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
opt_skb = skb_clone(skb, GFP_ATOMIC);
if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
- TCP_CHECK_TIMER(sk);
if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len))
goto reset;
- TCP_CHECK_TIMER(sk);
if (opt_skb)
goto ipv6_pktoptions;
return 0;
@@ -1667,10 +1665,8 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
}
}
- TCP_CHECK_TIMER(sk);
if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len))
goto reset;
- TCP_CHECK_TIMER(sk);
if (opt_skb)
goto ipv6_pktoptions;
return 0;
--
1.6.3.3
^ permalink raw reply related
* [PATCH]tcp: document tcp_max_ssthresh (Limited Slow-Start)
From: Shan Wei @ 2011-02-20 7:52 UTC (permalink / raw)
To: Ilpo Järvinen, David Miller, Network-Maillist, jheffner
From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Base on Ilpo's patch about documenting tcp_max_ssthresh.
(see http://marc.info/?l=linux-netdev&m=117950581307310&w=2)
According to errata of RFC3742, fix the number of segments increased
during RTT time.
Just to state the occasion to use this parameter, But
about how to set parameter value, maybe some others can do it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
Documentation/networking/ip-sysctl.txt | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ac3b4a7..ea78aac 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -280,6 +280,17 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.
+tcp_max_ssthresh - INTEGER
+ Limited Slow-Start for TCP with large congestion windows (cwnd) defined in
+ RFC3742. Limited slow-start is a mechanism to limit growth of the cwnd
+ on the region where cwnd is larger than tcp_max_ssthresh. TCP increases cwnd
+ by at most tcp_max_ssthresh segments, and by at least tcp_max_ssthresh/2
+ segments per RTT when the cwnd is above tcp_max_ssthresh.
+ If TCP connection increased cwnd to thousands (or tens of thousands) segments,
+ and thousands of packets were being dropped during slow-start, you can set
+ tcp_max_ssthresh to improve performance for new TCP connection.
+ Default: 0 (off)
+
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests, which are
still did not receive an acknowledgment from connecting client.
--
1.6.3.3
^ permalink raw reply related
* THANKS AND TREAT AS URGENT PLEASE
From: Mr Dubeam Solion @ 2011-02-20 4:47 UTC (permalink / raw)
Dear Friend,
Good day to you, I know that my message will come to you as a surprise, but never mind, I am Mr.Solion DUBEAM, the credit officer in BOA Bank of Africa here in my country Burkina-Faso west Africa, In my department here in the bank I discover an abandon sum of $12.5Million United State Dollars, that belong to one of our biggest customer here in this bank, who died years ago in a plane crash with his family, I contacted you so that you will help me see that the total sum of $12.5Million will be transfer into your account in your country.
And after the successful transfer i will come over to your country to meet you, and we shall share 50% for you while 50% will be for me, if you agree to help me, I will like you to get back to me immediately
You should call me as soon as you reply my mail so that I can check my mailing box and give you the details about this deal +226 78708420
Thanks and have a nice day.
Mr.Solion DUBEAM
tele: +226 78708420
^ permalink raw reply
* Re: [PATCH 0/2] netfilter: netfilter fixes for 2.6.38
From: David Miller @ 2011-02-20 3:01 UTC (permalink / raw)
To: kaber; +Cc: netfilter-devel, netdev
In-Reply-To: <1298130065-14205-1-git-send-email-kaber@trash.net>
From: kaber@trash.net
Date: Sat, 19 Feb 2011 16:41:03 +0100
> the following patches for two netfilter bugs:
>
> - an oops in nfnetlink_log in combination with TPROXY when a socket
> in TIME-WAIT state is assigned to skb->sk, patch from Florian Westphal
>
> - incorrect printing of the MAC header in the ip6t_LOG target,
> from Joerg Marx
>
> Please apply or pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.git master
Pulled, thanks Patrick.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox