* Re: [PATCH 1/5] ucc_geth: Reduce IRQ off in xmit path
From: Joakim Tjernlund @ 2012-09-19 7:36 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
In-Reply-To: <20120918223938.GA22868@electric-eye.fr.zoreil.com>
Francois Romieu <romieu@fr.zoreil.com> wrote on 2012/09/19 00:39:38:
>
> Joakim Tjernlund <Joakim.Tjernlund@transmode.se> :
> > Currently ucc_geth_start_xmit wraps IRQ off for the
> > whole body just to be safe.
> > Reduce the IRQ off period to a minimum.
>
> The driver does not do much work in its irq handler. You may as well
> convert it to the usual tg3-ish locking style (i.e. almost no locking).
You mean broadcom/tg3.c? It is a bit much to look at ATM for me and
there almost no locking with my patch also. Could possibly
be improved further but I am happy for now.
Jocke
^ permalink raw reply
* RE: [PATCH] netxen: check for root bus in netxen_mask_aer_correctable
From: Rajesh Borundia @ 2012-09-19 7:24 UTC (permalink / raw)
To: David Miller; +Cc: nikolay@redhat.com, Sony Chacko, netdev
In-Reply-To: <20120918.162321.1283398796161136088.davem@davemloft.net>
________________________________________
From: David Miller [davem@davemloft.net]
Sent: Wednesday, September 19, 2012 1:53 AM
To: Rajesh Borundia
Cc: nikolay@redhat.com; Sony Chacko; netdev
Subject: Re: [PATCH] netxen: check for root bus in netxen_mask_aer_correctable
No, this is not the correct way to submit patches written by other
people.
Look at how people like Jeff Kirsher submits Intel driver patches
written by people other than himself.
Apologies, will follow the guideline.
^ permalink raw reply
* [PATCH net-next] net: more accurate network taps in transmit path
From: Eric Dumazet @ 2012-09-19 6:44 UTC (permalink / raw)
To: Jamie Gloudon; +Cc: netdev
In-Reply-To: <1348034050.26523.325.camel@edumazet-glaptop>
From: Eric Dumazet <edumazet@google.com>
dev_queue_xmit_nit() should be called right before ndo_start_xmit()
calls or we might give wrong packet contents to taps users :
Packet checksum can be changed, or packet can be linearized or
segmented, and segments partially sent for the later case.
Also a memory allocation can fail and packet never really hit the
driver entry point.
Reported-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index dcc673d..52cd1d7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2213,9 +2213,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
skb_dst_drop(skb);
- if (!list_empty(&ptype_all))
- dev_queue_xmit_nit(skb, dev);
-
features = netif_skb_features(skb);
if (vlan_tx_tag_present(skb) &&
@@ -2250,6 +2247,9 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
}
}
+ if (!list_empty(&ptype_all))
+ dev_queue_xmit_nit(skb, dev);
+
skb_len = skb->len;
rc = ops->ndo_start_xmit(skb, dev);
trace_net_dev_xmit(skb, rc, dev, skb_len);
@@ -2272,6 +2272,9 @@ gso:
if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
skb_dst_drop(nskb);
+ if (!list_empty(&ptype_all))
+ dev_queue_xmit_nit(nskb, dev);
+
skb_len = nskb->len;
rc = ops->ndo_start_xmit(nskb, dev);
trace_net_dev_xmit(nskb, rc, dev, skb_len);
^ permalink raw reply related
* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-09-19 6:19 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1348034635.2006.52.camel@jtkirshe-mobl>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 18 Sep 2012 23:03:55 -0700
> Do not pull, it appears there will be changes to patch 04 of the series,
> I will be sending a v2 of the series once John gets patch 04 fixed up.
Ok.
^ permalink raw reply
* Re: [PATCHv4] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-19 6:12 UTC (permalink / raw)
To: Rusty Russell
Cc: kvm, netdev, rick.jones2, virtualization, levinsasha928, pbonzini,
Tom Herbert
In-Reply-To: <87wqzqpz7p.fsf@rustcorp.com.au>
On Wed, Sep 19, 2012 at 11:10:10AM +0930, Rusty Russell wrote:
> Tom Herbert <therbert@google.com> writes:
> > On Tue, Sep 11, 2012 at 10:49 PM, Rusty Russell <rusty@rustcorp.com.au>wrote:
> >> Perhaps Tom can explain how we avoid out-of-order receive for the
> >> accelerated RFS case? It's not clear to me, but we need to be able to
> >> do that for virtio-net if it implements accelerated RFS.
> >
> > AFAIK ooo RX is possible with accelerated RFS. We have an algorithm that
> > prevents this for RFS case by deferring a migration to a new queue as long
> > as it's possible that a flow might have outstanding packets on the old
> > queue. I suppose this could be implemented in the device for the HW
> > queues, but I don't think it would be easy to cover all cases where packets
> > were already in transit to the host or other cases where host and device
> > queues are out of sync.
>
> Having gone to such great lengths to avoid ooo for RFS, I don't think
> DaveM would be happy if we allow it for virtio_net.
>
> So, how *would* we implement such a thing for a "hardware" device? What
> if the device will only change the receive queue if the old receive
> queue is empty?
>
> Cheers,
> Rusty.
>
I think that would do it in most cases. Or if we want to be more
exact we could delay switching a specific flow until no
outstanding rx packets for this flow. Not sure it's worth the
hassle.
--
MST
^ permalink raw reply
* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-19 6:03 UTC (permalink / raw)
To: davem; +Cc: netdev, gospo, sassmann
In-Reply-To: <1348029108-26659-1-git-send-email-jeffrey.t.kirsher@intel.com>
[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]
On Tue, 2012-09-18 at 21:31 -0700, Jeff Kirsher wrote:
> This series contains updates to igb and ixgbevf.
>
> The following are changes since commit adccff34de1ef81564b7e6c436f762e7a1caf807:
> net/tipc/name_table.c: Remove unecessary semicolon
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>
> Akeem G. Abodunrin (1):
> igb: Support to enable EEE on all eee_supported devices
>
> Alexander Duyck (2):
> igb: Remove artificial restriction on RQDPC stat reading
> ixgbevf: Add support for VF API negotiation
>
> John Fastabend (1):
> ixgbevf: scheduling while atomic in reset hw path
>
> drivers/net/ethernet/intel/igb/e1000_82575.c | 17 +++++++---
> drivers/net/ethernet/intel/igb/e1000_defines.h | 3 +-
> drivers/net/ethernet/intel/igb/e1000_regs.h | 1 +
> drivers/net/ethernet/intel/igb/igb_main.c | 8 +++--
> drivers/net/ethernet/intel/ixgbevf/defines.h | 1 +
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++++++++
> drivers/net/ethernet/intel/ixgbevf/mbx.h | 21 ++++++++++--
> drivers/net/ethernet/intel/ixgbevf/vf.c | 39 ++++++++++++++++++++++-
> drivers/net/ethernet/intel/ixgbevf/vf.h | 3 ++
> 9 files changed, 105 insertions(+), 11 deletions(-)
>
Dave,
Do not pull, it appears there will be changes to patch 04 of the series,
I will be sending a v2 of the series once John gets patch 04 fixed up.
Cheers,
Jeff
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: BUG: TCPDUMP invalid cksum persists after disabling TCP cksum offload
From: Eric Dumazet @ 2012-09-19 5:54 UTC (permalink / raw)
To: Jamie Gloudon; +Cc: netdev
In-Reply-To: <20120918211423.GA19115@darkstar>
On Tue, 2012-09-18 at 17:14 -0400, Jamie Gloudon wrote:
> Hello,
> I am seeing that tx checksum offload appears to be still running after disabling the feature with ethtool. I'm using kernel 3.6.0-rc6 and the latest ethtool from the git repo.
>
> The default settings on my e1000e NIC:
> # ethtool -k eth1 | grep ': on'
> rx-checksumming: on
> tx-checksumming: on
> tx-checksum-ip-generic: on
> scatter-gather: on
> tx-scatter-gather: on
> tcp-segmentation-offload: on
> tx-tcp-segmentation: on
> tx-tcp6-segmentation: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> rx-vlan-offload: on
> tx-vlan-offload: on
> receive-hashing: on
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> tx-nocache-copy: on
>
> The results after disabling tcp cksum offload feature:
> # ethtool -K eth1 tx off
> Actual changes:
> tx-checksumming: off
> tx-checksum-ip-generic: off
> scatter-gather: off
> tx-scatter-gather: off [requested on]
> tcp-segmentation-offload: off
> tx-tcp-segmentation: off [requested on]
> tx-tcp6-segmentation: off [requested on]
> generic-segmentation-offload: off [requested on]
>
> However, in tcpdump, I'm still observing incorrect tcp checksum:
> 14:44:38.838711 IP (tos 0x10, ttl 64, id 45798, offset 0, flags [DF], proto TCP
> (6), length 60)
> 1.1.1.2.59748 > 1.1.1.1.23: Flags [S], cksum 0x0433 (incorrect -> 0x4137), seq 318222122, win 14600, options [mss 1460,sackOK,TS val 5447116 ecr 0,nop,wscale 7], length 0
>
> Is this behaviour valid? I'm quite baffled.
Thats because dev_hard_start_xmit() calls dev_queue_xmit_nit() before
doing the features tests :
tcpdump gets a copy of the packet before all mangling done
(skb_checksum_help() in your case)
if (!list_empty(&ptype_all))
dev_queue_xmit_nit(skb, dev);
features = netif_skb_features(skb);
if (vlan_tx_tag_present(skb) &&
!(features & NETIF_F_HW_VLAN_TX)) {
skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
if (unlikely(!skb))
goto out;
skb->vlan_tci = 0;
}
if (netif_needs_gso(skb, features)) {
if (unlikely(dev_gso_segment(skb, features)))
goto out_kfree_skb;
if (skb->next)
goto gso;
} else {
if (skb_needs_linearize(skb, features) &&
__skb_linearize(skb))
goto out_kfree_skb;
/* If packet is not checksummed and device does not
* support checksumming for this protocol, complete
* checksumming here.
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
skb_set_transport_header(skb,
skb_checksum_start_offset(skb));
if (!(features & NETIF_F_ALL_CSUM) &&
skb_checksum_help(skb))
goto out_kfree_skb;
}
}
skb_len = skb->len;
rc = ops->ndo_start_xmit(skb, dev);
I guess we could move dev_queue_xmit_nit(skb, dev) calls right before the
ndo_start_xmit() calls...
^ permalink raw reply
* Re: [PATCH] tcp: Fixed a TFO server bug that crashed kernel by raw sockets
From: Eric Dumazet @ 2012-09-19 5:12 UTC (permalink / raw)
To: Christoph Paasch; +Cc: H.K. Jerry Chu, davem, netdev, ncardwell, edumazet
In-Reply-To: <4380003.jOHRfqhomY@cpaasch-mac>
On Wed, 2012-09-19 at 02:19 +0200, Christoph Paasch wrote:
> Why not moving the TCP-code out of inet_sock_destruct by modifying the sk_destruct
> callback when TFO is in use? Like the below (only compile-tested) patch. That
> way inet_sock_destruct stays TFO-free.
>
>
> Cheers,
> Christoph
>
> ---------
>
> From: Christoph Paasch <christoph.paasch@uclouvain.be>
> Date: Wed, 19 Sep 2012 02:06:53 +0200
> Subject: [PATCH] Don't add TCP-code in inet_sock_destruct
>
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---
> include/linux/tcp.h | 4 ++++
> net/ipv4/af_inet.c | 2 --
> net/ipv4/tcp.c | 7 +++++++
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index ae46df5..67c789a 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -574,6 +574,8 @@ static inline bool fastopen_cookie_present(struct tcp_fastopen_cookie *foc)
> return foc->len != -1;
> }
>
> +extern void tcp_sock_destruct(struct sock *sk);
> +
> static inline int fastopen_init_queue(struct sock *sk, int backlog)
> {
> struct request_sock_queue *queue =
> @@ -585,6 +587,8 @@ static inline int fastopen_init_queue(struct sock *sk, int backlog)
> sk->sk_allocation);
> if (queue->fastopenq == NULL)
> return -ENOMEM;
> +
> + sk->sk_destruct = tcp_sock_destruct;
> spin_lock_init(&queue->fastopenq->lock);
Yes, it seems much better, thanks !
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH] net/core: fix comment in skb_try_coalesce
From: Eric Dumazet @ 2012-09-19 5:08 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev
In-Reply-To: <1348023201-7727-1-git-send-email-roy.qing.li@gmail.com>
On Wed, 2012-09-19 at 10:53 +0800, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
>
> It should be the skb which is not cloned
>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
> net/core/skbuff.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index fe00d12..354a4e4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3502,7 +3502,9 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> if (!skb_cloned(from))
> skb_shinfo(from)->nr_frags = 0;
>
> - /* if the skb is cloned this does nothing since we set nr_frags to 0 */
> + /* if the skb is not cloned this does nothing
> + * since we set nr_frags to 0.
> + */
> for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
> skb_frag_ref(from, i);
>
Yes I saw that yesterday and was about to submit the same change (more
or less)
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [net-next 4/4] ixgbevf: scheduling while atomic in reset hw path
From: Eric Dumazet @ 2012-09-19 5:05 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, John Fastabend, netdev, gospo, sassmann
In-Reply-To: <1348029108-26659-5-git-send-email-jeffrey.t.kirsher@intel.com>
On Tue, 2012-09-18 at 21:31 -0700, Jeff Kirsher wrote:
> From: John Fastabend <john.r.fastabend@intel.com>
>
> In ixgbevf_reset_hw_vf() msleep is called while holding rtnl_lock
> and mbx_lock resulting in a schedule while atomic bug with trace
> below.
>
This sentence is misleading, as rtnl is a mutex.
Its legal to sleep while holding it
So the atomic context is because of lock #1, not 'lock' #2
> This patch uses mdelay instead.
>
> BUG: scheduling while atomic: ip/6539/0x00000002
> 2 locks held by ip/6539:
> #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81419cc3>] rtnl_lock+0x17/0x19
> #1: (&(&adapter->mbx_lock)->rlock){+.+...}, at: [<ffffffffa0030855>] ixgbevf_reset+0x30/0xc1 [ixgbevf]
> Modules linked in: ixgbevf ixgbe mdio libfc scsi_transport_fc 8021q scsi_tgt garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput igb coretemp hwmon crc32c_intel ioatdma i2c_i801 shpchp microcode lpc_ich mfd_core i2c_core joydev dca pcspkr serio_raw pata_acpi ata_generic usb_storage pata_jmicron
> Pid: 6539, comm: ip Not tainted 3.6.0-rc3jk-net-next+ #104
> Call Trace:
> [<ffffffff81072202>] __schedule_bug+0x6a/0x79
> [<ffffffff814bc7e0>] __schedule+0xa2/0x684
> [<ffffffff8108f85f>] ? trace_hardirqs_off+0xd/0xf
> [<ffffffff814bd0c0>] schedule+0x64/0x66
> [<ffffffff814bb5e2>] schedule_timeout+0xa6/0xca
> [<ffffffff810536b9>] ? lock_timer_base+0x52/0x52
> [<ffffffff812629e0>] ? __udelay+0x15/0x17
> [<ffffffff814bb624>] schedule_timeout_uninterruptible+0x1e/0x20
> [<ffffffff810541c0>] msleep+0x1b/0x22
> [<ffffffffa002e723>] ixgbevf_reset_hw_vf+0x90/0xe5 [ixgbevf]
> [<ffffffffa0030860>] ixgbevf_reset+0x3b/0xc1 [ixgbevf]
> [<ffffffffa0032fba>] ixgbevf_open+0x43/0x43e [ixgbevf]
> [<ffffffff81409610>] ? dev_set_rx_mode+0x2e/0x33
> [<ffffffff8140b0f1>] __dev_open+0xa0/0xe5
> [<ffffffff814097ed>] __dev_change_flags+0xbe/0x142
> [<ffffffff8140b01c>] dev_change_flags+0x21/0x56
> [<ffffffff8141a843>] do_setlink+0x2e2/0x7f4
> [<ffffffff81016e36>] ? native_sched_clock+0x37/0x39
> [<ffffffff8141b0ac>] rtnl_newlink+0x277/0x4bb
> [<ffffffff8141aee9>] ? rtnl_newlink+0xb4/0x4bb
> [<ffffffff812217d1>] ? selinux_capable+0x32/0x3a
> [<ffffffff8104fb17>] ? ns_capable+0x4f/0x67
> [<ffffffff81419cc3>] ? rtnl_lock+0x17/0x19
> [<ffffffff81419f28>] rtnetlink_rcv_msg+0x236/0x253
> [<ffffffff81419cf2>] ? rtnetlink_rcv+0x2d/0x2d
> [<ffffffff8142fd42>] netlink_rcv_skb+0x43/0x94
> [<ffffffff81419ceb>] rtnetlink_rcv+0x26/0x2d
> [<ffffffff8142faf1>] netlink_unicast+0xee/0x174
> [<ffffffff81430327>] netlink_sendmsg+0x26a/0x288
> [<ffffffff813fb04f>] ? rcu_read_unlock+0x56/0x67
> [<ffffffff813f5e6d>] __sock_sendmsg_nosec+0x58/0x61
> [<ffffffff813f81b7>] __sock_sendmsg+0x3d/0x48
> [<ffffffff813f8339>] sock_sendmsg+0x6e/0x87
> [<ffffffff81107c9f>] ? might_fault+0xa5/0xac
> [<ffffffff81402a72>] ? copy_from_user+0x2a/0x2c
> [<ffffffff81402e62>] ? verify_iovec+0x54/0xaa
> [<ffffffff813f9834>] __sys_sendmsg+0x206/0x288
> [<ffffffff810694fa>] ? up_read+0x23/0x3d
> [<ffffffff811307e5>] ? fcheck_files+0xac/0xea
> [<ffffffff8113095e>] ? fget_light+0x3a/0xb9
> [<ffffffff813f9a2e>] sys_sendmsg+0x42/0x60
> [<ffffffff814c5ba9>] system_call_fastpath+0x16/0x1b
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbevf/vf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
> index 690801b..87b3f3b 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/vf.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
> @@ -100,7 +100,7 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
> msgbuf[0] = IXGBE_VF_RESET;
> mbx->ops.write_posted(hw, msgbuf, 1);
>
> - msleep(10);
> + mdelay(10);
>
> /* set our "perm_addr" based on info provided by PF */
> /* also set up the mc_filter_type which is piggy backed
^ permalink raw reply
* Re: [RFC net-next] netpoll: use static branch
From: Cong Wang @ 2012-09-19 4:50 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, Eric Dumazet, netdev
In-Reply-To: <20120918141014.573734db@nehalam.linuxnetplumber.net>
On Tue, 2012-09-18 at 14:10 -0700, Stephen Hemminger wrote:
> This is an attempt to optimize netpoll when not used.
>
> Since distro's enable everything and netpoll is only occasionally
> used, improve performance by getting netpoll condition check
> out of the Rx fastpath.
>
> Compile tested only, I have no real use for netpoll.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
>
> ---
> include/linux/netpoll.h | 28 ++++++++++++++++++++--------
> net/core/netpoll.c | 8 +++++++-
> 2 files changed, 27 insertions(+), 9 deletions(-)
>
> --- a/include/linux/netpoll.h 2012-09-18 13:25:15.575750004 -0700
> +++ b/include/linux/netpoll.h 2012-09-18 13:29:16.245323347 -0700
> @@ -66,10 +66,16 @@ static inline void netpoll_send_skb(stru
>
>
> #ifdef CONFIG_NETPOLL
> +extern struct static_key netpoll_needed;
> +
> static inline bool netpoll_rx_on(struct sk_buff *skb)
> {
> - struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo);
> + struct netpoll_info *npinfo;
> +
> + if (static_key_true(&netpoll_needed))
> + return false;
>
I think we should use static_key_false() here, as netpoll is an
"unlikely" code path.
Using static branch is a good idea though.
Thanks.
^ permalink raw reply
* [net-next 4/4] ixgbevf: scheduling while atomic in reset hw path
From: Jeff Kirsher @ 2012-09-19 4:31 UTC (permalink / raw)
To: davem; +Cc: John Fastabend, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348029108-26659-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: John Fastabend <john.r.fastabend@intel.com>
In ixgbevf_reset_hw_vf() msleep is called while holding rtnl_lock
and mbx_lock resulting in a schedule while atomic bug with trace
below.
This patch uses mdelay instead.
BUG: scheduling while atomic: ip/6539/0x00000002
2 locks held by ip/6539:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81419cc3>] rtnl_lock+0x17/0x19
#1: (&(&adapter->mbx_lock)->rlock){+.+...}, at: [<ffffffffa0030855>] ixgbevf_reset+0x30/0xc1 [ixgbevf]
Modules linked in: ixgbevf ixgbe mdio libfc scsi_transport_fc 8021q scsi_tgt garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput igb coretemp hwmon crc32c_intel ioatdma i2c_i801 shpchp microcode lpc_ich mfd_core i2c_core joydev dca pcspkr serio_raw pata_acpi ata_generic usb_storage pata_jmicron
Pid: 6539, comm: ip Not tainted 3.6.0-rc3jk-net-next+ #104
Call Trace:
[<ffffffff81072202>] __schedule_bug+0x6a/0x79
[<ffffffff814bc7e0>] __schedule+0xa2/0x684
[<ffffffff8108f85f>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff814bd0c0>] schedule+0x64/0x66
[<ffffffff814bb5e2>] schedule_timeout+0xa6/0xca
[<ffffffff810536b9>] ? lock_timer_base+0x52/0x52
[<ffffffff812629e0>] ? __udelay+0x15/0x17
[<ffffffff814bb624>] schedule_timeout_uninterruptible+0x1e/0x20
[<ffffffff810541c0>] msleep+0x1b/0x22
[<ffffffffa002e723>] ixgbevf_reset_hw_vf+0x90/0xe5 [ixgbevf]
[<ffffffffa0030860>] ixgbevf_reset+0x3b/0xc1 [ixgbevf]
[<ffffffffa0032fba>] ixgbevf_open+0x43/0x43e [ixgbevf]
[<ffffffff81409610>] ? dev_set_rx_mode+0x2e/0x33
[<ffffffff8140b0f1>] __dev_open+0xa0/0xe5
[<ffffffff814097ed>] __dev_change_flags+0xbe/0x142
[<ffffffff8140b01c>] dev_change_flags+0x21/0x56
[<ffffffff8141a843>] do_setlink+0x2e2/0x7f4
[<ffffffff81016e36>] ? native_sched_clock+0x37/0x39
[<ffffffff8141b0ac>] rtnl_newlink+0x277/0x4bb
[<ffffffff8141aee9>] ? rtnl_newlink+0xb4/0x4bb
[<ffffffff812217d1>] ? selinux_capable+0x32/0x3a
[<ffffffff8104fb17>] ? ns_capable+0x4f/0x67
[<ffffffff81419cc3>] ? rtnl_lock+0x17/0x19
[<ffffffff81419f28>] rtnetlink_rcv_msg+0x236/0x253
[<ffffffff81419cf2>] ? rtnetlink_rcv+0x2d/0x2d
[<ffffffff8142fd42>] netlink_rcv_skb+0x43/0x94
[<ffffffff81419ceb>] rtnetlink_rcv+0x26/0x2d
[<ffffffff8142faf1>] netlink_unicast+0xee/0x174
[<ffffffff81430327>] netlink_sendmsg+0x26a/0x288
[<ffffffff813fb04f>] ? rcu_read_unlock+0x56/0x67
[<ffffffff813f5e6d>] __sock_sendmsg_nosec+0x58/0x61
[<ffffffff813f81b7>] __sock_sendmsg+0x3d/0x48
[<ffffffff813f8339>] sock_sendmsg+0x6e/0x87
[<ffffffff81107c9f>] ? might_fault+0xa5/0xac
[<ffffffff81402a72>] ? copy_from_user+0x2a/0x2c
[<ffffffff81402e62>] ? verify_iovec+0x54/0xaa
[<ffffffff813f9834>] __sys_sendmsg+0x206/0x288
[<ffffffff810694fa>] ? up_read+0x23/0x3d
[<ffffffff811307e5>] ? fcheck_files+0xac/0xea
[<ffffffff8113095e>] ? fget_light+0x3a/0xb9
[<ffffffff813f9a2e>] sys_sendmsg+0x42/0x60
[<ffffffff814c5ba9>] system_call_fastpath+0x16/0x1b
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/vf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 690801b..87b3f3b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -100,7 +100,7 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
msgbuf[0] = IXGBE_VF_RESET;
mbx->ops.write_posted(hw, msgbuf, 1);
- msleep(10);
+ mdelay(10);
/* set our "perm_addr" based on info provided by PF */
/* also set up the mc_filter_type which is piggy backed
--
1.7.11.4
^ permalink raw reply related
* [net-next 3/4] ixgbevf: Add support for VF API negotiation
From: Jeff Kirsher @ 2012-09-19 4:31 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348029108-26659-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change makes it so that the VF can support the PF/VF API negotiation
protocol. Specifically in this case we are adding support for API 1.0
which will mean that the VF is capable of cleaning up buffers that span
multiple descriptors without triggering an error.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/defines.h | 1 +
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 ++++++++++++++
drivers/net/ethernet/intel/ixgbevf/mbx.h | 21 +++++++++++--
drivers/net/ethernet/intel/ixgbevf/vf.c | 37 +++++++++++++++++++++++
drivers/net/ethernet/intel/ixgbevf/vf.h | 3 ++
5 files changed, 83 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 418af82..da17ccf 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -272,5 +272,6 @@ struct ixgbe_adv_tx_context_desc {
/* Error Codes */
#define IXGBE_ERR_INVALID_MAC_ADDR -1
#define IXGBE_ERR_RESET_FAILED -2
+#define IXGBE_ERR_INVALID_ARGUMENT -3
#endif /* _IXGBEVF_DEFINES_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a5d9cc5..c5ffe1d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1334,6 +1334,25 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
adapter->stats.base_vfmprc = adapter->stats.last_vfmprc;
}
+static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ int api[] = { ixgbe_mbox_api_10,
+ ixgbe_mbox_api_unknown };
+ int err = 0, idx = 0;
+
+ spin_lock(&adapter->mbx_lock);
+
+ while (api[idx] != ixgbe_mbox_api_unknown) {
+ err = ixgbevf_negotiate_api_version(hw, api[idx]);
+ if (!err)
+ break;
+ idx++;
+ }
+
+ spin_unlock(&adapter->mbx_lock);
+}
+
static void ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
@@ -1399,6 +1418,8 @@ void ixgbevf_up(struct ixgbevf_adapter *adapter)
{
struct ixgbe_hw *hw = &adapter->hw;
+ ixgbevf_negotiate_api(adapter);
+
ixgbevf_configure(adapter);
ixgbevf_up_complete(adapter);
@@ -2388,6 +2409,8 @@ static int ixgbevf_open(struct net_device *netdev)
}
}
+ ixgbevf_negotiate_api(adapter);
+
/* allocate transmit descriptors */
err = ixgbevf_setup_all_tx_resources(adapter);
if (err)
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index cf9131c..946ce86 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -76,12 +76,29 @@
/* bits 23:16 are used for exra info for certain messages */
#define IXGBE_VT_MSGINFO_MASK (0xFF << IXGBE_VT_MSGINFO_SHIFT)
+/* definitions to support mailbox API version negotiation */
+
+/*
+ * each element denotes a version of the API; existing numbers may not
+ * change; any additions must go at the end
+ */
+enum ixgbe_pfvf_api_rev {
+ ixgbe_mbox_api_10, /* API version 1.0, linux/freebsd VF driver */
+ ixgbe_mbox_api_20, /* API version 2.0, solaris Phase1 VF driver */
+ /* This value should always be last */
+ ixgbe_mbox_api_unknown, /* indicates that API version is not known */
+};
+
+/* mailbox API, legacy requests */
#define IXGBE_VF_RESET 0x01 /* VF requests reset */
#define IXGBE_VF_SET_MAC_ADDR 0x02 /* VF requests PF to set MAC addr */
#define IXGBE_VF_SET_MULTICAST 0x03 /* VF requests PF to set MC addr */
#define IXGBE_VF_SET_VLAN 0x04 /* VF requests PF to set VLAN */
-#define IXGBE_VF_SET_LPE 0x05 /* VF requests PF to set VMOLR.LPE */
-#define IXGBE_VF_SET_MACVLAN 0x06 /* VF requests PF for unicast filter */
+
+/* mailbox API, version 1.0 VF requests */
+#define IXGBE_VF_SET_LPE 0x05 /* VF requests PF to set VMOLR.LPE */
+#define IXGBE_VF_SET_MACVLAN 0x06 /* VF requests PF for unicast filter */
+#define IXGBE_VF_API_NEGOTIATE 0x08 /* negotiate API version */
/* length of permanent address message returned from PF */
#define IXGBE_VF_PERMADDR_MSG_LEN 4
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 3d555a1..690801b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -79,6 +79,9 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
/* Call adapter stop to disable tx/rx and clear interrupts */
hw->mac.ops.stop_adapter(hw);
+ /* reset the api version */
+ hw->api_version = ixgbe_mbox_api_10;
+
IXGBE_WRITE_REG(hw, IXGBE_VFCTRL, IXGBE_CTRL_RST);
IXGBE_WRITE_FLUSH(hw);
@@ -433,6 +436,40 @@ void ixgbevf_rlpml_set_vf(struct ixgbe_hw *hw, u16 max_size)
ixgbevf_write_msg_read_ack(hw, msgbuf, 2);
}
+/**
+ * ixgbevf_negotiate_api_version - Negotiate supported API version
+ * @hw: pointer to the HW structure
+ * @api: integer containing requested API version
+ **/
+int ixgbevf_negotiate_api_version(struct ixgbe_hw *hw, int api)
+{
+ int err;
+ u32 msg[3];
+
+ /* Negotiate the mailbox API version */
+ msg[0] = IXGBE_VF_API_NEGOTIATE;
+ msg[1] = api;
+ msg[2] = 0;
+ err = hw->mbx.ops.write_posted(hw, msg, 3);
+
+ if (!err)
+ err = hw->mbx.ops.read_posted(hw, msg, 3);
+
+ if (!err) {
+ msg[0] &= ~IXGBE_VT_MSGTYPE_CTS;
+
+ /* Store value and return 0 on success */
+ if (msg[0] == (IXGBE_VF_API_NEGOTIATE | IXGBE_VT_MSGTYPE_ACK)) {
+ hw->api_version = api;
+ return 0;
+ }
+
+ err = IXGBE_ERR_INVALID_ARGUMENT;
+ }
+
+ return err;
+}
+
static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
.init_hw = ixgbevf_init_hw_vf,
.reset_hw = ixgbevf_reset_hw_vf,
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 07fd876..47f11a5 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -137,6 +137,8 @@ struct ixgbe_hw {
u8 revision_id;
bool adapter_stopped;
+
+ int api_version;
};
struct ixgbevf_hw_stats {
@@ -171,5 +173,6 @@ struct ixgbevf_info {
};
void ixgbevf_rlpml_set_vf(struct ixgbe_hw *hw, u16 max_size);
+int ixgbevf_negotiate_api_version(struct ixgbe_hw *hw, int api);
#endif /* __IXGBE_VF_H__ */
--
1.7.11.4
^ permalink raw reply related
* [net-next 2/4] igb: Support to enable EEE on all eee_supported devices
From: Jeff Kirsher @ 2012-09-19 4:31 UTC (permalink / raw)
To: davem; +Cc: Akeem G. Abodunrin, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348029108-26659-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: "Akeem G. Abodunrin" <akeem.g.abodunrin@intel.com>
Current implementation enables EEE on only i350 device. This patch enables
EEE on all eee_supported devices. Also, configured LPI clock to keep
running before EEE is enabled on i210 and i211 devices.
Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/e1000_82575.c | 17 +++++++++++++----
drivers/net/ethernet/intel/igb/e1000_defines.h | 3 ++-
drivers/net/ethernet/intel/igb/e1000_regs.h | 1 +
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index ba994fb..ca4641e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -2223,11 +2223,10 @@ out:
s32 igb_set_eee_i350(struct e1000_hw *hw)
{
s32 ret_val = 0;
- u32 ipcnfg, eeer, ctrl_ext;
+ u32 ipcnfg, eeer;
- ctrl_ext = rd32(E1000_CTRL_EXT);
- if ((hw->mac.type != e1000_i350) ||
- (ctrl_ext & E1000_CTRL_EXT_LINK_MODE_MASK))
+ if ((hw->mac.type < e1000_i350) ||
+ (hw->phy.media_type != e1000_media_type_copper))
goto out;
ipcnfg = rd32(E1000_IPCNFG);
eeer = rd32(E1000_EEER);
@@ -2240,6 +2239,14 @@ s32 igb_set_eee_i350(struct e1000_hw *hw)
E1000_EEER_RX_LPI_EN |
E1000_EEER_LPI_FC);
+ /* keep the LPI clock running before EEE is enabled */
+ if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
+ u32 eee_su;
+ eee_su = rd32(E1000_EEE_SU);
+ eee_su &= ~E1000_EEE_SU_LPI_CLK_STP;
+ wr32(E1000_EEE_SU, eee_su);
+ }
+
} else {
ipcnfg &= ~(E1000_IPCNFG_EEE_1G_AN |
E1000_IPCNFG_EEE_100M_AN);
@@ -2249,6 +2256,8 @@ s32 igb_set_eee_i350(struct e1000_hw *hw)
}
wr32(E1000_IPCNFG, ipcnfg);
wr32(E1000_EEER, eeer);
+ rd32(E1000_IPCNFG);
+ rd32(E1000_EEER);
out:
return ret_val;
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index cae3070..de4b41e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -857,8 +857,9 @@
#define E1000_IPCNFG_EEE_100M_AN 0x00000004 /* EEE Enable 100M AN */
#define E1000_EEER_TX_LPI_EN 0x00010000 /* EEE Tx LPI Enable */
#define E1000_EEER_RX_LPI_EN 0x00020000 /* EEE Rx LPI Enable */
-#define E1000_EEER_FRC_AN 0x10000000 /* Enable EEE in loopback */
+#define E1000_EEER_FRC_AN 0x10000000 /* Enable EEE in loopback */
#define E1000_EEER_LPI_FC 0x00040000 /* EEE Enable on FC */
+#define E1000_EEE_SU_LPI_CLK_STP 0X00800000 /* EEE LPI Clock Stop */
/* SerDes Control */
#define E1000_GEN_CTL_READY 0x80000000
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index faec840..e5db485 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -349,6 +349,7 @@
/* Energy Efficient Ethernet "EEE" register */
#define E1000_IPCNFG 0x0E38 /* Internal PHY Configuration */
#define E1000_EEER 0x0E30 /* Energy Efficient Ethernet */
+#define E1000_EEE_SU 0X0E34 /* EEE Setup */
/* Thermal Sensor Register */
#define E1000_THSTAT 0x08110 /* Thermal Sensor Status */
--
1.7.11.4
^ permalink raw reply related
* [net-next 1/4] igb: Remove artificial restriction on RQDPC stat reading
From: Jeff Kirsher @ 2012-09-19 4:31 UTC (permalink / raw)
To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1348029108-26659-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
For some reason the reading of the RQDPC register was being artificially
limited to 4K. Instead of limiting the value we should read the value and
add the full amount. Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 19d7666..246646b 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4681,11 +4681,13 @@ void igb_update_stats(struct igb_adapter *adapter,
bytes = 0;
packets = 0;
for (i = 0; i < adapter->num_rx_queues; i++) {
- u32 rqdpc_tmp = rd32(E1000_RQDPC(i)) & 0x0FFF;
+ u32 rqdpc = rd32(E1000_RQDPC(i));
struct igb_ring *ring = adapter->rx_ring[i];
- ring->rx_stats.drops += rqdpc_tmp;
- net_stats->rx_fifo_errors += rqdpc_tmp;
+ if (rqdpc) {
+ ring->rx_stats.drops += rqdpc;
+ net_stats->rx_fifo_errors += rqdpc;
+ }
do {
start = u64_stats_fetch_begin_bh(&ring->rx_syncp);
--
1.7.11.4
^ permalink raw reply related
* [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-19 4:31 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann
This series contains updates to igb and ixgbevf.
The following are changes since commit adccff34de1ef81564b7e6c436f762e7a1caf807:
net/tipc/name_table.c: Remove unecessary semicolon
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Akeem G. Abodunrin (1):
igb: Support to enable EEE on all eee_supported devices
Alexander Duyck (2):
igb: Remove artificial restriction on RQDPC stat reading
ixgbevf: Add support for VF API negotiation
John Fastabend (1):
ixgbevf: scheduling while atomic in reset hw path
drivers/net/ethernet/intel/igb/e1000_82575.c | 17 +++++++---
drivers/net/ethernet/intel/igb/e1000_defines.h | 3 +-
drivers/net/ethernet/intel/igb/e1000_regs.h | 1 +
drivers/net/ethernet/intel/igb/igb_main.c | 8 +++--
drivers/net/ethernet/intel/ixgbevf/defines.h | 1 +
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++++++++
drivers/net/ethernet/intel/ixgbevf/mbx.h | 21 ++++++++++--
drivers/net/ethernet/intel/ixgbevf/vf.c | 39 ++++++++++++++++++++++-
drivers/net/ethernet/intel/ixgbevf/vf.h | 3 ++
9 files changed, 105 insertions(+), 11 deletions(-)
--
1.7.11.4
^ permalink raw reply
* [ethtool] ethtool: --set-eee sends ETHTOOL_SEEE ioctl even if nothing changed
From: Jeff Kirsher @ 2012-09-19 3:20 UTC (permalink / raw)
To: bhutchings; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
From: Bruce Allan <bruce.w.allan@intel.com>
When setting EEE parameters with the --set-eee command line option,
ethtool will send the ETHTOOL_SEEE ioctl down to the driver even if
none of the provided parameters are a change from current settings.
Simply ignore it when that happens as done with other ethtool commands.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ethtool.c b/ethtool.c
index 25ba51f..f3649e2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3645,7 +3645,7 @@ static int do_geee(struct cmd_context *ctx)
static int do_seee(struct cmd_context *ctx)
{
int adv_c = -1, lpi_c = -1, lpi_time_c = -1, eee_c = -1;
- int change = -1, change2 = -1;
+ int change = -1, change2 = 0;
struct ethtool_eee eeecmd;
struct cmdline_info cmdline_eee[] = {
{ "advertise", CMDL_U32, &adv_c, &eeecmd.advertised },
--
1.7.11.4
^ permalink raw reply related
* [PATCH] net/core: fix comment in skb_try_coalesce
From: roy.qing.li @ 2012-09-19 2:53 UTC (permalink / raw)
To: netdev
From: Li RongQing <roy.qing.li@gmail.com>
It should be the skb which is not cloned
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
net/core/skbuff.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fe00d12..354a4e4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3502,7 +3502,9 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
if (!skb_cloned(from))
skb_shinfo(from)->nr_frags = 0;
- /* if the skb is cloned this does nothing since we set nr_frags to 0 */
+ /* if the skb is not cloned this does nothing
+ * since we set nr_frags to 0.
+ */
for (i = 0; i < skb_shinfo(from)->nr_frags; i++)
skb_frag_ref(from, i);
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/4] ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
From: Cong Wang @ 2012-09-19 2:50 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
David Miller
In-Reply-To: <1348023011-16195-1-git-send-email-amwang@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/net/ipv6.h | 13 +++++++++++--
net/ipv6/reassembly.c | 10 ----------
2 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 81d4455..979bf6c 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -271,8 +271,17 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
extern bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb);
-int ip6_frag_nqueues(struct net *net);
-int ip6_frag_mem(struct net *net);
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int ip6_frag_nqueues(struct net *net)
+{
+ return net->ipv6.frags.nqueues;
+}
+
+static inline int ip6_frag_mem(struct net *net)
+{
+ return atomic_read(&net->ipv6.frags.mem);
+}
+#endif
#define IPV6_FRAG_HIGH_THRESH (256 * 1024) /* 262144 */
#define IPV6_FRAG_LOW_THRESH (192 * 1024) /* 196608 */
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 0ee5533..cf74f4e 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -67,16 +67,6 @@ struct ip6frag_skb_cb
static struct inet_frags ip6_frags;
-int ip6_frag_nqueues(struct net *net)
-{
- return net->ipv6.frags.nqueues;
-}
-
-int ip6_frag_mem(struct net *net)
-{
- return atomic_read(&net->ipv6.frags.mem);
-}
-
static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
struct net_device *dev);
--
1.7.7.6
^ permalink raw reply related
* [PATCH 4/4] ipv6: unify fragment thresh handling code
From: Cong Wang @ 2012-09-19 2:50 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
David Miller
In-Reply-To: <1348023011-16195-1-git-send-email-amwang@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/net/inet_frag.h | 2 +-
net/ipv4/inet_fragment.c | 9 +++++++--
net/ipv4/ip_fragment.c | 5 ++---
net/ipv6/netfilter/nf_conntrack_reasm.c | 8 +++-----
net/ipv6/reassembly.c | 16 +++++-----------
5 files changed, 18 insertions(+), 22 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5098ee7..32786a0 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -61,7 +61,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
void inet_frag_destroy(struct inet_frag_queue *q,
struct inet_frags *f, int *work);
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f);
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force);
struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
struct inet_frags *f, void *key, unsigned int hash)
__releases(&f->lock);
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 85190e6..4750d2b 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -89,7 +89,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
nf->low_thresh = 0;
local_bh_disable();
- inet_frag_evictor(nf, f);
+ inet_frag_evictor(nf, f, true);
local_bh_enable();
}
EXPORT_SYMBOL(inet_frags_exit_net);
@@ -158,11 +158,16 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
}
EXPORT_SYMBOL(inet_frag_destroy);
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f)
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
{
struct inet_frag_queue *q;
int work, evicted = 0;
+ if (!force) {
+ if (atomic_read(&nf->mem) <= nf->high_thresh)
+ return 0;
+ }
+
work = atomic_read(&nf->mem) - nf->low_thresh;
while (work > 0) {
read_lock(&f->lock);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fa6a12c..448e685 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -219,7 +219,7 @@ static void ip_evictor(struct net *net)
{
int evicted;
- evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags);
+ evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags, false);
if (evicted)
IP_ADD_STATS_BH(net, IPSTATS_MIB_REASMFAILS, evicted);
}
@@ -684,8 +684,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
/* Start by cleaning up the memory. */
- if (atomic_read(&net->ipv4.frags.mem) > net->ipv4.frags.high_thresh)
- ip_evictor(net);
+ ip_evictor(net);
/* Lookup (or create) queue header */
if ((qp = ip_find(net, ip_hdr(skb), user)) != NULL) {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 54274c3..1af12fde 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -566,11 +566,9 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
hdr = ipv6_hdr(clone);
fhdr = (struct frag_hdr *)skb_transport_header(clone);
- if (atomic_read(&net->nf_frag.frags.mem) > net->nf_frag.frags.high_thresh) {
- local_bh_disable();
- inet_frag_evictor(&net->nf_frag.frags, &nf_frags);
- local_bh_enable();
- }
+ local_bh_disable();
+ inet_frag_evictor(&net->nf_frag.frags, &nf_frags, false);
+ local_bh_enable();
fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
if (fq == NULL) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index cf74f4e..da8a4e3 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -131,15 +131,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
}
EXPORT_SYMBOL(ip6_frag_init);
-static void ip6_evictor(struct net *net, struct inet6_dev *idev)
-{
- int evicted;
-
- evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags);
- if (evicted)
- IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
-}
-
void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
struct inet_frags *frags)
{
@@ -515,6 +506,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
struct frag_queue *fq;
const struct ipv6hdr *hdr = ipv6_hdr(skb);
struct net *net = dev_net(skb_dst(skb)->dev);
+ int evicted;
IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS);
@@ -539,8 +531,10 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
return 1;
}
- if (atomic_read(&net->ipv6.frags.mem) > net->ipv6.frags.high_thresh)
- ip6_evictor(net, ip6_dst_idev(skb_dst(skb)));
+ evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags, false);
+ if (evicted)
+ IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
+ IPSTATS_MIB_REASMFAILS, evicted);
fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr);
if (fq != NULL) {
--
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 2/4] ipv6: unify conntrack reassembly expire code with standard one
From: Cong Wang @ 2012-09-19 2:50 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
David Miller, Hideaki YOSHIFUJI, Patrick McHardy,
Pablo Neira Ayuso
In-Reply-To: <1348023011-16195-1-git-send-email-amwang@redhat.com>
Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/
The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.
"
If insufficient fragments are received to complete reassembly of a
packet within 60 seconds of the reception of the first-arriving
fragment of that packet, reassembly of that packet must be
abandoned and all the fragments that have been received for that
packet must be discarded. If the first fragment (i.e., the one
with a Fragment Offset of zero) has been received, an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment.
"
As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.
With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UPD packet.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/net/ipv6.h | 19 ++++++++
net/ipv6/netfilter/nf_conntrack_reasm.c | 74 ++++++++-----------------------
net/ipv6/reassembly.c | 63 ++++++++------------------
3 files changed, 57 insertions(+), 99 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 9bed5d4..81d4455 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -411,6 +411,25 @@ struct ip6_create_arg {
void ip6_frag_init(struct inet_frag_queue *q, void *a);
bool ip6_frag_match(struct inet_frag_queue *q, void *a);
+/*
+ * Equivalent of ipv4 struct ip
+ */
+struct frag_queue {
+ struct inet_frag_queue q;
+
+ __be32 id; /* fragment id */
+ u32 user;
+ struct in6_addr saddr;
+ struct in6_addr daddr;
+
+ int iif;
+ unsigned int csum;
+ __u16 nhoffset;
+};
+
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
+ struct inet_frags *frags);
+
static inline bool ipv6_addr_any(const struct in6_addr *a)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index f40f327..54274c3 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -57,19 +57,6 @@ struct nf_ct_frag6_skb_cb
#define NFCT_FRAG6_CB(skb) ((struct nf_ct_frag6_skb_cb*)((skb)->cb))
-struct nf_ct_frag6_queue
-{
- struct inet_frag_queue q;
-
- __be32 id; /* fragment id */
- u32 user;
- struct in6_addr saddr;
- struct in6_addr daddr;
-
- unsigned int csum;
- __u16 nhoffset;
-};
-
static struct inet_frags nf_frags;
#ifdef CONFIG_SYSCTL
@@ -151,9 +138,9 @@ static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
static unsigned int nf_hashfn(struct inet_frag_queue *q)
{
- const struct nf_ct_frag6_queue *nq;
+ const struct frag_queue *nq;
- nq = container_of(q, struct nf_ct_frag6_queue, q);
+ nq = container_of(q, struct frag_queue, q);
return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd);
}
@@ -163,44 +150,21 @@ static void nf_skb_free(struct sk_buff *skb)
kfree_skb(NFCT_FRAG6_CB(skb)->orig);
}
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct nf_ct_frag6_queue *fq)
-{
- inet_frag_put(&fq->q, &nf_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
-{
- inet_frag_kill(&fq->q, &nf_frags);
-}
-
static void nf_ct_frag6_expire(unsigned long data)
{
- struct nf_ct_frag6_queue *fq;
-
- fq = container_of((struct inet_frag_queue *)data,
- struct nf_ct_frag6_queue, q);
+ struct frag_queue *fq;
+ struct net *net;
- spin_lock(&fq->q.lock);
+ fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+ net = container_of(fq->q.net, struct net, nf_frag.frags);
- if (fq->q.last_in & INET_FRAG_COMPLETE)
- goto out;
-
- fq_kill(fq);
-
-out:
- spin_unlock(&fq->q.lock);
- fq_put(fq);
+ ip6_expire_frag_queue(net, fq, &nf_frags);
}
/* Creation primitives. */
-static inline struct nf_ct_frag6_queue *fq_find(struct net *net, __be32 id,
- u32 user, struct in6_addr *src,
- struct in6_addr *dst)
+static inline struct frag_queue *fq_find(struct net *net, __be32 id,
+ u32 user, struct in6_addr *src,
+ struct in6_addr *dst)
{
struct inet_frag_queue *q;
struct ip6_create_arg arg;
@@ -219,14 +183,14 @@ static inline struct nf_ct_frag6_queue *fq_find(struct net *net, __be32 id,
if (q == NULL)
goto oom;
- return container_of(q, struct nf_ct_frag6_queue, q);
+ return container_of(q, struct frag_queue, q);
oom:
return NULL;
}
-static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
+static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
const struct frag_hdr *fhdr, int nhoff)
{
struct sk_buff *prev, *next;
@@ -367,7 +331,7 @@ found:
return 0;
discard_fq:
- fq_kill(fq);
+ inet_frag_kill(&fq->q, &nf_frags);
err:
return -1;
}
@@ -382,12 +346,12 @@ err:
* the last and the first frames arrived and all the bits are here.
*/
static struct sk_buff *
-nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
+nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
{
struct sk_buff *fp, *op, *head = fq->q.fragments;
int payload_len;
- fq_kill(fq);
+ inet_frag_kill(&fq->q, &nf_frags);
WARN_ON(head == NULL);
WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -570,7 +534,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev)
: dev_net(skb->dev);
struct frag_hdr *fhdr;
- struct nf_ct_frag6_queue *fq;
+ struct frag_queue *fq;
struct ipv6hdr *hdr;
int fhoff, nhoff;
u8 prevhdr;
@@ -619,7 +583,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
if (nf_ct_frag6_queue(fq, clone, fhdr, nhoff) < 0) {
spin_unlock_bh(&fq->q.lock);
pr_debug("Can't insert skb to queue\n");
- fq_put(fq);
+ inet_frag_put(&fq->q, &nf_frags);
goto ret_orig;
}
@@ -631,7 +595,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
}
spin_unlock_bh(&fq->q.lock);
- fq_put(fq);
+ inet_frag_put(&fq->q, &nf_frags);
return ret_skb;
ret_orig:
@@ -695,7 +659,7 @@ int nf_ct_frag6_init(void)
nf_frags.constructor = ip6_frag_init;
nf_frags.destructor = NULL;
nf_frags.skb_free = nf_skb_free;
- nf_frags.qsize = sizeof(struct nf_ct_frag6_queue);
+ nf_frags.qsize = sizeof(struct frag_queue);
nf_frags.match = ip6_frag_match;
nf_frags.frag_expire = nf_ct_frag6_expire;
nf_frags.secret_interval = 10 * 60 * HZ;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4ff9af6..0ee5533 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -65,24 +65,6 @@ struct ip6frag_skb_cb
#define FRAG6_CB(skb) ((struct ip6frag_skb_cb*)((skb)->cb))
-/*
- * Equivalent of ipv4 struct ipq
- */
-
-struct frag_queue
-{
- struct inet_frag_queue q;
-
- __be32 id; /* fragment id */
- u32 user;
- struct in6_addr saddr;
- struct in6_addr daddr;
-
- int iif;
- unsigned int csum;
- __u16 nhoffset;
-};
-
static struct inet_frags ip6_frags;
int ip6_frag_nqueues(struct net *net)
@@ -159,21 +141,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
}
EXPORT_SYMBOL(ip6_frag_init);
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct frag_queue *fq)
-{
- inet_frag_put(&fq->q, &ip6_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct frag_queue *fq)
-{
- inet_frag_kill(&fq->q, &ip6_frags);
-}
-
static void ip6_evictor(struct net *net, struct inet6_dev *idev)
{
int evicted;
@@ -183,22 +150,18 @@ static void ip6_evictor(struct net *net, struct inet6_dev *idev)
IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
}
-static void ip6_frag_expire(unsigned long data)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
+ struct inet_frags *frags)
{
- struct frag_queue *fq;
struct net_device *dev = NULL;
- struct net *net;
-
- fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
spin_lock(&fq->q.lock);
if (fq->q.last_in & INET_FRAG_COMPLETE)
goto out;
- fq_kill(fq);
+ inet_frag_kill(&fq->q, frags);
- net = container_of(fq->q.net, struct net, ipv6.frags);
rcu_read_lock();
dev = dev_get_by_index_rcu(net, fq->iif);
if (!dev)
@@ -222,7 +185,19 @@ out_rcu_unlock:
rcu_read_unlock();
out:
spin_unlock(&fq->q.lock);
- fq_put(fq);
+ inet_frag_put(&fq->q, frags);
+}
+EXPORT_SYMBOL(ip6_expire_frag_queue);
+
+static void ip6_frag_expire(unsigned long data)
+{
+ struct frag_queue *fq;
+ struct net *net;
+
+ fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+ net = container_of(fq->q.net, struct net, ipv6.frags);
+
+ ip6_expire_frag_queue(net, fq, &ip6_frags);
}
static __inline__ struct frag_queue *
@@ -391,7 +366,7 @@ found:
return -1;
discard_fq:
- fq_kill(fq);
+ inet_frag_kill(&fq->q, &ip6_frags);
err:
IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
IPSTATS_MIB_REASMFAILS);
@@ -417,7 +392,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
unsigned int nhoff;
int sum_truesize;
- fq_kill(fq);
+ inet_frag_kill(&fq->q, &ip6_frags);
/* Make the one we just received the head. */
if (prev) {
@@ -586,7 +561,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
spin_unlock(&fq->q.lock);
- fq_put(fq);
+ inet_frag_put(&fq->q, &ip6_frags);
return ret;
}
--
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/4] ipv6: add a new namespace for nf_conntrack_reasm
From: Cong Wang @ 2012-09-19 2:50 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
David Miller, Patrick McHardy, Pablo Neira Ayuso
In-Reply-To: <1348023011-16195-1-git-send-email-amwang@redhat.com>
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/net/net_namespace.h | 3 +
include/net/netns/ipv6.h | 8 ++
net/ipv6/netfilter/nf_conntrack_reasm.c | 137 +++++++++++++++++++++----------
3 files changed, 106 insertions(+), 42 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5ae57f1..d61e2b3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -93,6 +93,9 @@ struct net {
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct netns_ct ct;
#endif
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+ struct netns_nf_frag nf_frag;
+#endif
struct sock *nfnl;
struct sock *nfnl_stash;
#endif
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 0318104..214cb0a 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -71,4 +71,12 @@ struct netns_ipv6 {
#endif
#endif
};
+
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+struct netns_nf_frag {
+ struct netns_sysctl_ipv6 sysctl;
+ struct netns_frags frags;
+};
+#endif
+
#endif
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index f94fb3a..f40f327 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -71,27 +71,26 @@ struct nf_ct_frag6_queue
};
static struct inet_frags nf_frags;
-static struct netns_frags nf_init_frags;
#ifdef CONFIG_SYSCTL
static struct ctl_table nf_ct_frag6_sysctl_table[] = {
{
.procname = "nf_conntrack_frag6_timeout",
- .data = &nf_init_frags.timeout,
+ .data = &init_net.nf_frag.frags.timeout,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
},
{
.procname = "nf_conntrack_frag6_low_thresh",
- .data = &nf_init_frags.low_thresh,
+ .data = &init_net.nf_frag.frags.low_thresh,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "nf_conntrack_frag6_high_thresh",
- .data = &nf_init_frags.high_thresh,
+ .data = &init_net.nf_frag.frags.high_thresh,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
@@ -99,7 +98,55 @@ static struct ctl_table nf_ct_frag6_sysctl_table[] = {
{ }
};
-static struct ctl_table_header *nf_ct_frag6_sysctl_header;
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+ struct ctl_table *table;
+ struct ctl_table_header *hdr;
+
+ table = nf_ct_frag6_sysctl_table;
+ if (!net_eq(net, &init_net)) {
+ table = kmemdup(table, sizeof(nf_ct_frag6_sysctl_table),
+ GFP_KERNEL);
+ if (table == NULL)
+ goto err_alloc;
+
+ table[0].data = &net->ipv6.frags.high_thresh;
+ table[1].data = &net->ipv6.frags.low_thresh;
+ table[2].data = &net->ipv6.frags.timeout;
+ }
+
+ hdr = register_net_sysctl(net, "net/netfilter", table);
+ if (hdr == NULL)
+ goto err_reg;
+
+ net->ipv6.sysctl.frags_hdr = hdr;
+ return 0;
+
+err_reg:
+ if (!net_eq(net, &init_net))
+ kfree(table);
+err_alloc:
+ return -ENOMEM;
+}
+
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+ struct ctl_table *table;
+
+ table = net->nf_frag.sysctl.frags_hdr->ctl_table_arg;
+ unregister_net_sysctl_table(net->nf_frag.sysctl.frags_hdr);
+ if (!net_eq(net, &init_net))
+ kfree(table);
+}
+
+#else
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+ return 0;
+}
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+}
#endif
static unsigned int nf_hashfn(struct inet_frag_queue *q)
@@ -131,13 +178,6 @@ static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
inet_frag_kill(&fq->q, &nf_frags);
}
-static void nf_ct_frag6_evictor(void)
-{
- local_bh_disable();
- inet_frag_evictor(&nf_init_frags, &nf_frags);
- local_bh_enable();
-}
-
static void nf_ct_frag6_expire(unsigned long data)
{
struct nf_ct_frag6_queue *fq;
@@ -158,9 +198,9 @@ out:
}
/* Creation primitives. */
-
-static __inline__ struct nf_ct_frag6_queue *
-fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
+static inline struct nf_ct_frag6_queue *fq_find(struct net *net, __be32 id,
+ u32 user, struct in6_addr *src,
+ struct in6_addr *dst)
{
struct inet_frag_queue *q;
struct ip6_create_arg arg;
@@ -174,7 +214,7 @@ fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
read_lock_bh(&nf_frags.lock);
hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
- q = inet_frag_find(&nf_init_frags, &nf_frags, &arg, hash);
+ q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
local_bh_enable();
if (q == NULL)
goto oom;
@@ -312,7 +352,7 @@ found:
fq->q.meat += skb->len;
if (payload_len > fq->q.max_size)
fq->q.max_size = payload_len;
- atomic_add(skb->truesize, &nf_init_frags.mem);
+ atomic_add(skb->truesize, &fq->q.net->mem);
/* The first fragment.
* nhoffset is obtained from the first fragment, of course.
@@ -322,7 +362,7 @@ found:
fq->q.last_in |= INET_FRAG_FIRST_IN;
}
write_lock(&nf_frags.lock);
- list_move_tail(&fq->q.lru_list, &nf_init_frags.lru_list);
+ list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
write_unlock(&nf_frags.lock);
return 0;
@@ -391,7 +431,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
clone->ip_summed = head->ip_summed;
NFCT_FRAG6_CB(clone)->orig = NULL;
- atomic_add(clone->truesize, &nf_init_frags.mem);
+ atomic_add(clone->truesize, &fq->q.net->mem);
}
/* We have to remove fragment header from datagram and to relocate
@@ -415,7 +455,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
}
- atomic_sub(head->truesize, &nf_init_frags.mem);
+ atomic_sub(head->truesize, &fq->q.net->mem);
head->local_df = 1;
head->next = NULL;
@@ -527,6 +567,8 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
{
struct sk_buff *clone;
struct net_device *dev = skb->dev;
+ struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev)
+ : dev_net(skb->dev);
struct frag_hdr *fhdr;
struct nf_ct_frag6_queue *fq;
struct ipv6hdr *hdr;
@@ -560,10 +602,13 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
hdr = ipv6_hdr(clone);
fhdr = (struct frag_hdr *)skb_transport_header(clone);
- if (atomic_read(&nf_init_frags.mem) > nf_init_frags.high_thresh)
- nf_ct_frag6_evictor();
+ if (atomic_read(&net->nf_frag.frags.mem) > net->nf_frag.frags.high_thresh) {
+ local_bh_disable();
+ inet_frag_evictor(&net->nf_frag.frags, &nf_frags);
+ local_bh_enable();
+ }
- fq = fq_find(fhdr->identification, user, &hdr->saddr, &hdr->daddr);
+ fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
if (fq == NULL) {
pr_debug("Can't find and can't create new queue\n");
goto ret_orig;
@@ -621,8 +666,31 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
nf_conntrack_put_reasm(skb);
}
+static int nf_ct_net_init(struct net *net)
+{
+ net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
+ net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
+ net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
+ inet_frags_init_net(&net->nf_frag.frags);
+
+ return nf_ct_frag6_sysctl_register(net);
+}
+
+static void nf_ct_net_exit(struct net *net)
+{
+ nf_ct_frags6_sysctl_unregister(net);
+ inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+}
+
+static struct pernet_operations nf_ct_net_ops = {
+ .init = nf_ct_net_init,
+ .exit = nf_ct_net_exit,
+};
+
int nf_ct_frag6_init(void)
{
+ int ret = 0;
+
nf_frags.hashfn = nf_hashfn;
nf_frags.constructor = ip6_frag_init;
nf_frags.destructor = NULL;
@@ -631,32 +699,17 @@ int nf_ct_frag6_init(void)
nf_frags.match = ip6_frag_match;
nf_frags.frag_expire = nf_ct_frag6_expire;
nf_frags.secret_interval = 10 * 60 * HZ;
- nf_init_frags.timeout = IPV6_FRAG_TIMEOUT;
- nf_init_frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
- nf_init_frags.low_thresh = IPV6_FRAG_LOW_THRESH;
- inet_frags_init_net(&nf_init_frags);
inet_frags_init(&nf_frags);
-#ifdef CONFIG_SYSCTL
- nf_ct_frag6_sysctl_header = register_net_sysctl(&init_net, "net/netfilter",
- nf_ct_frag6_sysctl_table);
- if (!nf_ct_frag6_sysctl_header) {
+ ret = register_pernet_subsys(&nf_ct_net_ops);
+ if (ret)
inet_frags_fini(&nf_frags);
- return -ENOMEM;
- }
-#endif
- return 0;
+ return ret;
}
void nf_ct_frag6_cleanup(void)
{
-#ifdef CONFIG_SYSCTL
- unregister_net_sysctl_table(nf_ct_frag6_sysctl_header);
- nf_ct_frag6_sysctl_header = NULL;
-#endif
+ unregister_pernet_subsys(&nf_ct_net_ops);
inet_frags_fini(&nf_frags);
-
- nf_init_frags.low_thresh = 0;
- nf_ct_frag6_evictor();
}
--
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v4 net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-19 2:50 UTC (permalink / raw)
To: netdev
Cc: netfilter-devel, Herbert Xu, David S. Miller, Pablo Neira Ayuso,
Cong Wang
V4: some coding style fix
V3: rename struct netns_nf_ct to struct netns_nf_frag
V2: use IS_ENABLED(CONFIG_IPV6) to fix a build error
rebase to latest net-next
ipv6: add a new namespace for nf_conntrack_reasm
ipv6: unify conntrack reassembly expire code with standard one
ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static
ipv6: unify fragment thresh handling code
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
-----
include/net/inet_frag.h | 2 +-
include/net/ipv6.h | 32 +++++-
include/net/net_namespace.h | 3 +
include/net/netns/ipv6.h | 8 ++
net/ipv4/inet_fragment.c | 9 +-
net/ipv4/ip_fragment.c | 5 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 199 +++++++++++++++++--------------
net/ipv6/reassembly.c | 89 ++++----------
8 files changed, 182 insertions(+), 165 deletions(-)
^ permalink raw reply
* Трудовой договор - типичные ошибки и судебная практика.
From: 24-25 September, Moscow @ 2012-09-19 2:25 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 0 bytes --]
[-- Attachment #2: Программа для юристов и кадровых служб.xls --]
[-- Type: application/vnd.ms-excel, Size: 84992 bytes --]
^ permalink raw reply
* Re: [PATCH] tcp: Fixed a TFO server bug that crashed kernel by raw sockets
From: Jerry Chu @ 2012-09-19 2:14 UTC (permalink / raw)
To: Christoph Paasch; +Cc: davem, netdev, ncardwell, edumazet
In-Reply-To: <4380003.jOHRfqhomY@cpaasch-mac>
On Tue, Sep 18, 2012 at 5:19 PM, Christoph Paasch
<christoph.paasch@uclouvain.be> wrote:
> On Tuesday 18 September 2012 16:35:51 H.K. Jerry Chu wrote:
>> From: Jerry Chu <hkchu@google.com>
>>
>> Crash dump msg looks like this:
>>
>> <1>[34468.419809] BUG: unable to handle kernel paging request at
>> ffffeb57000dc058 <1>[34468.426770] IP: [<ffffffff80383f9c>]
>> kfree+0x4c/0x2d0
>> ...
>> <4>[34468.603362] Call Trace:
>> <4>[34468.605802] [<ffffffff807542a4>] inet_sock_destruct+0x174/0x1f0
>> <4>[34468.611786] [<ffffffff806ced53>] __sk_free+0x23/0x170
>> <4>[34468.616907] [<ffffffff806ceec5>] sk_free+0x25/0x30
>> <4>[34468.621762] [<ffffffff806d066a>] sk_common_release+0x7a/0x80
>> <4>[34468.627481] [<ffffffff80746782>] raw_close+0x22/0x30
>> <4>[34468.632515] [<ffffffff80753038>] inet_release+0x58/0x90
>> <4>[34468.637802] [<ffffffff806c9dd8>] sock_release+0x28/0x90
>> <4>[34468.643087] [<ffffffff806c9f07>] sock_close+0x17/0x30
>> <4>[34468.648203] [<ffffffff8039d60a>] fput+0xda/0x210
>> <4>[34468.652894] [<ffffffff80398e06>] filp_close+0x66/0x90
>> <4>[34468.658016] [<ffffffff802822cd>] put_files_struct+0x9d/0x120
>> <4>[34468.663743] [<ffffffff802823fa>] exit_files+0x4a/0x60
>> <4>[34468.668857] [<ffffffff802828e4>] do_exit+0x1a4/0x8c0
>> <4>[34468.673886] [<ffffffff803697db>] ? do_munmap+0x2ab/0x390
>> <4>[34468.679256] [<ffffffff80283384>] do_group_exit+0x44/0xa0
>> <4>[34468.684632] [<ffffffff8028341f>] sys_exit_group+0x3f/0x50
>> <4>[34468.690094] [<ffffffff807a5730>] sysenter_dispatch+0x7/0x1a
>>
>> This bug was introduced as part of patch
>> 7ab4551f3b391818e29263279031dca1e26417c6
>>
>> It turns out the call "socket(PF_INET/PF_INET6, SOCK_RAW, IPPROTO_TCP)"
>> will cause a raw socket to be created with protocol == IPPROTO_TCP
>> so checking against the protocol field in sk alone is not sufficient
>> to guarantee a TCP socket. One must also check type == SOCK_STREAM.
>>
>> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
>> Cc: Neal Cardwell <ncardwell@google.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> ---
>> net/ipv4/af_inet.c | 28 +++++++++++++++++-----------
>> 1 files changed, 17 insertions(+), 11 deletions(-)
>
> Why not moving the TCP-code out of inet_sock_destruct by modifying the sk_destruct
> callback when TFO is in use? Like the below (only compile-tested) patch. That
> way inet_sock_destruct stays TFO-free.
That will work too. (I briefly thought about this but was distracted by
the two pr_err() returns...)
>
>
> Cheers,
> Christoph
>
> ---------
>
> From: Christoph Paasch <christoph.paasch@uclouvain.be>
> Date: Wed, 19 Sep 2012 02:06:53 +0200
> Subject: [PATCH] Don't add TCP-code in inet_sock_destruct
>
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---
> include/linux/tcp.h | 4 ++++
> net/ipv4/af_inet.c | 2 --
> net/ipv4/tcp.c | 7 +++++++
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index ae46df5..67c789a 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -574,6 +574,8 @@ static inline bool fastopen_cookie_present(struct tcp_fastopen_cookie *foc)
> return foc->len != -1;
> }
>
> +extern void tcp_sock_destruct(struct sock *sk);
> +
> static inline int fastopen_init_queue(struct sock *sk, int backlog)
> {
> struct request_sock_queue *queue =
> @@ -585,6 +587,8 @@ static inline int fastopen_init_queue(struct sock *sk, int backlog)
> sk->sk_allocation);
> if (queue->fastopenq == NULL)
> return -ENOMEM;
> +
> + sk->sk_destruct = tcp_sock_destruct;
> spin_lock_init(&queue->fastopenq->lock);
> }
> queue->fastopenq->max_qlen = backlog;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 845372b..766c596 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -149,8 +149,6 @@ void inet_sock_destruct(struct sock *sk)
> pr_err("Attempt to release alive inet socket %p\n", sk);
> return;
> }
> - if (sk->sk_protocol == IPPROTO_TCP)
> - kfree(inet_csk(sk)->icsk_accept_queue.fastopenq);
>
> WARN_ON(atomic_read(&sk->sk_rmem_alloc));
> WARN_ON(atomic_read(&sk->sk_wmem_alloc));
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index df83d74..7b1e940 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2325,6 +2325,13 @@ int tcp_disconnect(struct sock *sk, int flags)
> }
> EXPORT_SYMBOL(tcp_disconnect);
>
> +void tcp_sock_destruct(struct sock *sk)
> +{
> + inet_sock_destruct(sk);
> +
> + kfree(inet_csk(sk)->icsk_accept_queue.fastopenq);
> +}
> +
> static inline bool tcp_can_repair_sock(const struct sock *sk)
> {
> return capable(CAP_NET_ADMIN) &&
> --
> 1.7.9.5
>
>
Acked-by: H.K. Jerry Chu <hkchu@google.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox