Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-18  2:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel, herbert
In-Reply-To: <20120917.125925.930848734158369358.davem@davemloft.net>

On Mon, 2012-09-17 at 12:59 -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 17 Sep 2012 12:54:19 -0400 (EDT)
> 
> > From: Cong Wang <amwang@redhat.com>
> > Date: Thu, 13 Sep 2012 14:25:37 +0800
> > 
> >> ipv6: add a new namespace for nf_conntrack_reasm
> >> ipv6: unify conntrack reassembly expire code with
> >> ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static
> >> ipv6: unify fragment thresh handling code
> >> 
> >> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> >> Cc: "David S. Miller" <davem@davemloft.net>
> >> Signed-off-by: Cong Wang <amwang@redhat.com>
> > 
> > These changes look great, all applied to net-next, thanks.
> 
> I have to ask if you actually build tested this change at all:
> 
> net/ipv6/proc.c: In function ‘sockstat6_seq_show’:
> net/ipv6/proc.c:46:10: error: implicit declaration of function ‘ip6_frag_nqueues’ [-Werror=implicit-function-declaration]
> net/ipv6/proc.c:46:10: error: implicit declaration of function ‘ip6_frag_mem’ [-Werror=implicit-function-declaration]
> 
> It is absolutely impossible for you to have enabled ipv6 and not gotten
> that build error.

Weird, I don't see any build error:

% grep CONFIG_IPV6 .config
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_IPV6_MIP6=y
CONFIG_IPV6_SIT=y
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=y
CONFIG_IPV6_GRE=y
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
% rm net/ipv6/proc.o 
% make net/ipv6/proc.o
make[1]: Nothing to be done for `all'.
make[1]: Nothing to be done for `relocs'.
  CHK     include/linux/version.h
  CHK     include/generated/utsrelease.h
  CC      kernel/bounds.s
  GEN     include/generated/bounds.h
  CC      arch/x86/kernel/asm-offsets.s
  GEN     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CC      scripts/mod/empty.o
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTLD  scripts/mod/modpost
  CC      net/ipv6/proc.o

Rebuild the whole tree:
...
  CC      net/ipv6/ip6mr.o
  CC      net/ipv6/xfrm6_policy.o
  CC      net/ipv6/xfrm6_state.o
  CC      net/ipv6/xfrm6_input.o
  CC      net/ipv6/xfrm6_output.o
  CC      net/ipv6/netfilter.o
  CC      net/ipv6/fib6_rules.o
  CC      net/ipv6/proc.o
  CC      net/ipv6/syncookies.o
  LD      net/ipv6/ipv6.o
  CC      net/ipv6/ah6.o
  CC      net/ipv6/esp6.o
  CC      net/ipv6/ipcomp6.o
  CC      net/ipv6/xfrm6_tunnel.o
  CC      net/ipv6/tunnel6.o
...
% gcc --version
gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

> 
> The only logical explanation is that you didn't commit the changes
> to net/ipv6/proc.c in your tree when you put together these patches.

There is no change for net/ipv6/proc.c, ip6_frag_nqueues() and
ip6_frag_mem() are now defined as static inline in include/net/ipv6.h
which is already #included by net/ipv6/proc.c. This is why I still don't
see how that build error could happen.

Actually, the #ifdef CONFIG_IPV6 is not needed at all, as
sockstat6_seq_show() is their only caller, which is compiled only when
CONFIG_IPV6 is enabled.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: David Miller @ 2012-09-18  2:35 UTC (permalink / raw)
  To: amwang; +Cc: netdev, netfilter-devel, herbert
In-Reply-To: <1347935656.14402.12.camel@cr0>

From: Cong Wang <amwang@redhat.com>
Date: Tue, 18 Sep 2012 10:34:16 +0800

> Actually, the #ifdef CONFIG_IPV6 is not needed at all, as
> sockstat6_seq_show() is their only caller, which is compiled only when
> CONFIG_IPV6 is enabled.

"#ifdef CONFIG_IPV6 doesn't work for modular ipv6.

^ permalink raw reply

* Re: [PATCH net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-18  2:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel, herbert
In-Reply-To: <20120917.223559.1226100441781312938.davem@davemloft.net>

On Mon, 2012-09-17 at 22:35 -0400, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Tue, 18 Sep 2012 10:34:16 +0800
> 
> > Actually, the #ifdef CONFIG_IPV6 is not needed at all, as
> > sockstat6_seq_show() is their only caller, which is compiled only when
> > CONFIG_IPV6 is enabled.
> 
> "#ifdef CONFIG_IPV6 doesn't work for modular ipv6.

Ah... then this should be the cause of your build failure, as I always
compile IPV6 as builtin.

Sorry for this, my bad. I will remove this #ifdef and resend the whole
patchset.

Thanks!


^ permalink raw reply

* Re: [PATCH net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: David Miller @ 2012-09-18  2:53 UTC (permalink / raw)
  To: amwang; +Cc: netdev, netfilter-devel, herbert
In-Reply-To: <1347936443.14402.15.camel@cr0>

From: Cong Wang <amwang@redhat.com>
Date: Tue, 18 Sep 2012 10:47:23 +0800

> I will remove this #ifdef and resend the whole patchset.

Or, alternatively, use IS_ENABLED() or a similar test which will take
the modular case into account.


^ permalink raw reply

* Re: [net] e1000: Small packets may get corrupted during padding by HW
From: Alexander Duyck @ 2012-09-18  3:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, Dave, Tushar N, Fastabend, John R,
	Michal Miroslaw, Kirsher, Jeffrey T, davem@davemloft.net,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <1347915723.26523.179.camel@edumazet-glaptop>

On 9/17/2012 2:02 PM, Eric Dumazet wrote:
> On Mon, 2012-09-17 at 13:53 -0700, Alexander Duyck wrote:
>> On 09/17/2012 12:58 AM, Eric Dumazet wrote:
>>> On Mon, 2012-09-17 at 07:33 +0000, Dave, Tushar N wrote:
>>>>> -----Original Message-----
>>>>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>>>>> On Behalf Of John Fastabend
>>>>> Also wouldn't you want an unlikely() in your patch?
>>>> No because it is quite normal to have packet < ETH_ZLEN. e.g. ARP packets.
>>> ARP packets ? Hardly a performance problem.
>>>
>>> Or make sure all these packets have enough tailroom, or else you are
>>> going to hit the cost of reallocating packets.
>>>
>>> I would better point TCP pure ACK packets, since their size can be 54
>>> bytes.
>>>
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index cfe6ffe..aefc681 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -3083,8 +3083,9 @@ void tcp_send_ack(struct sock *sk)
>>>   	/* We are not putting this on the write queue, so
>>>   	 * tcp_transmit_skb() will set the ownership to this
>>>   	 * sock.
>>> +	 * Add 64 bytes of tailroom so that some drivers can use skb_pad()
>>>   	 */
>>> -	buff = alloc_skb(MAX_TCP_HEADER, sk_gfp_atomic(sk, GFP_ATOMIC));
>>> +	buff = alloc_skb(MAX_TCP_HEADER + 64, sk_gfp_atomic(sk, GFP_ATOMIC));
>>>   	if (buff == NULL) {
>>>   		inet_csk_schedule_ack(sk);
>>>   		inet_csk(sk)->icsk_ack.ato = TCP_ATO_MIN;
>> For most systems that extra padding should already be added since
>> alloc_skb will cache line align the buffer anyway.
>>
> Please define 'most systems' ?

Sorry I misspoke.  What I meant to say is that the allocation will be 
aligned to a slab size.  If you take a look at alloc_skb it looks like 
it is still using __alloc_skb so it is going to add skb_shared_info to 
the size so at least in the case of most 64 bit systems the total 
allocation size is going to be larger than 512 and as a result skb->head 
will be allocated from a 1K slab cache leaving plenty of room for 
padding to be added later.  On 32 bit systems the total size will likely 
be a little over 256 and get rounded up to 512.

The only real thing that bugged me about this is that you were adding 64 
when the most you should ever need is 10.  That was the only real reason 
I felt like commenting on it.

>> A more general fix might be to make it so that alloc_skb cannot allocate
>> less than 60 byte buffers on systems with a cache line size smaller than
>> 64 bytes.
> Nope, because we do a skb_reserve(skb, MAX_TCP_HEADER)
>
> So we might have no bytes available at all after this MAX_TCP_HEADER
> area.
>
> Relying on extra padding in alloc_skb() is hacky anyway, as it
> depends on external factors (external to TCP stack)

That is true, but the fact is there is probably a fair amount of that 
going on without people even realizing it.  As I recall the smallest skb 
head you can allocate  on a 64 bit system currently is something like 
128 bytes which comes from the 512 byte slab, the next step up after 
that is a 640 byte head.  Since MAX_TCP_HEADER starts at 160 the 
likelihood of it not getting at least 16 bytes of padding is pretty low.

Thanks,

Alex

^ permalink raw reply

* Re: [net] e1000: Small packets may get corrupted during padding by HW
From: David Miller @ 2012-09-18  3:03 UTC (permalink / raw)
  To: alexander.duyck
  Cc: eric.dumazet, alexander.h.duyck, tushar.n.dave, john.r.fastabend,
	mirqus, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <5057E3F2.5090504@gmail.com>

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Mon, 17 Sep 2012 20:01:06 -0700

> Since MAX_TCP_HEADER starts at 160 the likelihood of it not getting
> at least 16 bytes of padding is pretty low.

I know it's not on many people's radar, but with SLOB it will happen
a lot probably.

^ permalink raw reply

* Re: [PATCH net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-18  3:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel, herbert
In-Reply-To: <20120917.225304.1071254088219136410.davem@davemloft.net>

On Mon, 2012-09-17 at 22:53 -0400, David Miller wrote:
> From: Cong Wang <amwang@redhat.com>
> Date: Tue, 18 Sep 2012 10:47:23 +0800
> 
> > I will remove this #ifdef and resend the whole patchset.
> 
> Or, alternatively, use IS_ENABLED() or a similar test which will take
> the modular case into account.
> 

Yeah, actually net->ipv6 is also defined with #if
IS_ENABLED(CONFIG_IPV6), so IS_ENABLED() is the right fix.

Thanks!


^ permalink raw reply

* Re: [net] e1000: Small packets may get corrupted during padding by HW
From: Alexander Duyck @ 2012-09-18  3:27 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, alexander.h.duyck, tushar.n.dave, john.r.fastabend,
	mirqus, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <20120917.230300.653531213751776624.davem@davemloft.net>

On 9/17/2012 8:03 PM, David Miller wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Mon, 17 Sep 2012 20:01:06 -0700
>
>> Since MAX_TCP_HEADER starts at 160 the likelihood of it not getting
>> at least 16 bytes of padding is pretty low.
> I know it's not on many people's radar, but with SLOB it will happen
> a lot probably.

That is true.  I hadn't thought about anything other than SLAB/SLUB.

It also just occurred to me that there might be some benefit in cache 
aligning the max header size.  It seems like doing something like that 
should reduce the overall memory footprint and would probably improve 
performance.

Thanks,

Alex

^ permalink raw reply

* Re: [V4 PATCH 0/8] csiostor: Chelsio FCoE offload driver submission
From: Naresh Kumar Inna @ 2012-09-18  4:24 UTC (permalink / raw)
  To: JBottomley@parallels.com
  Cc: naresh, linux-scsi@vger.kernel.org, Dimitrios Michailidis,
	Casey Leedom, netdev@vger.kernel.org, Chethan Seshadri
In-Reply-To: <1347470328-32490-1-git-send-email-naresh@chelsio.com>

Hi James,

Could you please consider merging version V4 of the driver patches, if
you think they are in good shape now?

Thanks,
Naresh.

On 9/12/2012 10:48 PM, Naresh Kumar Inna wrote:
> This is the initial submission of the Chelsio FCoE offload driver (csiostor)
> to the upstream kernel. This driver currently supports FCoE offload
> functionality over Chelsio T4-based 10Gb Converged Network Adapters.
> 
> The following patches contain the driver sources for csiostor driver and
> updates to firmware/hardware header files shared between csiostor,
> cxgb4 (Chelsio T4-based NIC driver) and cxgb4vf (Chelsio T4-based Virtual
> Function NIC driver). The csiostor driver is dependent on these
> header updates. These patches have been generated against scsi 'misc' branch.
> 
> csiostor is a low level SCSI driver that interfaces with PCI, SCSI midlayer and
> FC transport subsystems. This driver claims the FCoE PCIe function on
> Chelsio Converged Network Adapters. It relies on firmware events for slow path
> operations like discovery, thereby offloading session management. The driver
> programs firmware via Work Request interfaces for fast path I/O offload
> features.
> 
> In this version (V4), the patches have been re-arranged to make them bisectable.
> 
> Here is the brief description of patches:
> [V4 PATCH 1/8]: Updates to header files shared between cxgb4, cxgb4vf and
>                 csiostor.
> [V4 PATCH 2/8]: Header files part 1.
> [V4 PATCH 3/8]: Header files part 2.
> [V4 PATCH 4/8]: Driver initialization and Work Request services.
> [V4 PATCH 5/8]: FC transport interfaces and mailbox services.
> [V4 PATCH 6/8]: Local and remote port state tracking functionality.
> [V4 PATCH 7/8]: Interrupt handling and fast path I/O functionality.
> [V4 PATCH 8/8]: Hardware interface, Makefile and Kconfig changes.
> 
> Naresh Kumar Inna (8):
>   cxgb4/cxgb4vf: Chelsio FCoE offload driver submission (common header
>     updates).
>   csiostor: Chelsio FCoE offload driver submission (headers part 1).
>   csiostor: Chelsio FCoE offload driver submission (headers part 2).
>   csiostor: Chelsio FCoE offload driver submission (sources part 1).
>   csiostor: Chelsio FCoE offload driver submission (sources part 2).
>   csiostor: Chelsio FCoE offload driver submission (sources part 3).
>   csiostor: Chelsio FCoE offload driver submission (sources part 4).
>   csiostor: Chelsio FCoE offload driver submission (sources part 5).
> 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |    2 +-
>  drivers/net/ethernet/chelsio/cxgb4/sge.c        |   10 +-
>  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      |   16 +-
>  drivers/net/ethernet/chelsio/cxgb4/t4_msg.h     |    1 +
>  drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |   69 +-
>  drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  104 +-
>  drivers/net/ethernet/chelsio/cxgb4vf/sge.c      |   11 +-
>  drivers/scsi/Kconfig                            |    1 +
>  drivers/scsi/Makefile                           |    1 +
>  drivers/scsi/csiostor/Kconfig                   |   19 +
>  drivers/scsi/csiostor/Makefile                  |   11 +
>  drivers/scsi/csiostor/csio_attr.c               |  809 +++++
>  drivers/scsi/csiostor/csio_defs.h               |  108 +
>  drivers/scsi/csiostor/csio_hw.c                 | 4396 +++++++++++++++++++++++
>  drivers/scsi/csiostor/csio_hw.h                 |  666 ++++
>  drivers/scsi/csiostor/csio_init.c               | 1272 +++++++
>  drivers/scsi/csiostor/csio_init.h               |  158 +
>  drivers/scsi/csiostor/csio_isr.c                |  624 ++++
>  drivers/scsi/csiostor/csio_lnode.c              | 2148 +++++++++++
>  drivers/scsi/csiostor/csio_lnode.h              |  255 ++
>  drivers/scsi/csiostor/csio_mb.c                 | 1769 +++++++++
>  drivers/scsi/csiostor/csio_mb.h                 |  278 ++
>  drivers/scsi/csiostor/csio_rnode.c              |  889 +++++
>  drivers/scsi/csiostor/csio_rnode.h              |  141 +
>  drivers/scsi/csiostor/csio_scsi.c               | 2560 +++++++++++++
>  drivers/scsi/csiostor/csio_scsi.h               |  342 ++
>  drivers/scsi/csiostor/csio_wr.c                 | 1632 +++++++++
>  drivers/scsi/csiostor/csio_wr.h                 |  512 +++
>  drivers/scsi/csiostor/t4fw_api_stor.h           |  578 +++
>  29 files changed, 19345 insertions(+), 37 deletions(-)
>  create mode 100644 drivers/scsi/csiostor/Kconfig
>  create mode 100644 drivers/scsi/csiostor/Makefile
>  create mode 100644 drivers/scsi/csiostor/csio_attr.c
>  create mode 100644 drivers/scsi/csiostor/csio_defs.h
>  create mode 100644 drivers/scsi/csiostor/csio_hw.c
>  create mode 100644 drivers/scsi/csiostor/csio_hw.h
>  create mode 100644 drivers/scsi/csiostor/csio_init.c
>  create mode 100644 drivers/scsi/csiostor/csio_init.h
>  create mode 100644 drivers/scsi/csiostor/csio_isr.c
>  create mode 100644 drivers/scsi/csiostor/csio_lnode.c
>  create mode 100644 drivers/scsi/csiostor/csio_lnode.h
>  create mode 100644 drivers/scsi/csiostor/csio_mb.c
>  create mode 100644 drivers/scsi/csiostor/csio_mb.h
>  create mode 100644 drivers/scsi/csiostor/csio_rnode.c
>  create mode 100644 drivers/scsi/csiostor/csio_rnode.h
>  create mode 100644 drivers/scsi/csiostor/csio_scsi.c
>  create mode 100644 drivers/scsi/csiostor/csio_scsi.h
>  create mode 100644 drivers/scsi/csiostor/csio_wr.c
>  create mode 100644 drivers/scsi/csiostor/csio_wr.h
>  create mode 100644 drivers/scsi/csiostor/t4fw_api_stor.h
> 


^ permalink raw reply

* [PATCH v2 net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-18  4:29 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Herbert Xu, David S. Miller, Cong Wang

V2: use IS_ENABLED(CONFIG_IPV6) to fix a build error
    rebase to latest net-next

ipv6: add a new namespace for nf_conntrack_reasm
ipv6: unify conntrack reassembly expire code with standard one
ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static
ipv6: unify fragment thresh handling code

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---

 include/net/inet_frag.h                 |    2 +-
 include/net/ipv6.h                      |   32 +++++-
 include/net/net_namespace.h             |    3 +
 include/net/netns/conntrack.h           |    6 +
 net/ipv4/inet_fragment.c                |    9 +-
 net/ipv4/ip_fragment.c                  |    5 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c |  196 ++++++++++++++++---------------
 net/ipv6/reassembly.c                   |   88 ++++----------
 8 files changed, 176 insertions(+), 165 deletions(-)


^ permalink raw reply

* [PATCH 4/4] ipv6: unify fragment thresh handling code
From: Cong Wang @ 2012-09-18  4:29 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller
In-Reply-To: <1347942582-23962-1-git-send-email-amwang@redhat.com>

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/inet_frag.h                 |    2 +-
 net/ipv4/inet_fragment.c                |    9 +++++++--
 net/ipv4/ip_fragment.c                  |    5 ++---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    8 +++-----
 net/ipv6/reassembly.c                   |   16 +++++-----------
 5 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5098ee7..32786a0 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -61,7 +61,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
 void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
 void inet_frag_destroy(struct inet_frag_queue *q,
 				struct inet_frags *f, int *work);
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f);
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock);
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 85190e6..4750d2b 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -89,7 +89,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
 	nf->low_thresh = 0;
 
 	local_bh_disable();
-	inet_frag_evictor(nf, f);
+	inet_frag_evictor(nf, f, true);
 	local_bh_enable();
 }
 EXPORT_SYMBOL(inet_frags_exit_net);
@@ -158,11 +158,16 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
 }
 EXPORT_SYMBOL(inet_frag_destroy);
 
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f)
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
 {
 	struct inet_frag_queue *q;
 	int work, evicted = 0;
 
+	if (!force) {
+		if (atomic_read(&nf->mem) <= nf->high_thresh)
+			return 0;
+	}
+
 	work = atomic_read(&nf->mem) - nf->low_thresh;
 	while (work > 0) {
 		read_lock(&f->lock);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fa6a12c..448e685 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -219,7 +219,7 @@ static void ip_evictor(struct net *net)
 {
 	int evicted;
 
-	evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags);
+	evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags, false);
 	if (evicted)
 		IP_ADD_STATS_BH(net, IPSTATS_MIB_REASMFAILS, evicted);
 }
@@ -684,8 +684,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
 
 	/* Start by cleaning up the memory. */
-	if (atomic_read(&net->ipv4.frags.mem) > net->ipv4.frags.high_thresh)
-		ip_evictor(net);
+	ip_evictor(net);
 
 	/* Lookup (or create) queue header */
 	if ((qp = ip_find(net, ip_hdr(skb), user)) != NULL) {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index ecefb31..22e9e55 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -563,11 +563,9 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	hdr = ipv6_hdr(clone);
 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
 
-	if (atomic_read(&net->nf_ct.frags.mem) > net->nf_ct.frags.high_thresh) {
-		local_bh_disable();
-		inet_frag_evictor(&net->nf_ct.frags, &nf_frags);
-		local_bh_enable();
-	}
+	local_bh_disable();
+	inet_frag_evictor(&net->nf_ct.frags, &nf_frags, false);
+	local_bh_enable();
 
 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
 	if (fq == NULL) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index cac690c..a1610ac 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -131,15 +131,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-static void ip6_evictor(struct net *net, struct inet6_dev *idev)
-{
-	int evicted;
-
-	evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags);
-	if (evicted)
-		IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
-}
-
 void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, struct inet_frags *frags)
 {
 	struct net_device *dev = NULL;
@@ -514,6 +505,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 	struct frag_queue *fq;
 	const struct ipv6hdr *hdr = ipv6_hdr(skb);
 	struct net *net = dev_net(skb_dst(skb)->dev);
+	int evicted;
 
 	IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS);
 
@@ -538,8 +530,10 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		return 1;
 	}
 
-	if (atomic_read(&net->ipv6.frags.mem) > net->ipv6.frags.high_thresh)
-		ip6_evictor(net, ip6_dst_idev(skb_dst(skb)));
+	evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags, false);
+	if (evicted)
+		IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
+				 IPSTATS_MIB_REASMFAILS, evicted);
 
 	fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr);
 	if (fq != NULL) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/4] ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
From: Cong Wang @ 2012-09-18  4:29 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller
In-Reply-To: <1347942582-23962-1-git-send-email-amwang@redhat.com>

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/ipv6.h    |   13 +++++++++++--
 net/ipv6/reassembly.c |   10 ----------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 81d4455..979bf6c 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -271,8 +271,17 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 
 extern bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb);
 
-int ip6_frag_nqueues(struct net *net);
-int ip6_frag_mem(struct net *net);
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int ip6_frag_nqueues(struct net *net)
+{
+	return net->ipv6.frags.nqueues;
+}
+
+static inline int ip6_frag_mem(struct net *net)
+{
+	return atomic_read(&net->ipv6.frags.mem);
+}
+#endif
 
 #define IPV6_FRAG_HIGH_THRESH	(256 * 1024)	/* 262144 */
 #define IPV6_FRAG_LOW_THRESH	(192 * 1024)	/* 196608 */
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 8508c8c..cac690c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -67,16 +67,6 @@ struct ip6frag_skb_cb
 
 static struct inet_frags ip6_frags;
 
-int ip6_frag_nqueues(struct net *net)
-{
-	return net->ipv6.frags.nqueues;
-}
-
-int ip6_frag_mem(struct net *net)
-{
-	return atomic_read(&net->ipv6.frags.mem);
-}
-
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			  struct net_device *dev);
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 1/4] ipv6: add a new namespace for nf_conntrack_reasm
From: Cong Wang @ 2012-09-18  4:29 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller, Patrick McHardy, Pablo Neira Ayuso
In-Reply-To: <1347942582-23962-1-git-send-email-amwang@redhat.com>

As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/net_namespace.h             |    3 +
 include/net/netns/conntrack.h           |    6 ++
 net/ipv6/netfilter/nf_conntrack_reasm.c |  135 +++++++++++++++++++++----------
 3 files changed, 102 insertions(+), 42 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5ae57f1..5c467bb 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -93,6 +93,9 @@ struct net {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct netns_ct		ct;
 #endif
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+	struct netns_nf_ct	nf_ct;
+#endif
 	struct sock		*nfnl;
 	struct sock		*nfnl_stash;
 #endif
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index a1d83cc..13503be 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -96,4 +96,10 @@ struct netns_ct {
 #endif
 	char			*slabname;
 };
+
+struct netns_nf_ct {
+	struct netns_sysctl_ipv6 sysctl;
+	struct netns_frags	frags;
+};
+
 #endif
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index f94fb3a..fff5b71 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -71,27 +71,26 @@ struct nf_ct_frag6_queue
 };
 
 static struct inet_frags nf_frags;
-static struct netns_frags nf_init_frags;
 
 #ifdef CONFIG_SYSCTL
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
 		.procname	= "nf_conntrack_frag6_timeout",
-		.data		= &nf_init_frags.timeout,
+		.data		= &init_net.nf_ct.frags.timeout,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_low_thresh",
-		.data		= &nf_init_frags.low_thresh,
+		.data		= &init_net.nf_ct.frags.low_thresh,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_high_thresh",
-		.data		= &nf_init_frags.high_thresh,
+		.data		= &init_net.nf_ct.frags.high_thresh,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
@@ -99,7 +98,54 @@ static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{ }
 };
 
-static struct ctl_table_header *nf_ct_frag6_sysctl_header;
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+	struct ctl_table *table;
+	struct ctl_table_header *hdr;
+
+	table = nf_ct_frag6_sysctl_table;
+	if (!net_eq(net, &init_net)) {
+		table = kmemdup(table, sizeof(nf_ct_frag6_sysctl_table), GFP_KERNEL);
+		if (table == NULL)
+			goto err_alloc;
+
+		table[0].data = &net->ipv6.frags.high_thresh;
+		table[1].data = &net->ipv6.frags.low_thresh;
+		table[2].data = &net->ipv6.frags.timeout;
+	}
+
+	hdr = register_net_sysctl(net, "net/netfilter", table);
+	if (hdr == NULL)
+		goto err_reg;
+
+	net->ipv6.sysctl.frags_hdr = hdr;
+	return 0;
+
+err_reg:
+	if (!net_eq(net, &init_net))
+		kfree(table);
+err_alloc:
+	return -ENOMEM;
+}
+
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = net->nf_ct.sysctl.frags_hdr->ctl_table_arg;
+	unregister_net_sysctl_table(net->nf_ct.sysctl.frags_hdr);
+	if (!net_eq(net, &init_net))
+		kfree(table);
+}
+
+#else
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+	return 0;
+}
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+}
 #endif
 
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
@@ -131,13 +177,6 @@ static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
 	inet_frag_kill(&fq->q, &nf_frags);
 }
 
-static void nf_ct_frag6_evictor(void)
-{
-	local_bh_disable();
-	inet_frag_evictor(&nf_init_frags, &nf_frags);
-	local_bh_enable();
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
 	struct nf_ct_frag6_queue *fq;
@@ -159,8 +198,8 @@ out:
 
 /* Creation primitives. */
 
-static __inline__ struct nf_ct_frag6_queue *
-fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
+static __inline__ struct nf_ct_frag6_queue*
+fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
@@ -174,7 +213,7 @@ fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 	read_lock_bh(&nf_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
 
-	q = inet_frag_find(&nf_init_frags, &nf_frags, &arg, hash);
+	q = inet_frag_find(&net->nf_ct.frags, &nf_frags, &arg, hash);
 	local_bh_enable();
 	if (q == NULL)
 		goto oom;
@@ -186,7 +225,7 @@ oom:
 }
 
 
-static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
+static int nf_ct_frag6_queue(struct nf_ct_frag6_queue*fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
@@ -312,7 +351,7 @@ found:
 	fq->q.meat += skb->len;
 	if (payload_len > fq->q.max_size)
 		fq->q.max_size = payload_len;
-	atomic_add(skb->truesize, &nf_init_frags.mem);
+	atomic_add(skb->truesize, &fq->q.net->mem);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
@@ -322,7 +361,7 @@ found:
 		fq->q.last_in |= INET_FRAG_FIRST_IN;
 	}
 	write_lock(&nf_frags.lock);
-	list_move_tail(&fq->q.lru_list, &nf_init_frags.lru_list);
+	list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
 	write_unlock(&nf_frags.lock);
 	return 0;
 
@@ -391,7 +430,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 		clone->ip_summed = head->ip_summed;
 
 		NFCT_FRAG6_CB(clone)->orig = NULL;
-		atomic_add(clone->truesize, &nf_init_frags.mem);
+		atomic_add(clone->truesize, &fq->q.net->mem);
 	}
 
 	/* We have to remove fragment header from datagram and to relocate
@@ -415,7 +454,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
 	}
-	atomic_sub(head->truesize, &nf_init_frags.mem);
+	atomic_sub(head->truesize, &fq->q.net->mem);
 
 	head->local_df = 1;
 	head->next = NULL;
@@ -527,6 +566,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 {
 	struct sk_buff *clone;
 	struct net_device *dev = skb->dev;
+	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev) : dev_net(skb->dev);
 	struct frag_hdr *fhdr;
 	struct nf_ct_frag6_queue *fq;
 	struct ipv6hdr *hdr;
@@ -560,10 +600,13 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	hdr = ipv6_hdr(clone);
 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
 
-	if (atomic_read(&nf_init_frags.mem) > nf_init_frags.high_thresh)
-		nf_ct_frag6_evictor();
+	if (atomic_read(&net->nf_ct.frags.mem) > net->nf_ct.frags.high_thresh) {
+		local_bh_disable();
+		inet_frag_evictor(&net->nf_ct.frags, &nf_frags);
+		local_bh_enable();
+	}
 
-	fq = fq_find(fhdr->identification, user, &hdr->saddr, &hdr->daddr);
+	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
 	if (fq == NULL) {
 		pr_debug("Can't find and can't create new queue\n");
 		goto ret_orig;
@@ -621,8 +664,31 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 	nf_conntrack_put_reasm(skb);
 }
 
+static int nf_ct_net_init(struct net *net)
+{
+	net->nf_ct.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->nf_ct.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->nf_ct.frags.timeout = IPV6_FRAG_TIMEOUT;
+	inet_frags_init_net(&net->nf_ct.frags);
+
+	return nf_ct_frag6_sysctl_register(net);
+}
+
+static void nf_ct_net_exit(struct net *net)
+{
+	nf_ct_frags6_sysctl_unregister(net);
+	inet_frags_exit_net(&net->nf_ct.frags, &nf_frags);
+}
+
+static struct pernet_operations nf_ct_net_ops = {
+	.init = nf_ct_net_init,
+	.exit = nf_ct_net_exit,
+};
+
 int nf_ct_frag6_init(void)
 {
+	int ret = 0;
+
 	nf_frags.hashfn = nf_hashfn;
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
@@ -631,32 +697,17 @@ int nf_ct_frag6_init(void)
 	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
 	nf_frags.secret_interval = 10 * 60 * HZ;
-	nf_init_frags.timeout = IPV6_FRAG_TIMEOUT;
-	nf_init_frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	nf_init_frags.low_thresh = IPV6_FRAG_LOW_THRESH;
-	inet_frags_init_net(&nf_init_frags);
 	inet_frags_init(&nf_frags);
 
-#ifdef CONFIG_SYSCTL
-	nf_ct_frag6_sysctl_header = register_net_sysctl(&init_net, "net/netfilter",
-							nf_ct_frag6_sysctl_table);
-	if (!nf_ct_frag6_sysctl_header) {
+	ret = register_pernet_subsys(&nf_ct_net_ops);
+	if (ret)
 		inet_frags_fini(&nf_frags);
-		return -ENOMEM;
-	}
-#endif
 
-	return 0;
+	return ret;
 }
 
 void nf_ct_frag6_cleanup(void)
 {
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(nf_ct_frag6_sysctl_header);
-	nf_ct_frag6_sysctl_header = NULL;
-#endif
+	unregister_pernet_subsys(&nf_ct_net_ops);
 	inet_frags_fini(&nf_frags);
-
-	nf_init_frags.low_thresh = 0;
-	nf_ct_frag6_evictor();
 }
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/4] ipv6: unify conntrack reassembly expire code with standard one
From: Cong Wang @ 2012-09-18  4:29 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller, Hideaki YOSHIFUJI, Patrick McHardy,
	Pablo Neira Ayuso
In-Reply-To: <1347942582-23962-1-git-send-email-amwang@redhat.com>

Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UPD packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/ipv6.h                      |   19 ++++++++
 net/ipv6/netfilter/nf_conntrack_reasm.c |   71 +++++++-----------------------
 net/ipv6/reassembly.c                   |   62 ++++++++-------------------
 3 files changed, 54 insertions(+), 98 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 9bed5d4..81d4455 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -411,6 +411,25 @@ struct ip6_create_arg {
 void ip6_frag_init(struct inet_frag_queue *q, void *a);
 bool ip6_frag_match(struct inet_frag_queue *q, void *a);
 
+/*
+ *	Equivalent of ipv4 struct ip
+ */
+struct frag_queue {
+	struct inet_frag_queue	q;
+
+	__be32			id;		/* fragment id		*/
+	u32			user;
+	struct in6_addr		saddr;
+	struct in6_addr		daddr;
+
+	int			iif;
+	unsigned int		csum;
+	__u16			nhoffset;
+};
+
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
+			   struct inet_frags *frags);
+
 static inline bool ipv6_addr_any(const struct in6_addr *a)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index fff5b71..ecefb31 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -57,19 +57,6 @@ struct nf_ct_frag6_skb_cb
 
 #define NFCT_FRAG6_CB(skb)	((struct nf_ct_frag6_skb_cb*)((skb)->cb))
 
-struct nf_ct_frag6_queue
-{
-	struct inet_frag_queue	q;
-
-	__be32			id;		/* fragment id		*/
-	u32			user;
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-
-	unsigned int		csum;
-	__u16			nhoffset;
-};
-
 static struct inet_frags nf_frags;
 
 #ifdef CONFIG_SYSCTL
@@ -150,9 +137,9 @@ static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
 
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
 {
-	const struct nf_ct_frag6_queue *nq;
+	const struct frag_queue *nq;
 
-	nq = container_of(q, struct nf_ct_frag6_queue, q);
+	nq = container_of(q, struct frag_queue, q);
 	return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd);
 }
 
@@ -162,43 +149,19 @@ static void nf_skb_free(struct sk_buff *skb)
 		kfree_skb(NFCT_FRAG6_CB(skb)->orig);
 }
 
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct nf_ct_frag6_queue *fq)
-{
-	inet_frag_put(&fq->q, &nf_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
-{
-	inet_frag_kill(&fq->q, &nf_frags);
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
-	struct nf_ct_frag6_queue *fq;
-
-	fq = container_of((struct inet_frag_queue *)data,
-			struct nf_ct_frag6_queue, q);
-
-	spin_lock(&fq->q.lock);
+	struct frag_queue *fq;
+	struct net *net;
 
-	if (fq->q.last_in & INET_FRAG_COMPLETE)
-		goto out;
+	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+	net = container_of(fq->q.net, struct net, nf_ct.frags);
 
-	fq_kill(fq);
-
-out:
-	spin_unlock(&fq->q.lock);
-	fq_put(fq);
+	ip6_expire_frag_queue(net, fq, &nf_frags);
 }
 
 /* Creation primitives. */
-
-static __inline__ struct nf_ct_frag6_queue*
+static __inline__ struct frag_queue *
 fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 {
 	struct inet_frag_queue *q;
@@ -218,14 +181,14 @@ fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_a
 	if (q == NULL)
 		goto oom;
 
-	return container_of(q, struct nf_ct_frag6_queue, q);
+	return container_of(q, struct frag_queue, q);
 
 oom:
 	return NULL;
 }
 
 
-static int nf_ct_frag6_queue(struct nf_ct_frag6_queue*fq, struct sk_buff *skb,
+static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
@@ -366,7 +329,7 @@ found:
 	return 0;
 
 discard_fq:
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &nf_frags);
 err:
 	return -1;
 }
@@ -381,12 +344,12 @@ err:
  *	the last and the first frames arrived and all the bits are here.
  */
 static struct sk_buff *
-nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
+nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 {
 	struct sk_buff *fp, *op, *head = fq->q.fragments;
 	int    payload_len;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &nf_frags);
 
 	WARN_ON(head == NULL);
 	WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -568,7 +531,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	struct net_device *dev = skb->dev;
 	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev) : dev_net(skb->dev);
 	struct frag_hdr *fhdr;
-	struct nf_ct_frag6_queue *fq;
+	struct frag_queue *fq;
 	struct ipv6hdr *hdr;
 	int fhoff, nhoff;
 	u8 prevhdr;
@@ -617,7 +580,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	if (nf_ct_frag6_queue(fq, clone, fhdr, nhoff) < 0) {
 		spin_unlock_bh(&fq->q.lock);
 		pr_debug("Can't insert skb to queue\n");
-		fq_put(fq);
+		inet_frag_put(&fq->q, &nf_frags);
 		goto ret_orig;
 	}
 
@@ -629,7 +592,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	}
 	spin_unlock_bh(&fq->q.lock);
 
-	fq_put(fq);
+	inet_frag_put(&fq->q, &nf_frags);
 	return ret_skb;
 
 ret_orig:
@@ -693,7 +656,7 @@ int nf_ct_frag6_init(void)
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
 	nf_frags.skb_free = nf_skb_free;
-	nf_frags.qsize = sizeof(struct nf_ct_frag6_queue);
+	nf_frags.qsize = sizeof(struct frag_queue);
 	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
 	nf_frags.secret_interval = 10 * 60 * HZ;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4ff9af6..8508c8c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -65,24 +65,6 @@ struct ip6frag_skb_cb
 #define FRAG6_CB(skb)	((struct ip6frag_skb_cb*)((skb)->cb))
 
 
-/*
- *	Equivalent of ipv4 struct ipq
- */
-
-struct frag_queue
-{
-	struct inet_frag_queue	q;
-
-	__be32			id;		/* fragment id		*/
-	u32			user;
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-
-	int			iif;
-	unsigned int		csum;
-	__u16			nhoffset;
-};
-
 static struct inet_frags ip6_frags;
 
 int ip6_frag_nqueues(struct net *net)
@@ -159,21 +141,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct frag_queue *fq)
-{
-	inet_frag_put(&fq->q, &ip6_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct frag_queue *fq)
-{
-	inet_frag_kill(&fq->q, &ip6_frags);
-}
-
 static void ip6_evictor(struct net *net, struct inet6_dev *idev)
 {
 	int evicted;
@@ -183,22 +150,17 @@ static void ip6_evictor(struct net *net, struct inet6_dev *idev)
 		IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
 }
 
-static void ip6_frag_expire(unsigned long data)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, struct inet_frags *frags)
 {
-	struct frag_queue *fq;
 	struct net_device *dev = NULL;
-	struct net *net;
-
-	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 
 	spin_lock(&fq->q.lock);
 
 	if (fq->q.last_in & INET_FRAG_COMPLETE)
 		goto out;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, frags);
 
-	net = container_of(fq->q.net, struct net, ipv6.frags);
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, fq->iif);
 	if (!dev)
@@ -222,7 +184,19 @@ out_rcu_unlock:
 	rcu_read_unlock();
 out:
 	spin_unlock(&fq->q.lock);
-	fq_put(fq);
+	inet_frag_put(&fq->q, frags);
+}
+EXPORT_SYMBOL(ip6_expire_frag_queue);
+
+static void ip6_frag_expire(unsigned long data)
+{
+	struct frag_queue *fq;
+	struct net *net;
+
+	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+	net = container_of(fq->q.net, struct net, ipv6.frags);
+
+	ip6_expire_frag_queue(net, fq, &ip6_frags);
 }
 
 static __inline__ struct frag_queue *
@@ -391,7 +365,7 @@ found:
 	return -1;
 
 discard_fq:
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &ip6_frags);
 err:
 	IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 		      IPSTATS_MIB_REASMFAILS);
@@ -417,7 +391,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	unsigned int nhoff;
 	int sum_truesize;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &ip6_frags);
 
 	/* Make the one we just received the head. */
 	if (prev) {
@@ -586,7 +560,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		spin_unlock(&fq->q.lock);
-		fq_put(fq);
+		inet_frag_put(&fq->q, &ip6_frags);
 		return ret;
 	}
 
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH] asix: Support DLink DUB-E100 H/W Ver C1
From: Christian Riesch @ 2012-09-18  5:41 UTC (permalink / raw)
  To: Søren Holm; +Cc: netdev, stable
In-Reply-To: <1347909800-3056-1-git-send-email-sgh@sgh.dk>

Hi Søren,

On Mon, Sep 17, 2012 at 9:23 PM, Søren Holm <sgh@sgh.dk> wrote:
> Signed-off-by: Søren Holm <sgh@sgh.dk>
> Cc: stable@vger.kernel.org
> ---
>  drivers/net/usb/asix.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
> index 3ae80ec..12f372e 100644
> --- a/drivers/net/usb/asix.c
> +++ b/drivers/net/usb/asix.c
> @@ -1604,6 +1604,10 @@ static const struct usb_device_id        products [] = {
>         USB_DEVICE (0x2001, 0x3c05),
>         .driver_info = (unsigned long) &ax88772_info,
>  }, {
> +       // DLink DUB-E100 H/W Ver C1
> +       USB_DEVICE (0x2001, 0x1a02),
> +       .driver_info = (unsigned long) &ax88772_info,
> +}, {

This will not apply to recent kernels, drivers/net/usb/asix.c has been
split into several files, please make these changes in
drivers/net/usb/asix_devices.c.

Regards, Christian

>         // Linksys USB1000
>         USB_DEVICE (0x1737, 0x0039),
>         .driver_info = (unsigned long) &ax88178_info,
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next.git 3/8 (V2)] stmmac: add the initial tx coalesce schema
From: Giuseppe CAVALLARO @ 2012-09-18  5:41 UTC (permalink / raw)
  To: David Miller, bhutchings; +Cc: netdev
In-Reply-To: <5052DE7E.8070704@st.com>

Hello David, Ben,

On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote:
> On 9/13/2012 10:23 PM, David Miller wrote:
>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
>> Date: Tue, 11 Sep 2012 08:55:09 +0200
>>
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(&priv->tx_lock, flags);
>>>
>>> -    spin_lock(&priv->tx_lock);
>>> +    priv->xstats.tx_clean++;
>>
>> You are changing the locking here for the sake of the new timer.
>>
>> But timers run in software interrupt context, so this change is
>> completely unnecessary since NAPI runs in software interrupt context
>> as well, and neither timers nor NAPI run in hardware interrupts
>> context.
>
> Indeed It can be called by the ISR too in this new implementation.
> I have added the spin_lock_irqsave/restore otherwise, testing with
> CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP.

sorry if I disturb you again, any news on these patches?
Please, let me know.

Regards
Peppe

>
> [    8.030000]
> [    8.030000] =================================
> [    8.030000] [ INFO: inconsistent lock state ]
> [    8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted
> [    8.030000] ---------------------------------
> [    8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [    8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [    8.030000]  (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>]
> stmmac_tx+0x1c/0x388
> [    8.030000] {HARDIRQ-ON-W} state was registered at:
> [    8.030000]   [<800562b4>] __lock_acquire+0x638/0x179c
> [    8.030000]   [<80057884>] lock_acquire+0x60/0x74
> [    8.030000]   [<80428a08>] _raw_spin_lock+0x40/0x50
> [    8.030000]   [<802651d8>] stmmac_tx+0x1c/0x388
> [    8.030000]   [<80026be0>] run_timer_softirq+0x180/0x23c
> [    8.030000]   [<80020ccc>] __do_softirq+0xa0/0x114
> [    8.030000]   [<80021204>] irq_exit+0x58/0x7c
> [    8.030000]   [<8000dc80>] handle_IRQ+0x7c/0xb8
> [    8.030000]   [<80008464>] gic_handle_irq+0x34/0x58
> [    8.030000]   [<80429684>] __irq_svc+0x44/0x78
> [    8.030000]   [<8001c3f4>] vprintk+0x41c/0x480
> [    8.030000]   [<8042097c>] printk+0x18/0x24
> [    8.030000]   [<805aef6c>] prepare_namespace+0x1c/0x1a4
> [    8.030000]   [<805ae980>] kernel_init+0x1c8/0x20c
> [    8.030000]   [<8000deb8>] kernel_thread_exit+0x0/0x8
> [    8.030000] irq event stamp: 254745
> [    8.030000] hardirqs last  enabled at (254744): [<80429240>]
> _raw_spin_unlock_irqrestore+0x3c/0x6c
> [    8.030000] hardirqs last disabled at (254745): [<80429674>]
> __irq_svc+0x34/0x78
> [    8.030000] softirqs last  enabled at (254741): [<8035d964>]
> dev_queue_xmit+0x6a4/0x724
> [    8.030000] softirqs last disabled at (254737): [<8035d2d4>]
> dev_queue_xmit+0x14/0x724
> [    8.030000]
> [    8.030000] other info that might help us debug this:
> [    8.030000]  Possible unsafe locking scenario:
> [    8.030000]
> [    8.030000]        CPU0
> [    8.030000]        ----
> [    8.030000]   lock(&(&priv->tx_lock)->rlock);
> [    8.030000]   <Interrupt>
> [    8.030000]     lock(&(&priv->tx_lock)->rlock);
> [    8.030000]
> [    8.030000]  *** DEADLOCK ***
>
>> Therefore, disabling hardware interrupts for this lock is unnecessary
>> and will decrease performance.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: [net] e1000: Small packets may get corrupted during padding by HW
From: Eric Dumazet @ 2012-09-18  5:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, alexander.h.duyck, tushar.n.dave, john.r.fastabend,
	mirqus, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <5057EA05.8020005@gmail.com>

On Mon, 2012-09-17 at 20:27 -0700, Alexander Duyck wrote:

> It also just occurred to me that there might be some benefit in cache 
> aligning the max header size.  It seems like doing something like that 
> should reduce the overall memory footprint and would probably improve 
> performance.

Given that most ACK packets are 66 bytes (14 ethernet + 20 IP + 32 TCP),
I am not sure we need to make any tweak on alignment ?

^ permalink raw reply

* [Patch net-next] l2tp: fix compile error when CONFIG_IPV6=m and CONFIG_L2TP=y
From: Cong Wang @ 2012-09-18  5:54 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Cong Wang

When CONFIG_IPV6=m and CONFIG_L2TP=y, I got the following compile error:

  LD      init/built-in.o
net/built-in.o: In function `l2tp_xmit_core':
l2tp_core.c:(.text+0x147781): undefined reference to `inet6_csk_xmit'
net/built-in.o: In function `l2tp_tunnel_create':
(.text+0x149067): undefined reference to `udpv6_encap_enable'
net/built-in.o: In function `l2tp_ip6_recvmsg':
l2tp_ip6.c:(.text+0x14e991): undefined reference to `ipv6_recv_error'
net/built-in.o: In function `l2tp_ip6_sendmsg':
l2tp_ip6.c:(.text+0x14ec64): undefined reference to `fl6_sock_lookup'
l2tp_ip6.c:(.text+0x14ed6b): undefined reference to `datagram_send_ctl'
l2tp_ip6.c:(.text+0x14eda0): undefined reference to `fl6_sock_lookup'
l2tp_ip6.c:(.text+0x14ede5): undefined reference to `fl6_merge_options'
l2tp_ip6.c:(.text+0x14edf4): undefined reference to `ipv6_fixup_options'
l2tp_ip6.c:(.text+0x14ee5d): undefined reference to `fl6_update_dst'
l2tp_ip6.c:(.text+0x14eea3): undefined reference to `ip6_dst_lookup_flow'
l2tp_ip6.c:(.text+0x14eee7): undefined reference to `ip6_dst_hoplimit'
l2tp_ip6.c:(.text+0x14ef8b): undefined reference to `ip6_append_data'
l2tp_ip6.c:(.text+0x14ef9d): undefined reference to `ip6_flush_pending_frames'
l2tp_ip6.c:(.text+0x14efe2): undefined reference to `ip6_push_pending_frames'
net/built-in.o: In function `l2tp_ip6_destroy_sock':
l2tp_ip6.c:(.text+0x14f090): undefined reference to `ip6_flush_pending_frames'
l2tp_ip6.c:(.text+0x14f0a0): undefined reference to `inet6_destroy_sock'
net/built-in.o: In function `l2tp_ip6_connect':
l2tp_ip6.c:(.text+0x14f14d): undefined reference to `ip6_datagram_connect'
net/built-in.o: In function `l2tp_ip6_bind':
l2tp_ip6.c:(.text+0x14f4fe): undefined reference to `ipv6_chk_addr'
net/built-in.o: In function `l2tp_ip6_init':
l2tp_ip6.c:(.init.text+0x73fa): undefined reference to `inet6_add_protocol'
l2tp_ip6.c:(.init.text+0x740c): undefined reference to `inet6_register_protosw'
net/built-in.o: In function `l2tp_ip6_exit':
l2tp_ip6.c:(.exit.text+0x1954): undefined reference to `inet6_unregister_protosw'
l2tp_ip6.c:(.exit.text+0x1965): undefined reference to `inet6_del_protocol'
net/built-in.o:(.rodata+0xf2d0): undefined reference to `inet6_release'
net/built-in.o:(.rodata+0xf2d8): undefined reference to `inet6_bind'
net/built-in.o:(.rodata+0xf308): undefined reference to `inet6_ioctl'
net/built-in.o:(.data+0x1af40): undefined reference to `ipv6_setsockopt'
net/built-in.o:(.data+0x1af48): undefined reference to `ipv6_getsockopt'
net/built-in.o:(.data+0x1af50): undefined reference to `compat_ipv6_setsockopt'
net/built-in.o:(.data+0x1af58): undefined reference to `compat_ipv6_getsockopt'
make: *** [vmlinux] Error 1

This is due to l2tp uses symbols from IPV6, so when l2tp is
builtin, IPV6 has to be builtin too.

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
diff --git a/net/l2tp/Kconfig b/net/l2tp/Kconfig
index 4b1e717..3f3c514 100644
--- a/net/l2tp/Kconfig
+++ b/net/l2tp/Kconfig
@@ -4,6 +4,7 @@
 
 menuconfig L2TP
 	tristate "Layer Two Tunneling Protocol (L2TP)"
+	select IPV6 if L2TP=y
 	depends on INET
 	---help---
 	  Layer Two Tunneling Protocol

^ permalink raw reply related

* Re: [net] e1000: Small packets may get corrupted during padding by HW
From: Alexander Duyck @ 2012-09-18  5:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, alexander.h.duyck, tushar.n.dave, john.r.fastabend,
	mirqus, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <1347947120.26523.207.camel@edumazet-glaptop>

On 9/17/2012 10:45 PM, Eric Dumazet wrote:
> On Mon, 2012-09-17 at 20:27 -0700, Alexander Duyck wrote:
>
>> It also just occurred to me that there might be some benefit in cache
>> aligning the max header size.  It seems like doing something like that
>> should reduce the overall memory footprint and would probably improve
>> performance.
> Given that most ACK packets are 66 bytes (14 ethernet + 20 IP + 32 TCP),
> I am not sure we need to make any tweak on alignment ?
I'm honestly not sure myself.  I will probably spend a few hours 
tomorrow tweaking a few things to test and see if there is any gain to 
be had there.  The only reason why it occurred to me is that it really 
isn't too far off from what we did back on the Rx side, except for there 
we were aligning at the start of the buffer and working our way up.

Thanks,

Alex

^ permalink raw reply

* Re: [patch net] sky2: fix rx filter setup on link up
From: Jiri Pirko @ 2012-09-18  6:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, davem, mlindner, linux-kernel
In-Reply-To: <20120917141507.7528b3ee@nehalam.linuxnetplumber.net>

Mon, Sep 17, 2012 at 11:15:07PM CEST, shemminger@vyatta.com wrote:
>On Mon, 17 Sep 2012 22:47:24 +0200
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Mon, Sep 17, 2012 at 06:12:14PM CEST, shemminger@vyatta.com wrote:
>> >On Mon, 17 Sep 2012 17:10:17 +0200
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >
>> >> In my case I have following problem. sky2_set_multicast() sets registers
>> >> GM_MC_ADDR_H[1-4] correctly to:
>> >> 0000 0800 0001 0410
>> >> However, when adapter gets link and sky2_link_up() is called, the values
>> >> are for some reason different:
>> >> 0000 0800 0016 0410
>> >
>> >Rather than papering over the problem, it would be better to
>> >trace back what is setting those registers and fix that code.
>> 
>> Yes, I did that. No code at sky2.[ch] is writing to this registers other
>> than sky2_set_multicast() and sky2_gmac_reset() (I hooked on sky2_write*()).
>> So I strongly believe this is a HW issue (maybe only issue of my revision
>> "Yukon-2 EC chip revision 2")
>> 
>> >
>> >> This in my case prevents iface to be able to receive packets with dst mac
>> >> 01:80:C2:00:00:02 (LACPDU dst mac), which I set up previously by
>> >> SIOCADDMULTI.
>> >> 
>> >> So remember computed rx_filter data and write it to GM_MC_ADDR_H[1-4] on
>> >> link_up.
>> >>
>> >
>> >Please do some more root cause analysis. Just save/restoring the
>> >registers is just a temporary workaround.
>
>Are you sure it isn't IPv6 or something else setting additional mulitcast
>addresses. You may need to instrument the set_multicast call.

I'm very sure that no code in sky2 is writing to GM_MC_ADDR_H[1-4] the
change I see in sky2_link_up(). When sky2_set_multicast() is called
again for any reason, the issue goes away and lacpdus are coming in.

I also experimentally used sky2_set_multicast() called from
sky2_link_up() and it helped as well.

^ permalink raw reply

* Re: [patch net] sky2: fix rx filter setup on link up
From: Jiri Pirko @ 2012-09-18  6:15 UTC (permalink / raw)
  To: Mirko Lindner
  Cc: Stephen Hemminger, netdev@vger.kernel.org, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <175CCF5F49938B4D99B2E3EF7F558EBE1B63898F0B@SC-VEXCH4.marvell.com>

Tue, Sep 18, 2012 at 02:38:52AM CEST, mlindner@marvell.com wrote:
>>Mon, Sep 17, 2012 at 06:12:14PM CEST, shemminger@vyatta.com wrote:
>>>On Mon, 17 Sep 2012 17:10:17 +0200
>>>Jiri Pirko <jiri@resnulli.us> wrote:
>>>
>>>> In my case I have following problem. sky2_set_multicast() sets registers
>>>> GM_MC_ADDR_H[1-4] correctly to:
>>>> 0000 0800 0001 0410
>>>> However, when adapter gets link and sky2_link_up() is called, the values
>>>> are for some reason different:
>>>> 0000 0800 0016 0410
>>>
>>>Rather than papering over the problem, it would be better to
>>>trace back what is setting those registers and fix that code.
>
>>Yes, I did that. No code at sky2.[ch] is writing to this registers other
>>than sky2_set_multicast() and sky2_gmac_reset() (I hooked on sky2_write*()).
>>So I strongly believe this is a HW issue (maybe only issue of my revision
>>"Yukon-2 EC chip revision 2")
>
>I would like to check the registers as soon as I'm back in my office next week and report my findings.

Okay, I'll wait for you. If you need more info from my side, please do
not hesitate to ask. Thanks!

>Could you also please check the hint from Stephen?

^ permalink raw reply

* [Patch net-next] netpoll: call ->ndo_select_queue() in tx path
From: Cong Wang @ 2012-09-18  6:16 UTC (permalink / raw)
  To: netdev; +Cc: Sylvain Munaut, David S. Miller, Eric Dumazet, Cong Wang

In netpoll tx path, we miss the chance of calling ->ndo_select_queue(),
thus could cause problems when bonding is involved.

This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx())
to let netpoll call it in netpoll_send_skb_on_dev().

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>

---
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ae3153c0..72661f6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1403,6 +1403,9 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev,
 		f(dev, &dev->_tx[i], arg);
 }
 
+extern struct netdev_queue *netdev_pick_tx(struct net_device *dev,
+					   struct sk_buff *skb);
+
 /*
  * Net namespace inlines
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index dcc673d..b13317a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2381,8 +2381,8 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 #endif
 }
 
-static struct netdev_queue *dev_pick_tx(struct net_device *dev,
-					struct sk_buff *skb)
+struct netdev_queue *netdev_pick_tx(struct net_device *dev,
+				    struct sk_buff *skb)
 {
 	int queue_index;
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -2556,7 +2556,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 	skb_update_prio(skb);
 
-	txq = dev_pick_tx(dev, skb);
+	txq = netdev_pick_tx(dev, skb);
 	q = rcu_dereference_bh(txq->qdisc);
 
 #ifdef CONFIG_NET_CLS_ACT
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index dd67818..77a0388 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -328,7 +328,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 	if (skb_queue_len(&npinfo->txq) == 0 && !netpoll_owner_active(dev)) {
 		struct netdev_queue *txq;
 
-		txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
+		txq = netdev_pick_tx(dev, skb);
 
 		/* try until next clock tick */
 		for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;

^ permalink raw reply related

* re: mlx4_en: fix endianness with blue frame support
From: Dan Carpenter @ 2012-09-18  7:34 UTC (permalink / raw)
  To: cascardo; +Cc: netdev

Hello Thadeu Lima de Souza Cascardo,

The patch c5d6136e10d6: "mlx4_en: fix endianness with blue frame 
support" from Oct 10, 2011, leads to the following warning:
drivers/net/ethernet/mellanox/mlx4/en_tx.c:720 mlx4_en_xmit()
	 warn: potential memory corrupting cast. 4 vs 2 bytes

That patch introduced a call to cpu_to_be32() and added some endian
notation.

	*(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);

But it doesn't make sense because the data type is declared as u16 in
the header and we would be corrupting the next elements in the struct
which are ins_vlan and fence_size.

struct mlx4_wqe_ctrl_seg {
        __be32                  owner_opcode;
        __be16                  vlan_tag;
        u8                      ins_vlan;
        u8                      fence_size;

I guess the reason we get away with it is that the ->doorbell_qpn is
normally less that 65k. But doorbell_qpn is a u32 type so I think there
is a risk here.

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH 1/4] ipv6: add a new namespace for nf_conntrack_reasm
From: Pablo Neira Ayuso @ 2012-09-18  7:37 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, netfilter-devel, Herbert Xu, Michal Kubeček,
	David Miller, Patrick McHardy
In-Reply-To: <1347942582-23962-2-git-send-email-amwang@redhat.com>

On Tue, Sep 18, 2012 at 12:29:39PM +0800, Cong Wang wrote:
> As pointed by Michal, it is necessary to add a new
> namespace for nf_conntrack_reasm code, this prepares
> for the second patch.

This looks good to me, but there are some comestical changes I have to
request.

> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Michal Kubeček <mkubecek@suse.cz>
> Cc: David Miller <davem@davemloft.net>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: netfilter-devel@vger.kernel.org
> Signed-off-by: Cong Wang <amwang@redhat.com>
> ---
>  include/net/net_namespace.h             |    3 +
>  include/net/netns/conntrack.h           |    6 ++
>  net/ipv6/netfilter/nf_conntrack_reasm.c |  135 +++++++++++++++++++++----------
>  3 files changed, 102 insertions(+), 42 deletions(-)
> 
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 5ae57f1..5c467bb 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -93,6 +93,9 @@ struct net {
>  #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
>  	struct netns_ct		ct;
>  #endif
> +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> +	struct netns_nf_ct	nf_ct;
> +#endif

There's above one "struct netns_ct" that already encapsulates
netfilter conntrack netns parameters.

However, I'd prefer if, while at it, you define some struct
netns_nf_frag instead.

In net/ipv6/netfilter/Makefile, it says:

# defrag
nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o

Note that nf defragmentation is not glued to conntrack anymore. So I'd
go for one netns_nf_frag for this in include/net/net_namespace.h

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] tcp: fix regression in urgent data handling
From: Eric Dumazet @ 2012-09-18  7:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephan Springl, Alexander Duyck
In-Reply-To: <1347922299.26523.198.camel@edumazet-glaptop>

From: Eric Dumazet <edumazet@google.com>

Stephan Springl found that commit 1402d366019fed "tcp: introduce
tcp_try_coalesce" introduced a regression for rlogin

It turns out problem comes from TCP urgent data handling and
a change in behavior in input path.

rlogin sends two one-byte packets with URG ptr set, and when next data
frame is coalesced, we lack sk_data_ready() calls to wakeup consumer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephan Springl <springl-k@bfw-online.de>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
v2: Changed Stephan Springl email address in changelog/CC

 net/ipv4/tcp_input.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6e38c6c..d377f48 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4661,7 +4661,7 @@ queue_and_out:
 
 		if (eaten > 0)
 			kfree_skb_partial(skb, fragstolen);
-		else if (!sock_flag(sk, SOCK_DEAD))
+		if (!sock_flag(sk, SOCK_DEAD))
 			sk->sk_data_ready(sk, 0);
 		return;
 	}
@@ -5556,8 +5556,7 @@ no_ack:
 #endif
 			if (eaten)
 				kfree_skb_partial(skb, fragstolen);
-			else
-				sk->sk_data_ready(sk, 0);
+			sk->sk_data_ready(sk, 0);
 			return 0;
 		}
 	}

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox