* Re: [PATCH 0/9] skb fragment API: convert network drivers (part V)
From: Ian Campbell @ 2011-10-10 19:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev@vger.kernel.org, linux-scsi@vger.kernel.org,
linux-mm@kvack.org
In-Reply-To: <1318272731.2567.4.camel@edumazet-laptop>
On Mon, 2011-10-10 at 19:52 +0100, Eric Dumazet wrote:
> Le lundi 10 octobre 2011 à 14:20 -0400, David Miller a écrit :
> > From: Ian Campbell <Ian.Campbell@citrix.com>
> > Date: Mon, 10 Oct 2011 12:11:16 +0100
> >
> > > I think "struct subpage" is a generally useful tuple I added to a
> > > central location (mm_types.h) rather than somewhere networking or driver
> > > specific but I can trivially move if preferred.
> >
> > I'm fine with the patch series, but this generic datastructure
> > addition needs some feedback first.
Sure. Would you take patches 6, 7 & 8 now? They don't rely on the new
struct.
> I was planning to send a patch to abstract frag->size manipulation and
> ease upcoming truesize certification work.
[...]
> Is it OK if I send a single patch right now ?
>
> I am asking because it might clash a bit with Ian work.
FWIW it's fine with me, there is only the half dozen or so drivers in
this series left to convert and I can rebase pretty easily.
Ian.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 0/9] skb fragment API: convert network drivers (part V)
From: David Miller @ 2011-10-10 19:16 UTC (permalink / raw)
To: eric.dumazet; +Cc: Ian.Campbell, netdev, linux-scsi, linux-mm
In-Reply-To: <1318272731.2567.4.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Oct 2011 20:52:11 +0200
> Is it OK if I send a single patch right now ?
>
> I am asking because it might clash a bit with Ian work.
Feel free to do so, we'll sort it out somehow.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCH] net-netlink: Add a new attribute to expose TOS values via netlink
From: Muraliraja Muniraju @ 2011-10-10 18:54 UTC (permalink / raw)
To: David S. Miller", Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy <ka
Cc: linux-kernel, netdev, Murali Raja
From: Murali Raja <muralira@google.com>
This patch exposes the tos value for the TCP sockets when the TOS flag
is requested in the ext_flags for the inet_diag request. This would mainly be
used to expose TOS values for both for TCP and UDP sockets. Currently it is
supported for TCP. When netlink support for UDP would be added the support
to expose the TOS values would alse be done.
Signed-off-by: Murali Raja <muralira@google.com>
---
include/linux/inet_diag.h | 10 +++++++++-
net/ipv4/inet_diag.c | 7 +++++++
2 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index bc8c490..f590a59 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -97,9 +97,10 @@ enum {
INET_DIAG_INFO,
INET_DIAG_VEGASINFO,
INET_DIAG_CONG,
+ INET_DIAG_TOS,
};
-#define INET_DIAG_MAX INET_DIAG_CONG
+#define INET_DIAG_MAX INET_DIAG_TOS
/* INET_DIAG_MEM */
@@ -120,6 +121,13 @@ struct tcpvegas_info {
__u32 tcpv_minrtt;
};
+/* INET_DIAG_TOS */
+
+struct inet_diag_tos {
+ __u8 idiag_tos;
+ __u8 idiag_reserved[3];
+};
+
#ifdef __KERNEL__
struct sock;
struct inet_hashinfo;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 389a2e6..6c52e29 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -82,6 +82,7 @@ static int inet_csk_diag_fill(struct sock *sk,
struct nlmsghdr *nlh;
void *info = NULL;
struct inet_diag_meminfo *minfo = NULL;
+ struct inet_diag_tos *tos = NULL;
unsigned char *b = skb_tail_pointer(skb);
const struct inet_diag_handler *handler;
@@ -108,6 +109,9 @@ static int inet_csk_diag_fill(struct sock *sk,
icsk->icsk_ca_ops->name);
}
+ if (ext & (1 << (INET_DIAG_TOS - 1)))
+ tos = INET_DIAG_PUT(skb, INET_DIAG_TOS, sizeof(*tos));
+
r->idiag_family = sk->sk_family;
r->idiag_state = sk->sk_state;
r->idiag_timer = 0;
@@ -169,6 +173,9 @@ static int inet_csk_diag_fill(struct sock *sk,
icsk->icsk_ca_ops && icsk->icsk_ca_ops->get_info)
icsk->icsk_ca_ops->get_info(sk, ext, skb);
+ if (tos)
+ tos->idiag_tos = inet->tos;
+
nlh->nlmsg_len = skb_tail_pointer(skb) - b;
return skb->len;
--
1.7.3.1
^ permalink raw reply related
* Re: [PATCH 0/9] skb fragment API: convert network drivers (part V)
From: Eric Dumazet @ 2011-10-10 18:52 UTC (permalink / raw)
To: David Miller; +Cc: Ian.Campbell, netdev, linux-scsi, linux-mm
In-Reply-To: <20111010.142040.2267571270586671416.davem@davemloft.net>
Le lundi 10 octobre 2011 à 14:20 -0400, David Miller a écrit :
> From: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Mon, 10 Oct 2011 12:11:16 +0100
>
> > I think "struct subpage" is a generally useful tuple I added to a
> > central location (mm_types.h) rather than somewhere networking or driver
> > specific but I can trivially move if preferred.
>
> I'm fine with the patch series, but this generic datastructure
> addition needs some feedback first.
I was planning to send a patch to abstract frag->size manipulation and
ease upcoming truesize certification work.
static inline int skb_frag_size(const skb_frag_t *frag)
{
return frag->size;
}
static inline void skb_frag_size_set(skb_frag_t *frag, int size)
{
frag->size = size;
}
static inline void skb_frag_size_add(skb_frag_t *frag, int size)
{
frag->size += size;
}
static inline void skb_frag_size_sub(skb_frag_t *frag, int size)
{
frag->size -= size;
}
Is it OK if I send a single patch right now ?
I am asking because it might clash a bit with Ian work.
drivers/atm/eni.c | 2
drivers/infiniband/hw/amso1100/c2.c | 4
drivers/infiniband/hw/nes/nes_nic.c | 10 -
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 18 +-
drivers/net/ethernet/3com/3c59x.c | 6
drivers/net/ethernet/3com/typhoon.c | 6
drivers/net/ethernet/adaptec/starfire.c | 8 -
drivers/net/ethernet/aeroflex/greth.c | 8 -
drivers/net/ethernet/alteon/acenic.c | 10 -
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2
drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 6
drivers/net/ethernet/atheros/atlx/atl1.c | 12 -
drivers/net/ethernet/broadcom/bnx2.c | 12 -
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 14 -
drivers/net/ethernet/broadcom/tg3.c | 8 -
drivers/net/ethernet/brocade/bna/bnad.c | 6
drivers/net/ethernet/chelsio/cxgb/sge.c | 10 -
drivers/net/ethernet/chelsio/cxgb3/sge.c | 12 -
drivers/net/ethernet/chelsio/cxgb4/sge.c | 26 +--
drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 26 +--
drivers/net/ethernet/cisco/enic/enic_main.c | 12 -
drivers/net/ethernet/emulex/benet/be_main.c | 18 +-
drivers/net/ethernet/ibm/ehea/ehea_main.c | 8 -
drivers/net/ethernet/ibm/emac/core.c | 2
drivers/net/ethernet/ibm/ibmveth.c | 6
drivers/net/ethernet/intel/e1000/e1000_main.c | 6
drivers/net/ethernet/intel/e1000e/netdev.c | 6
drivers/net/ethernet/intel/igb/igb_main.c | 2
drivers/net/ethernet/intel/igbvf/netdev.c | 4
drivers/net/ethernet/intel/ixgb/ixgb_main.c | 4
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6
drivers/net/ethernet/jme.c | 4
drivers/net/ethernet/marvell/mv643xx_eth.c | 9 -
drivers/net/ethernet/marvell/skge.c | 8 -
drivers/net/ethernet/marvell/sky2.c | 16 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 14 -
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 12 -
drivers/net/ethernet/micrel/ksz884x.c | 2
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 14 -
drivers/net/ethernet/natsemi/ns83820.c | 4
drivers/net/ethernet/neterion/s2io.c | 12 -
drivers/net/ethernet/neterion/vxge/vxge-main.c | 12 -
drivers/net/ethernet/nvidia/forcedeth.c | 18 +-
drivers/net/ethernet/pasemi/pasemi_mac.c | 8 -
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 6
drivers/net/ethernet/qlogic/qla3xxx.c | 6
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 6
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 6
drivers/net/ethernet/realtek/8139cp.c | 4
drivers/net/ethernet/realtek/r8169.c | 4
drivers/net/ethernet/sfc/rx.c | 2
drivers/net/ethernet/sfc/tx.c | 8 -
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4
drivers/net/ethernet/sun/cassini.c | 8 -
drivers/net/ethernet/sun/niu.c | 6
drivers/net/ethernet/sun/sungem.c | 4
drivers/net/ethernet/sun/sunhme.c | 4
drivers/net/ethernet/tehuti/tehuti.c | 6
drivers/net/ethernet/tile/tilepro.c | 2
drivers/net/ethernet/tundra/tsi108_eth.c | 6
drivers/net/ethernet/via/via-velocity.c | 6
drivers/net/ethernet/xilinx/ll_temac_main.c | 4
drivers/net/virtio_net.c | 8 -
drivers/net/vmxnet3/vmxnet3_drv.c | 12 -
drivers/net/xen-netback/netback.c | 4
drivers/net/xen-netfront.c | 4
drivers/scsi/cxgbi/libcxgbi.c | 10 -
drivers/scsi/fcoe/fcoe_transport.c | 2
drivers/staging/hv/netvsc_drv.c | 4
include/linux/skbuff.h | 28 +++
net/appletalk/ddp.c | 5
net/core/datagram.c | 16 +-
net/core/dev.c | 6
net/core/pktgen.c | 12 -
net/core/skbuff.c | 72 +++++-----
net/core/user_dma.c | 4
net/ipv4/inet_lro.c | 8 -
net/ipv4/ip_fragment.c | 4
net/ipv4/ip_output.c | 6
net/ipv4/tcp.c | 9 -
net/ipv4/tcp_output.c | 8 -
net/ipv6/ip6_output.c | 5
net/ipv6/netfilter/nf_conntrack_reasm.c | 4
net/ipv6/reassembly.c | 4
net/xfrm/xfrm_ipcomp.c | 2
87 files changed, 389 insertions(+), 359 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] mscan: too much data copied to CAN frame due to 16 bit accesses
From: David Miller @ 2011-10-10 18:31 UTC (permalink / raw)
To: wg-5Yr1BZd7O62+XT7JhA+gdA
Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
Netdev-u79uwXL29TY76Z2rM5mHXA, nautsch-Re5JQEeQqe8AvxtiuMwx3w,
socketcan-fJ+pQTUTwRTk1uMJSBkQmQ
In-Reply-To: <4E8F52CE.1000204-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
From: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
Date: Fri, 07 Oct 2011 21:28:14 +0200
> Due to the 16 bit access to mscan registers there's too much data copied to
> the zero initialized CAN frame when having an odd number of bytes to copy.
> This patch ensures that only the requested bytes are copied by using an
> 8 bit access for the remaining byte.
>
> Reported-by: Andre Naujoks <nautsch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Oliver Hartkopp <socketcan-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
> Signed-off-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
Applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH] gro: refetch inet6_protos[] after pulling ext headers
From: David Miller @ 2011-10-10 18:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: zheng.z.yan, netdev, herbert
In-Reply-To: <1318152831.5276.30.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Oct 2011 11:33:51 +0200
> Le dimanche 09 octobre 2011 à 16:34 +0800, Yan, Zheng a écrit :
>> ipv6_gro_receive() doesn't update the protocol ops after pulling
>> the ext headers. It looks like a typo.
>>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
...
>
> Good catch !
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH net] bnx2x: fix cl_id allocation for non-eth clients for NPAR mode
From: David Miller @ 2011-10-10 18:20 UTC (permalink / raw)
To: dmitry; +Cc: netdev, eilong
In-Reply-To: <1318240656-24657-1-git-send-email-dmitry@broadcom.com>
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Mon, 10 Oct 2011 11:57:36 +0200
> There are some consolidations of NPAR configuration
> when FCoE and iSCSI L2 clients will get the same id,
> in this case FCoE ring will be non-functional.
>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
I'll apply this, thanks.
^ permalink raw reply
* Re: [PATCH 0/9] skb fragment API: convert network drivers (part V)
From: David Miller @ 2011-10-10 18:20 UTC (permalink / raw)
To: Ian.Campbell; +Cc: netdev, linux-scsi, linux-mm
In-Reply-To: <1318245076.21903.408.camel@zakaz.uk.xensource.com>
From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Mon, 10 Oct 2011 12:11:16 +0100
> I think "struct subpage" is a generally useful tuple I added to a
> central location (mm_types.h) rather than somewhere networking or driver
> specific but I can trivially move if preferred.
I'm fine with the patch series, but this generic datastructure
addition needs some feedback first.
^ permalink raw reply
* Re: [PATCH] isdn: add missing cast operator in drivers/isdn/sc/init.c
From: David Miller @ 2011-10-10 18:18 UTC (permalink / raw)
To: corone.il.han; +Cc: isdn, netdev
In-Reply-To: <1318244560-19213-1-git-send-email-corone.il.han@gmail.com>
From: Il Han <corone.il.han@gmail.com>
Date: Mon, 10 Oct 2011 20:02:40 +0900
> Add (void __iomem *) to convert the value to the proper type before passing it to readl().
>
> Signed-off-by: Il Han <corone.il.han@gmail.com>
I've rejected this patch already in the past, and this is because the correct
fix is to change the type of rambase and the variables and datastructures
it is initialized from.
^ permalink raw reply
* Re: [PATCH] mlx4_en: fix endianness with blue frame support
From: David Miller @ 2011-10-10 18:10 UTC (permalink / raw)
To: cascardo; +Cc: netdev, linuxppc-dev, eli, yevgenyp, benh
In-Reply-To: <20111010164654.GA3648@oc1711230544.ibm.com>
From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Date: Mon, 10 Oct 2011 13:46:54 -0300
> On Mon, Oct 10, 2011 at 01:42:23PM -0300, Thadeu Lima de Souza Cascardo wrote:
>> The doorbell register was being unconditionally swapped. In x86, that
>> meant it was being swapped to BE and written to the descriptor and to
>> memory, depending on the case of blue frame support or writing to
>> doorbell register. On PPC, this meant it was being swapped to LE and
>> then swapped back to BE while writing to the register. But in the blue
>> frame case, it was being written as LE to the descriptor.
>>
>> The fix is not to swap doorbell unconditionally, write it to the
>> register as BE and convert it to BE when writing it to the descriptor.
>>
>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
>> Reported-by: Richard Hendrickson <richhend@us.ibm.com>
>> Cc: Eli Cohen <eli@dev.mellanox.co.il>
>> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> ---
>
> So I tested this patch and it works for me. Thanks Ben and Eli for
> finding out the problem with doorbell in the descriptor.
Applied, thanks everyone.
^ permalink raw reply
* Prize Winner
From: Admin @ 2011-10-10 15:57 UTC (permalink / raw)
Did you receive our email ???
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
^ permalink raw reply
* Re: [PATCH] af_packet: remove unnecessary BUG_ON() in tpacket_destruct_skb
From: David Miller @ 2011-10-10 18:09 UTC (permalink / raw)
To: eric.dumazet; +Cc: danborkmann, netdev
In-Reply-To: <1318266304.3227.18.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Oct 2011 19:05:04 +0200
> Le lundi 10 octobre 2011 à 18:52 +0200, danborkmann@iogearbox.net a
> écrit :
>> If skb is NULL, then stack trace is thrown anyway on dereference.
>> Therefore, the stack trace triggered by BUG_ON is duplicate.
>>
>> Signed-off-by: Daniel Borkmann <danborkmann@googlemail.com>
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, but please make patches like this against the net-next tree
in which the af_packet.c code had changed quite a bit and therefore
I had to munch your patch to get it to apply.
^ permalink raw reply
* Re: [net-next 02/11] igb: Use node specific allocations for the q_vectors and rings
From: David Miller @ 2011-10-10 17:50 UTC (permalink / raw)
To: alexander.h.duyck; +Cc: jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <4E931A06.7090805@intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 10 Oct 2011 09:15:02 -0700
> Actually the main reason for having adapter->node is because in our
> out-of-tree driver we end up using it as a module parameter in the event
> that someone is running in single queue mode and wants to split up the
> ports between nodes. As such I would prefer to keep the parameter
> around and just default it to -1 as I am currently doing. However if it
> must go I guess I can work around that sync-up issue.
Please stop adding such hacks to your out-of-tree driver and add
appropriate, generic, configure mechanisms to the upstream tree.
It absolutely is not appropriate to add something which is completely
useless to the upstream tree for the sake of something being done
only externally.
You guys are the best at upstream net driver maintainence, so it
really surprises me that you continue to do completely unacceptable
crap like this. Write the necessary generic non-module-option
mechanisms to facilitate the features you need and kill your out of
tree driver _now_.
^ permalink raw reply
* Re: [PATCH] af_packet: remove unnecessary BUG_ON() in tpacket_destruct_skb
From: Eric Dumazet @ 2011-10-10 17:05 UTC (permalink / raw)
To: danborkmann; +Cc: David S. Miller, netdev@vger.kernel.org
In-Reply-To: <20111010185246.15533bv1p3pmnba6@mail.your-server.de>
Le lundi 10 octobre 2011 à 18:52 +0200, danborkmann@iogearbox.net a
écrit :
> If skb is NULL, then stack trace is thrown anyway on dereference.
> Therefore, the stack trace triggered by BUG_ON is duplicate.
>
> Signed-off-by: Daniel Borkmann <danborkmann@googlemail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Thanks
^ permalink raw reply
* Re: [net-next 02/11] igb: Use node specific allocations for the q_vectors and rings
From: Alexander Duyck @ 2011-10-10 17:02 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jeff Kirsher, davem, netdev, gospo, sassmann
In-Reply-To: <20111010163228.GA14482@one.firstfloor.org>
On 10/10/2011 09:32 AM, Andi Kleen wrote:
>> The RR configuration is somewhat arbitrary. However it is still better
>> than dumping everyting on a single node, and it works with the
>> configuration when the rings numbers line up with the CPU numbers since
>> normally the CPUs are RR on the nodes. From what I have seen it does
>> work quite well and it prevents almost all cross-node memory accesses
>> when running a routing workload.
>
> Ok so it's optimized for one specific workload. I'm sure you'll
> find some other workload where it doesn't work out.
It isn't that I optimized it for one specific workload. I was just
citing that specific workload as one of the ones seeing the advantage.
> I suppose it's hard to get right in the general case, but best
> would be if ethtool had a nice and easy interface to set it at least.
The general case is never right for this it seems like. At least in
this case it becomes much easier to line up the memory and interrupts so
that they are all affinitized to the same core. From there RPS/RFS can
typically be used to spread out the work more if necessary.
> However one disadvantage of that patch over the existing state of the
> art (numactl modprobe ...) is that there's no way to override the placement
> now. So if you do the forced RR I think you need the ethtool part too,
> or at least some parameter to turn it off.
>
> -Andi
The counter argument to that though is that the approach you mention
always limits you to one node. At least with this approach we are
spread out over multiple nodes so that we can make full use of the
memory bandwidth on the system.
Thanks,
Alex
^ permalink raw reply
* [PATCH] af_packet: remove unnecessary BUG_ON() in tpacket_destruct_skb
From: danborkmann @ 2011-10-10 16:52 UTC (permalink / raw)
To: Eric Dumazet, David S. Miller; +Cc: netdev@vger.kernel.org
If skb is NULL, then stack trace is thrown anyway on dereference.
Therefore, the stack trace triggered by BUG_ON is duplicate.
Signed-off-by: Daniel Borkmann <danborkmann@googlemail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
net/packet/af_packet.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fabb4fa..886ae50 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1170,8 +1170,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
struct packet_sock *po = pkt_sk(skb->sk);
void *ph;
- BUG_ON(skb == NULL);
-
if (likely(po->tx_ring.pg_vec)) {
ph = skb_shinfo(skb)->destructor_arg;
BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
^ permalink raw reply related
* Re: [PATCH net-next] macvlan: handle fragmented multicast frames
From: Ben Greear @ 2011-10-10 16:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1318264891.3227.17.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
On 10/10/2011 09:41 AM, Eric Dumazet wrote:
> Le lundi 10 octobre 2011 à 09:27 -0700, Ben Greear a écrit :
>
>> I applied this to Linus' top-of-tree this morning and it does appear
>> to fix the problem for mac-vlans.
>>
>
> Thanks for testing
>
>> I do see this error, but I doubt it has anything to do with your
>> patch:
>>
>> device eth0 entered promiscuous mode
>> device rddVR10 entered promiscuous mode
>> ADDRCONF(NETDEV_CHANGE): rddVR1b: link becomes ready
>>
>> ================================================
>> [ BUG: lock held when returning to user space! ]
>> ------------------------------------------------
>> ip/3452 is leaving the kernel with locks still held!
>> 1 lock held by ip/3452:
>> #0: (rcu_read_lock){.+.+..}, at: [<f8c5336f>] rcu_read_lock+0x0/0x26 [ipv6]
>> ADDRCONF(NETDEV_CHANGE): rddVR4b: link becomes ready
>> ADDRCONF(NETDEV_CHANGE): rddVR5b: link becomes ready
>>
>>
>> I have no idea why it doesn't print out a more useful stack
>> trace. It seems repeatable (2 of 2 reboots so far). I'm
>> configuring a pretty complex virtual network, with veth devices,
>> xorp instances running ipv4 and ipv6 routing protocols, etc.
>>
>
> Do you have LOCKDEP enabled ?
Yes, as far as I can tell:
[greearb@build-32 linux-2.6.p4s]$ grep LOCKDEP .config
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
And it doesn't appear to have turned itself off:
[root@lec2010-ath9k-1 ~]# dmesg|grep lockdep
RCU lockdep checking is enabled.
lockdep: fixing up alternatives.
[root@lec2010-ath9k-1 ~]#
I looked through the kernel debug section of the config, and it
seems normal enough...
But, after this splat, if I run sysrq-d, then it says sysrq is off,
maybe because the splat disabled it?
SysRq : Show Locks Held
INFO: lockdep is turned off.
sysrq-l does show backtraces, so the backtrace logic in general
seems to work fine.
>
>> This is a clean upstream kernel with no outside patches aside from your
>> own.
>
> Hmm, it seems we have an rcu_read_unlock() missing...
>
> Any idea what was done by this "ip" command ?
No, it's called multiple times by my user-space control logic. Basically,
it configures around 30 interfaces, some GRE, veth, mac-vlans, .1q vlans, normal ethernet, etc.
Also, I have some ipv6 addrs configured on many of them.
And, setting up routing rules, for ipv4 and ipv6 for the virtual routers.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: e100 + VLANs?
From: Michael Tokarev @ 2011-10-10 16:51 UTC (permalink / raw)
To: David Lamparter; +Cc: Eric Dumazet, jeffrey.t.kirsher, netdev
In-Reply-To: <20111010151343.GB3260852@jupiter.n2.diac24.net>
10.10.2011 19:13, David Lamparter wrote:
> On Mon, Oct 10, 2011 at 05:05:52PM +0200, Eric Dumazet wrote:
>>> When pinging this NIC from another machine over VLAN5, I see
>>> ARP packets coming to it, gets recognized and replies going
>>> back, all on vlan 5. But on the other side, replies comes
>>> WITHOUT a VLAN tag!
>>>
>>> From this NIC's point of view, capturing on whole ethX:
>>>
>>> 00:1f:c6:ef:e5:1b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.48.11.2 tell 10.48.11.1, length 42
>>> 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.48.11.2 is-at 00:90:27:30:6d:1c, length 28
>>>
>>> From the partner point of view, also on whole ethX:
>>>
>>> 00:1f:c6:ef:e5:1b > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.48.11.2 tell 10.48.11.1, length 28
>>> 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.48.11.2 is-at 00:90:27:30:6d:1c, length 46
>>>
>>> So, the tag gets eaten somewhere along the way... ;)
>
> Hmm. Looks like broken VLAN TX offload, but the driver doesn't even
> implement VLAN offload. Maybe it's broken in its non-implementation...
>
> Your "partner" is a known-good setup and can be assumed to be working
> correctly? This is over a crossover cable, no evil switches involved?
There are just two machines involved, both connected to the
same _switch_ - no, it is not over cross-over cable. It's a
good idea to test one, I'll try it tomorrow (will insert a
second "known good" nic into another machine).
The second machine, the "partner", has this NIC:
02:00.0 Ethernet controller: Atheros Communications L1 Gigabit Ethernet (rev b0)
and it is a known-good implementation - it worked with and without vlan
tags (we had a weird mixed tagged/untagged setup) for over 2 years without
any issues, and which works now as well - it's our main server which is
in two VLANs, connected to an interface marked as tagged in the switch.
It communicates with the other machine when that other machine uses
already mentioned VIA RhineIII NIC - which I used to replace this non-working
E100.
So it's 2 machines, one with 2 nics - VIA Rhine (working) and e100 (non-working),
both connected to two "tagged" ports in the switch. And another, with atl1 NIC,
also connected to a "tagged" port in the switch.
>>> And I can't really recreate the situation which I had - I know
>>> some packets were flowing, so at least ARP worked. Now it
>>> does not work anymore.
>>
>> What the 'partner' setup looks like ?
>>
>> ip link
>> ip addr
>> ip ro
> 'local' setup too please :)
The setup is quite complex - there are numerous tunnels and virtual
interfaces. Here are the relevant parts. (Note that `ip addr'
includes information present in `ip link'):
The "Partner" machine, with just one NIC, atl1, ip addr:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
3: tls-vlan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master tls-br state UP
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
Our main vlan, LAN, #1.
4: tls-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
inet 192.168.177.15/26 brd 192.168.177.63 scope global tls-br
A bridge that connects this VLAN#1 and other stuff (virtual machines etc)
6: dmz-vlan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master dmz-br state UP
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
That's DMZ segment, VLAN#2
...
21: test@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
inet 10.48.11.1/24 scope global test
This is vlan#5, my test vlan.
The machine with two (working, via-rhine, and non-working, e100):
2: ethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
This is via-rhine, with the MAC address of E100 -- the one which works.
13: eth-tls@ethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
inet 192.168.177.5/26 brd 192.168.177.63 scope global eth-tls
Our main VLAN#1 (here it's w/o bridge)
14: eth-dmz@ethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
inet 192.168.177.225/29 brd 192.168.177.231 scope global eth-dmz
DMZ VLAN#2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
The non-working e100. Here it has the same MAC address as ethx above,
because I explicitly changed ethx to have this MAC, since the $ISP has
it hardcoded for our port on their side. The tests were done with the
two addresses being original as set up by the hardware, and later on
I also tried to set this MAC to be 00:90:27:30:6d:1d (note the last
digit) - all the same result, packets sent over the iface above shows
on the receiving side as having no vlan tag.
24: test@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
inet 10.48.11.2/24 scope global test
And finally this is the test vlan#5.
tcpdump was run on eth2 here and on eth0 on the first machine.
On both machines tcpdump is of version 4.1.1.
Here's offload information for e100 nic:
# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
It supports (or appears to) some offloading, in particular I
can enable GSO offload, and it even works somehow.
Now, I enabled another pair of VLAN interfaces on these two NICs,
with VLAN#6, and configured both ports in the switch to be parts
of VLAN6 too (tagged). And voila, everything now works in there.
Two ifaces added, "partner", atl1:
22: test6@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
link/ether 00:1f:c6:ef:e5:1b brd ff:ff:ff:ff:ff:ff
inet 10.48.6.1/24 scope global test6
this e100:
25: test6@eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
link/ether 00:90:27:30:6d:1c brd ff:ff:ff:ff:ff:ff
inet 10.48.6.2/24 scope global test6
Yesterday, the vlan ID where it didn't work was #4, and in #1 it all -
apparently - worked.
I created 2 more pairs of VLAN interfaces and added to the swithc --
it all works just fine. Here:
# x=8; ip link add link eth2 name test$x type vlan id $x; ip addr add 10.48.$x.2/24 dev test$x; ip link set test$x up
(That's on the e100 side, similar was on atl1 side). x=6, x=7 and x=8
works just fine. x=5 does not, ARP replies arrives without VLAN tag
to the atl1 side.
Ok. So now I can reproduce the initial problem.
So, `ping -s 1469' from atl1 side, so that the resulting packet side
is 1497 bytes (1468 is the largest size that works) -- the packets
does not arrive at e100 side at all - it's 100% quiet in tcpdump there.
When pinging from e100 side and tcpdump'ing on atl1 side (replies does
not come back to e100):
20:49:33.322646 00:90:27:30:6d:1c > 00:1f:c6:ef:e5:1b, ethertype 802.1Q (0x8100), length 1515: vlan 8, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 1497)
10.48.8.2 > 10.48.8.1: ICMP echo request, id 5785, seq 72, length 1477
20:49:33.322691 00:1f:c6:ef:e5:1b > 00:90:27:30:6d:1c, ethertype 802.1Q (0x8100), length 1515: vlan 8, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 23781, offset 0, flags [none], proto ICMP (1), length 1497)
10.48.8.1 > 10.48.8.2: ICMP echo reply, id 5785, seq 72, length 1477
So it appears that on e100 side, the _receive_ buffer is too small
somehow.
I'll do some more experiments with VLAN#5 tomorrow, in a clean environment
(maybe using direct cable connection - not cross-over, since GigE should
autodetect this stuff (hopefully)).
Thanks!
/mjt
^ permalink raw reply
* Re: [PATCH] mlx4_en: fix endianness with blue frame support
From: Thadeu Lima de Souza Cascardo @ 2011-10-10 16:46 UTC (permalink / raw)
To: netdev; +Cc: linuxppc-dev, Eli Cohen, Yevgeny Petrilin, Benjamin Herrenschmidt
In-Reply-To: <1318264943-10009-1-git-send-email-cascardo@linux.vnet.ibm.com>
On Mon, Oct 10, 2011 at 01:42:23PM -0300, Thadeu Lima de Souza Cascardo wrote:
> The doorbell register was being unconditionally swapped. In x86, that
> meant it was being swapped to BE and written to the descriptor and to
> memory, depending on the case of blue frame support or writing to
> doorbell register. On PPC, this meant it was being swapped to LE and
> then swapped back to BE while writing to the register. But in the blue
> frame case, it was being written as LE to the descriptor.
>
> The fix is not to swap doorbell unconditionally, write it to the
> register as BE and convert it to BE when writing it to the descriptor.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> Reported-by: Richard Hendrickson <richhend@us.ibm.com>
> Cc: Eli Cohen <eli@dev.mellanox.co.il>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
So I tested this patch and it works for me. Thanks Ben and Eli for
finding out the problem with doorbell in the descriptor.
Regards,
Cascardo.
> drivers/net/mlx4/en_tx.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> index 6e03de0..f76ab6b 100644
> --- a/drivers/net/mlx4/en_tx.c
> +++ b/drivers/net/mlx4/en_tx.c
> @@ -172,7 +172,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
> memset(ring->buf, 0, ring->buf_size);
>
> ring->qp_state = MLX4_QP_STATE_RST;
> - ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
> + ring->doorbell_qpn = ring->qp.qpn << 8;
>
> mlx4_en_fill_qp_context(priv, ring->size, ring->stride, 1, 0, ring->qpn,
> ring->cqn, &ring->context);
> @@ -791,7 +791,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
> skb_orphan(skb);
>
> if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag) {
> - *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> + *(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
> op_own |= htonl((bf_index & 0xffff) << 8);
> /* Ensure new descirptor hits memory
> * before setting ownership of this descriptor to HW */
> @@ -812,7 +812,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
> wmb();
> tx_desc->ctrl.owner_opcode = op_own;
> wmb();
> - writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> + iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> }
>
> /* Poll CQ here */
> --
> 1.7.4.4
>
^ permalink raw reply
* [PATCH] mlx4_en: fix endianness with blue frame support
From: Thadeu Lima de Souza Cascardo @ 2011-10-10 16:42 UTC (permalink / raw)
To: netdev
Cc: linuxppc-dev, Thadeu Lima de Souza Cascardo, Eli Cohen,
Yevgeny Petrilin, Benjamin Herrenschmidt
In-Reply-To: <1318231920.29415.404.camel@pasglop>
The doorbell register was being unconditionally swapped. In x86, that
meant it was being swapped to BE and written to the descriptor and to
memory, depending on the case of blue frame support or writing to
doorbell register. On PPC, this meant it was being swapped to LE and
then swapped back to BE while writing to the register. But in the blue
frame case, it was being written as LE to the descriptor.
The fix is not to swap doorbell unconditionally, write it to the
register as BE and convert it to BE when writing it to the descriptor.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reported-by: Richard Hendrickson <richhend@us.ibm.com>
Cc: Eli Cohen <eli@dev.mellanox.co.il>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
drivers/net/mlx4/en_tx.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
index 6e03de0..f76ab6b 100644
--- a/drivers/net/mlx4/en_tx.c
+++ b/drivers/net/mlx4/en_tx.c
@@ -172,7 +172,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
memset(ring->buf, 0, ring->buf_size);
ring->qp_state = MLX4_QP_STATE_RST;
- ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
+ ring->doorbell_qpn = ring->qp.qpn << 8;
mlx4_en_fill_qp_context(priv, ring->size, ring->stride, 1, 0, ring->qpn,
ring->cqn, &ring->context);
@@ -791,7 +791,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_orphan(skb);
if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag) {
- *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
+ *(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
op_own |= htonl((bf_index & 0xffff) << 8);
/* Ensure new descirptor hits memory
* before setting ownership of this descriptor to HW */
@@ -812,7 +812,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
wmb();
tx_desc->ctrl.owner_opcode = op_own;
wmb();
- writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+ iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
}
/* Poll CQ here */
--
1.7.4.4
^ permalink raw reply related
* Re: [PATCH net-next] macvlan: handle fragmented multicast frames
From: Eric Dumazet @ 2011-10-10 16:41 UTC (permalink / raw)
To: Ben Greear; +Cc: netdev
In-Reply-To: <4E931CEC.5050404@candelatech.com>
Le lundi 10 octobre 2011 à 09:27 -0700, Ben Greear a écrit :
> I applied this to Linus' top-of-tree this morning and it does appear
> to fix the problem for mac-vlans.
>
Thanks for testing
> I do see this error, but I doubt it has anything to do with your
> patch:
>
> device eth0 entered promiscuous mode
> device rddVR10 entered promiscuous mode
> ADDRCONF(NETDEV_CHANGE): rddVR1b: link becomes ready
>
> ================================================
> [ BUG: lock held when returning to user space! ]
> ------------------------------------------------
> ip/3452 is leaving the kernel with locks still held!
> 1 lock held by ip/3452:
> #0: (rcu_read_lock){.+.+..}, at: [<f8c5336f>] rcu_read_lock+0x0/0x26 [ipv6]
> ADDRCONF(NETDEV_CHANGE): rddVR4b: link becomes ready
> ADDRCONF(NETDEV_CHANGE): rddVR5b: link becomes ready
>
>
> I have no idea why it doesn't print out a more useful stack
> trace. It seems repeatable (2 of 2 reboots so far). I'm
> configuring a pretty complex virtual network, with veth devices,
> xorp instances running ipv4 and ipv6 routing protocols, etc.
>
Do you have LOCKDEP enabled ?
> This is a clean upstream kernel with no outside patches aside from your
> own.
Hmm, it seems we have an rcu_read_unlock() missing...
Any idea what was done by this "ip" command ?
^ permalink raw reply
* Re: [PATCH net] mscan: zero accidentally copied register content
From: Oliver Hartkopp @ 2011-10-10 16:38 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: Andre Naujoks, Wolfram Sang, Linux Netdev List
In-Reply-To: <4E8DF24E.5030606@grandegger.com>
On 10/06/11 20:24, Wolfgang Grandegger wrote:
> Well, copying just the relevant bytes seem much more straight-forward
> than removing accidentally copied bytes later-on. You do not need to
> care about little endian. The MSCAN is only available on PowerPC SOCs,
> which are big endian.
>
> I'm going to test and post a patch tomorrow.
Thanks.
My patch is then superseded by this one
"mscan: too much data copied to CAN frame due to 16 bit accesses"
http://patchwork.ozlabs.org/patch/118364/
Tnx,
Oliver
^ permalink raw reply
* Re: [net-next 02/11] igb: Use node specific allocations for the q_vectors and rings
From: Andi Kleen @ 2011-10-10 16:32 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Andi Kleen, Jeff Kirsher, davem, netdev, gospo, sassmann
In-Reply-To: <4E931C61.7040204@intel.com>
> The RR configuration is somewhat arbitrary. However it is still better
> than dumping everyting on a single node, and it works with the
> configuration when the rings numbers line up with the CPU numbers since
> normally the CPUs are RR on the nodes. From what I have seen it does
> work quite well and it prevents almost all cross-node memory accesses
> when running a routing workload.
Ok so it's optimized for one specific workload. I'm sure you'll
find some other workload where it doesn't work out.
I suppose it's hard to get right in the general case, but best
would be if ethtool had a nice and easy interface to set it at least.
However one disadvantage of that patch over the existing state of the
art (numactl modprobe ...) is that there's no way to override the placement
now. So if you do the forced RR I think you need the ethtool part too,
or at least some parameter to turn it off.
-Andi
^ permalink raw reply
* Re: [PATCH net-next] macvlan: handle fragmented multicast frames
From: Ben Greear @ 2011-10-10 16:27 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1317932911.3457.31.camel@edumazet-laptop>
On 10/06/2011 01:28 PM, Eric Dumazet wrote:
> Le mercredi 05 octobre 2011 à 15:35 -0700, Ben Greear a écrit :
>
>> If someone wants to cook up macvlan-ip-defrag patch I'll be happy
>> to test it. But, as far as I can tell, this problem can happen on
>> any two interfaces. The reason that some of mine work (.1q vlans)
>> and macvlan didn't is probably because those were separated by
>> some virtual network links that imparted extra delay...so the
>> vlan consumed all its fragments and passed the complete pkt up
>> the stack before the mac-vlan ever saw the initial frame.
>>
>> With this in mind, it seems that using multiple udp multicast
>> sockets bound to specific devices is fundamentally broken for
>> fragmented packets.
>>
>> I have no pressing need for this feature, so now that I better understand
>> the problem I can just document it and move on to other things.
>>
>> Thanks for all the help.
>>
>
> Please test following patch (note I had no time to test it, sorry !)
>
> Based on net-next tree, might apply on 3.0 kernel...
>
> [PATCH net-next] macvlan: handle fragmented multicast frames
>
> Fragmented multicast frames are delivered to a single macvlan port,
> because ip defrag logic considers other samples are redundant.
>
> Implement a defrag step before trying to send the multicast frame.
I applied this to Linus' top-of-tree this morning and it does appear
to fix the problem for mac-vlans.
I do see this error, but I doubt it has anything to do with your
patch:
device eth0 entered promiscuous mode
device rddVR10 entered promiscuous mode
ADDRCONF(NETDEV_CHANGE): rddVR1b: link becomes ready
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
ip/3452 is leaving the kernel with locks still held!
1 lock held by ip/3452:
#0: (rcu_read_lock){.+.+..}, at: [<f8c5336f>] rcu_read_lock+0x0/0x26 [ipv6]
ADDRCONF(NETDEV_CHANGE): rddVR4b: link becomes ready
ADDRCONF(NETDEV_CHANGE): rddVR5b: link becomes ready
I have no idea why it doesn't print out a more useful stack
trace. It seems repeatable (2 of 2 reboots so far). I'm
configuring a pretty complex virtual network, with veth devices,
xorp instances running ipv4 and ipv6 routing protocols, etc.
This is a clean upstream kernel with no outside patches aside from your
own.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [PATCH 1/9] mm: add a "struct subpage" type containing a page, offset and length
From: Ian Campbell @ 2011-10-10 16:27 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: linux-mm@kvack.org, linux-kernel, Jens Axboe, Christoph Hellwig
In-Reply-To: <1318245101-16890-1-git-send-email-ian.campbell@citrix.com>
(reposting including LKML to catch other potential users)
Is this structure of any use to unify other instances of a similar
tuple, e.g. biovec, pagefrag etc?
Ian.
On Mon, 2011-10-10 at 12:11 +0100, Ian Campbell wrote:
> A few network drivers currently use skb_frag_struct for this purpose but I have
> patches which add additional fields and semantics there which these other uses
> do not want.
>
> A structure for reference sub-page regions seems like a generally useful thing
> so do so instead of adding a network subsystem specific structure.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: linux-mm@kvack.org
> ---
> include/linux/mm_types.h | 11 +++++++++++
> 1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 774b895..dc1d103 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -135,6 +135,17 @@ struct page {
> #endif
> ;
>
> +struct subpage {
> + struct page *page;
> +#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
> + __u32 page_offset;
> + __u32 size;
> +#else
> + __u16 page_offset;
> + __u16 size;
> +#endif
> +};
> +
> typedef unsigned long __nocast vm_flags_t;
>
> /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox