* Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode callback function
From: Fabio Estevam @ 2015-01-07 3:44 UTC (permalink / raw)
To: Shawn Guo
Cc: fugang.duan@freescale.com, davem@davemloft.net,
netdev@vger.kernel.org, bhutchings@solarflare.com,
stephen@networkplumber.org
In-Reply-To: <20150107023307.GA4928@dragon>
On Wed, Jan 7, 2015 at 12:33 AM, Shawn Guo <shawn.guo@linaro.org> wrote:
> The GPR difference between SoCs can be encoded in device tree as well.
> It's pointless to repeat the same code pattern for every single
> platform, that need to set up GPR bits for enabling magic packet wake
> up, while the only difference is the register and bit offset.
>
> The platform code will become quite messy and unmaintainable if every
> device driver dump their GPR register setup code into platform.
>
> Sorry, but it's NACK from me.
This patch has already reached linux-next via Dave's tree as commit
456062b3ec6f5.
Should a revert patch be sent?
Also, this commit causes a build warning in linux-next:
arch/arm/mach-imx/mach-imx6sx.c:59:20: warning: 'imx6sx_enet_plt_init'
defined but not used [-Wunused-function]
^ permalink raw reply
* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
From: Tom Herbert @ 2015-01-07 3:46 UTC (permalink / raw)
To: Thomas Graf
Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
Linux Netdev List, dev@openvswitch.org
In-Reply-To: <7c5e8240263a22a69042c45a632c9940e5ae1062.1420594925.git.tgraf@suug.ch>
On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> The VXLAN receive code is currently conservative in what it accepts and
> will reject any frame that uses any of the reserved VXLAN protocol fields.
> The VXLAN draft specifies that "reserved fields MUST be set to zero on
> transmit and ignored on receive.".
>
IMO it is an unfortunate decision in VXLAN to ignore set reserved
fields on receive. Had they not done this, then adding a protocol
field to the header would have been feasible and we wouldn't need yet
another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
reserved bits set is the better behavior, but I think the comment
about this needs to be clear about why this diverges from RFC7348.
> Retain the current conservative parsing behaviour by default but allows
> these fields to be used by VXLAN extensions which are explicitly enabled
> on the VXLAN socket respectively VXLAN net_device.
>
Conceptually, VXLAN has both mandatory flags and optional flags for
extensions. You may want to look at the VXLAN RCO patches that added
an extension and infrastructure for them.
Tom
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> drivers/net/vxlan.c | 29 +++++++++++++++++++----------
> include/net/vxlan.h | 32 +++++++++++++++++++++++++++++---
> 2 files changed, 48 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 2ab0922..4d52aa9 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -65,7 +65,7 @@
> #define VXLAN_VID_MASK (VXLAN_N_VID - 1)
> #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
>
> -#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags required value. */
> +#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags default value. */
>
> /* UDP port for VXLAN traffic.
> * The IANA assigned port is 4789, but the Linux default is 8472
> @@ -1100,22 +1100,28 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> if (!pskb_may_pull(skb, VXLAN_HLEN))
> goto error;
>
> + vs = rcu_dereference_sk_user_data(sk);
> + if (!vs)
> + goto drop;
> +
> /* Return packets with reserved bits set */
> vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
> - if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
> - (vxh->vx_vni & htonl(0xff))) {
> - netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
> - ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
> - goto error;
> +
> + /* For backwards compatibility, only allow reserved fields to be
> + * used by VXLAN extensions if explicitly requested.
> + */
> + if (vs->exts) {
> + if (!vxh->vni_present)
> + goto error_invalid_header;
> + } else {
> + if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
> + (vxh->vx_vni & htonl(0xff)))
> + goto error_invalid_header;
> }
>
> if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
> goto drop;
>
> - vs = rcu_dereference_sk_user_data(sk);
> - if (!vs)
> - goto drop;
> -
> vs->rcv(vs, skb, vxh->vx_vni);
> return 0;
>
> @@ -1124,6 +1130,9 @@ drop:
> kfree_skb(skb);
> return 0;
>
> +error_invalid_header:
> + netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
> + ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
> error:
> /* Return non vxlan pkt */
> return 1;
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 903461a..3e98d31 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -11,10 +11,35 @@
> #define VNI_HASH_BITS 10
> #define VNI_HASH_SIZE (1<<VNI_HASH_BITS)
>
> -/* VXLAN protocol header */
> +/* VXLAN protocol header:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|R|R|I|R|R|R| Reserved |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * | VXLAN Network Identifier (VNI) | Reserved |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * I = 1 VXLAN Network Identifier (VNI) present
> + */
> struct vxlanhdr {
> - __be32 vx_flags;
> - __be32 vx_vni;
> + union {
> + struct {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> + __u8 reserved_flags1:3,
> + vni_present:1,
> + reserved_flags2:4;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> + __u8 reserved_flags2:4,
> + vni_present:1,
> + reserved_flags1:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> + __u8 vx_reserved1;
> + __be16 vx_reserved2;
> + };
> + __be32 vx_flags;
> + };
> + __be32 vx_vni;
> };
>
> struct vxlan_sock;
> @@ -25,6 +50,7 @@ struct vxlan_sock {
> struct hlist_node hlist;
> vxlan_rcv_t *rcv;
> void *data;
> + u32 exts;
> struct work_struct del_work;
> struct socket *sock;
> struct rcu_head rcu;
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Some warnings for skge - netpoll/netconsole induced
From: poma @ 2015-01-07 5:29 UTC (permalink / raw)
To: netdev
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2991 at net/core/netpoll.c:362 netpoll_send_skb_on_dev+0x25e/0x270()
netpoll_send_skb_on_dev(): enp1s6 enabled interrupts in poll (skge_xmit_frame+0x0/0x680 [skge])
...
CPU: 1 PID: 2991 Comm: modprobe Not tainted 3.19.0-0.rc3.git0.1.fc21.x86_64 #1
...
Call Trace:
[<ffffffff8176cf2d>] dump_stack+0x45/0x57
[<ffffffff8109b09a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8109b125>] warn_slowpath_fmt+0x55/0x70
[<ffffffff813a2aab>] ? delay_tsc+0x3b/0x70
[<ffffffffa00d3be0>] ? skge_rx_setup.isra.44+0x180/0x180 [skge]
[<ffffffff8167518e>] netpoll_send_skb_on_dev+0x25e/0x270
[<ffffffff81675462>] netpoll_send_udp+0x2c2/0x410
[<ffffffffa079b92f>] write_msg+0xcf/0x140 [netconsole]
[<ffffffff810ef2aa>] call_console_drivers.constprop.22+0xaa/0xf0
[<ffffffff810f0992>] console_unlock+0x362/0x450
[<ffffffff810f0e30>] vprintk_emit+0x3b0/0x550
[<ffffffff810f1159>] vprintk_default+0x29/0x40
[<ffffffff8176b42c>] printk+0x55/0x6b
[<ffffffff810f15cd>] register_console+0x1bd/0x380
[<ffffffffa07a0000>] ? 0xffffffffa07a0000
[<ffffffffa07a01c4>] init_netconsole+0x1c4/0x1000 [netconsole]
[<ffffffffa07a0000>] ? 0xffffffffa07a0000
[<ffffffff81002148>] do_one_initcall+0xd8/0x210
[<ffffffff811de302>] ? __vunmap+0xa2/0x100
[<ffffffff8111ef16>] load_module+0x2256/0x2800
[<ffffffff8111a750>] ? store_uevent+0x70/0x70
[<ffffffff8111f58d>] SyS_init_module+0xcd/0x120
[<ffffffff81773929>] system_call_fastpath+0x12/0x17
---[ end trace 257df3717d2db089 ]---
^ permalink raw reply
* [PATCH] sh_eth: Fix access to TRSCER register
From: Nobuhiro Iwamatsu @ 2015-01-07 5:35 UTC (permalink / raw)
To: netdev; +Cc: yoshihiro.shimoda.uh, linux-sh, Nobuhiro Iwamatsu
TRSCER register is configured differently by SoCs. TRSCER of R-Car is
RINT8 bit only valid, other bits are reserved bits.
This removes access to TRSCER register reserve bit.
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
---
drivers/net/ethernet/renesas/sh_eth.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index c29ba80..59ee457 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1294,7 +1294,11 @@ static int sh_eth_dev_init(struct net_device *ndev, bool start)
/* Frame recv control (enable multiple-packets per rx irq) */
sh_eth_write(ndev, RMCR_RNC, RMCR);
- sh_eth_write(ndev, DESC_I_RINT8 | DESC_I_RINT5 | DESC_I_TINT2, TRSCER);
+ if (mdp->cd->register_type == SH_ETH_REG_FAST_RCAR)
+ sh_eth_write(ndev, DESC_I_RINT8, TRSCER);
+ else
+ sh_eth_write(ndev, DESC_I_RINT8 | DESC_I_RINT5 | DESC_I_TINT2,
+ TRSCER);
if (mdp->cd->bculr)
sh_eth_write(ndev, 0x800, BCULR); /* Burst sycle set */
--
2.1.3
^ permalink raw reply related
* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: John Fastabend @ 2015-01-07 5:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
Jamal Hadi Salim, simon.horman, netdev@vger.kernel.org,
David S. Miller, Andy Gospodarek
In-Reply-To: <CAADnVQJ6MWVCPsBX6oZpQq=D=4iO7zE5rV8N1Vw5WHxDFGZ7xg@mail.gmail.com>
On 01/06/2015 05:14 PM, Alexei Starovoitov wrote:
> On Wed, Dec 31, 2014 at 11:45 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> + * [NET_FLOW_TABLE_IDENTIFIER_TYPE]
>> + * [NET_FLOW_TABLE_IDENTIFIER]
>> + * [NET_FLOW_TABLE_TABLES]
>> + * [NET_FLOW_TABLE]
>> + * [NET_FLOW_TABLE_ATTR_NAME]
>> + * [NET_FLOW_TABLE_ATTR_UID]
>> + * [NET_FLOW_TABLE_ATTR_SOURCE]
>> + * [NET_FLOW_TABLE_ATTR_SIZE]
> ...
>> + * Header definitions used to define headers with user friendly
>> + * names.
>> + *
>> + * [NET_FLOW_TABLE_HEADERS]
>> + * [NET_FLOW_HEADER]
>> + * [NET_FLOW_HEADER_ATTR_NAME]
>> + * [NET_FLOW_HEADER_ATTR_UID]
>> + * [NET_FLOW_HEADER_ATTR_FIELDS]
>> + * [NET_FLOW_HEADER_ATTR_FIELD]
>> + * [NET_FLOW_FIELD_ATTR_NAME]
>> + * [NET_FLOW_FIELD_ATTR_UID]
>> + * [NET_FLOW_FIELD_ATTR_BITWIDTH]
>> + * [NET_FLOW_HEADER_ATTR_FIELD]
>> + * [...]
>> + * [...]
>> + * Action definitions supported by tables
>> + *
>> + * [NET_FLOW_TABLE_ACTIONS]
>> + * [NET_FLOW_TABLE_ATTR_ACTIONS]
>> + * [NET_FLOW_ACTION]
>> + * [NET_FLOW_ACTION_ATTR_NAME]
>> + * [NET_FLOW_ACTION_ATTR_UID]
>> + * [NET_FLOW_ACTION_ATTR_SIGNATURE]
>> + * [NET_FLOW_ACTION_ARG]
> ..
>> + * Get Table Graph <Reply> description
>> + *
>> + * [NET_FLOW_TABLE_TABLE_GRAPH]
>> + * [TABLE_GRAPH_NODE]
>> + * [TABLE_GRAPH_NODE_UID]
>> + * [TABLE_GRAPH_NODE_JUMP]
>
> I think NET_FLOW prefix everywhere is too verbose.
> Especially since you've missed it in the above 3.
> and in patch 2 it is:
> NET_FLOW_FLOW
> which is kinda awkward.
> Can you abbreviate it to NFL_ or something else ?
hmm I'm open for a better name, NFL_ might work but seems
a bit cryptic to me. Maybe it is better than NET_FLOW.
Anyone other suggestions?
>
> I couldn't find get_headers() and get_header_graph()
> implementation on rocker side ?
It is in patch
[net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
+#ifdef CONFIG_NET_FLOW_TABLES
+ .ndo_flow_get_tables = rocker_get_tables,
+ .ndo_flow_get_headers = rocker_get_headers,
+ .ndo_flow_get_actions = rocker_get_actions,
+ .ndo_flow_get_tbl_graph = rocker_get_tgraph,
+ .ndo_flow_get_hdr_graph = rocker_get_hgraph,
+#endif
although v2 will address some good feedback and clean it up a bit.
> Could you describe how put_header_graph() will look like?
The signature for get_hdrs and get_hdrs_graph in my latest deck
(I'll push shortly still need to sort out the caching for
get_flows) look like this,
struct net_flow_hdr **(*ndo_flow_get_hdrs)(struct net_device *dev);
struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct
net_device *dev)
I could then use the following signatures for put hdrs,
int (*ndo_flow_put_hdrs)(struct net_device *dev, struct net_flow_hdr
**hdrs);
int (*ndo_flow_put_hdrs_graph(struct net_device *dev, struct
net_flow_hdr_graph **graph);
If the user supplies a new set of hdrs via put_hdrs it would
invalidate the hdrs graph though so we could either smash those two
operations into one or require both to occur when the device is down but
not allow it to come up without a graph operation. Currently my
preference is to smash the above two ops to this,
> int (*ndo_flow_put_hdrs)(struct net_device *dev, struct net_flow_hdr **hdrs, struct net_flow_hdr_graph **graph);
I've gone back and forth on this, doing updates to the hdrs/graph
while the device is online doesn't seem practical. I don't have access
to any devices that would support this. If your device is a software
one though (futuristic eBPF-OVS) it may make some sense.
> When it comes to parsing I'm assuming that hw will fall
> into N categories:
> - that has get_headers() and get_header_graph() only
> which would mean fixed parser
right this is rocker and what the initial series is trying to
address.
> - above plus put_header_graph() which will allow to
> rearrange some fixed sized headers ?
OK but I'm not sure where/if these devices exist. Maybe your
thinking of a software dataplane case? Would get_headers return
some headers/fields but not include them in the graph and then
expect the user to build a graph with them if the user needs
them. Are there restrictions on how the graph can be built
out? I guess I'm working with the assumption that the device
returns a complete parse graph for all combinations it supports.
Are there really devices that could only support certain combinations
and then if you shuffled the headers graph around support others?
I'm just not aware of any device.
> - above plus put_header() ?
> I'm having a hard time envisioning how that would
> look like.
This case makes more sense to me. The user supplies the definition
of the headers and the graph showing how they are related and the
driver can program the parser to work correctly. This implies a
flexible parser but I think some devices could support this. You
would need some attributes to define depth of the parser and such
restrictions if the device has restrictions on the parsers that
can be supported.
Maybe one concrete example would be to introduce a header tag that
was previously unknown to the device. You could define it using
the header/fields (bit/length/offset) notation and then give the
graph to let the parser "know" where to expect it. Finally this
could be passed to the driver and the parser could be generated.
To be honest though I would really be happy getting the 1st option
working.
> - ... ?
>
> also can we change a name from add_flow
> to add_entry or add_rule ?
> I think 'rule' fits better, since rule = field_ref+action
> and one real TCP flow may need multiple rules
> inserted into table, right?
> The whole thing can still be called 'flow API'...
add_rule/del_rule fine by me.
>
> will there be a put_table_graph() ?
> probably not, right? since as soon as HW supports
> 'goto' aciton, the meaning of table_graph is lost and
> it's actually just a set of disconnected tables and the
> way to jump from one into another is through 'goto'.
hmm I have support in another tree for create/destroy table. This
allows users to create tables and destroy them. .
I think the table_graph is still relevant its just the graph
is completely connected. I'm not sure you will really see hardware
like this anytime soon though :) or maybe I just mean I haven't
seen any.
>
> I think OVS guys are quiet, since they're skeptical
> that headers+header_graph approach can work?
> Would be great if they can share the experience...
>
hmm maybe but I could define all the headers OVS supports using
this. And then define a linear array of tables. It might be an
interesting exercise to build this on top of rocker.
--
John Fastabend Intel Corporation
^ permalink raw reply
* [PATCH] sh-eth: Set fdr_value of R-Car SoCs
From: Nobuhiro Iwamatsu @ 2015-01-07 5:40 UTC (permalink / raw)
To: netdev; +Cc: yoshihiro.shimoda.uh, linux-sh, Nobuhiro Iwamatsu
FDR register of R-Car set in fdr_value can have the original settings.
This sets the value that is suitable for each SoCs to fdr_value of R8A777x
and R8A779x.
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
---
drivers/net/ethernet/renesas/sh_eth.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 59ee457..62f2e88 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -473,6 +473,7 @@ static struct sh_eth_cpu_data r8a777x_data = {
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE |
EESR_ECI,
+ .fdr_value = 0x00000f0f,
.apr = 1,
.mpr = 1,
@@ -495,6 +496,7 @@ static struct sh_eth_cpu_data r8a779x_data = {
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE |
EESR_ECI,
+ .fdr_value = 0x00000f0f,
.apr = 1,
.mpr = 1,
--
2.1.3
^ permalink raw reply related
* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Rahul Sharma @ 2015-01-07 5:41 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netdev, linux-kernel, Hannes Frederic Sowa, netfilter-devel
In-Reply-To: <20150106224706.GA7416@salvia>
Hi Pablo,
On Wed, Jan 7, 2015 at 4:17 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
>> ipv6_find_hdr() currently assumes that the next-header field in the
>> fragment header of the non-first fragment is the "protocol number of
>> the last header" (here last header excludes any extension header
>> protocol numbers ) which is incorrect as per RFC2460. The next-header
>> value is the first header of the fragmentable part of the original
>> packet (which can be extension header as well).
>> This can create reassembly problems. For example: Fragmented
>> authenticated OSPFv3 packets (where AH header is inserted before the
>> protocol header). For the second fragment, the next header value in
>> the fragment header will be NEXTHDR_AUTH which is correct but
>> ipv6_find_hdr will return ENOENT since AH is an extension header
>> resulting in second fragment getting dropped. This check for the
>> presence of non-extension header needs to be removed.
>>
>> Signed-off-by: Rahul Sharma <rsharma@arista.com>
>> ---
>> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig 2015-01-06
>> 10:25:36.411419863 -0800
>> +++ linux-3.18.1/net/ipv6/exthdrs_core.c 2015-01-06
>> 10:51:45.819364986 -0800
>> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
>> * If the first fragment doesn't contain the final protocol header or
>> * NEXTHDR_NONE it is considered invalid.
>> *
>> - * Note that non-1st fragment is special case that "the protocol number
>> - * of last header" is "next header" field in Fragment header. In this case,
>> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
>> - * isn't NULL.
>> + * Note that non-1st fragment is special case that "the protocol number of the
>> + * first header of the fragmentable part of the original packet" is
>> + * "next header" field in the Fragment header. In this case, *offset is
>> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
>> + * NULL.
>> *
>> * if flags is not NULL and it's a fragment, then the frag flag
>> * IP6_FH_F_FRAG will be set. If it's an AH header, the
>> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
>>
>> _frag_off = ntohs(*fp) & ~0x7;
>> if (_frag_off) {
>> - if (target < 0 &&
>> - ((!ipv6_ext_hdr(hp->nexthdr)) ||
>
> This check assumes that the following headers cannot show up in the
> fragmented part of the IPv6 packet:
>
> 12 bool ipv6_ext_hdr(u8 nexthdr)
> 13 {
> 14 /*
> 15 * find out if nexthdr is an extension header or a protocol
> 16 */
> 17 return (nexthdr == NEXTHDR_HOP) ||
> 18 (nexthdr == NEXTHDR_ROUTING) ||
> 19 (nexthdr == NEXTHDR_FRAGMENT) ||
> 20 (nexthdr == NEXTHDR_AUTH) ||
> 21 (nexthdr == NEXTHDR_NONE) ||
> 22 (nexthdr == NEXTHDR_DEST);
>
>> - hp->nexthdr == NEXTHDR_NONE)) {
>> + if (target < 0) {
>> if (fragoff)
>> *fragoff = _frag_off;
>> return hp->nexthdr;
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
I think this is incorrect. Authentication header shows up in the
fragmentable part of the original IPv6 packet. So, for the non-first
fragments the next-header field value can be NEXTHDR_AUTH.
Thanks
^ permalink raw reply
* [PATCH net-next v2 0/7] Involve rhashtable_lookup_insert routine
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
The series aims to involve rhashtable_lookup_insert() to guarantee
that the process of lookup and insertion of an object from/into hash
table is finished atomically, allowing rhashtable's users not to
introduce an extra lock during search and insertion. For example,
tipc socket is the first user benefiting from this enhancement.
v2 changes:
- fix the issue of waking up worker thread under a wrong condition in
patch #2, which is pointed by Thomas.
- move a comment from rhashtable_inser() to rhashtable_wakeup_worker()
according to Thomas's suggestion in patch #2.
- indent the third line of condition statement in
rhashtable_wakeup_worker() to inner bracket in patch #2.
- drop patch #3 of v1 series
- fix an issue of being unable to remove an object from hash table in
certain special case in patch #4.
- involve a new patch #5 to avoid unnecessary wakeup for worker queue
thread
- involve a new patch #6 to initialize atomic "nelems" variable
- adjust "nelem_hint" value from 256 to 192 avoiding to unnecessarily
to shrink hash table from the beginning phase in patch #7.
v1 changes:
But before rhashtable_lookup_insert() is involved, the following
optimizations need to be first done:
- simplify rhashtable_lookup by reusing rhashtable_lookup_compare()
- introduce rhashtable_wakeup_worker() to further reduce duplicated
code in patch #2
- fix an issue in patch #3
- involve rhashtable_lookup_insert(). But in this version, we firstly
use rhashtable_lookup() to search duplicate key in both old and new
bucket table; secondly introduce another __rhashtable_insert() helper
function to reduce the duplicated code between rhashtable_insert()
and rhashtable_lookup_insert().
- add patch #5 into the series as it depends on above patches. But in
this version, no change is made comparing with its previous version.
Ying Xue (7):
rhashtable: optimize rhashtable_lookup routine
rhashtable: introduce rhashtable_wakeup_worker helper function
rhashtable: involve rhashtable_lookup_insert routine
rhashtable: future table needs to be traversed when remove an object
rhashtable: avoid unnecessary wakeup for worker queue
rhashtable: initialize atomic nelems variable
tipc: convert tipc reference table to use generic rhashtable
include/linux/rhashtable.h | 3 +-
lib/rhashtable.c | 182 +++++++++++------
net/tipc/Kconfig | 12 --
net/tipc/config.c | 24 +--
net/tipc/core.c | 10 +-
net/tipc/core.h | 3 -
net/tipc/socket.c | 480 ++++++++++++++++----------------------------
net/tipc/socket.h | 4 +-
8 files changed, 306 insertions(+), 412 deletions(-)
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply
* [PATCH net-next v2 1/7] rhashtable: optimize rhashtable_lookup routine
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
Define an internal compare function and relevant compare argument,
and then make use of rhashtable_lookup_compare() to lookup key in
hash table, reducing duplicated code between rhashtable_lookup()
and rhashtable_lookup_compare().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
lib/rhashtable.c | 41 ++++++++++++++++++-----------------------
1 file changed, 18 insertions(+), 23 deletions(-)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cbad192..f2fdd7a 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -612,6 +612,19 @@ restart:
}
EXPORT_SYMBOL_GPL(rhashtable_remove);
+struct rhashtable_compare_arg {
+ struct rhashtable *ht;
+ const void *key;
+};
+
+static bool rhashtable_compare(void *ptr, void *arg)
+{
+ struct rhashtable_compare_arg *x = arg;
+ struct rhashtable *ht = x->ht;
+
+ return !memcmp(ptr + ht->p.key_offset, x->key, ht->p.key_len);
+}
+
/**
* rhashtable_lookup - lookup key in hash table
* @ht: hash table
@@ -627,32 +640,14 @@ EXPORT_SYMBOL_GPL(rhashtable_remove);
*/
void *rhashtable_lookup(struct rhashtable *ht, const void *key)
{
- const struct bucket_table *tbl, *old_tbl;
- struct rhash_head *he;
- u32 hash;
+ struct rhashtable_compare_arg arg = {
+ .ht = ht,
+ .key = key,
+ };
BUG_ON(!ht->p.key_len);
- rcu_read_lock();
- old_tbl = rht_dereference_rcu(ht->tbl, ht);
- tbl = rht_dereference_rcu(ht->future_tbl, ht);
- hash = key_hashfn(ht, key, ht->p.key_len);
-restart:
- rht_for_each_rcu(he, tbl, rht_bucket_index(tbl, hash)) {
- if (memcmp(rht_obj(ht, he) + ht->p.key_offset, key,
- ht->p.key_len))
- continue;
- rcu_read_unlock();
- return rht_obj(ht, he);
- }
-
- if (unlikely(tbl != old_tbl)) {
- tbl = old_tbl;
- goto restart;
- }
-
- rcu_read_unlock();
- return NULL;
+ return rhashtable_lookup_compare(ht, key, &rhashtable_compare, &arg);
}
EXPORT_SYMBOL_GPL(rhashtable_lookup);
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 2/7] rhashtable: introduce rhashtable_wakeup_worker helper function
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
Introduce rhashtable_wakeup_worker() helper function to reduce
duplicated code where to wake up worker.
By the way, as long as the both "future_tbl" and "tbl" bucket table
pointers point to the same bucket array, we should try to wake up
the resizing worker thread, otherwise, it indicates the work of
resizing hash table is not finished yet. However, currently we will
wake up the worker thread only when the two pointers point to
different bucket array. Obviously this is wrong. So, the issue is
also fixed as well in the patch.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
lib/rhashtable.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index f2fdd7a..2000685 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -492,6 +492,19 @@ static void rht_deferred_worker(struct work_struct *work)
mutex_unlock(&ht->mutex);
}
+static void rhashtable_wakeup_worker(struct rhashtable *ht)
+{
+ struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
+ struct bucket_table *new_tbl = rht_dereference_rcu(ht->future_tbl, ht);
+ size_t size = tbl->size;
+
+ /* Only adjust the table if no resizing is currently in progress. */
+ if (tbl == new_tbl &&
+ ((ht->p.grow_decision && ht->p.grow_decision(ht, size)) ||
+ (ht->p.shrink_decision && ht->p.shrink_decision(ht, size))))
+ schedule_delayed_work(&ht->run_work, 0);
+}
+
/**
* rhashtable_insert - insert object into hash hash table
* @ht: hash table
@@ -532,10 +545,7 @@ void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
atomic_inc(&ht->nelems);
- /* Only grow the table if no resizing is currently in progress. */
- if (ht->tbl != ht->future_tbl &&
- ht->p.grow_decision && ht->p.grow_decision(ht, tbl->size))
- schedule_delayed_work(&ht->run_work, 0);
+ rhashtable_wakeup_worker(ht);
rcu_read_unlock();
}
@@ -584,10 +594,7 @@ restart:
spin_unlock_bh(lock);
- if (ht->tbl != ht->future_tbl &&
- ht->p.shrink_decision &&
- ht->p.shrink_decision(ht, tbl->size))
- schedule_delayed_work(&ht->run_work, 0);
+ rhashtable_wakeup_worker(ht);
rcu_read_unlock();
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 4/7] rhashtable: future table needs to be traversed when remove an object
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
When remove an object from hash table, we currently only traverse old
bucket table to check whether the object exists. If the object is not
found in it, we will try again. But in the second search loop, we still
search the object from the old table instead of future table. As a
result, the object may be not removed from hash table especially when
resizing is currently in progress and the object is just saved in the
future table.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
lib/rhashtable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4430233..1aef942 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -608,10 +608,10 @@ restart:
return true;
}
- if (tbl != rht_dereference_rcu(ht->tbl, ht)) {
+ if (tbl != rht_dereference_rcu(ht->future_tbl, ht)) {
spin_unlock_bh(lock);
- tbl = rht_dereference_rcu(ht->tbl, ht);
+ tbl = rht_dereference_rcu(ht->future_tbl, ht);
hash = head_hashfn(ht, tbl, obj);
lock = bucket_lock(tbl, hash);
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 5/7] rhashtable: avoid unnecessary wakeup for worker queue
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
Move condition statements of verifying whether hash table size exceeds
its maximum threshold or reaches its minimum threshold from resizing
functions to resizing decision functions, avoiding unnecessary wakeup
for worker queue thread.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
include/linux/rhashtable.h | 2 +-
lib/rhashtable.c | 18 +++++++-----------
2 files changed, 8 insertions(+), 12 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 73c913f..326acd8 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -113,7 +113,7 @@ struct rhashtable {
struct bucket_table __rcu *tbl;
struct bucket_table __rcu *future_tbl;
atomic_t nelems;
- size_t shift;
+ atomic_t shift;
struct rhashtable_params p;
struct delayed_work run_work;
struct mutex mutex;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 1aef942..7fb474b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -199,7 +199,8 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size)
{
/* Expand table when exceeding 75% load */
- return atomic_read(&ht->nelems) > (new_size / 4 * 3);
+ return atomic_read(&ht->nelems) > (new_size / 4 * 3) &&
+ (ht->p.max_shift && atomic_read(&ht->shift) < ht->p.max_shift);
}
EXPORT_SYMBOL_GPL(rht_grow_above_75);
@@ -211,7 +212,8 @@ EXPORT_SYMBOL_GPL(rht_grow_above_75);
bool rht_shrink_below_30(const struct rhashtable *ht, size_t new_size)
{
/* Shrink table beneath 30% load */
- return atomic_read(&ht->nelems) < (new_size * 3 / 10);
+ return atomic_read(&ht->nelems) < (new_size * 3 / 10) &&
+ (atomic_read(&ht->shift) > ht->p.min_shift);
}
EXPORT_SYMBOL_GPL(rht_shrink_below_30);
@@ -318,14 +320,11 @@ int rhashtable_expand(struct rhashtable *ht)
ASSERT_RHT_MUTEX(ht);
- if (ht->p.max_shift && ht->shift >= ht->p.max_shift)
- return 0;
-
new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
if (new_tbl == NULL)
return -ENOMEM;
- ht->shift++;
+ atomic_inc(&ht->shift);
/* Make insertions go into the new, empty table right away. Deletions
* and lookups will be attempted in both tables until we synchronize.
@@ -421,9 +420,6 @@ int rhashtable_shrink(struct rhashtable *ht)
ASSERT_RHT_MUTEX(ht);
- if (ht->shift <= ht->p.min_shift)
- return 0;
-
new_tbl = bucket_table_alloc(ht, tbl->size / 2);
if (new_tbl == NULL)
return -ENOMEM;
@@ -462,7 +458,7 @@ int rhashtable_shrink(struct rhashtable *ht)
/* Publish the new, valid hash table */
rcu_assign_pointer(ht->tbl, new_tbl);
- ht->shift--;
+ atomic_dec(&ht->shift);
/* Wait for readers. No new readers will have references to the
* old hash table.
@@ -851,7 +847,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
if (tbl == NULL)
return -ENOMEM;
- ht->shift = ilog2(tbl->size);
+ atomic_set(&ht->shift, ilog2(tbl->size));
RCU_INIT_POINTER(ht->tbl, tbl);
RCU_INIT_POINTER(ht->future_tbl, tbl);
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
lib/rhashtable.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 7fb474b..8023b55 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -847,6 +847,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
if (tbl == NULL)
return -ENOMEM;
+ atomic_set(&ht->nelems, 0);
atomic_set(&ht->shift, ilog2(tbl->size));
RCU_INIT_POINTER(ht->tbl, tbl);
RCU_INIT_POINTER(ht->future_tbl, tbl);
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 7/7] tipc: convert tipc reference table to use generic rhashtable
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
net/tipc/Kconfig | 12 --
net/tipc/config.c | 24 +--
net/tipc/core.c | 10 +-
net/tipc/core.h | 3 -
net/tipc/socket.c | 480 +++++++++++++++++++----------------------------------
net/tipc/socket.h | 4 +-
6 files changed, 180 insertions(+), 353 deletions(-)
diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig
index c890848..91c8a8e 100644
--- a/net/tipc/Kconfig
+++ b/net/tipc/Kconfig
@@ -20,18 +20,6 @@ menuconfig TIPC
If in doubt, say N.
-config TIPC_PORTS
- int "Maximum number of ports in a node"
- depends on TIPC
- range 127 65535
- default "8191"
- help
- Specifies how many ports can be supported by a node.
- Can range from 127 to 65535 ports; default is 8191.
-
- Setting this to a smaller value saves some memory,
- setting it to higher allows for more ports.
-
config TIPC_MEDIA_IB
bool "InfiniBand media type support"
depends on TIPC && INFINIBAND_IPOIB
diff --git a/net/tipc/config.c b/net/tipc/config.c
index 876f4c6..0b3a90e 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -183,22 +183,6 @@ static struct sk_buff *cfg_set_own_addr(void)
return tipc_cfg_reply_error_string("cannot change to network mode");
}
-static struct sk_buff *cfg_set_max_ports(void)
-{
- u32 value;
-
- if (!TLV_CHECK(req_tlv_area, req_tlv_space, TIPC_TLV_UNSIGNED))
- return tipc_cfg_reply_error_string(TIPC_CFG_TLV_ERROR);
- value = ntohl(*(__be32 *)TLV_DATA(req_tlv_area));
- if (value == tipc_max_ports)
- return tipc_cfg_reply_none();
- if (value < 127 || value > 65535)
- return tipc_cfg_reply_error_string(TIPC_CFG_INVALID_VALUE
- " (max ports must be 127-65535)");
- return tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED
- " (cannot change max ports while TIPC is active)");
-}
-
static struct sk_buff *cfg_set_netid(void)
{
u32 value;
@@ -285,15 +269,9 @@ struct sk_buff *tipc_cfg_do_cmd(u32 orig_node, u16 cmd, const void *request_area
case TIPC_CMD_SET_NODE_ADDR:
rep_tlv_buf = cfg_set_own_addr();
break;
- case TIPC_CMD_SET_MAX_PORTS:
- rep_tlv_buf = cfg_set_max_ports();
- break;
case TIPC_CMD_SET_NETID:
rep_tlv_buf = cfg_set_netid();
break;
- case TIPC_CMD_GET_MAX_PORTS:
- rep_tlv_buf = tipc_cfg_reply_unsigned(tipc_max_ports);
- break;
case TIPC_CMD_GET_NETID:
rep_tlv_buf = tipc_cfg_reply_unsigned(tipc_net_id);
break;
@@ -317,6 +295,8 @@ struct sk_buff *tipc_cfg_do_cmd(u32 orig_node, u16 cmd, const void *request_area
case TIPC_CMD_SET_REMOTE_MNG:
case TIPC_CMD_GET_REMOTE_MNG:
case TIPC_CMD_DUMP_LOG:
+ case TIPC_CMD_SET_MAX_PORTS:
+ case TIPC_CMD_GET_MAX_PORTS:
rep_tlv_buf = tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED
" (obsolete command)");
break;
diff --git a/net/tipc/core.c b/net/tipc/core.c
index a5737b8..71b2ada 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -34,6 +34,8 @@
* POSSIBILITY OF SUCH DAMAGE.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include "core.h"
#include "name_table.h"
#include "subscr.h"
@@ -47,7 +49,6 @@ int tipc_random __read_mostly;
/* configurable TIPC parameters */
u32 tipc_own_addr __read_mostly;
-int tipc_max_ports __read_mostly;
int tipc_net_id __read_mostly;
int sysctl_tipc_rmem[3] __read_mostly; /* min/default/max */
@@ -84,9 +85,9 @@ static void tipc_core_stop(void)
tipc_netlink_stop();
tipc_subscr_stop();
tipc_nametbl_stop();
- tipc_sk_ref_table_stop();
tipc_socket_stop();
tipc_unregister_sysctl();
+ tipc_sk_rht_destroy();
}
/**
@@ -98,7 +99,7 @@ static int tipc_core_start(void)
get_random_bytes(&tipc_random, sizeof(tipc_random));
- err = tipc_sk_ref_table_init(tipc_max_ports, tipc_random);
+ err = tipc_sk_rht_init();
if (err)
goto out_reftbl;
@@ -138,7 +139,7 @@ out_socket:
out_netlink:
tipc_nametbl_stop();
out_nametbl:
- tipc_sk_ref_table_stop();
+ tipc_sk_rht_destroy();
out_reftbl:
return err;
}
@@ -150,7 +151,6 @@ static int __init tipc_init(void)
pr_info("Activated (version " TIPC_MOD_VER ")\n");
tipc_own_addr = 0;
- tipc_max_ports = CONFIG_TIPC_PORTS;
tipc_net_id = 4711;
sysctl_tipc_rmem[0] = TIPC_CONN_OVERLOAD_LIMIT >> 4 <<
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 8460213..56fe422 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -37,8 +37,6 @@
#ifndef _TIPC_CORE_H
#define _TIPC_CORE_H
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
#include <linux/tipc.h>
#include <linux/tipc_config.h>
#include <linux/tipc_netlink.h>
@@ -79,7 +77,6 @@ int tipc_snprintf(char *buf, int len, const char *fmt, ...);
* Global configuration variables
*/
extern u32 tipc_own_addr __read_mostly;
-extern int tipc_max_ports __read_mostly;
extern int tipc_net_id __read_mostly;
extern int sysctl_tipc_rmem[3] __read_mostly;
extern int sysctl_tipc_named_timeout __read_mostly;
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 4731cad..701f31b 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -34,22 +34,25 @@
* POSSIBILITY OF SUCH DAMAGE.
*/
+#include <linux/rhashtable.h>
+#include <linux/jhash.h>
#include "core.h"
#include "name_table.h"
#include "node.h"
#include "link.h"
-#include <linux/export.h>
#include "config.h"
#include "socket.h"
-#define SS_LISTENING -1 /* socket is listening */
-#define SS_READY -2 /* socket is connectionless */
+#define SS_LISTENING -1 /* socket is listening */
+#define SS_READY -2 /* socket is connectionless */
-#define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
-#define CONN_PROBING_INTERVAL 3600000 /* [ms] => 1 h */
-#define TIPC_FWD_MSG 1
-#define TIPC_CONN_OK 0
-#define TIPC_CONN_PROBING 1
+#define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
+#define CONN_PROBING_INTERVAL 3600000 /* [ms] => 1 h */
+#define TIPC_FWD_MSG 1
+#define TIPC_CONN_OK 0
+#define TIPC_CONN_PROBING 1
+#define TIPC_MAX_PORT 0xffffffff
+#define TIPC_MIN_PORT 1
/**
* struct tipc_sock - TIPC socket structure
@@ -59,7 +62,7 @@
* @conn_instance: TIPC instance used when connection was established
* @published: non-zero if port has one or more associated names
* @max_pkt: maximum packet size "hint" used when building messages sent by port
- * @ref: unique reference to port in TIPC object registry
+ * @portid: unique port identity in TIPC socket hash table
* @phdr: preformatted message header used when sending messages
* @port_list: adjacent ports in TIPC's global list of ports
* @publications: list of publications for port
@@ -74,6 +77,8 @@
* @link_cong: non-zero if owner must sleep because of link congestion
* @sent_unacked: # messages sent by socket, and not yet acked by peer
* @rcv_unacked: # messages read by user, but not yet acked back to peer
+ * @node: hash table node
+ * @rcu: rcu struct for tipc_sock
*/
struct tipc_sock {
struct sock sk;
@@ -82,7 +87,7 @@ struct tipc_sock {
u32 conn_instance;
int published;
u32 max_pkt;
- u32 ref;
+ u32 portid;
struct tipc_msg phdr;
struct list_head sock_list;
struct list_head publications;
@@ -95,6 +100,8 @@ struct tipc_sock {
bool link_cong;
uint sent_unacked;
uint rcv_unacked;
+ struct rhash_head node;
+ struct rcu_head rcu;
};
static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb);
@@ -103,16 +110,14 @@ static void tipc_write_space(struct sock *sk);
static int tipc_release(struct socket *sock);
static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags);
static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
-static void tipc_sk_timeout(unsigned long ref);
+static void tipc_sk_timeout(unsigned long portid);
static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
struct tipc_name_seq const *seq);
static int tipc_sk_withdraw(struct tipc_sock *tsk, uint scope,
struct tipc_name_seq const *seq);
-static u32 tipc_sk_ref_acquire(struct tipc_sock *tsk);
-static void tipc_sk_ref_discard(u32 ref);
-static struct tipc_sock *tipc_sk_get(u32 ref);
-static struct tipc_sock *tipc_sk_get_next(u32 *ref);
-static void tipc_sk_put(struct tipc_sock *tsk);
+static struct tipc_sock *tipc_sk_lookup(u32 portid);
+static int tipc_sk_insert(struct tipc_sock *tsk);
+static void tipc_sk_remove(struct tipc_sock *tsk);
static const struct proto_ops packet_ops;
static const struct proto_ops stream_ops;
@@ -174,6 +179,9 @@ static const struct nla_policy tipc_nl_sock_policy[TIPC_NLA_SOCK_MAX + 1] = {
* - port reference
*/
+/* Protects tipc socket hash table mutations */
+static struct rhashtable tipc_sk_rht;
+
static u32 tsk_peer_node(struct tipc_sock *tsk)
{
return msg_destnode(&tsk->phdr);
@@ -305,7 +313,6 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
struct sock *sk;
struct tipc_sock *tsk;
struct tipc_msg *msg;
- u32 ref;
/* Validate arguments */
if (unlikely(protocol != 0))
@@ -339,24 +346,22 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
return -ENOMEM;
tsk = tipc_sk(sk);
- ref = tipc_sk_ref_acquire(tsk);
- if (!ref) {
- pr_warn("Socket create failed; reference table exhausted\n");
- return -ENOMEM;
- }
tsk->max_pkt = MAX_PKT_DEFAULT;
- tsk->ref = ref;
INIT_LIST_HEAD(&tsk->publications);
msg = &tsk->phdr;
tipc_msg_init(msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG,
NAMED_H_SIZE, 0);
- msg_set_origport(msg, ref);
/* Finish initializing socket data structures */
sock->ops = ops;
sock->state = state;
sock_init_data(sock, sk);
- k_init_timer(&tsk->timer, (Handler)tipc_sk_timeout, ref);
+ if (tipc_sk_insert(tsk)) {
+ pr_warn("Socket create failed; port numbrer exhausted\n");
+ return -EINVAL;
+ }
+ msg_set_origport(msg, tsk->portid);
+ k_init_timer(&tsk->timer, (Handler)tipc_sk_timeout, tsk->portid);
sk->sk_backlog_rcv = tipc_backlog_rcv;
sk->sk_rcvbuf = sysctl_tipc_rmem[1];
sk->sk_data_ready = tipc_data_ready;
@@ -442,6 +447,13 @@ int tipc_sock_accept_local(struct socket *sock, struct socket **newsock,
return ret;
}
+static void tipc_sk_callback(struct rcu_head *head)
+{
+ struct tipc_sock *tsk = container_of(head, struct tipc_sock, rcu);
+
+ sock_put(&tsk->sk);
+}
+
/**
* tipc_release - destroy a TIPC socket
* @sock: socket to destroy
@@ -491,7 +503,7 @@ static int tipc_release(struct socket *sock)
(sock->state == SS_CONNECTED)) {
sock->state = SS_DISCONNECTING;
tsk->connected = 0;
- tipc_node_remove_conn(dnode, tsk->ref);
+ tipc_node_remove_conn(dnode, tsk->portid);
}
if (tipc_msg_reverse(skb, &dnode, TIPC_ERR_NO_PORT))
tipc_link_xmit_skb(skb, dnode, 0);
@@ -499,16 +511,16 @@ static int tipc_release(struct socket *sock)
}
tipc_sk_withdraw(tsk, 0, NULL);
- tipc_sk_ref_discard(tsk->ref);
k_cancel_timer(&tsk->timer);
+ tipc_sk_remove(tsk);
if (tsk->connected) {
skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG,
SHORT_H_SIZE, 0, dnode, tipc_own_addr,
tsk_peer_port(tsk),
- tsk->ref, TIPC_ERR_NO_PORT);
+ tsk->portid, TIPC_ERR_NO_PORT);
if (skb)
- tipc_link_xmit_skb(skb, dnode, tsk->ref);
- tipc_node_remove_conn(dnode, tsk->ref);
+ tipc_link_xmit_skb(skb, dnode, tsk->portid);
+ tipc_node_remove_conn(dnode, tsk->portid);
}
k_term_timer(&tsk->timer);
@@ -518,7 +530,8 @@ static int tipc_release(struct socket *sock)
/* Reject any messages that accumulated in backlog queue */
sock->state = SS_DISCONNECTING;
release_sock(sk);
- sock_put(sk);
+
+ call_rcu(&tsk->rcu, tipc_sk_callback);
sock->sk = NULL;
return 0;
@@ -611,7 +624,7 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
addr->addr.id.ref = tsk_peer_port(tsk);
addr->addr.id.node = tsk_peer_node(tsk);
} else {
- addr->addr.id.ref = tsk->ref;
+ addr->addr.id.ref = tsk->portid;
addr->addr.id.node = tipc_own_addr;
}
@@ -946,7 +959,7 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
}
new_mtu:
- mtu = tipc_node_get_mtu(dnode, tsk->ref);
+ mtu = tipc_node_get_mtu(dnode, tsk->portid);
__skb_queue_head_init(&head);
rc = tipc_msg_build(mhdr, m, 0, dsz, mtu, &head);
if (rc < 0)
@@ -955,7 +968,7 @@ new_mtu:
do {
skb = skb_peek(&head);
TIPC_SKB_CB(skb)->wakeup_pending = tsk->link_cong;
- rc = tipc_link_xmit(&head, dnode, tsk->ref);
+ rc = tipc_link_xmit(&head, dnode, tsk->portid);
if (likely(rc >= 0)) {
if (sock->state != SS_READY)
sock->state = SS_CONNECTING;
@@ -1028,7 +1041,7 @@ static int tipc_send_stream(struct kiocb *iocb, struct socket *sock,
struct tipc_msg *mhdr = &tsk->phdr;
struct sk_buff_head head;
DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name);
- u32 ref = tsk->ref;
+ u32 portid = tsk->portid;
int rc = -EINVAL;
long timeo;
u32 dnode;
@@ -1067,7 +1080,7 @@ next:
goto exit;
do {
if (likely(!tsk_conn_cong(tsk))) {
- rc = tipc_link_xmit(&head, dnode, ref);
+ rc = tipc_link_xmit(&head, dnode, portid);
if (likely(!rc)) {
tsk->sent_unacked++;
sent += send;
@@ -1076,7 +1089,7 @@ next:
goto next;
}
if (rc == -EMSGSIZE) {
- tsk->max_pkt = tipc_node_get_mtu(dnode, ref);
+ tsk->max_pkt = tipc_node_get_mtu(dnode, portid);
goto next;
}
if (rc != -ELINKCONG)
@@ -1130,8 +1143,8 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
tsk->probing_state = TIPC_CONN_OK;
tsk->connected = 1;
k_start_timer(&tsk->timer, tsk->probing_interval);
- tipc_node_add_conn(peer_node, tsk->ref, peer_port);
- tsk->max_pkt = tipc_node_get_mtu(peer_node, tsk->ref);
+ tipc_node_add_conn(peer_node, tsk->portid, peer_port);
+ tsk->max_pkt = tipc_node_get_mtu(peer_node, tsk->portid);
}
/**
@@ -1238,7 +1251,7 @@ static void tipc_sk_send_ack(struct tipc_sock *tsk, uint ack)
if (!tsk->connected)
return;
skb = tipc_msg_create(CONN_MANAGER, CONN_ACK, INT_H_SIZE, 0, dnode,
- tipc_own_addr, peer_port, tsk->ref, TIPC_OK);
+ tipc_own_addr, peer_port, tsk->portid, TIPC_OK);
if (!skb)
return;
msg = buf_msg(skb);
@@ -1552,7 +1565,7 @@ static int filter_connect(struct tipc_sock *tsk, struct sk_buff **buf)
tsk->connected = 0;
/* let timer expire on it's own */
tipc_node_remove_conn(tsk_peer_node(tsk),
- tsk->ref);
+ tsk->portid);
}
retval = TIPC_OK;
}
@@ -1743,7 +1756,7 @@ int tipc_sk_rcv(struct sk_buff *skb)
u32 dnode;
/* Validate destination and message */
- tsk = tipc_sk_get(dport);
+ tsk = tipc_sk_lookup(dport);
if (unlikely(!tsk)) {
rc = tipc_msg_eval(skb, &dnode);
goto exit;
@@ -1763,7 +1776,7 @@ int tipc_sk_rcv(struct sk_buff *skb)
rc = -TIPC_ERR_OVERLOAD;
}
spin_unlock_bh(&sk->sk_lock.slock);
- tipc_sk_put(tsk);
+ sock_put(sk);
if (likely(!rc))
return 0;
exit:
@@ -2050,20 +2063,20 @@ restart:
goto restart;
}
if (tipc_msg_reverse(skb, &dnode, TIPC_CONN_SHUTDOWN))
- tipc_link_xmit_skb(skb, dnode, tsk->ref);
- tipc_node_remove_conn(dnode, tsk->ref);
+ tipc_link_xmit_skb(skb, dnode, tsk->portid);
+ tipc_node_remove_conn(dnode, tsk->portid);
} else {
dnode = tsk_peer_node(tsk);
skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE,
TIPC_CONN_MSG, SHORT_H_SIZE,
0, dnode, tipc_own_addr,
tsk_peer_port(tsk),
- tsk->ref, TIPC_CONN_SHUTDOWN);
- tipc_link_xmit_skb(skb, dnode, tsk->ref);
+ tsk->portid, TIPC_CONN_SHUTDOWN);
+ tipc_link_xmit_skb(skb, dnode, tsk->portid);
}
tsk->connected = 0;
sock->state = SS_DISCONNECTING;
- tipc_node_remove_conn(dnode, tsk->ref);
+ tipc_node_remove_conn(dnode, tsk->portid);
/* fall through */
case SS_DISCONNECTING:
@@ -2084,14 +2097,14 @@ restart:
return res;
}
-static void tipc_sk_timeout(unsigned long ref)
+static void tipc_sk_timeout(unsigned long portid)
{
struct tipc_sock *tsk;
struct sock *sk;
struct sk_buff *skb = NULL;
u32 peer_port, peer_node;
- tsk = tipc_sk_get(ref);
+ tsk = tipc_sk_lookup(portid);
if (!tsk)
return;
@@ -2108,20 +2121,20 @@ static void tipc_sk_timeout(unsigned long ref)
/* Previous probe not answered -> self abort */
skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG,
SHORT_H_SIZE, 0, tipc_own_addr,
- peer_node, ref, peer_port,
+ peer_node, portid, peer_port,
TIPC_ERR_NO_PORT);
} else {
skb = tipc_msg_create(CONN_MANAGER, CONN_PROBE, INT_H_SIZE,
0, peer_node, tipc_own_addr,
- peer_port, ref, TIPC_OK);
+ peer_port, portid, TIPC_OK);
tsk->probing_state = TIPC_CONN_PROBING;
k_start_timer(&tsk->timer, tsk->probing_interval);
}
bh_unlock_sock(sk);
if (skb)
- tipc_link_xmit_skb(skb, peer_node, ref);
+ tipc_link_xmit_skb(skb, peer_node, portid);
exit:
- tipc_sk_put(tsk);
+ sock_put(sk);
}
static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
@@ -2132,12 +2145,12 @@ static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
if (tsk->connected)
return -EINVAL;
- key = tsk->ref + tsk->pub_count + 1;
- if (key == tsk->ref)
+ key = tsk->portid + tsk->pub_count + 1;
+ if (key == tsk->portid)
return -EADDRINUSE;
publ = tipc_nametbl_publish(seq->type, seq->lower, seq->upper,
- scope, tsk->ref, key);
+ scope, tsk->portid, key);
if (unlikely(!publ))
return -EINVAL;
@@ -2188,9 +2201,9 @@ static int tipc_sk_show(struct tipc_sock *tsk, char *buf,
ret = tipc_snprintf(buf, len, "<%u.%u.%u:%u>:",
tipc_zone(tipc_own_addr),
tipc_cluster(tipc_own_addr),
- tipc_node(tipc_own_addr), tsk->ref);
+ tipc_node(tipc_own_addr), tsk->portid);
else
- ret = tipc_snprintf(buf, len, "%-10u:", tsk->ref);
+ ret = tipc_snprintf(buf, len, "%-10u:", tsk->portid);
if (tsk->connected) {
u32 dport = tsk_peer_port(tsk);
@@ -2224,13 +2237,15 @@ static int tipc_sk_show(struct tipc_sock *tsk, char *buf,
struct sk_buff *tipc_sk_socks_show(void)
{
+ const struct bucket_table *tbl;
+ struct rhash_head *pos;
struct sk_buff *buf;
struct tlv_desc *rep_tlv;
char *pb;
int pb_len;
struct tipc_sock *tsk;
int str_len = 0;
- u32 ref = 0;
+ int i;
buf = tipc_cfg_reply_alloc(TLV_SPACE(ULTRA_STRING_MAX_LEN));
if (!buf)
@@ -2239,14 +2254,18 @@ struct sk_buff *tipc_sk_socks_show(void)
pb = TLV_DATA(rep_tlv);
pb_len = ULTRA_STRING_MAX_LEN;
- tsk = tipc_sk_get_next(&ref);
- for (; tsk; tsk = tipc_sk_get_next(&ref)) {
- lock_sock(&tsk->sk);
- str_len += tipc_sk_show(tsk, pb + str_len,
- pb_len - str_len, 0);
- release_sock(&tsk->sk);
- tipc_sk_put(tsk);
+ rcu_read_lock();
+ tbl = rht_dereference_rcu((&tipc_sk_rht)->tbl, &tipc_sk_rht);
+ for (i = 0; i < tbl->size; i++) {
+ rht_for_each_entry_rcu(tsk, pos, tbl, i, node) {
+ spin_lock_bh(&tsk->sk.sk_lock.slock);
+ str_len += tipc_sk_show(tsk, pb + str_len,
+ pb_len - str_len, 0);
+ spin_unlock_bh(&tsk->sk.sk_lock.slock);
+ }
}
+ rcu_read_unlock();
+
str_len += 1; /* for "\0" */
skb_put(buf, TLV_SPACE(str_len));
TLV_SET(rep_tlv, TIPC_TLV_ULTRA_STRING, NULL, str_len);
@@ -2259,255 +2278,91 @@ struct sk_buff *tipc_sk_socks_show(void)
*/
void tipc_sk_reinit(void)
{
+ const struct bucket_table *tbl;
+ struct rhash_head *pos;
+ struct tipc_sock *tsk;
struct tipc_msg *msg;
- u32 ref = 0;
- struct tipc_sock *tsk = tipc_sk_get_next(&ref);
+ int i;
- for (; tsk; tsk = tipc_sk_get_next(&ref)) {
- lock_sock(&tsk->sk);
- msg = &tsk->phdr;
- msg_set_prevnode(msg, tipc_own_addr);
- msg_set_orignode(msg, tipc_own_addr);
- release_sock(&tsk->sk);
- tipc_sk_put(tsk);
+ rcu_read_lock();
+ tbl = rht_dereference_rcu((&tipc_sk_rht)->tbl, &tipc_sk_rht);
+ for (i = 0; i < tbl->size; i++) {
+ rht_for_each_entry_rcu(tsk, pos, tbl, i, node) {
+ spin_lock_bh(&tsk->sk.sk_lock.slock);
+ msg = &tsk->phdr;
+ msg_set_prevnode(msg, tipc_own_addr);
+ msg_set_orignode(msg, tipc_own_addr);
+ spin_unlock_bh(&tsk->sk.sk_lock.slock);
+ }
}
+ rcu_read_unlock();
}
-/**
- * struct reference - TIPC socket reference entry
- * @tsk: pointer to socket associated with reference entry
- * @ref: reference value for socket (combines instance & array index info)
- */
-struct reference {
- struct tipc_sock *tsk;
- u32 ref;
-};
-
-/**
- * struct tipc_ref_table - table of TIPC socket reference entries
- * @entries: pointer to array of reference entries
- * @capacity: array index of first unusable entry
- * @init_point: array index of first uninitialized entry
- * @first_free: array index of first unused socket reference entry
- * @last_free: array index of last unused socket reference entry
- * @index_mask: bitmask for array index portion of reference values
- * @start_mask: initial value for instance value portion of reference values
- */
-struct ref_table {
- struct reference *entries;
- u32 capacity;
- u32 init_point;
- u32 first_free;
- u32 last_free;
- u32 index_mask;
- u32 start_mask;
-};
-
-/* Socket reference table consists of 2**N entries.
- *
- * State Socket ptr Reference
- * ----- ---------- ---------
- * In use non-NULL XXXX|own index
- * (XXXX changes each time entry is acquired)
- * Free NULL YYYY|next free index
- * (YYYY is one more than last used XXXX)
- * Uninitialized NULL 0
- *
- * Entry 0 is not used; this allows index 0 to denote the end of the free list.
- *
- * Note that a reference value of 0 does not necessarily indicate that an
- * entry is uninitialized, since the last entry in the free list could also
- * have a reference value of 0 (although this is unlikely).
- */
-
-static struct ref_table tipc_ref_table;
-
-static DEFINE_RWLOCK(ref_table_lock);
-
-/**
- * tipc_ref_table_init - create reference table for sockets
- */
-int tipc_sk_ref_table_init(u32 req_sz, u32 start)
+static struct tipc_sock *tipc_sk_lookup(u32 portid)
{
- struct reference *table;
- u32 actual_sz;
-
- /* account for unused entry, then round up size to a power of 2 */
-
- req_sz++;
- for (actual_sz = 16; actual_sz < req_sz; actual_sz <<= 1) {
- /* do nothing */
- };
-
- /* allocate table & mark all entries as uninitialized */
- table = vzalloc(actual_sz * sizeof(struct reference));
- if (table == NULL)
- return -ENOMEM;
-
- tipc_ref_table.entries = table;
- tipc_ref_table.capacity = req_sz;
- tipc_ref_table.init_point = 1;
- tipc_ref_table.first_free = 0;
- tipc_ref_table.last_free = 0;
- tipc_ref_table.index_mask = actual_sz - 1;
- tipc_ref_table.start_mask = start & ~tipc_ref_table.index_mask;
+ struct tipc_sock *tsk;
- return 0;
-}
+ rcu_read_lock();
+ tsk = rhashtable_lookup(&tipc_sk_rht, &portid);
+ if (tsk)
+ sock_hold(&tsk->sk);
+ rcu_read_unlock();
-/**
- * tipc_ref_table_stop - destroy reference table for sockets
- */
-void tipc_sk_ref_table_stop(void)
-{
- if (!tipc_ref_table.entries)
- return;
- vfree(tipc_ref_table.entries);
- tipc_ref_table.entries = NULL;
+ return tsk;
}
-/* tipc_ref_acquire - create reference to a socket
- *
- * Register an socket pointer in the reference table.
- * Returns a unique reference value that is used from then on to retrieve the
- * socket pointer, or to determine if the socket has been deregistered.
- */
-u32 tipc_sk_ref_acquire(struct tipc_sock *tsk)
+static int tipc_sk_insert(struct tipc_sock *tsk)
{
- u32 index;
- u32 index_mask;
- u32 next_plus_upper;
- u32 ref = 0;
- struct reference *entry;
-
- if (unlikely(!tsk)) {
- pr_err("Attempt to acquire ref. to non-existent obj\n");
- return 0;
- }
- if (unlikely(!tipc_ref_table.entries)) {
- pr_err("Ref. table not found in acquisition attempt\n");
- return 0;
- }
-
- /* Take a free entry, if available; otherwise initialize a new one */
- write_lock_bh(&ref_table_lock);
- index = tipc_ref_table.first_free;
- entry = &tipc_ref_table.entries[index];
+ u32 remaining = (TIPC_MAX_PORT - TIPC_MIN_PORT) + 1;
+ u32 portid = prandom_u32() % remaining + TIPC_MIN_PORT;
- if (likely(index)) {
- index = tipc_ref_table.first_free;
- entry = &tipc_ref_table.entries[index];
- index_mask = tipc_ref_table.index_mask;
- next_plus_upper = entry->ref;
- tipc_ref_table.first_free = next_plus_upper & index_mask;
- ref = (next_plus_upper & ~index_mask) + index;
- entry->tsk = tsk;
- } else if (tipc_ref_table.init_point < tipc_ref_table.capacity) {
- index = tipc_ref_table.init_point++;
- entry = &tipc_ref_table.entries[index];
- ref = tipc_ref_table.start_mask + index;
+ while (remaining--) {
+ portid++;
+ if ((portid < TIPC_MIN_PORT) || (portid > TIPC_MAX_PORT))
+ portid = TIPC_MIN_PORT;
+ tsk->portid = portid;
+ sock_hold(&tsk->sk);
+ if (rhashtable_lookup_insert(&tipc_sk_rht, &tsk->node))
+ return 0;
+ sock_put(&tsk->sk);
}
- if (ref) {
- entry->ref = ref;
- entry->tsk = tsk;
- }
- write_unlock_bh(&ref_table_lock);
- return ref;
+ return -1;
}
-/* tipc_sk_ref_discard - invalidate reference to an socket
- *
- * Disallow future references to an socket and free up the entry for re-use.
- */
-void tipc_sk_ref_discard(u32 ref)
+static void tipc_sk_remove(struct tipc_sock *tsk)
{
- struct reference *entry;
- u32 index;
- u32 index_mask;
-
- if (unlikely(!tipc_ref_table.entries)) {
- pr_err("Ref. table not found during discard attempt\n");
- return;
- }
-
- index_mask = tipc_ref_table.index_mask;
- index = ref & index_mask;
- entry = &tipc_ref_table.entries[index];
-
- write_lock_bh(&ref_table_lock);
+ struct sock *sk = &tsk->sk;
- if (unlikely(!entry->tsk)) {
- pr_err("Attempt to discard ref. to non-existent socket\n");
- goto exit;
- }
- if (unlikely(entry->ref != ref)) {
- pr_err("Attempt to discard non-existent reference\n");
- goto exit;
+ if (rhashtable_remove(&tipc_sk_rht, &tsk->node)) {
+ WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+ __sock_put(sk);
}
-
- /* Mark entry as unused; increment instance part of entry's
- * reference to invalidate any subsequent references
- */
-
- entry->tsk = NULL;
- entry->ref = (ref & ~index_mask) + (index_mask + 1);
-
- /* Append entry to free entry list */
- if (unlikely(tipc_ref_table.first_free == 0))
- tipc_ref_table.first_free = index;
- else
- tipc_ref_table.entries[tipc_ref_table.last_free].ref |= index;
- tipc_ref_table.last_free = index;
-exit:
- write_unlock_bh(&ref_table_lock);
}
-/* tipc_sk_get - find referenced socket and return pointer to it
- */
-struct tipc_sock *tipc_sk_get(u32 ref)
+int tipc_sk_rht_init(void)
{
- struct reference *entry;
- struct tipc_sock *tsk;
+ struct rhashtable_params rht_params = {
+ .nelem_hint = 192,
+ .head_offset = offsetof(struct tipc_sock, node),
+ .key_offset = offsetof(struct tipc_sock, portid),
+ .key_len = sizeof(u32), /* portid */
+ .hashfn = jhash,
+ .max_shift = 20, /* 1M */
+ .min_shift = 8, /* 256 */
+ .grow_decision = rht_grow_above_75,
+ .shrink_decision = rht_shrink_below_30,
+ };
- if (unlikely(!tipc_ref_table.entries))
- return NULL;
- read_lock_bh(&ref_table_lock);
- entry = &tipc_ref_table.entries[ref & tipc_ref_table.index_mask];
- tsk = entry->tsk;
- if (likely(tsk && (entry->ref == ref)))
- sock_hold(&tsk->sk);
- else
- tsk = NULL;
- read_unlock_bh(&ref_table_lock);
- return tsk;
+ return rhashtable_init(&tipc_sk_rht, &rht_params);
}
-/* tipc_sk_get_next - lock & return next socket after referenced one
-*/
-struct tipc_sock *tipc_sk_get_next(u32 *ref)
+void tipc_sk_rht_destroy(void)
{
- struct reference *entry;
- struct tipc_sock *tsk = NULL;
- uint index = *ref & tipc_ref_table.index_mask;
+ /* Wait for socket readers to complete */
+ synchronize_net();
- read_lock_bh(&ref_table_lock);
- while (++index < tipc_ref_table.capacity) {
- entry = &tipc_ref_table.entries[index];
- if (!entry->tsk)
- continue;
- tsk = entry->tsk;
- sock_hold(&tsk->sk);
- *ref = entry->ref;
- break;
- }
- read_unlock_bh(&ref_table_lock);
- return tsk;
-}
-
-static void tipc_sk_put(struct tipc_sock *tsk)
-{
- sock_put(&tsk->sk);
+ rhashtable_destroy(&tipc_sk_rht);
}
/**
@@ -2829,7 +2684,7 @@ static int __tipc_nl_add_sk(struct sk_buff *skb, struct netlink_callback *cb,
attrs = nla_nest_start(skb, TIPC_NLA_SOCK);
if (!attrs)
goto genlmsg_cancel;
- if (nla_put_u32(skb, TIPC_NLA_SOCK_REF, tsk->ref))
+ if (nla_put_u32(skb, TIPC_NLA_SOCK_REF, tsk->portid))
goto attr_msg_cancel;
if (nla_put_u32(skb, TIPC_NLA_SOCK_ADDR, tipc_own_addr))
goto attr_msg_cancel;
@@ -2859,22 +2714,29 @@ int tipc_nl_sk_dump(struct sk_buff *skb, struct netlink_callback *cb)
{
int err;
struct tipc_sock *tsk;
- u32 prev_ref = cb->args[0];
- u32 ref = prev_ref;
-
- tsk = tipc_sk_get_next(&ref);
- for (; tsk; tsk = tipc_sk_get_next(&ref)) {
- lock_sock(&tsk->sk);
- err = __tipc_nl_add_sk(skb, cb, tsk);
- release_sock(&tsk->sk);
- tipc_sk_put(tsk);
- if (err)
- break;
+ const struct bucket_table *tbl;
+ struct rhash_head *pos;
+ u32 prev_portid = cb->args[0];
+ u32 portid = prev_portid;
+ int i;
- prev_ref = ref;
+ rcu_read_lock();
+ tbl = rht_dereference_rcu((&tipc_sk_rht)->tbl, &tipc_sk_rht);
+ for (i = 0; i < tbl->size; i++) {
+ rht_for_each_entry_rcu(tsk, pos, tbl, i, node) {
+ spin_lock_bh(&tsk->sk.sk_lock.slock);
+ portid = tsk->portid;
+ err = __tipc_nl_add_sk(skb, cb, tsk);
+ spin_unlock_bh(&tsk->sk.sk_lock.slock);
+ if (err)
+ break;
+
+ prev_portid = portid;
+ }
}
+ rcu_read_unlock();
- cb->args[0] = prev_ref;
+ cb->args[0] = prev_portid;
return skb->len;
}
@@ -2962,12 +2824,12 @@ static int __tipc_nl_list_sk_publ(struct sk_buff *skb,
int tipc_nl_publ_dump(struct sk_buff *skb, struct netlink_callback *cb)
{
int err;
- u32 tsk_ref = cb->args[0];
+ u32 tsk_portid = cb->args[0];
u32 last_publ = cb->args[1];
u32 done = cb->args[2];
struct tipc_sock *tsk;
- if (!tsk_ref) {
+ if (!tsk_portid) {
struct nlattr **attrs;
struct nlattr *sock[TIPC_NLA_SOCK_MAX + 1];
@@ -2984,13 +2846,13 @@ int tipc_nl_publ_dump(struct sk_buff *skb, struct netlink_callback *cb)
if (!sock[TIPC_NLA_SOCK_REF])
return -EINVAL;
- tsk_ref = nla_get_u32(sock[TIPC_NLA_SOCK_REF]);
+ tsk_portid = nla_get_u32(sock[TIPC_NLA_SOCK_REF]);
}
if (done)
return 0;
- tsk = tipc_sk_get(tsk_ref);
+ tsk = tipc_sk_lookup(tsk_portid);
if (!tsk)
return -EINVAL;
@@ -2999,9 +2861,9 @@ int tipc_nl_publ_dump(struct sk_buff *skb, struct netlink_callback *cb)
if (!err)
done = 1;
release_sock(&tsk->sk);
- tipc_sk_put(tsk);
+ sock_put(&tsk->sk);
- cb->args[0] = tsk_ref;
+ cb->args[0] = tsk_portid;
cb->args[1] = last_publ;
cb->args[2] = done;
diff --git a/net/tipc/socket.h b/net/tipc/socket.h
index d340893..c7d46d06 100644
--- a/net/tipc/socket.h
+++ b/net/tipc/socket.h
@@ -46,8 +46,8 @@ int tipc_sk_rcv(struct sk_buff *buf);
struct sk_buff *tipc_sk_socks_show(void);
void tipc_sk_mcast_rcv(struct sk_buff *buf);
void tipc_sk_reinit(void);
-int tipc_sk_ref_table_init(u32 requested_size, u32 start);
-void tipc_sk_ref_table_stop(void);
+int tipc_sk_rht_init(void);
+void tipc_sk_rht_destroy(void);
int tipc_nl_sk_dump(struct sk_buff *skb, struct netlink_callback *cb);
int tipc_nl_publ_dump(struct sk_buff *skb, struct netlink_callback *cb);
--
1.7.9.5
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
^ permalink raw reply related
* [PATCH net-next v2 3/7] rhashtable: involve rhashtable_lookup_insert routine
From: Ying Xue @ 2015-01-07 5:41 UTC (permalink / raw)
To: tgraf; +Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
tipc-discussion
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>
Involve a new function called rhashtable_lookup_insert() which makes
lookup and insertion atomic under bucket lock protection, helping us
avoid to introduce an extra lock when we search and insert an object
into hash table.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
include/linux/rhashtable.h | 1 +
lib/rhashtable.c | 97 +++++++++++++++++++++++++++++++++++++-------
2 files changed, 83 insertions(+), 15 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index de1459c7..73c913f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -168,6 +168,7 @@ int rhashtable_shrink(struct rhashtable *ht);
void *rhashtable_lookup(struct rhashtable *ht, const void *key);
void *rhashtable_lookup_compare(struct rhashtable *ht, const void *key,
bool (*compare)(void *, void *), void *arg);
+bool rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj);
void rhashtable_destroy(struct rhashtable *ht);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2000685..4430233 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -505,8 +505,26 @@ static void rhashtable_wakeup_worker(struct rhashtable *ht)
schedule_delayed_work(&ht->run_work, 0);
}
+static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
+ struct bucket_table *tbl, u32 hash)
+{
+ struct rhash_head *head = rht_dereference_bucket(tbl->buckets[hash],
+ tbl, hash);
+
+ if (rht_is_a_nulls(head))
+ INIT_RHT_NULLS_HEAD(obj->next, ht, hash);
+ else
+ RCU_INIT_POINTER(obj->next, head);
+
+ rcu_assign_pointer(tbl->buckets[hash], obj);
+
+ atomic_inc(&ht->nelems);
+
+ rhashtable_wakeup_worker(ht);
+}
+
/**
- * rhashtable_insert - insert object into hash hash table
+ * rhashtable_insert - insert object into hash table
* @ht: hash table
* @obj: pointer to hash head inside object
*
@@ -523,7 +541,6 @@ static void rhashtable_wakeup_worker(struct rhashtable *ht)
void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
{
struct bucket_table *tbl;
- struct rhash_head *head;
spinlock_t *lock;
unsigned hash;
@@ -534,19 +551,9 @@ void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
lock = bucket_lock(tbl, hash);
spin_lock_bh(lock);
- head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
- if (rht_is_a_nulls(head))
- INIT_RHT_NULLS_HEAD(obj->next, ht, hash);
- else
- RCU_INIT_POINTER(obj->next, head);
-
- rcu_assign_pointer(tbl->buckets[hash], obj);
+ __rhashtable_insert(ht, obj, tbl, hash);
spin_unlock_bh(lock);
- atomic_inc(&ht->nelems);
-
- rhashtable_wakeup_worker(ht);
-
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(rhashtable_insert);
@@ -560,7 +567,7 @@ EXPORT_SYMBOL_GPL(rhashtable_insert);
* walk the bucket chain upon removal. The removal operation is thus
* considerable slow if the hash table is not correctly sized.
*
- * Will automatically shrink the table via rhashtable_expand() if the the
+ * Will automatically shrink the table via rhashtable_expand() if the
* shrink_decision function specified at rhashtable_init() returns true.
*
* The caller must ensure that no concurrent table mutations occur. It is
@@ -641,7 +648,7 @@ static bool rhashtable_compare(void *ptr, void *arg)
* for a entry with an identical key. The first matching entry is returned.
*
* This lookup function may only be used for fixed key hash table (key_len
- * paramter set). It will BUG() if used inappropriately.
+ * parameter set). It will BUG() if used inappropriately.
*
* Lookups may occur in parallel with hashtable mutations and resizing.
*/
@@ -702,6 +709,66 @@ restart:
}
EXPORT_SYMBOL_GPL(rhashtable_lookup_compare);
+/**
+ * rhashtable_lookup_insert - lookup and insert object into hash table
+ * @ht: hash table
+ * @obj: pointer to hash head inside object
+ *
+ * Locks down the bucket chain in both the old and new table if a resize
+ * is in progress to ensure that writers can't remove from the old table
+ * and can't insert to the new table during the atomic operation of search
+ * and insertion. Searches for duplicates in both the old and new table if
+ * a resize is in progress.
+ *
+ * This lookup function may only be used for fixed key hash table (key_len
+ * parameter set). It will BUG() if used inappropriately.
+ *
+ * It is safe to call this function from atomic context.
+ *
+ * Will trigger an automatic deferred table resizing if the size grows
+ * beyond the watermark indicated by grow_decision() which can be passed
+ * to rhashtable_init().
+ */
+bool rhashtable_lookup_insert(struct rhashtable *ht, struct rhash_head *obj)
+{
+ struct bucket_table *new_tbl, *old_tbl;
+ spinlock_t *new_bucket_lock, *old_bucket_lock;
+ u32 new_hash, old_hash;
+ bool success = true;
+
+ BUG_ON(!ht->p.key_len);
+
+ rcu_read_lock();
+
+ old_tbl = rht_dereference_rcu(ht->tbl, ht);
+ old_hash = head_hashfn(ht, old_tbl, obj);
+ old_bucket_lock = bucket_lock(old_tbl, old_hash);
+ spin_lock_bh(old_bucket_lock);
+
+ new_tbl = rht_dereference_rcu(ht->future_tbl, ht);
+ new_hash = head_hashfn(ht, new_tbl, obj);
+ new_bucket_lock = bucket_lock(new_tbl, new_hash);
+ if (unlikely(old_tbl != new_tbl))
+ spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
+
+ if (rhashtable_lookup(ht, rht_obj(ht, obj) + ht->p.key_offset)) {
+ success = false;
+ goto exit;
+ }
+
+ __rhashtable_insert(ht, obj, new_tbl, new_hash);
+
+exit:
+ if (unlikely(old_tbl != new_tbl))
+ spin_unlock_bh(new_bucket_lock);
+ spin_unlock_bh(old_bucket_lock);
+
+ rcu_read_unlock();
+
+ return success;
+}
+EXPORT_SYMBOL_GPL(rhashtable_lookup_insert);
+
static size_t rounded_hashtable_size(struct rhashtable_params *params)
{
return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
--
1.7.9.5
^ permalink raw reply related
* Re: [net v2 2/3] i40e: Fix Rx checksum error counter
From: Tom Herbert @ 2015-01-07 5:43 UTC (permalink / raw)
To: Jeff Kirsher
Cc: David Miller, Anjali Singhai, Linux Netdev List, nhorman,
sassmann, jogreene, Greg Rose
In-Reply-To: <1420594317-6191-3-git-send-email-jeffrey.t.kirsher@intel.com>
On Tue, Jan 6, 2015 at 5:31 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> From: Anjali Singhai <anjali.singhai@intel.com>
>
> The Rx port checksum error counter was incrementing incorrectly with
> UDP encapsulated tunneled traffic. This patch fixes the problem so that
> the port_rx_csum counter will show accurate statistics.
>
> Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Tested-by: Jim Young <james.m.young@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index f145aaf..38c7638 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1325,9 +1325,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
> * so the total length of IPv4 header is IHL*4 bytes
> * The UDP_0 bit *may* bet set if the *inner* header is UDP
> */
> - if (ipv4_tunnel &&
> - (decoded.inner_prot != I40E_RX_PTYPE_INNER_PROT_UDP) &&
> - !(rx_status & (1 << I40E_RX_DESC_STATUS_UDP_0_SHIFT))) {
> + if (ipv4_tunnel) {
> skb->transport_header = skb->mac_header +
> sizeof(struct ethhdr) +
> (ip_hdr(skb)->ihl * 4);
> @@ -1337,15 +1335,19 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
> skb->protocol == htons(ETH_P_8021AD))
> ? VLAN_HLEN : 0;
>
> - rx_udp_csum = udp_csum(skb);
> - iph = ip_hdr(skb);
> - csum = csum_tcpudp_magic(
> - iph->saddr, iph->daddr,
> - (skb->len - skb_transport_offset(skb)),
> - IPPROTO_UDP, rx_udp_csum);
> + if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
> + (udp_hdr(skb)->check != 0)) {
> + rx_udp_csum = udp_csum(skb);
Doesn't this compute the whole checksum of the packet making the fact
that device verified inner checksum pretty much irrelevant? It would
probably be just as well to return CHECKSUM_NONE and let the stack
deal with it and remove all this complexity.
> + iph = ip_hdr(skb);
> + csum = csum_tcpudp_magic(
> + iph->saddr, iph->daddr,
> + (skb->len - skb_transport_offset(skb)),
> + IPPROTO_UDP, rx_udp_csum);
>
> - if (udp_hdr(skb)->check != csum)
> - goto checksum_fail;
> + if (udp_hdr(skb)->check != csum)
> + goto checksum_fail;
> +
> + } /* else its GRE and so no outer UDP header */
> }
>
> skb->ip_summed = CHECKSUM_UNNECESSARY;
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Fan Du @ 2015-01-07 5:58 UTC (permalink / raw)
To: Jesse Gross
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Michael S. Tsirkin,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Wang,
Du, Fan, fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <CAEP_g=8bCR=PeSoi09jLWLtNUrxhzx45h1Wm=9D=R57AqUac2w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
于 2015年01月07日 03:11, Jesse Gross 写道:
>>> One of the reasons for only doing path MTU discovery
>>> >>for L3 is that it operates seamlessly as part of normal operation -
>>> >>there is no need to forge addresses or potentially generate ICMP when
>>> >>on an L2 network. However, this ignores the IP handling that is going
>>> >>on (note that in OVS it is possible for L3 to be implemented as a set
>>> >>of flows coming from a controller).
>>> >>
>>> >>It also should not be VXLAN specific or duplicate VXLAN encapsulation
>>> >>code. As this is happening before encapsulation, the generated ICMP
>>> >>does not need to be encapsulated either if it is created in the right
>>> >>location.
>> >
>> >Yes, I agree. GRE share the same issue from the code flow.
>> >Pushing back ICMP msg back without encapsulation without circulating down
>> >to physical device is possible. The "right location" as far as I know
>> >could only be in ovs_vport_send. In addition this probably requires wrapper
>> >route looking up operation for GRE/VXLAN, after get the under layer device
>> >MTU
>> >from the routing information, then calculate reduced MTU becomes feasible.
> As I said, it needs to be integrated into L3 processing. In OVS this
> would mean adding some primitives to the kernel and then exposing the
> functionality upwards into userspace/controller.
I'm bit of confused with "L3 processing" you mentioned here... SORRY
Apparently I'm not seeing the whole picture as you pointed out. Could you please
elaborate "L3 processing" a bit more? docs/codes/or other useful links. Appreciated.
My understanding is:
controller sets the forwarding rules into kernel datapath, any flow not matching
with the rules are threw to controller by upcall. Once the rule decision is made
by controller, then, this flow packet is pushed down to datapath to be forwarded
again according to the new rule.
So I'm not sure whether pushing the over-MTU-sized packet or pushing the forged ICMP
without encapsulation to controller is required by current ovs implementation. By doing
so, such over-MTU-sized packet is treated as a event for the controller to be take
care of.
--
No zuo no die but I have to try.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Pravin Shelar @ 2015-01-07 6:00 UTC (permalink / raw)
To: Daniele Di Proietto; +Cc: David Miller, netdev
In-Reply-To: <CAExiqTLJG38DX_6C2Gaf6jE8-GThznML_-nkzs+57fU72QPW7w@mail.gmail.com>
On Tue, Jan 6, 2015 at 3:32 PM, Daniele Di Proietto
<daniele.di.proietto@gmail.com> wrote:
> The motivation for the change is to make datapath.c independent from
> the actual implementation of the vport. The problem came up when
> experimenting with other vport implementations and this type of change
> will help identifying layering violations.
> You are perfectly right, however, that in this form the code does not
> improve much: we should rather provide a vport_index() call, and
> implement one in each of the vports.
>
This sounds like lot more invasive change compared to the current
patch. For such change I need to see complete set of changes that you
are planning.
> Thoughts?
>
> 2015-01-06 23:28 GMT+01:00 Pravin Shelar <pshelar@nicira.com>:
>> On Tue, Jan 6, 2015 at 2:15 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>>> On Tue, Jan 6, 2015 at 2:02 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Pravin Shelar <pshelar@nicira.com>
>>>> Date: Tue, 6 Jan 2015 13:16:11 -0800
>>>>
>>>>> Function return type and function name should be on same line,
>>>>> otherwise looks good.
>>>>
>>>> I disagree, where is the code in the tree that needs this?
>>>
>>> Most of function definitions that I have seen are defined like this. I
>>> was pointing out coding style issue.
>>
>> About the actual change, I think it is a cleanup. netdev_vport_index()
>> hides the implementation from datapath.c. I hope Daniele will explain
>> need for the change.
^ permalink raw reply
* Re: [PATCH] brcm80211: brcmsmac: dma: Remove some unused functions
From: Julia Lawall @ 2015-01-07 6:29 UTC (permalink / raw)
To: Rickard Strandqvist
Cc: Arend van Spriel, Kalle Valo, Larry Finger, Brett Rudley,
Hante Meuleman, Fabian Frederick, linux-wireless@vger.kernel.org,
brcm80211-dev-list, Network Development,
Linux Kernel Mailing List
In-Reply-To: <CAKXHbyNBV6=DPF4n0onzhEqAuS-PqLS5pwZj_OitRuYMvgpjvw@mail.gmail.com>
On Wed, 7 Jan 2015, Rickard Strandqvist wrote:
> 2015-01-05 12:06 GMT+01:00 Arend van Spriel <arend@broadcom.com>:
> > On 01/05/15 11:49, Kalle Valo wrote:
> >>
> >> Rickard Strandqvist<rickard_strandqvist@spectrumdigital.se> writes:
> >>
> >>> As I hope you can see I have made some changes regarding the
> >>> subject-line. Thought it was an advantage to be able to see which file
> >>> I actually removed something from. There seems to be a big focus on
> >>> getting right on subject-line right in recent weeks.
> >>>
> >>> I wonder why there is a script that takes a file name, and respond
> >>> with an appropriate subject line?
> >
> >
> > Is there a script for this? Anyway, I would say driver name is enough.
> > Enough about the subject line ;-) I would like to give some general remarks
> > as you seem to touch a lot of kernel code. First off, I think it is good to
> > remove unused stuff. However, I would like some more explanation on your
> > methodology apart from "partially found by using a static code analysis
> > program". So a cover-letter explaining that would have been nice (maybe
> > still is). Things like Kconfig option can affect whether function are used
> > or not so how did you cover that.
> >
> > Regards,
> > Arend
> >
> >
> >> I don't think you can really automate this as some drivers do this a bit
> >> differently. You always need to manually check the commit log.
> >>
> >>> But ok, I change my script accordingly. Should I submit the patch again?
> >>
> >>
> >> Yes, please resubmit.
> >>
> >
>
> Hi Arend
>
> Yes, a script that had been excellent, I think!
> I have one as part of my git send-email script, until a week ago, it
> was enough that I removed the "drivers/" and changed all "/" to ": "
> I have now been expanded my sed pipe a lot (tell me if anyone is interested)
> But now I've seen everything from uppercase and [DIR], etc.
> So I can not understand how anyone should be able to get the right
> name without a good help.
>
> Sure i like to share how I use cppcheck, but is very hesitant to write
> this with each patch mails I send though!
>
> I run:
> cppcheck --force --quiet --enable=all .
>
> Or a specific file instead of .
>
> This will include, among other things get a lot of error message such,
> +4000 for the kernel.
> (style) The function 'xxx' is never used
>
> For these I made a script that searched through all the files after
> the function name (cppcheck missed a few). And save the rest so I go
> through them and possibly send patches.
I think that the question was about what methodology is cppcheck using to
find the given issue. But probably cppcheck is a black box that does
whatever it does, so the user doesn't know what the rationale is.
However, I think you mentioned that cppcheck found only some of the
issues. You could thus describe what was the methodology for finding the
other ones.
julia
^ permalink raw reply
* Re: ipv6: oops in datagram.c line 260
From: Steffen Klassert @ 2015-01-07 7:22 UTC (permalink / raw)
To: Hannes Frederic Sowa; +Cc: Chris Ruehl, netdev, davem
In-Reply-To: <1420560073.32369.60.camel@redhat.com>
On Tue, Jan 06, 2015 at 05:01:13PM +0100, Hannes Frederic Sowa wrote:
>
> xfrm6_output_finish unconditionally resets skb->protocol so we try to
> dispatch to the IPv6 handler, even though tcp just sends an IPv4 packet.
Maybe we better dispatch based on sk->sk_family, this should give
always the right address family of the socket.
^ permalink raw reply
* My name is Gatan Magsino
From: Sanders M, Tjeu @ 2015-01-07 8:00 UTC (permalink / raw)
My name is Gatan Magsino, I work with Mediterranean Bank in Malta. Can i trust you with a business worth 8.3 million USD? Please reply ONLY to my private email: mgatan@rogers.com<mailto:mgatan@rogers.com> for more information.
^ permalink raw reply
* Re: [PATCH v8 34/50] vhost/net: virtio 1.0 byte swap
From: Greg Kurz @ 2015-01-07 8:31 UTC (permalink / raw)
To: Alex Williamson
Cc: thuth, kvm, Michael S. Tsirkin, rusty, netdev, linux-kernel,
virtualization, dahi, pbonzini, David Miller
In-Reply-To: <1420588530.3541.116.camel@redhat.com>
On Tue, 06 Jan 2015 16:55:30 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:
> On Mon, 2014-12-01 at 18:05 +0200, Michael S. Tsirkin wrote:
> > I had to add an explicit tag to suppress compiler warning:
> > gcc isn't smart enough to notice that
> > len is always initialized since function is called with size > 0.
>
> I'm getting a panic inside a guest when this change is applied on the
> host. I identified this patch via bisect and confirmed by reverting it
> from v3.19-rc2. Guest is centos6. Thanks,
>
> Alex
>
> commit 8b38694a2dc8b18374310df50174f1e4376d6824
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date: Fri Oct 24 14:19:48 2014 +0300
>
> vhost/net: virtio 1.0 byte swap
>
> I had to add an explicit tag to suppress compiler warning:
> gcc isn't smart enough to notice that
> len is always initialized since function is called with size > 0.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>
This chunk looks suspicious:
- heads[headcount - 1].len += datalen;
+ heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
s/len - datalen/len + datalen/ ?
> XML chunk:
>
> <interface type='direct'>
> <mac address='52:54:00:64:f3:34'/>
> <source dev='iscsinet0' mode='bridge'/>
> <model type='virtio'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
> </interface>
>
> Panic log:
>
> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> <1>IP: [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> <4>PGD 1aa2f4067 PUD 1aa2f5067 PMD 0
> <4>Oops: 0000 [#1] SMP
> <4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth9/ifindex
> <4>CPU 0
> <4>Modules linked in: 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc igbvf nvidia(P)(U) i2c_core tg3 ptp pps_core virtio_balloon virtio_net virtio_console ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> <4>
> <4>Pid: 1374, comm: NetworkManager Tainted: P --------------- 2.6.32-431.23.3.el6.centos.plus.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
> <4>RIP: 0010:[<ffffffffa0079469>] [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> <4>RSP: 0018:ffff880028203e48 EFLAGS: 00010246
> <4>RAX: ffff8801a3383d00 RBX: ffff8801a6aaf480 RCX: ffff8801aa20b6e0
> <4>RDX: 00000000000000c0 RSI: ffff8801a3383c00 RDI: ffff8801a3383cc0
> <4>RBP: ffff880028203ed8 R08: 000000000000009e R09: ffff8801aa1d800c
> <4>R10: 0000000000000218 R11: 0000000000000000 R12: ffff8801aa20b6e0
> <4>R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> <4>FS: 00007febf114d800(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>CR2: 0000000000000010 CR3: 00000001aa793000 CR4: 00000000000006f0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process NetworkManager (pid: 1374, threadinfo ffff8801a74ba000, task ffff8801a8d56040)
> <4>Stack:
> <4> ffff8801aa1d8000 000000000000009e ffff8801aa20b6e0 ffff8801aa20b718
> <4><d> ffff8801aa20b780 ffff8801aa1d800c ffff8801a6aaf4b8 ffff8801aa20b020
> <4><d> 0000000000000080 ffff8801aa20b708 0000000000000001 00001f5981a830c8
> <4>Call Trace:
> <4> <IRQ>
> <4> [<ffffffff8146ae33>] net_rx_action+0x103/0x2f0
> <4> [<ffffffff8107a5f1>] __do_softirq+0xc1/0x1e0
> <4> [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
> <4> [<ffffffff8100c30c>] call_softirq+0x1c/0x30
> <4> <EOI>
> <4> [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
> <4> [<ffffffff8107b2ea>] local_bh_enable+0x9a/0xb0
> <4> [<ffffffffa007813a>] virtnet_napi_enable+0x4a/0x60 [virtio_net]
> <4> [<ffffffffa0078ebf>] virtnet_open+0x4f/0x60 [virtio_net]
> <4> [<ffffffff81467691>] dev_open+0xa1/0x100
> <4> [<ffffffff81466751>] dev_change_flags+0xa1/0x1d0
> <4> [<ffffffff81474a59>] do_setlink+0x169/0x8b0
> <4> [<ffffffff814770b6>] ? rtnl_fill_ifinfo+0x946/0xcb0
> <4> [<ffffffff812a3d24>] ? nla_parse+0x34/0x110
> <4> [<ffffffff8147659e>] rtnl_setlink+0xee/0x130
> <4> [<ffffffff81475b67>] rtnetlink_rcv_msg+0x2d7/0x340
> <4> [<ffffffff81231e14>] ? socket_has_perm+0x74/0x90
> <4> [<ffffffff81475890>] ? rtnetlink_rcv_msg+0x0/0x340
> <4> [<ffffffff814910a9>] netlink_rcv_skb+0xa9/0xd0
> <4> [<ffffffff81475875>] rtnetlink_rcv+0x25/0x40
> <4> [<ffffffff81490cdb>] netlink_unicast+0x2db/0x320
> <4> [<ffffffff81491750>] netlink_sendmsg+0x2c0/0x3d0
> <4> [<ffffffff814520c3>] sock_sendmsg+0x123/0x150
> <4> [<ffffffff81453d73>] ? sock_recvmsg+0x133/0x160
> <4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
> <4> [<ffffffff81136941>] ? lru_cache_add_lru+0x21/0x40
> <4> [<ffffffff8115522d>] ? page_add_new_anon_rmap+0x9d/0xf0
> <4> [<ffffffff8114aeef>] ? handle_pte_fault+0x4af/0xb00
> <4> [<ffffffff81451f14>] ? move_addr_to_kernel+0x64/0x70
> <4> [<ffffffff814538b6>] __sys_sendmsg+0x406/0x420
> <4> [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
> <4> [<ffffffff814523d9>] ? sys_sendto+0x139/0x190
> <4> [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
> <4> [<ffffffff81453ad9>] sys_sendmsg+0x49/0x90
> <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> <4>Code: 83 e0 00 00 00 00 10 00 00 48 03 93 d0 00 00 00 66 83 42 04 01 8b 93 cc 00 00 00 48 8b b3 d0 00 00 00 80 4c 16 10 20 44 2b 68 0c <4d> 8b 76 10 75 89 e9 d1 fd ff ff 0f 1f 40 00 a8 02 74 0d 0f b6
> <1>RIP [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> <4> RSP <ffff880028203e48>
> <4>CR2: 0000000000000010
>
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
^ permalink raw reply
* Re: [PATCH] sh_eth: Fix access to TRSCER register
From: Geert Uytterhoeven @ 2015-01-07 8:42 UTC (permalink / raw)
To: Nobuhiro Iwamatsu
Cc: netdev@vger.kernel.org, Yoshihiro Shimoda, Linux-sh list
In-Reply-To: <1420608947-6861-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>
Hi Iwamatsu-san,
On Wed, Jan 7, 2015 at 6:35 AM, Nobuhiro Iwamatsu
<nobuhiro.iwamatsu.yj@renesas.com> wrote:
> TRSCER register is configured differently by SoCs. TRSCER of R-Car is
> RINT8 bit only valid, other bits are reserved bits.
> This removes access to TRSCER register reserve bit.
>
> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
> ---
> drivers/net/ethernet/renesas/sh_eth.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index c29ba80..59ee457 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1294,7 +1294,11 @@ static int sh_eth_dev_init(struct net_device *ndev, bool start)
> /* Frame recv control (enable multiple-packets per rx irq) */
> sh_eth_write(ndev, RMCR_RNC, RMCR);
>
> - sh_eth_write(ndev, DESC_I_RINT8 | DESC_I_RINT5 | DESC_I_TINT2, TRSCER);
> + if (mdp->cd->register_type == SH_ETH_REG_FAST_RCAR)
This catches both the R-Car Gen1 and R-Car Gen2 cases.
According to the datasheets, r8a7778/9 do have the other bits?
Only R-Car Gen2 doesn't have them.
> + sh_eth_write(ndev, DESC_I_RINT8, TRSCER);
> + else
> + sh_eth_write(ndev, DESC_I_RINT8 | DESC_I_RINT5 | DESC_I_TINT2,
> + TRSCER);
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH] vhost/net: length miscalculation
From: Michael S. Tsirkin @ 2015-01-07 8:55 UTC (permalink / raw)
To: linux-kernel; +Cc: Alex Williamson, Greg Kurz, kvm, virtualization, netdev
commit 8b38694a2dc8b18374310df50174f1e4376d6824
vhost/net: virtio 1.0 byte swap
had this chunk:
- heads[headcount - 1].len += datalen;
+ heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
This adds datalen with the wrong sign, causing guest panics.
Fixes: 8b38694a2dc8b18374310df50174f1e4376d6824
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Alex, could you please confirm this fixes the crash for you?
drivers/vhost/net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 14419a8..d415d69 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -538,7 +538,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
++headcount;
seg += in;
}
- heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
+ heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
*iovcount = seg;
if (unlikely(log))
*log_num = nlogs;
--
MST
^ permalink raw reply related
* Re: [PATCH v8 34/50] vhost/net: virtio 1.0 byte swap
From: Michael S. Tsirkin @ 2015-01-07 8:57 UTC (permalink / raw)
To: Greg Kurz
Cc: Alex Williamson, thuth, kvm, rusty, netdev, linux-kernel,
virtualization, dahi, pbonzini, David Miller
In-Reply-To: <20150107093105.6ef918f1@bahia.local>
On Wed, Jan 07, 2015 at 09:31:05AM +0100, Greg Kurz wrote:
> On Tue, 06 Jan 2015 16:55:30 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
>
> > On Mon, 2014-12-01 at 18:05 +0200, Michael S. Tsirkin wrote:
> > > I had to add an explicit tag to suppress compiler warning:
> > > gcc isn't smart enough to notice that
> > > len is always initialized since function is called with size > 0.
> >
> > I'm getting a panic inside a guest when this change is applied on the
> > host. I identified this patch via bisect and confirmed by reverting it
> > from v3.19-rc2. Guest is centos6. Thanks,
> >
> > Alex
> >
> > commit 8b38694a2dc8b18374310df50174f1e4376d6824
> > Author: Michael S. Tsirkin <mst@redhat.com>
> > Date: Fri Oct 24 14:19:48 2014 +0300
> >
> > vhost/net: virtio 1.0 byte swap
> >
> > I had to add an explicit tag to suppress compiler warning:
> > gcc isn't smart enough to notice that
> > len is always initialized since function is called with size > 0.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >
>
> This chunk looks suspicious:
>
> - heads[headcount - 1].len += datalen;
> + heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
>
> s/len - datalen/len + datalen/ ?
Indeed!
I just sent a patch fixing this, thanks a lot.
> > XML chunk:
> >
> > <interface type='direct'>
> > <mac address='52:54:00:64:f3:34'/>
> > <source dev='iscsinet0' mode='bridge'/>
> > <model type='virtio'/>
> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
> > </interface>
> >
> > Panic log:
> >
> > <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > <1>IP: [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> > <4>PGD 1aa2f4067 PUD 1aa2f5067 PMD 0
> > <4>Oops: 0000 [#1] SMP
> > <4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth9/ifindex
> > <4>CPU 0
> > <4>Modules linked in: 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc igbvf nvidia(P)(U) i2c_core tg3 ptp pps_core virtio_balloon virtio_net virtio_console ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> > <4>
> > <4>Pid: 1374, comm: NetworkManager Tainted: P --------------- 2.6.32-431.23.3.el6.centos.plus.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
> > <4>RIP: 0010:[<ffffffffa0079469>] [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> > <4>RSP: 0018:ffff880028203e48 EFLAGS: 00010246
> > <4>RAX: ffff8801a3383d00 RBX: ffff8801a6aaf480 RCX: ffff8801aa20b6e0
> > <4>RDX: 00000000000000c0 RSI: ffff8801a3383c00 RDI: ffff8801a3383cc0
> > <4>RBP: ffff880028203ed8 R08: 000000000000009e R09: ffff8801aa1d800c
> > <4>R10: 0000000000000218 R11: 0000000000000000 R12: ffff8801aa20b6e0
> > <4>R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > <4>FS: 00007febf114d800(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> > <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>CR2: 0000000000000010 CR3: 00000001aa793000 CR4: 00000000000006f0
> > <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > <4>Process NetworkManager (pid: 1374, threadinfo ffff8801a74ba000, task ffff8801a8d56040)
> > <4>Stack:
> > <4> ffff8801aa1d8000 000000000000009e ffff8801aa20b6e0 ffff8801aa20b718
> > <4><d> ffff8801aa20b780 ffff8801aa1d800c ffff8801a6aaf4b8 ffff8801aa20b020
> > <4><d> 0000000000000080 ffff8801aa20b708 0000000000000001 00001f5981a830c8
> > <4>Call Trace:
> > <4> <IRQ>
> > <4> [<ffffffff8146ae33>] net_rx_action+0x103/0x2f0
> > <4> [<ffffffff8107a5f1>] __do_softirq+0xc1/0x1e0
> > <4> [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
> > <4> [<ffffffff8100c30c>] call_softirq+0x1c/0x30
> > <4> <EOI>
> > <4> [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
> > <4> [<ffffffff8107b2ea>] local_bh_enable+0x9a/0xb0
> > <4> [<ffffffffa007813a>] virtnet_napi_enable+0x4a/0x60 [virtio_net]
> > <4> [<ffffffffa0078ebf>] virtnet_open+0x4f/0x60 [virtio_net]
> > <4> [<ffffffff81467691>] dev_open+0xa1/0x100
> > <4> [<ffffffff81466751>] dev_change_flags+0xa1/0x1d0
> > <4> [<ffffffff81474a59>] do_setlink+0x169/0x8b0
> > <4> [<ffffffff814770b6>] ? rtnl_fill_ifinfo+0x946/0xcb0
> > <4> [<ffffffff812a3d24>] ? nla_parse+0x34/0x110
> > <4> [<ffffffff8147659e>] rtnl_setlink+0xee/0x130
> > <4> [<ffffffff81475b67>] rtnetlink_rcv_msg+0x2d7/0x340
> > <4> [<ffffffff81231e14>] ? socket_has_perm+0x74/0x90
> > <4> [<ffffffff81475890>] ? rtnetlink_rcv_msg+0x0/0x340
> > <4> [<ffffffff814910a9>] netlink_rcv_skb+0xa9/0xd0
> > <4> [<ffffffff81475875>] rtnetlink_rcv+0x25/0x40
> > <4> [<ffffffff81490cdb>] netlink_unicast+0x2db/0x320
> > <4> [<ffffffff81491750>] netlink_sendmsg+0x2c0/0x3d0
> > <4> [<ffffffff814520c3>] sock_sendmsg+0x123/0x150
> > <4> [<ffffffff81453d73>] ? sock_recvmsg+0x133/0x160
> > <4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
> > <4> [<ffffffff81136941>] ? lru_cache_add_lru+0x21/0x40
> > <4> [<ffffffff8115522d>] ? page_add_new_anon_rmap+0x9d/0xf0
> > <4> [<ffffffff8114aeef>] ? handle_pte_fault+0x4af/0xb00
> > <4> [<ffffffff81451f14>] ? move_addr_to_kernel+0x64/0x70
> > <4> [<ffffffff814538b6>] __sys_sendmsg+0x406/0x420
> > <4> [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
> > <4> [<ffffffff814523d9>] ? sys_sendto+0x139/0x190
> > <4> [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
> > <4> [<ffffffff81453ad9>] sys_sendmsg+0x49/0x90
> > <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> > <4>Code: 83 e0 00 00 00 00 10 00 00 48 03 93 d0 00 00 00 66 83 42 04 01 8b 93 cc 00 00 00 48 8b b3 d0 00 00 00 80 4c 16 10 20 44 2b 68 0c <4d> 8b 76 10 75 89 e9 d1 fd ff ff 0f 1f 40 00 a8 02 74 0d 0f b6
> > <1>RIP [<ffffffffa0079469>] virtnet_poll+0x4f9/0x910 [virtio_net]
> > <4> RSP <ffff880028203e48>
> > <4>CR2: 0000000000000010
> >
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox