* [PATCH net-next 1/8] tipc: factor stats struct out of the larger link struct
From: Paul Gortmaker @ 2012-07-12 16:39 UTC (permalink / raw)
To: davem; +Cc: netdev, Jon Maloy, Erik Hugne, ying.xue, Paul Gortmaker
In-Reply-To: <1342111201-9426-1-git-send-email-paul.gortmaker@windriver.com>
This is done to improve readability, and so that we can give
the struct a name that will allow us to declare a local
pointer to it in code, instead of having to always redirect
through the link struct to get to it.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
net/tipc/link.h | 62 ++++++++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 30 deletions(-)
diff --git a/net/tipc/link.h b/net/tipc/link.h
index d6a60a9..8024a56 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -63,6 +63,37 @@
*/
#define MAX_PKT_DEFAULT 1500
+struct tipc_stats {
+ u32 sent_info; /* used in counting # sent packets */
+ u32 recv_info; /* used in counting # recv'd packets */
+ u32 sent_states;
+ u32 recv_states;
+ u32 sent_probes;
+ u32 recv_probes;
+ u32 sent_nacks;
+ u32 recv_nacks;
+ u32 sent_acks;
+ u32 sent_bundled;
+ u32 sent_bundles;
+ u32 recv_bundled;
+ u32 recv_bundles;
+ u32 retransmitted;
+ u32 sent_fragmented;
+ u32 sent_fragments;
+ u32 recv_fragmented;
+ u32 recv_fragments;
+ u32 link_congs; /* # port sends blocked by congestion */
+ u32 bearer_congs;
+ u32 deferred_recv;
+ u32 duplicates;
+ u32 max_queue_sz; /* send queue size high water mark */
+ u32 accu_queue_sz; /* used for send queue size profiling */
+ u32 queue_sz_counts; /* used for send queue size profiling */
+ u32 msg_length_counts; /* used for message length profiling */
+ u32 msg_lengths_total; /* used for message length profiling */
+ u32 msg_length_profile[7]; /* used for msg. length profiling */
+};
+
/**
* struct tipc_link - TIPC link data structure
* @addr: network address of link's peer node
@@ -175,36 +206,7 @@ struct tipc_link {
struct sk_buff *defragm_buf;
/* Statistics */
- struct {
- u32 sent_info; /* used in counting # sent packets */
- u32 recv_info; /* used in counting # recv'd packets */
- u32 sent_states;
- u32 recv_states;
- u32 sent_probes;
- u32 recv_probes;
- u32 sent_nacks;
- u32 recv_nacks;
- u32 sent_acks;
- u32 sent_bundled;
- u32 sent_bundles;
- u32 recv_bundled;
- u32 recv_bundles;
- u32 retransmitted;
- u32 sent_fragmented;
- u32 sent_fragments;
- u32 recv_fragmented;
- u32 recv_fragments;
- u32 link_congs; /* # port sends blocked by congestion */
- u32 bearer_congs;
- u32 deferred_recv;
- u32 duplicates;
- u32 max_queue_sz; /* send queue size high water mark */
- u32 accu_queue_sz; /* used for send queue size profiling */
- u32 queue_sz_counts; /* used for send queue size profiling */
- u32 msg_length_counts; /* used for message length profiling */
- u32 msg_lengths_total; /* used for message length profiling */
- u32 msg_length_profile[7]; /* used for msg. length profiling */
- } stats;
+ struct tipc_stats stats;
};
struct tipc_port;
--
1.7.9.7
^ permalink raw reply related
* [PATCH net-next 0/8] tipc: kill off struct print_buf/log
From: Paul Gortmaker @ 2012-07-12 16:39 UTC (permalink / raw)
To: davem; +Cc: netdev, Jon Maloy, Erik Hugne, ying.xue, Paul Gortmaker
Dave,
The main thrust of what happens in this series is to deal with
the request you'd made a while ago about making it so TIPC did
not have its own specific (and complex) logging infrastructure.
It used to have tipc_printf taking an arg of this struct print_buf
thing, with a length field in it. There was also a function to
set/validate string lengths etc. So the approach was to first
wean off as many users of tipc_printf as possible (e.g. by deleting
old debug code, etc.) and then changing tipc_printf into something
much simpler that conveyed string lengths via normal return values
one would expect from snprintf-like functions. Finally, with the
core code no longer using the print_buf struct, it and the log
code associated with it are removed.
A side bonus is that we get rid of a couple TIPC related Kconfig
options along the way, and about 600 lines of code too.
The folks at Ericsson did most of the work here, and I just
refactored things a bit and made a few suggestions here and there.
I've also run the server/client tests on this series atop of the
net-next baseline of 48ee3569f "ipv6: Move ipv6 twsk accessors
outside of CONFIG_IPV6 ifdefs."
If there no obvious problems spotted by anyone in the next day
or so, I would like to issue a pull request for these.
Thanks,
Paul.
---
Erik Hugne (5):
tipc: use standard printk shortcut macros (pr_err etc.)
tipc: remove TIPC packet debugging functions and macros
tipc: simplify print buffer handling in tipc_printf
tipc: phase out most of the struct print_buf usage
tipc: remove print_buf and deprecated log buffer code
Paul Gortmaker (3):
tipc: factor stats struct out of the larger link struct
tipc: limit error messages relating to memory leak to one line
tipc: simplify link_print by divorcing it from using tipc_printf
include/linux/tipc_config.h | 4 +-
net/tipc/Kconfig | 25 ----
net/tipc/bcast.c | 65 +++++-----
net/tipc/bearer.c | 62 +++++----
net/tipc/bearer.h | 2 +-
net/tipc/config.c | 41 +++---
net/tipc/core.c | 15 +--
net/tipc/core.h | 63 +--------
net/tipc/discover.c | 10 +-
net/tipc/handler.c | 4 +-
net/tipc/link.c | 297 ++++++++++++++++++------------------------
net/tipc/link.h | 63 ++++-----
net/tipc/log.c | 302 ++-----------------------------------------
net/tipc/log.h | 66 ----------
net/tipc/msg.c | 242 ----------------------------------
net/tipc/name_distr.c | 25 ++--
net/tipc/name_table.c | 132 ++++++++++---------
net/tipc/net.c | 8 +-
net/tipc/netlink.c | 2 +-
net/tipc/node.c | 23 ++--
net/tipc/node_subscr.c | 3 +-
net/tipc/port.c | 66 +++++-----
net/tipc/ref.c | 10 +-
net/tipc/socket.c | 4 +-
net/tipc/subscr.c | 14 +-
25 files changed, 430 insertions(+), 1118 deletions(-)
delete mode 100644 net/tipc/log.h
--
1.7.9.7
^ permalink raw reply
* Re: [PATCH 2/2] ipvs: generalize app registration in netns
From: Pablo Neira Ayuso @ 2012-07-12 16:22 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <1341966327-16606-3-git-send-email-horms@verge.net.au>
On Wed, Jul 11, 2012 at 09:25:27AM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
>
> Get rid of the ftp_app pointer and allow applications
> to be registered without adding fields in the netns_ipvs structure.
>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
> include/net/ip_vs.h | 5 ++--
> net/netfilter/ipvs/ip_vs_app.c | 61 +++++++++++++++++++++++++++++++-----------
> net/netfilter/ipvs/ip_vs_ftp.c | 21 ++++-----------
> 3 files changed, 52 insertions(+), 35 deletions(-)
>
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index d6146b4..6cb4699 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -808,8 +808,6 @@ struct netns_ipvs {
> struct list_head rs_table[IP_VS_RTAB_SIZE];
> /* ip_vs_app */
> struct list_head app_list;
> - /* ip_vs_ftp */
> - struct ip_vs_app *ftp_app;
> /* ip_vs_proto */
> #define IP_VS_PROTO_TAB_SIZE 32 /* must be power of 2 */
> struct ip_vs_proto_data *proto_data_table[IP_VS_PROTO_TAB_SIZE];
> @@ -1179,7 +1177,8 @@ extern void ip_vs_service_net_cleanup(struct net *net);
> * (from ip_vs_app.c)
> */
> #define IP_VS_APP_MAX_PORTS 8
> -extern int register_ip_vs_app(struct net *net, struct ip_vs_app *app);
> +extern struct ip_vs_app *register_ip_vs_app(struct net *net,
> + struct ip_vs_app *app);
> extern void unregister_ip_vs_app(struct net *net, struct ip_vs_app *app);
> extern int ip_vs_bind_app(struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
> extern void ip_vs_unbind_app(struct ip_vs_conn *cp);
> diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
> index 64f9e8f..11caaea 100644
> --- a/net/netfilter/ipvs/ip_vs_app.c
> +++ b/net/netfilter/ipvs/ip_vs_app.c
> @@ -180,22 +180,41 @@ register_ip_vs_app_inc(struct net *net, struct ip_vs_app *app, __u16 proto,
> }
>
>
> -/*
> - * ip_vs_app registration routine
> - */
> -int register_ip_vs_app(struct net *net, struct ip_vs_app *app)
> +/* Register application for netns */
> +struct ip_vs_app *register_ip_vs_app(struct net *net, struct ip_vs_app *app)
> {
> struct netns_ipvs *ipvs = net_ipvs(net);
> - /* increase the module use count */
> - ip_vs_use_count_inc();
> + struct ip_vs_app *a;
> + int err = 0;
> +
> + if (!ipvs)
> + return ERR_PTR(-ENOENT);
>
> mutex_lock(&__ip_vs_app_mutex);
>
> - list_add(&app->a_list, &ipvs->app_list);
> + list_for_each_entry(a, &ipvs->app_list, a_list) {
> + if (!strcmp(app->name, a->name)) {
> + err = -EEXIST;
> + break;
> + }
> + }
> + if (!err) {
> + a = kmemdup(app, sizeof(*app), GFP_KERNEL);
> + if (!a)
> + err = -ENOMEM;
> + }
> + if (!err) {
> + INIT_LIST_HEAD(&a->incs_list);
> + list_add(&a->a_list, &ipvs->app_list);
> + /* increase the module use count */
> + ip_vs_use_count_inc();
> + }
I think this code will look better if you use something like:
+ if (!strcmp(app->name, a->name)) {
+ err = -EEXIST;
+ goto err_unlock;
+ }
err_unlock:
mutex_unlock(...)
>
> mutex_unlock(&__ip_vs_app_mutex);
>
> - return 0;
> + if (err)
> + return ERR_PTR(err);
> + return a;
For this three lines above, you can use:
return err ? return ERR_PTR(err) : a;
> }
>
>
> @@ -205,20 +224,29 @@ int register_ip_vs_app(struct net *net, struct ip_vs_app *app)
> */
> void unregister_ip_vs_app(struct net *net, struct ip_vs_app *app)
> {
> - struct ip_vs_app *inc, *nxt;
> + struct netns_ipvs *ipvs = net_ipvs(net);
> + struct ip_vs_app *a, *anxt, *inc, *nxt;
> +
> + if (!ipvs)
> + return;
>
> mutex_lock(&__ip_vs_app_mutex);
>
> - list_for_each_entry_safe(inc, nxt, &app->incs_list, a_list) {
> - ip_vs_app_inc_release(net, inc);
> - }
> + list_for_each_entry_safe(a, anxt, &ipvs->app_list, a_list) {
> + if (app && strcmp(app->name, a->name))
> + continue;
> + list_for_each_entry_safe(inc, nxt, &a->incs_list, a_list) {
> + ip_vs_app_inc_release(net, inc);
> + }
>
> - list_del(&app->a_list);
> + list_del(&a->a_list);
> + kfree(a);
>
> - mutex_unlock(&__ip_vs_app_mutex);
> + /* decrease the module use count */
> + ip_vs_use_count_dec();
> + }
>
> - /* decrease the module use count */
> - ip_vs_use_count_dec();
> + mutex_unlock(&__ip_vs_app_mutex);
> }
>
>
> @@ -586,5 +614,6 @@ int __net_init ip_vs_app_net_init(struct net *net)
>
> void __net_exit ip_vs_app_net_cleanup(struct net *net)
> {
> + unregister_ip_vs_app(net, NULL /* all */);
> proc_net_remove(net, "ip_vs_app");
> }
> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
> index b20b29c..ad70b7e 100644
> --- a/net/netfilter/ipvs/ip_vs_ftp.c
> +++ b/net/netfilter/ipvs/ip_vs_ftp.c
> @@ -441,16 +441,10 @@ static int __net_init __ip_vs_ftp_init(struct net *net)
>
> if (!ipvs)
> return -ENOENT;
> - app = kmemdup(&ip_vs_ftp, sizeof(struct ip_vs_app), GFP_KERNEL);
> - if (!app)
> - return -ENOMEM;
> - INIT_LIST_HEAD(&app->a_list);
> - INIT_LIST_HEAD(&app->incs_list);
> - ipvs->ftp_app = app;
>
> - ret = register_ip_vs_app(net, app);
> - if (ret)
> - goto err_exit;
> + app = register_ip_vs_app(net, &ip_vs_ftp);
> + if (IS_ERR(app))
> + return PTR_ERR(app);
>
> for (i = 0; i < ports_count; i++) {
> if (!ports[i])
> @@ -464,9 +458,7 @@ static int __net_init __ip_vs_ftp_init(struct net *net)
> return 0;
>
> err_unreg:
> - unregister_ip_vs_app(net, app);
> -err_exit:
> - kfree(ipvs->ftp_app);
> + unregister_ip_vs_app(net, &ip_vs_ftp);
> return ret;
> }
> /*
> @@ -474,10 +466,7 @@ err_exit:
> */
> static void __ip_vs_ftp_exit(struct net *net)
> {
> - struct netns_ipvs *ipvs = net_ipvs(net);
> -
> - unregister_ip_vs_app(net, ipvs->ftp_app);
> - kfree(ipvs->ftp_app);
> + unregister_ip_vs_app(net, &ip_vs_ftp);
> }
>
> static struct pernet_operations ip_vs_ftp_ops = {
> --
> 1.7.10.2.484.gcd07cc5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next 01/11] sfc: Implement 128-bit writes for efx_writeo_page
From: Ben Hutchings @ 2012-07-12 16:17 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <20120712.081300.624240435828585132.davem@davemloft.net>
On Thu, 2012-07-12 at 08:13 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Thu, 12 Jul 2012 00:15:18 +0100
>
> > Add support for writing a TX descriptor to the NIC in one PCIe
> > transaction on x86_64 machines.
> >
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>
> This absolutely does not belong in a driver.
OK. My assumption was that 128-bit MMIO on a data path would be so rare
that this would not be generically useful. (Really long MMIOs are
likely to be done with write-combining, so that the data width of
instructions doesn't matter much.)
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* RE: [PATCH net-next 01/11] sfc: Implement 128-bit writes for efx_writeo_page
From: Ben Hutchings @ 2012-07-12 16:10 UTC (permalink / raw)
To: David Laight; +Cc: David Miller, netdev, linux-net-drivers
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B6F7D@saturn3.aculab.com>
On Thu, 2012-07-12 at 09:45 +0100, David Laight wrote:
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Ben Hutchings
> > Sent: 12 July 2012 00:15
> > To: David Miller
> > Cc: netdev@vger.kernel.org; linux-net-drivers@solarflare.com
> > Subject: [PATCH net-next 01/11] sfc: Implement 128-bit writes for
> > efx_writeo_page
> >
> > Add support for writing a TX descriptor to the NIC in one PCIe
> > transaction on x86_64 machines.
> ...
> > +static inline void _efx_writeo(struct efx_nic *efx, efx_le_128 value,
> > + unsigned int reg)
> > +{
> ...
> > +}
>
> Wouldn't it be better to put code this in some generic header
> where it can be used by other drivers?
I can do that, but I'm not sure that it's all that generic.
> Some architectures/cpus have dma engines associated with the
> PCIe interface than can be used to request longer PCIe transactions.
>
> So you probably need a transfer length as well.
>
> I suspect that it is never worth using an interrupt for
> the completion of such dma - although splitting the request
> and wait would allow the caller to overlap operations.
>
> With a dma interface it is worth updating multiple ring
> entries at one, and reading multiple entries for status.
We're trying to reduce latency. You seem to be talking about batching
to increase efficiency at the cost of latency.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] tc: man: change man page and comment to confirm to code's behavior.
From: Stephen Hemminger @ 2012-07-12 16:06 UTC (permalink / raw)
To: Li Wei; +Cc: netdev
In-Reply-To: <4FFE2EE9.8030808@cn.fujitsu.com>
On Thu, 12 Jul 2012 09:56:57 +0800
Li Wei <lw@cn.fujitsu.com> wrote:
>
> Since the get_rate() code incorrectly interpreted bare number, the
> behavior is not the same as man page and comment described.
>
> We need to change the man page and comment for compatible with the
> existing usage by scripts.
> ---
> man/man8/tc.8 | 7 +++++--
> tc/tc_util.c | 2 +-
> 2 files changed, 6 insertions(+), 3 deletions(-)
Thanks for fixing. Accepted.
^ permalink raw reply
* Re: [PATCH iproute2] Ability to compile iproute2 as shared library
From: Stephen Hemminger @ 2012-07-12 16:03 UTC (permalink / raw)
To: hamid jafarian; +Cc: netdev
In-Reply-To: <1342093501.19963.11.camel@gol>
On Thu, 12 Jul 2012 16:14:54 +0430
hamid jafarian <hamid.jafarian@pdnsoft.com> wrote:
> Hi,
>
> This is a try with minimum changes to compile iproute2 as shared
> library.
> Some functions would be used when we compile iproute2 as
> shared library has been defined in "ip.c".
> Also NICs caching strategy has been changed because, when we use
> "libiproute2.so", system NIC list may change, so we should
> re-cache all NICs at each call to NIC manipulation functions.
> Also some call of "exit(*)" changed to "return *".
> HOWTO Make: # export LIBIPROUTE2_SO=y; make
>
> in attached files there is a simple wrapper to work with
> libiproute2.so ...
>
Thank you for the contribution. I can see how this could be useful
in some limited embedded applications, but at this moment it doesn't
seem to be generally worth adding to the upstream code.
The patch does contain some cleanup bits for where the existing
code is not freeing stuff. This should be fixed, independent of
whether library support is added, because it would remove the number
of leaks reported when testing under valgrind. Could you resubmit
that part please.
^ permalink raw reply
* Re: [patch net-next 0/3] team: couple of patches
From: Jiri Pirko @ 2012-07-12 15:53 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120712.081443.1373285885474644661.davem@davemloft.net>
Thu, Jul 12, 2012 at 05:14:43PM CEST, davem@davemloft.net wrote:
>From: David Miller <davem@davemloft.net>
>Date: Thu, 12 Jul 2012 08:11:24 -0700 (PDT)
>
>> From: Jiri Pirko <jpirko@redhat.com>
>> Date: Wed, 11 Jul 2012 17:34:01 +0200
>>
>>> Jiri Pirko (3):
>>> team: use function team_port_txable() for determing enabled and up
>>> port
>>> team: add broadcast mode
>>> team: make team_port_enabled() and team_port_txable() static inline
>>
>> All applied, thanks.
>
>Jiri, btw, any chance I can convince you to remove the EXPERIMENTAL
>Kconfig dependency?
>
>Code I've written and checked in myself over the past few days is
>several orders of magnitude more experimental than the team driver
>is :-)
Hehe :) Nevertheless, I would like to keep this flag for some more time.
I will remove that once I have all planned basic functionality in.
Jirka
^ permalink raw reply
* Re: [net-next:master 90/102] net/ipv4/route.c:1283:9: warning: unused variable 'saddr'
From: Dan Carpenter @ 2012-07-12 15:47 UTC (permalink / raw)
To: David Miller; +Cc: fengguang.wu, kernel-janitors, netdev
In-Reply-To: <20120712.074058.753681400854318989.davem@davemloft.net>
On Thu, Jul 12, 2012 at 07:40:58AM -0700, David Miller wrote:
>
> There's not need to report these to kernel-janitors if it's a
> net-next specific issue and I'm going to fix it up 5 minutes
> after you report it.
The kernel-janitors list is CC'd to prevent people from sending
duplicate messages. This has happened in the past and it's
annoying for everyone.
regards,
dan carpenter
^ permalink raw reply
* Re: [PATCH 1/2] ipvs: ip_vs_ftp depends on nf_conntrack_ftp helper
From: Pablo Neira Ayuso @ 2012-07-12 15:39 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer
In-Reply-To: <1341966327-16606-2-git-send-email-horms@verge.net.au>
On Wed, Jul 11, 2012 at 09:25:26AM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
>
> The FTP application indirectly depends on the
> nf_conntrack_ftp helper for proper NAT support. If the
> module is not loaded, IPVS can resize the packets for the
> command connection, eg. PASV response but the SEQ adjustment
> logic in ipv4_confirm is not called without helper.
>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
> net/netfilter/ipvs/Kconfig | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
> index f987138..8b2cffd 100644
> --- a/net/netfilter/ipvs/Kconfig
> +++ b/net/netfilter/ipvs/Kconfig
> @@ -250,7 +250,8 @@ comment 'IPVS application helper'
>
> config IP_VS_FTP
> tristate "FTP protocol helper"
> - depends on IP_VS_PROTO_TCP && NF_CONNTRACK && NF_NAT
> + depends on IP_VS_PROTO_TCP && NF_CONNTRACK && NF_NAT && \
> + NF_CONNTRACK_FTP
If you require FTP NAT support, then this depends on NF_NAT_FTP
instead of NF_CONNTRACK_FTP.
^ permalink raw reply
* Re: [RFC PATCH 1/2] net: Add new network device function to allow for MMIO batching
From: Alexander Duyck @ 2012-07-12 15:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, davem, jeffrey.t.kirsher, edumazet, bhutchings, therbert,
alexander.duyck
In-Reply-To: <1342077259.3265.8232.camel@edumazet-glaptop>
On 07/12/2012 12:14 AM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 17:26 -0700, Alexander Duyck wrote:
>> This change adds capabilities to the driver for batching the MMIO write
>> involved with transmits. Most of the logic is based off of the code for
>> the qdisc scheduling.
>>
>> What I did is break the transmit path into two parts. We already had the
>> ndo_start_xmit function which has been there all along. The part I added
>> was ndo_complete_xmit which is meant to handle notifying the hardware that
>> frames are ready for delivery.
>>
>> To control all of this I added a net sysfs value for the Tx queues called
>> dispatch_limit. When 0 it indicates that all frames will notify hardware
>> immediately. When 1 or more the netdev_complete_xmit call will queue up to
>> that number of packets, and when the value is exceeded it will notify the
>> hardware and reset the pending frame dispatch count.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
> The idea is good, but do we really need so complex schem ?
>
> Most of the transmits are done from __qdisc_run()
>
> We could add logic in __qdisc_run()/qdisc_restart()
>
> qdisc_run_end() would then have to call ndo_complete_xmit() to make
> sure the MMIO is done.
The problem is in both of the cases where I have seen the issue the
qdisc is actually empty.
In the case of pktgen it does not use the qdisc layer at all. It just
directly calls ndo_start_xmit.
In the standard networking case we never fill the qdisc because the MMIO
write stalls the entire CPU so the application never gets a chance to
get ahead of the hardware. From what I can tell the only case in which
the qdisc_run solution would work is if the ndo_start_xmit was called on
a different CPU from the application that is doing the transmitting.
Thanks,
Alex
^ permalink raw reply
* Re: [PATCH] sch_sfb: Fix missing NULL check
From: David Miller @ 2012-07-12 15:33 UTC (permalink / raw)
To: eric.dumazet; +Cc: alan, netdev
In-Reply-To: <1342101021.3265.8261.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 12 Jul 2012 15:50:21 +0200
> On Thu, 2012-07-12 at 06:25 -0700, David Miller wrote:
>> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
>> Date: Thu, 12 Jul 2012 14:39:11 +0100
>>
>> > Signed-off-by: Alan Cox <alna@linux.intel.com>
>> ^^^^
>>
>> I'm truly astonished that you type in signoffs by hand Alan.
>
> Weel, I do the same ;)
You guys are weird :-)
> Feel free to add my
>
> Acked-by: Eric Dumazet <edumazet@google.com>
Applied, with signoff typo fixed too :-)
^ permalink raw reply
* Re: [PATCH 2/2] net: Update alloc frag to reduce get/put page usage and recycle pages
From: Alexander Duyck @ 2012-07-12 15:33 UTC (permalink / raw)
To: Eric Dumazet
Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher, Eric Dumazet
In-Reply-To: <1342069601.3265.8218.camel@edumazet-glaptop>
On 07/11/2012 10:06 PM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 19:02 -0700, Alexander Duyck wrote:
>
>> The gain will be minimal if any with the 1500 byte allocations, however
>> there shouldn't be a performance degradation.
>>
>> I was thinking more of the ixgbe case where we are working with only 256
>> byte allocations and can recycle pages in the case of GRO or TCP. For
>> ixgbe the advantages are significant since we drop a number of the
>> get_page calls and get the advantage of the page recycling. So for
>> example with GRO enabled we should only have to allocate 1 page for
>> headers every 16 buffers, and the 6 slots we use in that page have a
>> good likelihood of being warm in the cache since we just keep looping on
>> the same page.
>>
> Its not possible to get 16 buffers per 4096 bytes page.
Actually I was talking about buffers from the device, not buffers from
the page. However, it is possible to get 16 head_frag buffers from the
same 4K page if we consider recycling. In the case of GRO we will end
up with the first buffer keeping the head_frag, and all of the remaining
head_frags will be freed before we call netdev_alloc_frag again. So
what will end up happening is that each GRO assembled frame from ixgbe
would start with a recycled page used for the previously freed
head_frags, the page will be dropped from netdev_alloc_frag after we run
out of space, a new page will be allocated for use as head_frags, and
finally those head_frags will be freed and recycled until we hit the end
of the GRO frame and start over. So if you count them all then we end
up using the page up to 16 times, maybe even more depending on how the
page offset reset aligns with the start of the GRO frame.
> sizeof(struct skb_shared_info)=0x140 320
>
> Add 192 bytes (NET_SKB_PAD + 128)
>
> Thats a minimum of 512 bytes (but ixgbe uses more) per skb.
>
> In practice for ixgbe, its :
>
> #define IXGBE_RXBUFFER_512 512 /* Used for packet split */
> #define IXGBE_RX_HDR_SIZE IXGBE_RXBUFFER_512
>
> skb = netdev_alloc_skb_ip_align(rx_ring->netdev, IXGBE_RX_HDR_SIZE)
>
> So 4 buffers per PAGE
>
> Maybe you plan to use IXGBE_RXBUFFER_256 or IXGBE_RXBUFFER_128 ?
I have a patch that is in testing in Jeff Kirsher's tree that uses
IXGBE_RXBUFFER_256. With your recent changes it didn't make sense to
use 512 when we would only copy 256 bytes into the head. With the size
set to 256 we will get 6 buffers per page without any recycling.
Thanks,
Alex
^ permalink raw reply
* Re: [PATCH 02/16] ipv4: Deliver ICMP redirects to sockets too.
From: Hiroaki SHIMODA @ 2012-07-12 15:21 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120712.080653.1463195798230664640.davem@davemloft.net>
On Thu, 12 Jul 2012 08:06:53 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> Date: Thu, 12 Jul 2012 23:58:37 +0900
>
> > So, I think avobe deleted checks about skb->len need to move to
> > ping_err() in case of packets are malformed.
>
> You would be wrong, the check belongs in icmp_socket_deliver().
Ah, OK. Thanks ;)
^ permalink raw reply
* Re: pull-request: can-next 2012-07-11
From: David Miller @ 2012-07-12 15:19 UTC (permalink / raw)
To: mkl; +Cc: netdev, linux-can
In-Reply-To: <4FFD8825.2060109@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 11 Jul 2012 16:05:25 +0200
> the fourth pull request for upcoming v3.6 net-next consist of a series
> of can_gw netlink cleanups done by Thomas Graf.
Pulled, thanks.
^ permalink raw reply
* Re: [PATCH net-next 0/7] be2net updates
From: David Miller @ 2012-07-12 15:16 UTC (permalink / raw)
To: padmanabh.ratnakar; +Cc: netdev
In-Reply-To: <a8f2c513-d398-4be6-a059-242dfc2c052a@exht1.ad.emulex.com>
From: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Date: Thu, 12 Jul 2012 19:25:01 +0530
> Padmanabh Ratnakar (7):
> be2net: Fix error while toggling autoneg of pause parameters
> be2net : Fix die temperature stat for Lancer
> be2net: Fix initialization sequence for Lancer
> be2net: Activate new FW after FW download for Lancer
> be2net: Fix cleanup path when EQ creation fails
> be2net: Fix port name in message during driver load
> be2net: Enable RSS UDP hashing for Lancer and Skyhawk
All applied, but like others have said you should document in
the driver what exactly the chip uses in it's RSS calculations
and in what circumstances.
^ permalink raw reply
* Re: [patch net-next 0/3] team: couple of patches
From: David Miller @ 2012-07-12 15:14 UTC (permalink / raw)
To: jpirko; +Cc: netdev
In-Reply-To: <20120712.081124.1207448876900334978.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Thu, 12 Jul 2012 08:11:24 -0700 (PDT)
> From: Jiri Pirko <jpirko@redhat.com>
> Date: Wed, 11 Jul 2012 17:34:01 +0200
>
>> Jiri Pirko (3):
>> team: use function team_port_txable() for determing enabled and up
>> port
>> team: add broadcast mode
>> team: make team_port_enabled() and team_port_txable() static inline
>
> All applied, thanks.
Jiri, btw, any chance I can convince you to remove the EXPERIMENTAL
Kconfig dependency?
Code I've written and checked in myself over the past few days is
several orders of magnitude more experimental than the team driver
is :-)
^ permalink raw reply
* Re: [PATCH net-next 01/11] sfc: Implement 128-bit writes for efx_writeo_page
From: David Miller @ 2012-07-12 15:13 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048518.2613.60.camel@bwh-desktop.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 12 Jul 2012 00:15:18 +0100
> Add support for writing a TX descriptor to the NIC in one PCIe
> transaction on x86_64 machines.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This absolutely does not belong in a driver.
^ permalink raw reply
* Re: [patch net-next 0/3] team: couple of patches
From: David Miller @ 2012-07-12 15:11 UTC (permalink / raw)
To: jpirko; +Cc: netdev
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 11 Jul 2012 17:34:01 +0200
> Jiri Pirko (3):
> team: use function team_port_txable() for determing enabled and up
> port
> team: add broadcast mode
> team: make team_port_enabled() and team_port_txable() static inline
All applied, thanks.
^ permalink raw reply
* Re: [PATCH 02/16] ipv4: Deliver ICMP redirects to sockets too.
From: David Miller @ 2012-07-12 15:06 UTC (permalink / raw)
To: shimoda.hiroaki; +Cc: netdev
In-Reply-To: <20120712235837.4d611326830a16f9a035dd75@gmail.com>
From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date: Thu, 12 Jul 2012 23:58:37 +0900
> So, I think avobe deleted checks about skb->len need to move to
> ping_err() in case of packets are malformed.
You would be wrong, the check belongs in icmp_socket_deliver().
====================
>From f0a70e902f483295a8b6d74ef4393bc577b703d7 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Thu, 12 Jul 2012 08:06:04 -0700
Subject: [PATCH] ipv4: Put proper checks into icmp_socket_deliver().
All handler->err() routines expect that we've done a pskb_may_pull()
test to make sure that IP header length + 8 bytes can be safely
pulled.
Reported-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/icmp.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index d01aeb4..ea3a996 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -640,6 +640,12 @@ static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
const struct net_protocol *ipprot;
int protocol = iph->protocol;
+ /* Checkin full IP header plus 8 bytes of protocol to
+ * avoid additional coding at protocol handlers.
+ */
+ if (!pskb_may_pull(skb, iph->ihl * 4 + 8))
+ return;
+
raw_icmp_error(skb, protocol, info);
rcu_read_lock();
@@ -733,12 +739,6 @@ static void icmp_unreach(struct sk_buff *skb)
goto out;
}
- /* Checkin full IP header plus 8 bytes of protocol to
- * avoid additional coding at protocol handlers.
- */
- if (!pskb_may_pull(skb, iph->ihl * 4 + 8))
- goto out;
-
icmp_socket_deliver(skb, info);
out:
--
1.7.10.4
^ permalink raw reply related
* Re: [net-next 0/5][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-07-12 15:01 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1341997769-22034-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 11 Jul 2012 02:09:24 -0700
> This series contains updates to ixgbe.
>
> The following are changes since commit 4715213d9cf40285492fff4092bb1fa8e982f632:
> bridge: fix endian
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
>
> Alexander Duyck (5):
> ixgbe: count q_vectors instead of MSI-X vectors
> ixgbe: Add upper limit to ring features
> ixgbe: Add feature offset value to ring features
> ixgbe: Clean up a useless switch statement and dead code in
> configure_srrctl
> ixgbe: Merge RSS and flow director ring register caching and
> configuration
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH 1/1] net: sched: add ipset ematch
From: David Miller @ 2012-07-12 15:00 UTC (permalink / raw)
To: fw; +Cc: netdev, kadlec
In-Reply-To: <1342040217-5637-1-git-send-email-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Wed, 11 Jul 2012 22:56:57 +0200
> Can be used to match packets against netfilter ip sets created via ipset(8).
> skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.
>
> Since ipset is usually called from netfilter, the ematch
> initializes a fake xt_action_param, pulls the ip header into the
> linear area and also sets skb->data to the IP header (otherwise
> matching Layer 4 set types doesn't work).
>
> Tested-by: Mr Dash Four <mr.dash.four@googlemail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 02/16] ipv4: Deliver ICMP redirects to sockets too.
From: Hiroaki SHIMODA @ 2012-07-12 14:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120712.011049.831106026936792516.davem@davemloft.net>
On Thu, 12 Jul 2012 01:10:49 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
>
> And thus, we can remove the ping_err() hack.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
> net/ipv4/icmp.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 18e39d1..5885146 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -782,13 +782,7 @@ static void icmp_redirect(struct sk_buff *skb)
> break;
> }
>
> - /* Ping wants to see redirects.
> - * Let's pretend they are errors of sorts... */
> - if (iph->protocol == IPPROTO_ICMP &&
> - iph->ihl >= 5 &&
> - pskb_may_pull(skb, (iph->ihl<<2)+8)) {
> - ping_err(skb, icmp_hdr(skb)->un.gateway);
> - }
> + icmp_socket_deliver(skb, icmp_hdr(skb)->un.gateway);
icmp_redirect() just checks skb->len is larger than
sizeof(struct iphdr) and then ping_err() is called.
In ping_err(), *icmph is derived from following code without
sanity check of skb->len. So, I think avobe deleted checks about
skb->len need to move to ping_err() in case of packets are malformed.
struct icmphdr *icmph = (struct icmphdr *)(skb->data+(iph->ihl<<2))
^ permalink raw reply
* Re: [PATCH net-next] netxen: fix link notification order
From: David Miller @ 2012-07-12 14:57 UTC (permalink / raw)
To: fbl; +Cc: netdev, sony.chacko, rajesh.borundia
In-Reply-To: <1342033015-31442-1-git-send-email-fbl@redhat.com>
From: Flavio Leitner <fbl@redhat.com>
Date: Wed, 11 Jul 2012 15:56:55 -0300
> First update the adapter variables with the current speed and
> mode before fire the notification. Otherwise, the get_settings()
> may provide old values.
>
> Signed-off-by: Flavio Leitner <fbl@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next v2 0/7] ieee802.15.4 general fixes
From: David Miller @ 2012-07-12 14:56 UTC (permalink / raw)
To: alex.bluesman.smirnov; +Cc: eric.dumazet, netdev
In-Reply-To: <1341991368-11800-1-git-send-email-alex.bluesman.smirnov@gmail.com>
From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Date: Wed, 11 Jul 2012 11:22:41 +0400
> Dear David, Eric,
>
> this patch-set is mostly intended to fix sparse and LOCKDEP warnings.
> It mostly contains some my previous patches reworked and extended according
> to the hints from Eric Dumazet and Fengguang Wu. Many thanks to they!
>
> Changes since v1:
> 1. A new patch from Tony Cheneau was added. The fragmentation stops working
> after some amount of packets sent. This patch fixes this issue.
> 2. 6lowpan fragment deleting routine: I removed spinlocks from timer_expired
> handler and use spin_lock_bh to disable concurrency races with timer interrupt.
> 3. at86rf230 irq handler was a little bit modified
Series applied, but you don't need to grab a spinlock to only
load one interger from some datastructure. I mean:
lock();
ret = p->foo;
unlock();
is completely pointless.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox