* Re: [-next April 8] eHEA driver failure on powerpc
From: Sachin Sant @ 2010-04-09 17:48 UTC (permalink / raw)
To: Linux/PPC Development
Cc: linux-next, netdev, HERING2, Anton Blanchard,
Benjamin Herrenschmidt
In-Reply-To: <4BBDC447.3040901@in.ibm.com>
Sachin Sant wrote:
> With today's next release, eHEA network interface on couple
> of power6 boxes fails to initialize.
>
> # modprobe ehea
> IBM eHEA ethernet device driver (Release EHEA_0102)
> alloc irq_desc for 256 on node 0
> alloc kstat_irqs on node 0
> irq: irq 590080 on host null mapped to virtual irq 256
> ehea: Error in ehea_plpar_hcall_norets: opcode=26c
> ret=fffffffffffffffc arg1=8000000003000000 arg2=0
> arg3=7000000000050400 arg4=fc9b0000 arg5=200 arg6=0 arg7=0
> ehea: Error in ehea_reg_mr_section: register_rpage_mr failed
> ehea: Error in ehea_reg_kernel_mr: registering mr failed
> ehea: Error in ehea_setup_ports: creating MR failed
> ehea 23c00200.lhea: setup_ports failed
> ehea: probe of 23c00200.lhea failed with error -5
I tracked this problem to the following commit.
commit 7545ba6f82924d4523f8f8a2baf2e517a750265d
powerpc/mm: Bump SECTION_SIZE_BITS from 16MB to 256MB
If i revert this commit, the network interface is initialized
properly. Verified this on two different power6 boxes.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
^ permalink raw reply
* Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
From: Rick Jones @ 2010-04-09 17:13 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Michael S. Tsirkin, Tom Lendacky, netdev, kvm@vger.kernel.org
In-Reply-To: <1270827580.25555.13.camel@w-sridhar.beaverton.ibm.com>
Sridhar Samudrala wrote:
> On Thu, 2010-04-08 at 17:14 -0700, Rick Jones wrote:
>
>>>Here are the results with netperf TCP_STREAM 64K guest to host on a
>>>8-cpu Nehalem system.
>>
>>I presume you mean 8 core Nehalem-EP, or did you mean 8 processor Nehalem-EX?
>
>
> Yes. It is a 2 socket quad-core Nehalem. so i guess it is a 8 core
> Nehalem-EP.
>
>>Don't get me wrong, I *like* the netperf 64K TCP_STREAM test, I lik it a lot!-)
>>but I find it incomplete and also like to run things like single-instance TCP_RR
>>and multiple-instance, multiple "transaction" (./configure --enable-burst)
>>TCP_RR tests, particularly when concerned with "scaling" issues.
>
>
> Can we run multiple instance and multiple transaction tests with a
> single netperf commandline?
Do you count a shell for loop as a single command line?
> Is there any easy way to get consolidated throughput when a netserver on
> the host is servicing netperf clients from multiple guests?
I tend to use a script such as:
ftp://ftp.netperf.org/netperf/misc/runemomniagg2.sh
which presumes that netperf/netserver have been built with:
./configure --enable-omni --enable-burst ...
and uses the CSV output format of the omni tests. When I want sums I then turn
to a spreadsheet, or I suppose I could turn to awk etc.
The TCP_RR test can be flipped around request size for response size etc, so
when I have a single sustem under test, I initiate the netperf commands on it,
targetting netservers on the clients. If I want inbound bulk throughput I use
the TCP_MAERTS test rather than the TCP_STREAM test.
happy benchmarking,
rick jones
^ permalink raw reply
* Re: pull request: wireless-2.6 2010-04-09
From: David Miller @ 2010-04-09 17:03 UTC (permalink / raw)
To: linville
Cc: jeff.chua.linux, shanyu.zhao, reinette.chatre, stable,
linux-kernel, torvalds, viro, wey-yi.w.guy, linux-wireless,
netdev
In-Reply-To: <20100409153807.GA3014@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Fri, 9 Apr 2010 11:38:07 -0400
> This fix is intended for 2.6.34. It resolves an issue involving an
> Oops on boxes w/ iwl4965 hardware, apparently introduced by another
> recent patch. The thread describing the issue and the resolution
> is here:
>
> http://marc.info/?l=linux-kernel&m=127074721531649&w=2
>
> In order to avoid reverting that patch, please accept this fix instead.
> As usual, please let me know if there are problems!
Pulled, thanks John.
^ permalink raw reply
* Re: [PATCH 1/1] add ethtool loopback support
From: Jeff Garzik @ 2010-04-09 17:01 UTC (permalink / raw)
To: Laurent Chavey; +Cc: Ben Hutchings, davem, netdev, therbert
In-Reply-To: <x2o97949e3e1004090943o4b6b29e5pd261e2cb4c7f421d@mail.gmail.com>
On 04/09/2010 12:43 PM, Laurent Chavey wrote:
> isn't the existing ETHTOOL_TEST ioctl use for something like self test ?
>
> the intent of this patch is to enable a mode whereby one could run
> netperf / iperf and other application and have the packets sent and
> received by the driver.
I said "ethtool flags interface", which is ETHTOOL_[GS]FLAGS.
ethtool private flags interface would also work, ETHTOOL_[GS]PFLAGS.
Both are interfaces enabling user setting/clearing of 32 on/off switches
(bits).
Jeff
^ permalink raw reply
* Re: [PATCH 1/1] add ethtool loopback support
From: Ben Hutchings @ 2010-04-09 16:55 UTC (permalink / raw)
To: Laurent Chavey; +Cc: Jeff Garzik, davem, netdev, therbert
In-Reply-To: <x2o97949e3e1004090943o4b6b29e5pd261e2cb4c7f421d@mail.gmail.com>
Don't top-post.
On Fri, 2010-04-09 at 09:43 -0700, Laurent Chavey wrote:
> isn't the existing ETHTOOL_TEST ioctl use for something like self test ?
>
> the intent of this patch is to enable a mode whereby one could run
> netperf / iperf and other application and have the packets sent and
> received by the driver.
[...]
If you send to a local address, the traffic will be routed over the
internal loopback interface. Applications will not use a network
interface in loopback unless they override routing or send raw packets.
netperf and iperf don't allow you todo that, so far as I'm ware.
Also those applications are *performance* tests; they are not very
useful for fault-finding.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH 1/1] add ethtool loopback support
From: Laurent Chavey @ 2010-04-09 16:43 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Ben Hutchings, davem, netdev, therbert
In-Reply-To: <4BBE78E2.2000709@garzik.org>
isn't the existing ETHTOOL_TEST ioctl use for something like self test ?
the intent of this patch is to enable a mode whereby one could run
netperf / iperf and other application and have the packets sent and
received by the driver.
On Thu, Apr 8, 2010 at 5:46 PM, Jeff Garzik <jeff@garzik.org> wrote:
> On 04/08/2010 06:43 PM, Laurent Chavey wrote:
>>
>> On Thu, Apr 8, 2010 at 12:35 PM, Ben Hutchings
>> <bhutchings@solarflare.com> wrote:
>>>
>>> On Thu, 2010-04-08 at 12:17 -0700, Laurent Chavey wrote:
>>>>
>>>> On Thu, Apr 8, 2010 at 11:29 AM, Ben Hutchings
>>>> <bhutchings@solarflare.com> wrote:
>>>>>
>>>>> On Thu, 2010-04-08 at 10:35 -0700, chavey@google.com wrote:
>>>
>>> [...]
>>>>>>
>>>>>> +enum ethtool_loopback_type {
>>>>>> + ETH_MAC = 0x00000001,
>>>>>> + ETH_PHY_INT = 0x00000002,
>>>>>> + ETH_PHY_EXT = 0x00000004
>>>>>> +};
>>>>>
>>>>> [...]
>>>>>
>>>>> There are many different places you can loop back within a MAC or PHY,
>>>>> not to mention bypassing the MAC altogether. See
>>>>> drivers/net/sfc/mcdi_pcol.h, starting from the line
>>>>> '#define MC_CMD_LOOPBACK_NONE 0'. I believe we implement all of those
>>>>> loopback modes on at least one board.
>>>>>
>>>>> Also are these supposed to be an enumeration or flags? In theory you
>>>>
>>>> those are enums that can be or together.
>>>
>>> I.e. they are flags. So how do you answer this:
>>>
>>>>> could use wire-side and host-side loopback at the same time if they
>>>>> don't overlap, but it's probably too much trouble to bother with. Any
>>>>> other combination is meaningless.
>>
>> since the intent is to enable the sending and receiving of packets at
>> the hw/driver interfaces, a simple loopback mode on/off is sufficient
>> and the ethtool_loopback_type are not necessary. the implementor can
>> choose
>> how to implement the loopback. From drivers/net/sfc/mcdi_pcol.h it is
>> clear
>> that unless ethtool_loopback_type as defined are meaningless.
>
> If an off/on switch is sufficient, the existing ethtool flags interface
> should work just fine.
>
> Jeff
>
>
>
>
^ permalink raw reply
* Re: HTB - What's the minimal value for 'rate' parameter?
From: Antonio Almeida @ 2010-04-09 15:40 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: netdev, kaber, davem, devik
In-Reply-To: <4BBE4BB4.1060209@gmail.com>
So, what about the rate limit miss?
As you can see the ceil of class 1:2 is set to 4096Kbit but its
sending rate is actually 8071Kbit!
It looks like classes 1:10 and 1:11 are ignoring hierarchical rate
restrictions of class 1:2
Here:
class htb 1:2 parent 1:1 rate 4096Kbit ceil 4096Kbit burst 3655b cburst 3655b
Sent 84285894 bytes 55671 pkt (dropped 0, overlimits 0 requeues 0)
rate 8071Kbit 666pps backlog 0b 0p requeues 0
lended: 0 borrowed: 0 giants: 0
tokens: -937499999 ctokens: -937499999
On Thu, Apr 8, 2010 at 10:33 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> Antonio Almeida wrote, On 04/08/2010 01:07 PM:
>
>> Hi!
>
> Hi!
>
>> I've been using HTB for a while, and we've already sent some e-mails
>> each other when resolving HTB accuracy issue.
>> When using HTB, I realised that for some configurations the rate limit
>> doesn't work.
>> I suspect that the problem is the minimum value of rate parameter,
>> which I cant figure out what is.
>>
>> I simple configuration that turns out to be wrong is as fallows: The
>> root (1:1) gets the link bandwidth configuration; the second (1:2) is
>> set to 4096Kbit; then I have two branches (1:10 and 1:11) with rate
>> 1024Kbit and ceil 4096Kbit; and finally a leaf class in each branch
>> (1:111 below 1:11, and 1:101 below 1:10) with rate 8bit and ceil
>> 4096Kbit, and the same priority.
>> I don't want to have sustained rate, and since I must configure 'rate'
>> parameter I decide to set it to 8bits - which is the minimal accepted
>> value. My cue goes for 'rate' parameter. If I set 'rate' parameter to
>> 1Kbit for instance, the problem disappears and the shaping is done
>> perfectly.
>>
>> So, I'm looking for help to find out if the problem is actually in
>> this parameter configuration or if it's just coincidence and I'll get
>> the same problem ahead :(
>> What's the minimal value for 'rate' parameter using HTB qdisc?
>
>
> I think "reasonable" or "minimally useful" (for common use) should be
> enough, and 8bits meaning one 1500 byte packet per 25 minutes or
> something, doesn't look like this to me.
>
> This changelog:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a4a710c4a7490587406462bf1d54504b7783d7d7
> mentions ~2 minutes as max time for accounting, so 1 max packet
> per 2 minutes should give such a minimal rate, I guess, but I'd
> still multiply it a few times to call it useful.
>
> Regards,
> Jarek P.
>
>
>
^ permalink raw reply
* Re: [PATCH] vhost: Make it more scalable by creating a vhost thread per device.
From: Sridhar Samudrala @ 2010-04-09 15:39 UTC (permalink / raw)
To: Rick Jones; +Cc: Michael S. Tsirkin, Tom Lendacky, netdev, kvm@vger.kernel.org
In-Reply-To: <4BBE716B.7050904@hp.com>
On Thu, 2010-04-08 at 17:14 -0700, Rick Jones wrote:
> > Here are the results with netperf TCP_STREAM 64K guest to host on a
> > 8-cpu Nehalem system.
>
> I presume you mean 8 core Nehalem-EP, or did you mean 8 processor Nehalem-EX?
Yes. It is a 2 socket quad-core Nehalem. so i guess it is a 8 core
Nehalem-EP.
>
> Don't get me wrong, I *like* the netperf 64K TCP_STREAM test, I lik it a lot!-)
> but I find it incomplete and also like to run things like single-instance TCP_RR
> and multiple-instance, multiple "transaction" (./configure --enable-burst)
> TCP_RR tests, particularly when concerned with "scaling" issues.
Can we run multiple instance and multiple transaction tests with a
single netperf commandline?
Is there any easy way to get consolidated throughput when a netserver on
the host is servicing netperf clients from multiple guests?
Thanks
Sridhar
>
> happy benchmarking,
>
> rick jones
>
> > It shows cumulative bandwidth in Mbps and host
> > CPU utilization.
> >
> > Current default single vhost thread
> > -----------------------------------
> > 1 guest: 12500 37%
> > 2 guests: 12800 46%
> > 3 guests: 12600 47%
> > 4 guests: 12200 47%
> > 5 guests: 12000 47%
> > 6 guests: 11700 47%
> > 7 guests: 11340 47%
> > 8 guests: 11200 48%
> >
> > vhost thread per cpu
> > --------------------
> > 1 guest: 4900 25%
> > 2 guests: 10800 49%
> > 3 guests: 17100 67%
> > 4 guests: 20400 84%
> > 5 guests: 21000 90%
> > 6 guests: 22500 92%
> > 7 guests: 23500 96%
> > 8 guests: 24500 99%
> >
> > vhost thread per guest interface
> > --------------------------------
> > 1 guest: 12500 37%
> > 2 guests: 21000 72%
> > 3 guests: 21600 79%
> > 4 guests: 21600 85%
> > 5 guests: 22500 89%
> > 6 guests: 22800 94%
> > 7 guests: 24500 98%
> > 8 guests: 26400 99%
> >
> > Thanks
> > Sridhar
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* pull request: wireless-2.6 2010-04-09
From: John W. Linville @ 2010-04-09 15:38 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: Jeff Chua, Zhao, Shanyu, Chatre, Reinette,
stable-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, Linux Kernel,
Linus Torvalds, Al Viro, Guy, Wey-Yi,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1270762107.20845.3.camel@wwguy-ubuntu>
Dave,
This fix is intended for 2.6.34. It resolves an issue involving an
Oops on boxes w/ iwl4965 hardware, apparently introduced by another
recent patch. The thread describing the issue and the resolution
is here:
http://marc.info/?l=linux-kernel&m=127074721531649&w=2
In order to avoid reverting that patch, please accept this fix instead.
As usual, please let me know if there are problems!
Thanks,
John
---
The following changes since commit 2626419ad5be1a054d350786b684b41d23de1538:
David S. Miller (1):
tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git master
Wey-Yi Guy (1):
iwlwifi: need check for valid qos packet before free
drivers/net/wireless/iwlwifi/iwl-4965.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c
index 83c52a6..8972166 100644
--- a/drivers/net/wireless/iwlwifi/iwl-4965.c
+++ b/drivers/net/wireless/iwlwifi/iwl-4965.c
@@ -2015,7 +2015,9 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
IWL_DEBUG_TX_REPLY(priv, "Retry scheduler reclaim scd_ssn "
"%d index %d\n", scd_ssn , index);
freed = iwl_tx_queue_reclaim(priv, txq_id, index);
- iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
+ if (qc)
+ iwl_free_tfds_in_queue(priv, sta_id,
+ tid, freed);
if (priv->mac80211_registered &&
(iwl_queue_space(&txq->q) > txq->q.low_mark) &&
@@ -2041,14 +2043,17 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
tx_resp->failure_frame);
freed = iwl_tx_queue_reclaim(priv, txq_id, index);
- iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
+ if (qc && likely(sta_id != IWL_INVALID_STATION))
+ iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
+ else if (sta_id == IWL_INVALID_STATION)
+ IWL_DEBUG_TX_REPLY(priv, "Station not known\n");
if (priv->mac80211_registered &&
(iwl_queue_space(&txq->q) > txq->q.low_mark))
iwl_wake_queue(priv, txq_id);
}
^ permalink raw reply related
* Re: net-next: 2.6.34-rc1 regression: panic when running diagnostic on interface with IPv6
From: Stephen Hemminger @ 2010-04-09 15:07 UTC (permalink / raw)
To: Emil S Tantilov; +Cc: netdev, David Miller
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1365FE4F6B39F@rrsmsx501.amr.corp.intel.com>
Send me your kernel config. And are you running tests online or offline
^ permalink raw reply
* Re: mmotm 2010-04-05-16-09 uploaded
From: Patrick McHardy @ 2010-04-09 14:49 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, David S. Miller,
linux-kernel, netfilter-devel, netdev
In-Reply-To: <8356.1270774206@localhost>
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]
Valdis.Kletnieks@vt.edu wrote:
> On Thu, 08 Apr 2010 17:36:07 +0200, Patrick McHardy said:
>
>> Valdis.Kletnieks@vt.edu wrote:
>
>>> Well, it *changed* it. Does the rcu_defererence_check() only fire on the
>>> first time it hits something, so we've fixed the first one and now we get to
>>> see the second one?
>> It appears that way, otherwise you should have seen a second warning in
>> nf_conntrack_ecache the last time.
>>
>>> (For what it's worth, if this is going to be one-at-a-time whack-a-mole, I'm
>>> OK on that, just want to know up front.)
>> I went through the other files and I believe this should be it.
>> We already removed most of these incorrect rcu_dereference()
>> calls a while back.
>
> Confirming - the second version of the patch fixes all the network-related
> RCU complaints I've been able to trigger...
Thanks. I've added the attached commit to the nf-next tree. I'll push
it to Dave shortly so this can get included in the next tree.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 4009 bytes --]
>From ed86308f6179d8fa6151c2d0f652aad0091548e2 Mon Sep 17 00:00:00 2001
From: Patrick McHardy <kaber@trash.net>
Date: Fri, 9 Apr 2010 16:42:15 +0200
Subject: [PATCH] netfilter: remove invalid rcu_dereference() calls
The CONFIG_PROVE_RCU option discovered a few invalid uses of
rcu_dereference() in netfilter. In all these cases, the code code
intends to check whether a pointer is already assigned when
performing registration or whether the assigned pointer matches
when performing unregistration. The entire registration/
unregistration is protected by a mutex, so we don't need the
rcu_dereference() calls.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
net/netfilter/nf_conntrack_ecache.c | 18 ++++--------------
net/netfilter/nf_log.c | 8 ++------
2 files changed, 6 insertions(+), 20 deletions(-)
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d5a9bcd..849614a 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -81,11 +81,9 @@ EXPORT_SYMBOL_GPL(nf_ct_deliver_cached_events);
int nf_conntrack_register_notifier(struct nf_ct_event_notifier *new)
{
int ret = 0;
- struct nf_ct_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_conntrack_event_cb);
- if (notify != NULL) {
+ if (nf_conntrack_event_cb != NULL) {
ret = -EBUSY;
goto out_unlock;
}
@@ -101,11 +99,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_register_notifier);
void nf_conntrack_unregister_notifier(struct nf_ct_event_notifier *new)
{
- struct nf_ct_event_notifier *notify;
-
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_conntrack_event_cb);
- BUG_ON(notify != new);
+ BUG_ON(nf_conntrack_event_cb != new);
rcu_assign_pointer(nf_conntrack_event_cb, NULL);
mutex_unlock(&nf_ct_ecache_mutex);
}
@@ -114,11 +109,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
int nf_ct_expect_register_notifier(struct nf_exp_event_notifier *new)
{
int ret = 0;
- struct nf_exp_event_notifier *notify;
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_expect_event_cb);
- if (notify != NULL) {
+ if (nf_expect_event_cb != NULL) {
ret = -EBUSY;
goto out_unlock;
}
@@ -134,11 +127,8 @@ EXPORT_SYMBOL_GPL(nf_ct_expect_register_notifier);
void nf_ct_expect_unregister_notifier(struct nf_exp_event_notifier *new)
{
- struct nf_exp_event_notifier *notify;
-
mutex_lock(&nf_ct_ecache_mutex);
- notify = rcu_dereference(nf_expect_event_cb);
- BUG_ON(notify != new);
+ BUG_ON(nf_expect_event_cb != new);
rcu_assign_pointer(nf_expect_event_cb, NULL);
mutex_unlock(&nf_ct_ecache_mutex);
}
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 015725a..908f599 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -35,7 +35,6 @@ static struct nf_logger *__find_logger(int pf, const char *str_logger)
/* return EEXIST if the same logger is registred, 0 on success. */
int nf_log_register(u_int8_t pf, struct nf_logger *logger)
{
- const struct nf_logger *llog;
int i;
if (pf >= ARRAY_SIZE(nf_loggers))
@@ -52,8 +51,7 @@ int nf_log_register(u_int8_t pf, struct nf_logger *logger)
} else {
/* register at end of list to honor first register win */
list_add_tail(&logger->list[pf], &nf_loggers_l[pf]);
- llog = rcu_dereference(nf_loggers[pf]);
- if (llog == NULL)
+ if (nf_loggers[pf] == NULL)
rcu_assign_pointer(nf_loggers[pf], logger);
}
@@ -65,13 +63,11 @@ EXPORT_SYMBOL(nf_log_register);
void nf_log_unregister(struct nf_logger *logger)
{
- const struct nf_logger *c_logger;
int i;
mutex_lock(&nf_log_mutex);
for (i = 0; i < ARRAY_SIZE(nf_loggers); i++) {
- c_logger = rcu_dereference(nf_loggers[i]);
- if (c_logger == logger)
+ if (nf_loggers[i] == logger)
rcu_assign_pointer(nf_loggers[i], NULL);
list_del(&logger->list[i]);
}
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] net/wireless/libertas: do not call wiphy_unregister() w/o wiphy_register()
From: Holger Schurig @ 2010-04-09 13:51 UTC (permalink / raw)
To: John W. Linville
Cc: Dan Williams, Daniel Mack, libertas-dev, netdev, linux-wireless,
linux-kernel
In-Reply-To: <20100408190358.GB2999@tuxdriver.com>
> Ping?
Pong.
I'm a bit swamped with other stuff, so I can't do that right now. That's the
pity of someone who can only use this-and-then on kernel projects.
^ permalink raw reply
* Re: [PATCH v3] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-09 13:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1270797457.2623.19.camel@edumazet-laptop>
>> * @mc_ttl - Multicasting TTL
>> * @is_icsk - is this an inet_connection_sock?
>> @@ -124,6 +126,9 @@ struct inet_sock {
>> __u16 cmsg_flags;
>> __be16 inet_sport;
>> __u16 inet_id;
>> +#ifdef CONFIG_RPS
>> + __u32 rxhash;
>> +#endif
>
> I am a bit worried, because dirtying this cache line might hurt non RPS
> setups (if network interrupts are balanced to all cpus)
>
The rxhash should only be written when it changes. So as long as
device or lower stack provide a consistent rxhash for a connection
this should be okay.
> Best place would be to put rxhash close to sk_refcnt (because we dirty
> it to get a reference on rcu sk lookups)
>
In sock_common?... I don't know if we need this in every socket yet.
> I believe we have a 32bits hole on 64bit arches for this :)
>
>
> While testint latest net-nex-2.6 on my nehalem machine, I got a crash
> (in RPS I am afraid...)
>
> I am going to correct this crash before testing RFS and let you know the
> results.
Thanks for doing that.
>
> Thanks
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [Patch 1/3] sysctl: refactor integer handling proc code
From: Octavian Purdila @ 2010-04-09 13:40 UTC (permalink / raw)
To: Changli Gao
Cc: Amerigo Wang, linux-kernel, Eric Dumazet, netdev, Neil Horman,
David Miller, ebiederm
In-Reply-To: <n2j412e6f7f1004090349q37cb01f0u177aaa4fc16664ec@mail.gmail.com>
Hi and thanks for reviewing.
On Friday 09 April 2010 13:49:12 you wrote:
> > + *
> > + * In case of success 0 is returned and buf and size are updated with
> > + * the amount of bytes read. If tr is non NULL and a trailing
> > + * character exist (size is non zero after returning from this
> > + * function) tr is updated with the trailing character.
> > + */
> > +static int proc_get_ulong(char __user **buf, size_t *size,
> > + unsigned long *val, bool *neg,
> > + const char *perm_tr, unsigned perm_tr_len, char
> > *tr) +{
> > + int len;
> > + char *p, tmp[TMPBUFLEN];
> > +
> > + if (!*size)
> > + return -EINVAL;
> > +
> > + len = *size;
> > + if (len > TMPBUFLEN-1)
> > + len = TMPBUFLEN-1;
> > +
> > + if (copy_from_user(tmp, *buf, len))
> > + return -EFAULT;
> > +
> > + tmp[len] = 0;
> > + p = tmp;
> > + if (*p == '-' && *size > 1) {
> > + *neg = 1;
> > + p++;
> > + } else
> > + *neg = 0;
>
> the function name implies that it is used to parse unsigned long, so
> negative value should not be supported.
>
My intention was to signal that the argument is unsigned long and that the
sign come separate in neg, but I am OK with changing the function name to
proc_get_long() if you think that is better.
> > + if (!isdigit(*p))
> > + return -EINVAL;
>
> It seems that ledding white space should be allowed, so this check
> isn't needed, and simple_strtoul can handle it.
>
Leading white space is skipped with proc_skip_space before calling this
function. AFAICS simple_strtoul does not handle whitespaces.
> > +
> > + *val = simple_strtoul(p, &p, 0);
> > +
> > + len = p - tmp;
> > +
> > + /* We don't know if the next char is whitespace thus we may
> > accept + * invalid integers (e.g. 1234...a) or two integers
> > instead of one + * (e.g. 123...1). So lets not allow such large
> > numbers. */ + if (len == TMPBUFLEN - 1)
> > + return -EINVAL;
> > +
> > + if (len < *size && perm_tr_len && !isanyof(*p, perm_tr,
> > perm_tr_len)) + return -EINVAL;
>
> is strspn() better?
>
I don't think it will work out, \0 is an accepted trailer for many of the
function which use this function.
> > +
> > + if (tr && (len < *size))
> > + *tr = *p;
> > +
> > + *buf += len;
> > + *size -= len;
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * proc_put_ulong - coverts an integer to a decimal ASCII formated
> > string + *
> > + * @buf - the user buffer
> > + * @size - the size of the user buffer
> > + * @val - the integer to be converted
> > + * @neg - sign of the number, %TRUE for negative
> > + * @first - if %FALSE will insert a separator character before the
> > number + * @separator - the separator character
> > + *
> > + * In case of success 0 is returned and buf and size are updated with
> > + * the amount of bytes read.
> > + */
> > +static int proc_put_ulong(char __user **buf, size_t *size, unsigned long
> > val, + bool neg, bool first, char separator)
> > +{
> > + int len;
> > + char tmp[TMPBUFLEN], *p = tmp;
> > +
> > + if (!first)
> > + *p++ = separator;
> > + sprintf(p, "%s%lu", neg ? "-" : "", val);
>
> negative should not be supported too.
>
We need negatives in proc_dointvec, again we can change the function name if
it will clear things up.
<snip>
> > int val = *valp;
> > unsigned long lval;
> > if (val < 0) {
> > - *negp = -1;
> > + *negp = 1;
> > lval = (unsigned long)-val;
> > } else {
> > *negp = 0;
>
> These functions have so much lines of code. I think you can make them
> less. Please refer to strsep().
>
Hmm, the input its pretty permissive and maybe this is why it looks so fat, we
need to account for quite a few cases.
Or maybe I spent too much time on this code already and I can't see the simple
solution :)
Thanks,
tavi
^ permalink raw reply
* Re: Strange packet drops with heavy firewalling
From: Eric Dumazet @ 2010-04-09 13:29 UTC (permalink / raw)
To: Benny Amorsen; +Cc: netdev
In-Reply-To: <m3vdc0ztyc.fsf@ursa.amorsen.dk>
Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :
> Thank you very much for the help! I will report back whether it was the
> hash buckets.
OK
You could try :
ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100
(to reduce tx completion irqs)
Before buying multiqueue devices, you also could try net-next-2.6 kernel,
because RPS (Remote Packet Steering) is in.
In your setup, this might help a bit, distribute the packets to all cpus,
with appropriate cache handling.
^ permalink raw reply
* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
From: Tetsuo Handa @ 2010-04-09 13:21 UTC (permalink / raw)
To: amwang, linux-kernel
Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm
In-Reply-To: <20100409101513.5051.97926.sendpatchset@localhost.localdomain>
Hello.
Amerigo Wang wrote:
> Index: linux-2.6/drivers/infiniband/core/cma.c
> ===================================================================
> --- linux-2.6.orig/drivers/infiniband/core/cma.c
> +++ linux-2.6/drivers/infiniband/core/cma.c
> @@ -1980,6 +1980,8 @@ retry:
> /* FIXME: add proper port randomization per like inet_csk_get_port */
> do {
> ret = idr_get_new_above(ps, bind_list, next_port, &port);
> + if (inet_is_reserved_local_port(port))
> + ret = -EAGAIN;
You should not overwrite ret with -EAGAIN when idr_get_new_above() returned
-ENOSPC. I don't know about idr, thus I don't know whether
if (!ret && inet_is_reserved_local_port(port))
ret = -EAGAIN;
is correct or not.
> } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>
> if (ret)
> @@ -2996,10 +2998,13 @@ static int __init cma_init(void)
> {
> int ret, low, high, remaining;
>
> - get_random_bytes(&next_port, sizeof next_port);
> inet_get_local_port_range(&low, &high);
> +again:
> + get_random_bytes(&next_port, sizeof next_port);
> remaining = (high - low) + 1;
> next_port = ((unsigned int) next_port % remaining) + low;
> + if (inet_is_reserved_local_port(next_port))
> + goto again;
>
You should not unconditionally "goto again;".
If all ports were reserved, it will loop forever (CPU stalls).
> cma_wq = create_singlethread_workqueue("rdma_cm");
> if (!cma_wq)
> Index: linux-2.6/net/sctp/socket.c
> ===================================================================
> --- linux-2.6.orig/net/sctp/socket.c
> +++ linux-2.6/net/sctp/socket.c
> @@ -5436,6 +5436,8 @@ static long sctp_get_port_local(struct s
> rover++;
> if ((rover < low) || (rover > high))
> rover = low;
> + if (inet_is_reserved_local_port(rover))
> + continue;
This one needs to be
if (inet_is_reserved_local_port(rover))
goto next_nolock;
> index = sctp_phashfn(rover);
> head = &sctp_port_hashtable[index];
> sctp_spin_lock(&head->lock);
next:
sctp_spin_unlock(&head->lock);
+next_nolock:
} while (--remaining > 0);
otherwise, it will loop forever if all ports were reserved.
^ permalink raw reply
* Re: [Patch 2/3] sysctl: add proc_do_large_bitmap
From: Octavian Purdila @ 2010-04-09 12:35 UTC (permalink / raw)
To: Changli Gao
Cc: Amerigo Wang, linux-kernel, ebiederm, Eric Dumazet, netdev,
Neil Horman, David Miller
In-Reply-To: <s2w412e6f7f1004090333g3b23eb94udb1e6cc3939a07e5@mail.gmail.com>
On Friday 09 April 2010 13:33:29 you wrote:
> On Fri, Apr 9, 2010 at 6:11 PM, Amerigo Wang <amwang@redhat.com> wrote:
> > From: Octavian Purdila <opurdila@ixiacom.com>
> >
> > The new function can be used to read/write large bitmaps via /proc. A
> > comma separated range format is used for compact output and input
> > (e.g. 1,3-4,10-10).
> >
> > Writing into the file will first reset the bitmap then update it
> > based on the given input.
>
> We have bitmap_scnprintf() and bitmap_parse_user(), why invent a new suite?
>
A decimal comma separated ranges seems the best option for this feature, and
unfortunately both of the above functions only support hexadecimal and no
ranges.
^ permalink raw reply
* Re: Strange packet drops with heavy firewalling
From: Benny Amorsen @ 2010-04-09 12:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1270813662.2623.85.camel@edumazet-laptop>
Eric Dumazet <eric.dumazet@gmail.com> writes:
> might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
> RX ring from 200 to 511 if you want more buffers ?)
I tried that already actually. (I didn't expect it to cause traffic
interruption, but it did. Oh well.)
It didn't make a difference, at least not one I could detect from the
number of packet drops and the CPU utilization.
> cat /proc/net/softnet_stat
000002d9 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
42bc8143 00000000 0000024c 00000000 00000000 00000000 00000000 00000000 00000000
0000031b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1c5a35e9 00000000 000005f7 00000000 00000000 00000000 00000000 00000000 00000000
I am not quite sure how to interpret that...
> cat /proc/interrupts
79: 1240 4050590849 1253 1263 PCI-MSI-edge eth0
80: 12 9 14 3613521843 PCI-MSI-edge eth1
> (check eth0 IRQS are delivered to one cpu)
Yes CPU1 handles eth0 and CPU3 handles eth1.
> grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*
nf_conntrack_acct:1
nf_conntrack_buckets:8192
nf_conntrack_checksum:1
nf_conntrack_count:49311
nf_conntrack_events:1
nf_conntrack_events_retry_timeout:15
nf_conntrack_expect_max:2048
nf_conntrack_generic_timeout:600
nf_conntrack_icmp_timeout:30
nf_conntrack_log_invalid:1
nf_conntrack_max:1048576
nf_conntrack_tcp_be_liberal:0
nf_conntrack_tcp_loose:1
nf_conntrack_tcp_max_retrans:3
nf_conntrack_tcp_timeout_close:10
nf_conntrack_tcp_timeout_close_wait:60
nf_conntrack_tcp_timeout_established:432000
nf_conntrack_tcp_timeout_fin_wait:120
nf_conntrack_tcp_timeout_last_ack:30
nf_conntrack_tcp_timeout_max_retrans:300
nf_conntrack_tcp_timeout_syn_recv:60
nf_conntrack_tcp_timeout_syn_sent:120
nf_conntrack_tcp_timeout_time_wait:120
nf_conntrack_tcp_timeout_unacknowledged:300
nf_conntrack_udp_timeout:30
nf_conntrack_udp_timeout_stream:180
> (might need to increase ip_conntrack_buckets)
You got me there. I had forgotten nf_conntrack.hashsize=1048576
and nf_conntrack.expect_hashsize=32768 on the kernel command line. It
was on the hot standby firewall, but not on the primary one. I will do a
failover to the hot standby sometime during the weekend.
It still isn't possible to change without a reboot, is it?
> ethtool -c eth0
> (might change coalesce params to reduce number of irqs)
Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 20
rx-frames: 5
rx-usecs-irq: 0
rx-frames-irq: 5
tx-usecs: 72
tx-frames: 53
tx-usecs-irq: 0
tx-frames-irq: 5
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
I played quite a lot with the parameters but it did not seem to make any
difference. I didn't try adaptive though, but the load is fairly static
so it didn't seem appropriate.
> ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 511
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 200
RX Mini: 0
RX Jumbo: 0
TX: 511
Right now RX is 200, but when it was 511 it didn't seem to make a
difference.
Thank you very much for the help! I will report back whether it was the
hash buckets.
/Benny
^ permalink raw reply
* Re: Strange packet drops with heavy firewalling
From: Eric Dumazet @ 2010-04-09 11:47 UTC (permalink / raw)
To: Benny Amorsen; +Cc: netdev
In-Reply-To: <m339z50x1l.fsf@ursa.amorsen.dk>
Le vendredi 09 avril 2010 à 11:56 +0200, Benny Amorsen a écrit :
> I have a netfilter-box which is dropping packets. ethtool -S counts
> 10-20 rx_discards per second on the interface.
>
> The switch does not have flow control enabled; with flow control enabled
> the rx_discards turn into tx_on_sent which ultimately cause the same
> problem (the load is pretty constant so the switch has to drop the
> packets instead).
>
> perf top shows something like:
> 5201.00 - 6.7% : _spin_unlock_irqrestore
> 4232.00 - 5.5% : finish_task_switch
> 3597.00 - 4.6% : tg3_poll [tg3]
> 3257.00 - 4.2% : handle_IRQ_event
> 2515.00 - 3.2% : tick_nohz_restart_sched_tick
> 1947.00 - 2.5% : nf_ct_tuple_equal
> 1927.00 - 2.5% : tg3_start_xmit [tg3]
> 1879.00 - 2.4% : kmem_cache_alloc_node
> 1625.00 - 2.1% : tick_nohz_stop_sched_tick
> 1619.00 - 2.1% : ipt_do_table
> 1595.00 - 2.1% : ip_route_input
> 1547.00 - 2.0% : kmem_cache_free
> 1474.00 - 1.9% : __alloc_skb
> 1424.00 - 1.8% : fget_light
> 1391.00 - 1.8% : nf_iterate
>
> The rule set is quite large (more than 4000 rules), but organized so
> that each packet only has to traverse a few rules before getting
> accepted or rejected.
>
> When the problem started we were using a different server, an old
> two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on
> one CPU with that server. After replacing the server with a ProLiant
> DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage
> rarely exceeds 10% on any CPU, but the packet loss persists.
>
might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
RX ring from 200 to 511 if you want more buffers ?)
> We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit
> Ethernet PCI Express nics, and the kernel is
> kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably
> installing a better ethernet card, perhaps an Intel 82576-based one, so
> that we can get multiqueue support.
>
Sure, but before this, could you check
cat /proc/net/softnet_stat
cat /proc/interrupts
(check eth0 IRQS are delivered to one cpu)
grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*
(might need to increase ip_conntrack_buckets)
ethtool -c eth0
(might change coalesce params to reduce number of irqs)
ethtool -g eth0
^ permalink raw reply
* [PATCH 1/2] PHY: fix typo in bcm63xx PHY driver table
From: Florian Fainelli @ 2010-04-09 11:04 UTC (permalink / raw)
To: netdev; +Cc: David Miller
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
---
diff --git a/drivers/net/phy/bcm63xx.c b/drivers/net/phy/bcm63xx.c
index ac5e498..c128156 100644
--- a/drivers/net/phy/bcm63xx.c
+++ b/drivers/net/phy/bcm63xx.c
@@ -137,4 +137,4 @@ static struct mdio_device_id bcm63xx_tbl[] = {
{ }
};
-MODULE_DEVICE_TABLE(mdio, bcm64xx_tbl);
+MODULE_DEVICE_TABLE(mdio, bcm63xx_tbl);
^ permalink raw reply related
* [PATCH 2/2] bcm63xx_enet: do not overwrite ENET_CTL_REG value
From: Florian Fainelli @ 2010-04-09 11:04 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Maxime Bizon
bcm_enet_hw_preinit will correctly set values in ENET_CTL_REG for internal
or external MII operations, however, bcm_enet_open will blindly overwrite the
ENET_CTL_REG register value and thus we will loose any changes to it that
were made in bcm_enet_hw_preinit, rendering external MII operations non-working.
This would lead to the driver not being able to check for link availability on
external PHY setups, and thus we would never get to sending packets because
link was down from the driver side.
This was completely un-noticed because all boards out there but BCM6338-based
ones use internal phy on their enet0 interface.
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
---
diff --git a/drivers/net/bcm63xx_enet.c b/drivers/net/bcm63xx_enet.c
index 5173340..14ab4dc 100644
--- a/drivers/net/bcm63xx_enet.c
+++ b/drivers/net/bcm63xx_enet.c
@@ -958,7 +958,9 @@ static int bcm_enet_open(struct net_device *dev)
/* all set, enable mac and interrupts, start dma engine and
* kick rx dma channel */
wmb();
- enet_writel(priv, ENET_CTL_ENABLE_MASK, ENET_CTL_REG);
+ val = enet_readl(priv, ENET_CTL_REG);
+ val |= ENET_CTL_ENABLE_MASK;
+ enet_writel(priv, val, ENET_CTL_REG);
enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG);
enet_dma_writel(priv, ENETDMA_CHANCFG_EN_MASK,
ENETDMA_CHANCFG_REG(priv->rx_chan));
^ permalink raw reply related
* Re: [RFC PATCH 0/2] netdev: Add tracepoint to network/driver interface
From: Neil Horman @ 2010-04-09 11:04 UTC (permalink / raw)
To: Koki Sanagi; +Cc: netdev, izumi.taku, kaneshige.kenji, davem
In-Reply-To: <4BBED951.8040406@jp.fujitsu.com>
On Fri, Apr 09, 2010 at 04:37:53PM +0900, Koki Sanagi wrote:
> These patches add tracepoints to network/driver interface.
>
> These tracepoints are helpful to investigate whether a packet passes or not.
> For example, when Heart Beat is disconnected, that information is helpful
> to investigate the cause is whether driver/device side or not.
>
> An output is below.
>
> sshd-2443 [001] 68238.415621: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
> <idle>-0 [001] 68238.417058: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=52
> <idle>-0 [001] 68238.704363: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=100
> sshd-2443 [001] 68238.705459: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
> <idle>-0 [001] 68238.706891: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=52
> <idle>-0 [001] 68238.878736: netdev_receive_skb: dev=eth3 skbaddr=f3c81540 len=100
> sshd-2443 [001] 68238.880361: netdev_start_xmit: dev=eth3 skbaddr=f3db5138 len=114
>
> As other use case I have, we can get throughput per interface with some sort of
> perf scripts. I plan to create it.
>
> Thanks
> Koki Sanagi
>
You can get a reasonable estimate of per-interface throughput using ethtool or
even ifconfig in a script. What are the tracepoints needed for that? Don't get
me wrong, I think these tracepoints could have some potential use thats not
covered by other tools, I just don't see the above as a conclusive reason to add
them.
Regards
Neil
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [Patch 1/3] sysctl: refactor integer handling proc code
From: Changli Gao @ 2010-04-09 10:49 UTC (permalink / raw)
To: Amerigo Wang
Cc: linux-kernel, Octavian Purdila, Eric Dumazet, netdev, Neil Horman,
David Miller, ebiederm
In-Reply-To: <20100409101452.5051.74050.sendpatchset@localhost.localdomain>
On Fri, Apr 9, 2010 at 6:11 PM, Amerigo Wang <amwang@redhat.com> wrote:
>
> From: Octavian Purdila <opurdila@ixiacom.com>
>
> As we are about to add another integer handling proc function a little
> bit of cleanup is in order: add a few helper functions to improve code
> readability and decrease code duplication.
>
> In the process a bug is also fixed: if the user specifies a number
> with more then 20 digits it will be interpreted as two integers
> (e.g. 10000...13 will be interpreted as 100.... and 13).
>
> Behavior for EFAULT handling was changed as well. Previous to this
> patch, when an EFAULT error occurred in the middle of a write
> operation, although some of the elements were set, that was not
> acknowledged to the user (by shorting the write and returning the
> number of bytes accepted). EFAULT is now treated just like any other
> errors by acknowledging the amount of bytes accepted.
>
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> Signed-off-by: WANG Cong <amwang@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>
> Index: linux-2.6/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.orig/kernel/sysctl.c
> +++ linux-2.6/kernel/sysctl.c
> @@ -2040,8 +2040,148 @@ int proc_dostring(struct ctl_table *tabl
> buffer, lenp, ppos);
> }
>
> +static int proc_skip_wspace(char __user **buf, size_t *size)
> +{
> + char c;
> +
> + while (*size) {
> + if (get_user(c, *buf))
> + return -EFAULT;
> + if (!isspace(c))
> + break;
> + (*size)--;
> + (*buf)++;
> + }
> +
> + return 0;
> +}
> +
> +static bool isanyof(char c, const char *v, unsigned len)
> +{
> + int i;
> +
> + if (!len)
> + return false;
> +
> + for (i = 0; i < len; i++)
> + if (c == v[i])
> + break;
> + if (i == len)
> + return false;
> +
> + return true;
> +}
> +
> +#define TMPBUFLEN 22
> +/**
> + * proc_get_ulong - reads an ASCII formated integer from a user buffer
> + *
> + * @buf - user buffer
> + * @size - size of the user buffer
> + * @val - this is where the number will be stored
> + * @neg - set to %TRUE if number is negative
> + * @perm_tr - a vector which contains the allowed trailers
> + * @perm_tr_len - size of the perm_tr vector
> + * @tr - pointer to store the trailer character
> + *
> + * In case of success 0 is returned and buf and size are updated with
> + * the amount of bytes read. If tr is non NULL and a trailing
> + * character exist (size is non zero after returning from this
> + * function) tr is updated with the trailing character.
> + */
> +static int proc_get_ulong(char __user **buf, size_t *size,
> + unsigned long *val, bool *neg,
> + const char *perm_tr, unsigned perm_tr_len, char *tr)
> +{
> + int len;
> + char *p, tmp[TMPBUFLEN];
> +
> + if (!*size)
> + return -EINVAL;
> +
> + len = *size;
> + if (len > TMPBUFLEN-1)
> + len = TMPBUFLEN-1;
> +
> + if (copy_from_user(tmp, *buf, len))
> + return -EFAULT;
> +
> + tmp[len] = 0;
> + p = tmp;
> + if (*p == '-' && *size > 1) {
> + *neg = 1;
> + p++;
> + } else
> + *neg = 0;
the function name implies that it is used to parse unsigned long, so
negative value should not be supported.
> + if (!isdigit(*p))
> + return -EINVAL;
It seems that ledding white space should be allowed, so this check
isn't needed, and simple_strtoul can handle it.
> +
> + *val = simple_strtoul(p, &p, 0);
> +
> + len = p - tmp;
> +
> + /* We don't know if the next char is whitespace thus we may accept
> + * invalid integers (e.g. 1234...a) or two integers instead of one
> + * (e.g. 123...1). So lets not allow such large numbers. */
> + if (len == TMPBUFLEN - 1)
> + return -EINVAL;
> +
> + if (len < *size && perm_tr_len && !isanyof(*p, perm_tr, perm_tr_len))
> + return -EINVAL;
is strspn() better?
> +
> + if (tr && (len < *size))
> + *tr = *p;
> +
> + *buf += len;
> + *size -= len;
> +
> + return 0;
> +}
> +
> +/**
> + * proc_put_ulong - coverts an integer to a decimal ASCII formated string
> + *
> + * @buf - the user buffer
> + * @size - the size of the user buffer
> + * @val - the integer to be converted
> + * @neg - sign of the number, %TRUE for negative
> + * @first - if %FALSE will insert a separator character before the number
> + * @separator - the separator character
> + *
> + * In case of success 0 is returned and buf and size are updated with
> + * the amount of bytes read.
> + */
> +static int proc_put_ulong(char __user **buf, size_t *size, unsigned long val,
> + bool neg, bool first, char separator)
> +{
> + int len;
> + char tmp[TMPBUFLEN], *p = tmp;
> +
> + if (!first)
> + *p++ = separator;
> + sprintf(p, "%s%lu", neg ? "-" : "", val);
negative should not be supported too.
> + len = strlen(tmp);
> + if (len > *size)
> + len = *size;
> + if (copy_to_user(*buf, tmp, len))
> + return -EFAULT;
> + *size -= len;
> + *buf += len;
> + return 0;
> +}
> +#undef TMPBUFLEN
> +
> +static int proc_put_char(char __user **buf, size_t *size, char c)
> +{
> + if (*size) {
> + if (put_user(c, *buf))
> + return -EFAULT;
> + (*size)--, (*buf)++;
> + }
> + return 0;
> +}
>
> -static int do_proc_dointvec_conv(int *negp, unsigned long *lvalp,
> +static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
> int *valp,
> int write, void *data)
> {
> @@ -2050,7 +2190,7 @@ static int do_proc_dointvec_conv(int *ne
> } else {
> int val = *valp;
> if (val < 0) {
> - *negp = -1;
> + *negp = 1;
> *lvalp = (unsigned long)-val;
> } else {
> *negp = 0;
> @@ -2060,20 +2200,18 @@ static int do_proc_dointvec_conv(int *ne
> return 0;
> }
>
> +static const char proc_wspace_sep[] = { ' ', '\t', '\n', 0 };
> +
> static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
> - int write, void __user *buffer,
> + int write, void __user *_buffer,
> size_t *lenp, loff_t *ppos,
> - int (*conv)(int *negp, unsigned long *lvalp, int *valp,
> + int (*conv)(bool *negp, unsigned long *lvalp, int *valp,
> int write, void *data),
> void *data)
> {
> -#define TMPBUFLEN 21
> - int *i, vleft, first = 1, neg;
> - unsigned long lval;
> - size_t left, len;
> -
> - char buf[TMPBUFLEN], *p;
> - char __user *s = buffer;
> + int *i, vleft, first = 1, err = 0;
> + size_t left;
> + char __user *buffer = (char __user *) _buffer;
>
> if (!tbl_data || !table->maxlen || !*lenp ||
> (*ppos && !write)) {
> @@ -2089,88 +2227,48 @@ static int __do_proc_dointvec(void *tbl_
> conv = do_proc_dointvec_conv;
>
> for (; left && vleft--; i++, first=0) {
> - if (write) {
> - while (left) {
> - char c;
> - if (get_user(c, s))
> - return -EFAULT;
> - if (!isspace(c))
> - break;
> - left--;
> - s++;
> - }
> - if (!left)
> - break;
> - neg = 0;
> - len = left;
> - if (len > sizeof(buf) - 1)
> - len = sizeof(buf) - 1;
> - if (copy_from_user(buf, s, len))
> - return -EFAULT;
> - buf[len] = 0;
> - p = buf;
> - if (*p == '-' && left > 1) {
> - neg = 1;
> - p++;
> - }
> - if (*p < '0' || *p > '9')
> - break;
> -
> - lval = simple_strtoul(p, &p, 0);
> + unsigned long lval;
> + bool neg;
>
> - len = p-buf;
> - if ((len < left) && *p && !isspace(*p))
> + if (write) {
> + err = proc_skip_wspace(&buffer, &left);
> + if (err)
> + return err;
> + err = proc_get_ulong(&buffer, &left, &lval, &neg,
> + proc_wspace_sep,
> + sizeof(proc_wspace_sep), NULL);
> + if (err)
> break;
> - s += len;
> - left -= len;
> -
> - if (conv(&neg, &lval, i, 1, data))
> + if (conv(&neg, &lval, i, 1, data)) {
> + err = -EINVAL;
> break;
> + }
> } else {
> - p = buf;
> - if (!first)
> - *p++ = '\t';
> -
> - if (conv(&neg, &lval, i, 0, data))
> + if (conv(&neg, &lval, i, 0, data)) {
> + err = -EINVAL;
> break;
> -
> - sprintf(p, "%s%lu", neg ? "-" : "", lval);
> - len = strlen(buf);
> - if (len > left)
> - len = left;
> - if(copy_to_user(s, buf, len))
> - return -EFAULT;
> - left -= len;
> - s += len;
> - }
> - }
> -
> - if (!write && !first && left) {
> - if(put_user('\n', s))
> - return -EFAULT;
> - left--, s++;
> - }
> - if (write) {
> - while (left) {
> - char c;
> - if (get_user(c, s++))
> - return -EFAULT;
> - if (!isspace(c))
> + }
> + err = proc_put_ulong(&buffer, &left, lval, neg, first,
> + '\t');
> + if (err)
> break;
> - left--;
> }
> }
> +
> + if (!write && !first && left && !err)
> + err = proc_put_char(&buffer, &left, '\n');
> + if (write && !err)
> + err = proc_skip_wspace(&buffer, &left);
> if (write && first)
> - return -EINVAL;
> + return err ? : -EINVAL;
> *lenp -= left;
> *ppos += *lenp;
> return 0;
> -#undef TMPBUFLEN
> }
>
> static int do_proc_dointvec(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos,
> - int (*conv)(int *negp, unsigned long *lvalp, int *valp,
> + int (*conv)(bool *negp, unsigned long *lvalp, int *valp,
> int write, void *data),
> void *data)
> {
> @@ -2238,8 +2336,8 @@ struct do_proc_dointvec_minmax_conv_para
> int *max;
> };
>
> -static int do_proc_dointvec_minmax_conv(int *negp, unsigned long *lvalp,
> - int *valp,
> +static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
> + int *valp,
> int write, void *data)
> {
> struct do_proc_dointvec_minmax_conv_param *param = data;
> @@ -2252,7 +2350,7 @@ static int do_proc_dointvec_minmax_conv(
> } else {
> int val = *valp;
> if (val < 0) {
> - *negp = -1;
> + *negp = 1;
> *lvalp = (unsigned long)-val;
> } else {
> *negp = 0;
> @@ -2290,17 +2388,15 @@ int proc_dointvec_minmax(struct ctl_tabl
> }
>
> static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int write,
> - void __user *buffer,
> + void __user *_buffer,
> size_t *lenp, loff_t *ppos,
> unsigned long convmul,
> unsigned long convdiv)
> {
> -#define TMPBUFLEN 21
> - unsigned long *i, *min, *max, val;
> - int vleft, first=1, neg;
> - size_t len, left;
> - char buf[TMPBUFLEN], *p;
> - char __user *s = buffer;
> + unsigned long *i, *min, *max;
> + int vleft, first = 1, err = 0;
> + size_t left;
> + char __user *buffer = (char __user *) _buffer;
>
> if (!data || !table->maxlen || !*lenp ||
> (*ppos && !write)) {
> @@ -2315,82 +2411,42 @@ static int __do_proc_doulongvec_minmax(v
> left = *lenp;
>
> for (; left && vleft--; i++, min++, max++, first=0) {
> + unsigned long val;
> +
> if (write) {
> - while (left) {
> - char c;
> - if (get_user(c, s))
> - return -EFAULT;
> - if (!isspace(c))
> - break;
> - left--;
> - s++;
> - }
> - if (!left)
> - break;
> - neg = 0;
> - len = left;
> - if (len > TMPBUFLEN-1)
> - len = TMPBUFLEN-1;
> - if (copy_from_user(buf, s, len))
> - return -EFAULT;
> - buf[len] = 0;
> - p = buf;
> - if (*p == '-' && left > 1) {
> - neg = 1;
> - p++;
> - }
> - if (*p < '0' || *p > '9')
> - break;
> - val = simple_strtoul(p, &p, 0) * convmul / convdiv ;
> - len = p-buf;
> - if ((len < left) && *p && !isspace(*p))
> + bool neg;
> +
> + err = proc_skip_wspace(&buffer, &left);
> + if (err)
> + return err;
> + err = proc_get_ulong(&buffer, &left, &val, &neg,
> + proc_wspace_sep,
> + sizeof(proc_wspace_sep), NULL);
> + if (err)
> break;
> if (neg)
> - val = -val;
> - s += len;
> - left -= len;
> -
> - if(neg)
> continue;
> if ((min && val < *min) || (max && val > *max))
> continue;
> *i = val;
> } else {
> - p = buf;
> - if (!first)
> - *p++ = '\t';
> - sprintf(p, "%lu", convdiv * (*i) / convmul);
> - len = strlen(buf);
> - if (len > left)
> - len = left;
> - if(copy_to_user(s, buf, len))
> - return -EFAULT;
> - left -= len;
> - s += len;
> - }
> - }
> -
> - if (!write && !first && left) {
> - if(put_user('\n', s))
> - return -EFAULT;
> - left--, s++;
> - }
> - if (write) {
> - while (left) {
> - char c;
> - if (get_user(c, s++))
> - return -EFAULT;
> - if (!isspace(c))
> + val = convdiv * (*i) / convmul;
> + err = proc_put_ulong(&buffer, &left, val, 0, first,
> + '\t');
> + if (err)
> break;
> - left--;
> }
> }
> +
> + if (!write && !first && left && !err)
> + err = proc_put_char(&buffer, &left, '\n');
> + if (write && !err)
> + err = proc_skip_wspace(&buffer, &left);
> if (write && first)
> - return -EINVAL;
> + return err ? : -EINVAL;
> *lenp -= left;
> *ppos += *lenp;
> return 0;
> -#undef TMPBUFLEN
> }
>
> static int do_proc_doulongvec_minmax(struct ctl_table *table, int write,
> @@ -2451,7 +2507,7 @@ int proc_doulongvec_ms_jiffies_minmax(st
> }
>
>
> -static int do_proc_dointvec_jiffies_conv(int *negp, unsigned long *lvalp,
> +static int do_proc_dointvec_jiffies_conv(bool *negp, unsigned long *lvalp,
> int *valp,
> int write, void *data)
> {
> @@ -2463,7 +2519,7 @@ static int do_proc_dointvec_jiffies_conv
> int val = *valp;
> unsigned long lval;
> if (val < 0) {
> - *negp = -1;
> + *negp = 1;
> lval = (unsigned long)-val;
> } else {
> *negp = 0;
> @@ -2474,7 +2530,7 @@ static int do_proc_dointvec_jiffies_conv
> return 0;
> }
>
> -static int do_proc_dointvec_userhz_jiffies_conv(int *negp, unsigned long *lvalp,
> +static int do_proc_dointvec_userhz_jiffies_conv(bool *negp, unsigned long *lvalp,
> int *valp,
> int write, void *data)
> {
> @@ -2486,7 +2542,7 @@ static int do_proc_dointvec_userhz_jiffi
> int val = *valp;
> unsigned long lval;
> if (val < 0) {
> - *negp = -1;
> + *negp = 1;
> lval = (unsigned long)-val;
> } else {
> *negp = 0;
> @@ -2497,7 +2553,7 @@ static int do_proc_dointvec_userhz_jiffi
> return 0;
> }
>
> -static int do_proc_dointvec_ms_jiffies_conv(int *negp, unsigned long *lvalp,
> +static int do_proc_dointvec_ms_jiffies_conv(bool *negp, unsigned long *lvalp,
> int *valp,
> int write, void *data)
> {
> @@ -2507,7 +2563,7 @@ static int do_proc_dointvec_ms_jiffies_c
> int val = *valp;
> unsigned long lval;
> if (val < 0) {
> - *negp = -1;
> + *negp = 1;
> lval = (unsigned long)-val;
> } else {
> *negp = 0;
These functions have so much lines of code. I think you can make them
less. Please refer to strsep().
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply
* Re: [Patch 2/3] sysctl: add proc_do_large_bitmap
From: Changli Gao @ 2010-04-09 10:33 UTC (permalink / raw)
To: Amerigo Wang
Cc: linux-kernel, Octavian Purdila, ebiederm, Eric Dumazet, netdev,
Neil Horman, David Miller
In-Reply-To: <20100409101503.5051.3805.sendpatchset@localhost.localdomain>
On Fri, Apr 9, 2010 at 6:11 PM, Amerigo Wang <amwang@redhat.com> wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
>
> The new function can be used to read/write large bitmaps via /proc. A
> comma separated range format is used for compact output and input
> (e.g. 1,3-4,10-10).
>
> Writing into the file will first reset the bitmap then update it
> based on the given input.
>
We have bitmap_scnprintf() and bitmap_parse_user(), why invent a new suite?
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply
* [PATCH] stmmac: updated the drv module version
From: Giuseppe CAVALLARO @ 2010-04-09 10:24 UTC (permalink / raw)
To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1270808662-7115-7-git-send-email-peppe.cavallaro@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/stmmac/stmmac.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index 1a6eb7b..ebebc64 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -20,7 +20,7 @@
Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
*******************************************************************************/
-#define DRV_MODULE_VERSION "Jan_2010"
+#define DRV_MODULE_VERSION "Apr_2010"
#include <linux/stmmac.h>
#include "common.h"
--
1.6.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox