Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
From: Junchang Wang @ 2010-12-17  6:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Fenghua Yu, David S. Miller, John Fastabend, Xinan Tang, netdev,
	linux-kernel
In-Reply-To: <1292475627.2603.39.camel@edumazet-laptop>

On Thu, Dec 16, 2010 at 1:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 16 décembre 2010 à 09:52 +0800, Junchang Wang a écrit :
>> Commit 564824b0c52c34692d had been used in the experiments, but the problem
>> remained unsolved.
>>
>> SLUB was used, and both servers were equipped with 8G physical memory.
>> Is there any
>> additional information I can provide?
>>
>
> Yes, sure, you could provide a description of the bench you used, and
> data you gathered to make the conclusion that NUMA was a problem.
>
Under the current circumstances (1Mpps), we can hardly see side effects
from memory allocator. With higher speed (say, 5Mpps with this patch set),
the problem emerged.

I'll continue this work after the patch set is done.


Thanks.
-- 
--Junchang

^ permalink raw reply

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
From: Junchang Wang @ 2010-12-17  6:22 UTC (permalink / raw)
  To: Changli Gao
  Cc: Fenghua Yu, Eric Dumazet, David S. Miller, Fastabend, John R,
	Tang, Xinan, netdev, linux-kernel
In-Reply-To: <AANLkTingt6SM+uasicwPSJRE5vYpvJ=GyFf3bbKhO72G@mail.gmail.com>

On Thu, Dec 16, 2010 at 9:28 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>>
>> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
>> specify right SKF_AD_QUEUE.
>
> It is wrong. AFAIK, you can get the queue number through
> /sys/class/net/eth*/queues/ or /proc/interrupts
>

Valuable comment. Thanks.

>
> If you turn to SKF_AD_QUEUE, I think no patch for kernel is needed.
>
This patch set is about parallelization of socket interfaces to gain
performance boost (say, from 1Mpps to around 5Mpps), rather than
simply bounding socket to cpu/queue. Therefore, it does worth having.


Thanks.
-- 
--Junchang

^ permalink raw reply

* Re: [PATCH V7 3/8] posix clocks: introduce dynamic clocks
From: Richard Cochran @ 2010-12-17  6:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, netdev, Alan Cox, Arnd Bergmann,
	Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
	Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <alpine.LFD.2.00.1012161939340.12146@localhost6.localdomain6>

On Thu, Dec 16, 2010 at 09:56:38PM +0100, Thomas Gleixner wrote:
> > +void *posix_clock_private(struct file *fp);
> 
> Leftover ? There is neither a caller nor an implementation

Yes, you are right. Sorry.

This patch series has been through several contortions, and this
current one won't be the last!

As to the other comments, thanks for your careful review. I will adapt
the patch set accordingly.

Richard

^ permalink raw reply

* Re: [PATCH 1/3] Kernel interfaces for multiqueue aware socket
From: Eric Dumazet @ 2010-12-17  6:50 UTC (permalink / raw)
  To: Junchang Wang
  Cc: Changli Gao, Fenghua Yu, David S. Miller, Fastabend, John R,
	Tang, Xinan, netdev, linux-kernel
In-Reply-To: <AANLkTi=oXCqPHYs9reF9WL46Wxi_b8vL_NMSsAJJ34wD@mail.gmail.com>

Le vendredi 17 décembre 2010 à 14:22 +0800, Junchang Wang a écrit :
> On Thu, Dec 16, 2010 at 9:28 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> > On Thu, Dec 16, 2010 at 9:14 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> >>
> >> SKF_AD_QUEUE doesn't know number of rx queues. Thus user application can't
> >> specify right SKF_AD_QUEUE.
> >
> > It is wrong. AFAIK, you can get the queue number through
> > /sys/class/net/eth*/queues/ or /proc/interrupts
> >
> 
> Valuable comment. Thanks.
> 
> >
> > If you turn to SKF_AD_QUEUE, I think no patch for kernel is needed.
> >
> This patch set is about parallelization of socket interfaces to gain
> performance boost (say, from 1Mpps to around 5Mpps), rather than
> simply bounding socket to cpu/queue. Therefore, it does worth having.
> 

Definitely, but this needs to be designed so that it can be used by even
dumb applications :)




^ permalink raw reply

* [NETWORK] Firmware file for tehuti
From: Joe Jin @ 2010-12-17  7:04 UTC (permalink / raw)
  To: Alexander Indenbaum, Andy Gospodarek
  Cc: netdev, linux-kernel, Guru Anbalagane, greg.marsden@oracle.com,
	DuanZhenzhong, Joe Jin

Hi,

Regarding firmware for driver tehuti, it request firmware name is
tehuti/firmware.bin,
but from kernel source found no such file but have tehuti/bdx.bin.ihex,
so the firmware
should be bdx.bin rather than firmware.bin?

Thanks,
Joe

-- 
Oracle <http://www.oracle.com>
Joe Jin | Team Leader, Software Development | +8610.8278.6295
ORACLE | Linux and Virtualization
Incubator Building 2-A ZPark | Beijing China, 100094

^ permalink raw reply

* Re: [PATCH V7 4/8] posix clocks: hook dynamic clocks into system calls
From: Richard Cochran @ 2010-12-17  7:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, netdev, Alan Cox, Arnd Bergmann,
	Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
	Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <alpine.LFD.2.00.1012162156580.12146@localhost6.localdomain6>

On Fri, Dec 17, 2010 at 12:20:44AM +0100, Thomas Gleixner wrote:
> On Thu, 16 Dec 2010, Richard Cochran wrote:
> > +
> > +static inline bool clock_is_static(const clockid_t id)
> > +{
> > +	if (0 == (id & ~CLOCKFD_MASK))
> > +		return true;
> > +	if (CLOCKFD == (id & CLOCKFD_MASK))
> > +		return false;
> 
> Please use the usual kernel notation.

Sorry, what do you mean?

> >  	return CLOCK_DISPATCH(which_clock, clock_set, (which_clock, &new_tp));
> 
> I really start to wonder whether we should cleanup that whole
> CLOCK_DISPATCH macro magic and provide real inline functions for
> each of the clock functions instead.

I'm glad you suggested that.

IMHO, the CLOCK_DISPATCH thing is calling out to be eliminated, but I
didn't dare to take that upon myself.

> Please don't tell me now that we could even hack this into
> CLOCK_DISPATCH. *shudder*

;^)

> > +int pc_clock_adjtime(struct posix_clock *clk, struct timex *tx)
> > +{
> > +	int err;
> > +
> > +	mutex_lock(&clk->mux);
> > +
> > +	if (clk->zombie)
> 
> Uuurgh. That explains the zombie thing. That's really horrible and we
> can be more clever. When we leave the file lookup to the pc_timer_*
> functions, then we can simply do:
...
> That get's rid of your life time problems, of clk->mutex, clock->zombie
> and makes the code simpler and more robust. And it avoids the whole
> mess in posix-timers.c as well.

The code in the patch set is modeled after a USB driver, namely
drivers/usb/class/usbtmc.c. Alan Cox had brought up the example of a
PTP Hardware Clock appearing as a USB device. So the device might
suddenly disappear, and the zombie field is supposed to cover the case
where the hardware no longer exists, but the file pointer is still
valid.

I'll take a closer look at your suggestion...

> The whole point of this exercise is to provide dynamic clock ids and
> make them available via the standard posix timer interface and aside
> of that have standard file operations for these clocks to provide
> special purpose ioctls, mmap and whatever. And to make it a bit more
> complex these must be removable modules.

Yes, and hot pluggable, too.

> I need to read back the previous discussions, but maybe someone can
> fill me in why we can't make these dynamic things not just live in
> posix_clocks[MAX_CLOCKS ... MAX_DYNAMIC_CLOCKS] (there is no
> requirement to have an unlimited number of those) and just get a
> mapping from the fd to the clock_id? That creates a different set of
> life time problems, but no real horrible ones.

I would summarize the discussion like this:

Alan Cox was strongly in favor of using a regular file descriptor as
the reference to the dynamic clock.

John Stultz thought it wouldn't be too bad to cycle though a number of
static ids, being careful not to every assign the same static id (in
series) to two different clocks.

I implemented Alan's idea, since it seemed like maintaining the
mapping between clocks and static ids would be extra work, but without
any real benefit.

Thanks again for your review,
Richard

^ permalink raw reply

* Possible sequence number corruption? 2.6.32-24
From: George B. @ 2010-12-17  7:20 UTC (permalink / raw)
  To: netdev

I have a box running an Ubuntu kernel (2.6.32-24-server #43-Ubuntu SMP)

It has 4 IP addresses on a bond interface.  We noticed today a strange
problem with a load balancer attempting to connect to one of the
addresses.  We would see a syn from the balancer, the syn-ack goes
back from the Linux box and then an immediate reset from the balancer
(Citrix Netscaler running 8.1).

After taking a packet capture on both machines I see this:

Netscaler sends (or thinks it sends)

Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 4246152065, Len: 0

Linux box sees this:

Transmission Control Protocol, Src Port: 33860 (33860), Dst Port: 7080
(7080), Seq: 1098970366, Len: 0

Note the sequence number has changed.  This is a flat layer2 network
between them.

Linux replies with:

Transmission Control Protocol, Src Port: 7080 (7080), Dst Port: 33860
(33860), Seq: 1187929616, Ack: 1098970367, Len: 0

And the Netscaler, seeing the out of whack sequence number sends a RST.

Now if I connect to that same IP/Port from another Linux box, it works
fine.  If I connect from the load balancer to a different IP on the
same box it works fine.  If I connect from a different load balancer
to the troublesome IP/port, I get the same result.  It seems unlikely
that two different load balancers would be scrambled in exactly the
same fashion and only to one IP on the box.  In fact, there are two
different servers exhibiting the same problem when either of the load
balancers attempts to connect and it works great to other IP addresses
on those same two servers.  We have not experienced this problem with
these load balancers or kernel version on Linux before but it could be
some "magic" combination of numbers or something, I have seen stranger
things.

The linux interface is a bond interface in balance-xor mode with the
default transmit hash..  The NICs are bnx2 driver v2.0.2

I am not physically at the network so I can not mirror a port and see
what exactly is on the wire but I have taken packet captures using
tcpdump on the machines at both ends that I can provide if desired.
Has anyone else seen or heard of any odd sequence number corruption in
a similar configuration?

Thanks,

George

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Jarek Poplawski @ 2010-12-17  7:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <1292542305.2655.25.camel@edumazet-laptop>

On Fri, Dec 17, 2010 at 12:31:45AM +0100, Eric Dumazet wrote:
> Le jeudi 16 décembre 2010 ?? 23:42 +0100, Jarek Poplawski a écrit :
> > On Thu, Dec 16, 2010 at 11:26:03PM +0100, Eric Dumazet wrote:
> > > Le jeudi 16 décembre 2010 ?? 23:08 +0100, Jarek Poplawski a écrit :
> > > 
> > > > Hmm... Do you expect more people start debugging SFQ or I missed your
> > > > point? ;-) Maybe such a change would be reasonable on a cloned skb?
> > > 
> > > My point was to permit an admin to check his ingress shaping works or
> > > not. In this respect, netem was a specialization of a general problem.
> > > 
> > > We had a prior discussion on timestamping skb in RX path, when RPS came
> > > in : Should we take timestamp before RPS or after. At that time we added
> > > a sysctl. Not sure it was the right choice :-(
> > > 
> > > I feel tcpdump (on TX side) should really display time of packet right
> > > before being given to device, but this is probably a matter of taste.
> > 
> > It might be reasonable unless it changes data for later users. That's
> > why I mentioned cloning.
> > 
> > > 
> > > Before commit 8caf153974f2 (net: sch_netem: Fix an inconsistency in
> > > ingress netem timestamps.), this is what was done.
> > 
> > Then why don't you try to let turn it off in netem, where it only
> > matters?

Forget this advice - I lost the point how you tested that.

> Because, if you read again my patch, you'll see :
> 
> #ifdef CONFIG_NET_CLS_ACT
>        if (!skb->tstamp.tv64 || (G_TC_FROM(skb->tc_verd) & AT_INGRESS))
>                 net_timestamp_set(skb);
> #else
> 
> So : 
> 
> If we are handling an INGRESS packet, we force a timestamp renew.

It is wrong because it brings back the inconsistency with ping etc.
described by Alex Sidorenko in the changelog of netem patch, but
use normal (not netem) ingress scheduling (ping results shouldn't
depend on sniffers).

> 
> Therefore :
> 
> We dont need in netem_dequeue() to force :
> 
> -#ifdef CONFIG_NET_CLS_ACT
> -                       /*
> -                        * If it's at ingress let's pretend the delay is
> -                        * from the network (tstamp will be updated).
> -                        */
> -                       if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
> -                               skb->tstamp.tv64 = 0;
> -#endif
> 
> Since :
> 
> We are going to renew timestamp anyway.
> 
> Conclusion :
> 
> I am eliminating dead code.
> 
> Is that OK ?

Not to me.

Thanks,
Jarek P.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Eric Dumazet @ 2010-12-17  8:08 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <20101217073015.GA6907@ff.dom.local>

Le vendredi 17 décembre 2010 à 07:30 +0000, Jarek Poplawski a écrit :

> It is wrong because it brings back the inconsistency with ping etc.
> described by Alex Sidorenko in the changelog of netem patch, but
> use normal (not netem) ingress scheduling (ping results shouldn't
> depend on sniffers).

Care to explain why only netem should take care of this delaying
problem ?
If a packet is delayed on other qdisc, dont we have the same problem ?

Right now, ping lies, giving different results if a sniffer is active or
not.

Care to suggest an alternative patch, because I am lost at this point ?

Thanks

^ permalink raw reply

* Re: [PATCH] via-velocity: fix the WOL bug on 1000M full duplex forced mode
From: David Lv @ 2010-12-17  8:25 UTC (permalink / raw)
  To: Jan Ceuleers, netdev, romieu
In-Reply-To: <4D0A5BBF.7000400@computer.org>

I am very Sorry to cause you some trouble.
There are some differences with this version and previous version.

} else if (SPD_DPX_1000_FULL != pInfo->hw.sOpts.spd_dpx) {
+		if (SPD_DPX_AUTO == pInfo->hw.sOpts.spd_dpx) {

changed to

} else if (SPD_DPX_1000_FULL != vptr->options.spd_dpx) {
+		if (SPD_DPX_AUTO == vptr->options.spd_dpx) {




2010/12/17 Jan Ceuleers <jan.ceuleers@computer.org>:
> On 16/12/10 10:33, David Lv wrote:
>>
>> This patch isn't based on the first patch of the sleep speed 10M.
>> The VIA velocity card can't be waken up by WOL tool on 1000M full
>> duplex forced mode.
>> This patch fixes the bug.
>> Thanks!
>
> This patch is whitespace-damaged. Your previous version (14 December)
> wasn't. Why did you resend? Is there a difference?
>
> Thx, Jan
>

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Jarek Poplawski @ 2010-12-17  8:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <1292573297.2655.42.camel@edumazet-laptop>

On Fri, Dec 17, 2010 at 09:08:17AM +0100, Eric Dumazet wrote:
> Le vendredi 17 décembre 2010 ?? 07:30 +0000, Jarek Poplawski a écrit :
> 
> > It is wrong because it brings back the inconsistency with ping etc.
> > described by Alex Sidorenko in the changelog of netem patch, but
> > use normal (not netem) ingress scheduling (ping results shouldn't
> > depend on sniffers).
> 
> Care to explain why only netem should take care of this delaying
> problem ?
> If a packet is delayed on other qdisc, dont we have the same problem ?

netem was treated as a special case just to pretend (lie) about the
net outside (still not depending on sniffers), but it's hard to
believe there are "normal" ping users after netem.

> 
> Right now, ping lies, giving different results if a sniffer is active or
> not.
> 
> Care to suggest an alternative patch, because I am lost at this point ?

Just what I wrote earlier: consider one additional cloning in
dev_queue_xmit_nit or maybe resetting the timestamp with act_skbedit?

Thanks,
Jarek P.

^ permalink raw reply

* [PATCH NEXT 3/3] qlcnic: reset pci function unconditionally during probe
From: Amit Kumar Salecha @ 2010-12-17  8:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Rajesh Borundia
In-Reply-To: <1292576342-1359-1-git-send-email-amit.salecha@qlogic.com>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

Some boot code drivers dont have cleanup routine, so pci function
remains in unknown state prior to driver load. So during driver load
issue FLR unconditionally.

Update driver version to 5.0.14.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    4 ++--
 drivers/net/qlcnic/qlcnic_main.c |    5 +----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 4028d0c..9c2a02d 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -34,8 +34,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 0
-#define _QLCNIC_LINUX_SUBVERSION 13
-#define QLCNIC_LINUX_VERSIONID  "5.0.13"
+#define _QLCNIC_LINUX_SUBVERSION 14
+#define QLCNIC_LINUX_VERSIONID  "5.0.14"
 #define QLCNIC_DRV_IDC_VER  0x01
 #define QLCNIC_DRIVER_VERSION  ((_QLCNIC_LINUX_MAJOR << 16) |\
 		 (_QLCNIC_LINUX_MINOR << 8) | (_QLCNIC_LINUX_SUBVERSION))
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 788850e..11e3a46 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1468,7 +1468,6 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	uint8_t revision_id;
 	uint8_t pci_using_dac;
 	char brd_name[QLCNIC_MAX_BOARD_NAME_LEN];
-	u32 val;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -1530,9 +1529,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto err_out_iounmap;
 
-	val = QLCRD32(adapter, QLCNIC_CRB_DRV_ACTIVE);
-	if (QLC_DEV_CHECK_ACTIVE(val, adapter->portnum))
-		adapter->flags |= QLCNIC_NEED_FLR;
+	adapter->flags |= QLCNIC_NEED_FLR;
 
 	err = adapter->nic_ops->start_firmware(adapter);
 	if (err) {
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH NEXT 2/3] qlcnic: fix ocm window register offset calculation
From: Amit Kumar Salecha @ 2010-12-17  8:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Rajesh Borundia
In-Reply-To: <1292576342-1359-1-git-send-email-amit.salecha@qlogic.com>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

OCM window register offset was calculated incorrectly for
pci function greater than zero.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_hdr.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index 19328e0..726ef55 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -621,7 +621,7 @@ enum {
 #define PCIX_INT_MASK		(0x10104)
 
 #define PCIX_OCM_WINDOW		(0x10800)
-#define PCIX_OCM_WINDOW_REG(func)	(PCIX_OCM_WINDOW + 0x20 * (func))
+#define PCIX_OCM_WINDOW_REG(func)	(PCIX_OCM_WINDOW + 0x4 * (func))
 
 #define PCIX_TARGET_STATUS	(0x10118)
 #define PCIX_TARGET_STATUS_F1	(0x10160)
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH NEXT 0/3]qlcnic: bug fixes
From: Amit Kumar Salecha @ 2010-12-17  8:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty


Hi
  Series of 3 bug fixes, apply them in net-next branch.

-Amit

^ permalink raw reply

* [PATCH NEXT 1/3] qlcnic: fix LED test when interface is down.
From: Amit Kumar Salecha @ 2010-12-17  8:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Sucheta Chakraborty
In-Reply-To: <1292576342-1359-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

When interface is down, create temporary context to config LED.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h         |    1 +
 drivers/net/qlcnic/qlcnic_ethtool.c |   27 ++++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index f267da4..4028d0c 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -925,6 +925,7 @@ struct qlcnic_ipaddr {
 
 #define QLCNIC_INTERRUPT_TEST		1
 #define QLCNIC_LOOPBACK_TEST		2
+#define QLCNIC_LED_TEST		3
 
 #define QLCNIC_FILTER_AGE	80
 #define QLCNIC_READD_AGE	20
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 0eaf31b..1e7af70 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -834,16 +834,27 @@ static int qlcnic_set_tso(struct net_device *dev, u32 data)
 static int qlcnic_blink_led(struct net_device *dev, u32 val)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
+	int max_sds_rings = adapter->max_sds_rings;
+	int dev_down = 0;
 	int ret;
 
-	if (!test_bit(__QLCNIC_DEV_UP, &adapter->state))
-		return -EIO;
+	if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
+		dev_down = 1;
+		if (test_and_set_bit(__QLCNIC_RESETTING, &adapter->state))
+			return -EIO;
+
+		ret = qlcnic_diag_alloc_res(dev, QLCNIC_LED_TEST);
+		if (ret) {
+			clear_bit(__QLCNIC_RESETTING, &adapter->state);
+			return ret;
+		}
+	}
 
 	ret = adapter->nic_ops->config_led(adapter, 1, 0xf);
 	if (ret) {
 		dev_err(&adapter->pdev->dev,
 			"Failed to set LED blink state.\n");
-		return ret;
+		goto done;
 	}
 
 	msleep_interruptible(val * 1000);
@@ -852,10 +863,16 @@ static int qlcnic_blink_led(struct net_device *dev, u32 val)
 	if (ret) {
 		dev_err(&adapter->pdev->dev,
 			"Failed to reset LED blink state.\n");
-		return ret;
+		goto done;
 	}
 
-	return 0;
+done:
+	if (dev_down) {
+		qlcnic_diag_free_res(dev, max_sds_rings);
+		clear_bit(__QLCNIC_RESETTING, &adapter->state);
+	}
+	return ret;
+
 }
 
 static void
-- 
1.7.3.2


^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Jarek Poplawski @ 2010-12-17  8:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <20101217083413.GB6907@ff.dom.local>

On Fri, Dec 17, 2010 at 08:34:13AM +0000, Jarek Poplawski wrote:
> On Fri, Dec 17, 2010 at 09:08:17AM +0100, Eric Dumazet wrote:
> > Care to suggest an alternative patch, because I am lost at this point ?
> 
> Just what I wrote earlier: consider one additional cloning in
> dev_queue_xmit_nit

Actually, it's enough to get time once and change it for all current
clones in the loop. 

Jarek P.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Eric Dumazet @ 2010-12-17  9:26 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <20101217085937.GC6907@ff.dom.local>

Le vendredi 17 décembre 2010 à 08:59 +0000, Jarek Poplawski a écrit :
> On Fri, Dec 17, 2010 at 08:34:13AM +0000, Jarek Poplawski wrote:
> > On Fri, Dec 17, 2010 at 09:08:17AM +0100, Eric Dumazet wrote:
> > > Care to suggest an alternative patch, because I am lost at this point ?
> > 
> > Just what I wrote earlier: consider one additional cloning in
> > dev_queue_xmit_nit
> 
> Actually, it's enough to get time once and change it for all current
> clones in the loop. 
> 
> Jarek P.

I think we can add this after latest Changli patch :

He does one skb_clone() before calling the sniffers.
We could set timestamp on this clone, instead of original skb.

Problem solved.




^ permalink raw reply

* Re: [PATCH 0/15] RFC: create drivers/net/legacy for ISA, EISA, MCA drivers
From: Jeff Kirsher @ 2010-12-17  9:51 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: paul.gortmaker, Joe Perches, Maciej W. Rozycki, netdev
In-Reply-To: <alpine.LNX.2.01.1012161321500.6484@obet.zrqbmnf.qr>

On Thu, Dec 16, 2010 at 04:22, Jan Engelhardt <jengelh@medozas.de> wrote:
> [adding missing cc:netdev]
>
>
> A few comments, since I have just been made aware of the Netconf slides,
>
>
> Paul Gortmaker wrote:
>>
>>If in fact this series gets agreement, I'm figuring it makes sense to
>>have it go in either at the beginning of a dev cycle, or at the very
>>end
>
> I think I have seen git properly coping with renames, if your and other
> developers' branches are git-merged (patchwise application of course
> leads to rejects).
>
>
>>classifying drivers according to the physical layer they support or if
>>multiple are, such as with the Ethernet that is backwards compatible,
>>the newest variation they do?
>
> Above all I would probably like to see
>
> - getting rid of the "1000 Mbit" and "10000 Mbit" submenus. jme.ko for
> example is put under 1000 Mbit, but the jme chip I have ("197b:0260 (rev
> 02) JMicron Technology Corp. JMC260 PCI Express Fast Ethernet
> Controller") does not do 1000.

Organization of the Kconfig menus/sub-menus is also goal in this
organization of /drivers/net.  I like the 10/100, 1000, and 10GbE
sub-menus, but if we find that there are issues with drivers that
support hardware in more than one category, then we should look at the
best way to handle those drivers or organization of the Ethernet
sub-menus.

>
> - getting rid of CONFIG_NET_PCI and move dependencies to individual
> driver config options. It looks odd that only drivers from the 10/100
> Mbit category — and then, not even all — depend on this. Of course you
> probably won't see Gbit adapters for ISA, but SUN, 3com, HP cards are
> also available on PCI.

Agreed, this is fixed in some of the preliminary patches that Joe and
I have been working on.

>
> Jeff Kirsher puts forward in:
>>
>>http://vger.kernel.org/netconf2010_slides/netconf-jtk.pdf
>>
>>Create /drivers/net/sw for vlan, 8021q, bonding, bridging, etc drivers
>
> That does not seem too nice. Currently, bridge is at net/bridge/, and
> moving it into drivers/net/sw/bridge/ is just elongating the path name
> for a rename that is.. a bit disputable as far as I read the thread.

The reasoning behind this idea was that we are trying to organize
/drivers/net so that we have /drivers/net/<L2 technologies> for
example:
/drivers/net/appletalk
/drivers/net/arcnet
/drivers/net/ethernet
/drivers/net/tokenring
/drivers/net/wimax
/drivers/net/wireless

Stack drivers like vlan, bonding, bridging do not fall into this and
so we thought it best to create a /drivers/net/sw directory for stack
drivers like this.  For now, we planned on keeping stack drivers in
/drivers/net/.  IMHO I still think that moving stack drivers to
/drivers/net/sw is still a good idea but it does not have to happen
right away.

>
> What is in net/, leave it there for now.
> drivers/net/ethernet/, I am fine with.

Most of the drivers in /drivers/net are Ethernet drivers and should
reside in /drivers/net/ethernet (if they are Ethernet drivers) when
the directory structure exists.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH V7 4/8] posix clocks: hook dynamic clocks into system calls
From: Thomas Gleixner @ 2010-12-17 10:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <20101217070450.GB2982-7KxsofuKt4IfAd9E5cN8NEzG7cXyKsk/@public.gmane.org>

On Fri, 17 Dec 2010, Richard Cochran wrote:
> On Fri, Dec 17, 2010 at 12:20:44AM +0100, Thomas Gleixner wrote:
> > > +static inline bool clock_is_static(const clockid_t id)
> > > +{
> > > +	if (0 == (id & ~CLOCKFD_MASK))
> > > +		return true;
> > > +	if (CLOCKFD == (id & CLOCKFD_MASK))
> > > +		return false;
> > 
> > Please use the usual kernel notation.
>
> Sorry, what do you mean?

  if (!id & ~CLOCKFD_MASK)
  if ((id & CLOCKFD_MASK) == CLOCKFD)

Not a real problem. I just always stumble over that coding style.

> The code in the patch set is modeled after a USB driver, namely
> drivers/usb/class/usbtmc.c. Alan Cox had brought up the example of a
> PTP Hardware Clock appearing as a USB device. So the device might
> suddenly disappear, and the zombie field is supposed to cover the case
> where the hardware no longer exists, but the file pointer is still
> valid.

Hmm ok, so you use clk->mutex to prevent the underlying device from
going away while you call a function and clk->zombie indicates that
it's gone and clk just kept around for an open file descriptor. Now it
makes sense, but that wants a big fat comment in the code. :)

Doesn't the same problem exist for the file operations of patch 3? I
think you want the very same protection there unless you want to do
the same magic in your USB driver as well.

So you could do the following:

static int get_fd_clk(struct posix_clock_descr *cd, struct file *fp)
{
	cd->fp = fp;
	cd->clk = fp->private_data;
	mutex_lock(&cd->clk->mutex);
	if (!cd->clk->zombie)
		return 0;
	mutex_unlock(&cd->clk->mutex);
	return -ENODEV;
}

static void put_fd_clk(struct posix_clock_descr *cd)
{
	mutex_unlock(&cd->clk->mutex);
}

static int get_posix_clock(const clockid_t id, struct posix_clock_descr *cd)
{
	struct file *fp = fget(CLOCKID_TO_FD(id));
	int ret;

	if (!fp || fp->f_op->open != posix_clock_open || !fp->private_data)
	   	return -ENODEV;
	ret = get_fd_clk(cd, fp);
	if (ret)
		fput(fp);
	return ret;
}

static void put_posix_clock(struct posix_clock_descr *cd)
{
	put_fd_clk(cd);
	fput(cd->fp);
}

So your fops would call get_fd_clk() and put_fd_clk() and the pc_timer
ones get_posix_clock() and put_posix_clock() or whatever sensible
function names you come up with.

Thoughts ?

> I would summarize the discussion like this:
> 
> Alan Cox was strongly in favor of using a regular file descriptor as
> the reference to the dynamic clock.
> 
> John Stultz thought it wouldn't be too bad to cycle though a number of
> static ids, being careful not to every assign the same static id (in
> series) to two different clocks.
> 
> I implemented Alan's idea, since it seemed like maintaining the
> mapping between clocks and static ids would be extra work, but without
> any real benefit.

Yes, he's right. Was too tired to think about the clock ids assingment
when devices are removed and plugged. The fd mapping makes that all go
away.

Thanks,

	tglx

^ permalink raw reply

* [BUG] 2.6.37-rc5 Memory leak in net/ipv4/udp.c
From: Lothar Waßmann @ 2010-12-17 10:18 UTC (permalink / raw)
  To: netdev

Hi,

the kernel memory leak detector spews the message:
|kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
cat /sys/kernel/debug/kmemleak
|unreferenced object 0xc7a1c000 (size 5120):
|  comm "swapper", pid 1, jiffies 4294937513 (age 2320.120s)
|  hex dump (first 32 bytes):
|    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
|    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
|  backtrace:
|    [<c00112b8>] alloc_large_system_hash+0x188/0x224
|    [<c001be14>] udp_table_init+0x44/0x180
|    [<c001bf64>] udp_init+0x14/0x78
|    [<c001c620>] inet_init+0x138/0x240
|    [<c0030368>] do_one_initcall+0x58/0x1a8
|    [<c00083c8>] kernel_init+0x98/0x14c
|    [<c0037714>] kernel_thread_exit+0x0/0x8
|    [<ffffffff>] 0xffffffff
|unreferenced object 0xc7a26000 (size 5120):
|  comm "swapper", pid 1, jiffies 4294937513 (age 2320.130s)
|  hex dump (first 32 bytes):
|    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
|    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
|  backtrace:
|    [<c00112b8>] alloc_large_system_hash+0x188/0x224
|    [<c001be14>] udp_table_init+0x44/0x180
|    [<c001bff0>] udplite4_register+0x10/0x94
|    [<c001c624>] inet_init+0x13c/0x240
|    [<c0030368>] do_one_initcall+0x58/0x1a8
|    [<c00083c8>] kernel_init+0x98/0x14c
|    [<c0037714>] kernel_thread_exit+0x0/0x8
|    [<ffffffff>] 0xffffffff

The offending code in net/ipv4/udp.c is:
|void __init udp_table_init(struct udp_table *table, const char *name)
|{
|	unsigned int i;
|
|	if (!CONFIG_BASE_SMALL)
|		table->hash = alloc_large_system_hash(name,
|			2 * sizeof(struct udp_hslot),
|			uhash_entries,
|			21, /* one slot per 2 MB */
|			0,
|			&table->log,
|			&table->mask,
|			64 * 1024);
|	/*
|	 * Make sure hash table has the minimum size
|	 */
|	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
|		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
|				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
In case of !CONFIG_BASE_SMALL and 'table->mask < UDP_HTABLE_SIZE_MIN - 1)'
the memory allocated in the previous if clause becomes inacessible!

Shouldn't this be:
|	if (!CONFIG_BASE_SMALL && table->mask >= UDP_HTABLE_SIZE_MIN - 1) {
|		table->hash = alloc_large_system_hash(name,
|			2 * sizeof(struct udp_hslot),
|			uhash_entries,
|			21, /* one slot per 2 MB */
|			0,
|			&table->log,
|			&table->mask,
|			64 * 1024);
|	} else {
|		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
|				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
[...]



Lothar Waßmann
-- 
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | info@karo-electronics.de
___________________________________________________________

^ permalink raw reply

* Re: [BUG] 2.6.37-rc5 Memory leak in net/ipv4/udp.c
From: Eric Dumazet @ 2010-12-17 10:35 UTC (permalink / raw)
  To: Lothar Waßmann; +Cc: netdev
In-Reply-To: <19723.14557.349975.821418@ipc1.ka-ro>

Le vendredi 17 décembre 2010 à 11:18 +0100, Lothar Waßmann a écrit :
> Hi,
> 
> the kernel memory leak detector spews the message:
> |kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> cat /sys/kernel/debug/kmemleak
> |unreferenced object 0xc7a1c000 (size 5120):
> |  comm "swapper", pid 1, jiffies 4294937513 (age 2320.120s)
> |  hex dump (first 32 bytes):
> |    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
> |    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
> |  backtrace:
> |    [<c00112b8>] alloc_large_system_hash+0x188/0x224
> |    [<c001be14>] udp_table_init+0x44/0x180
> |    [<c001bf64>] udp_init+0x14/0x78
> |    [<c001c620>] inet_init+0x138/0x240
> |    [<c0030368>] do_one_initcall+0x58/0x1a8
> |    [<c00083c8>] kernel_init+0x98/0x14c
> |    [<c0037714>] kernel_thread_exit+0x0/0x8
> |    [<ffffffff>] 0xffffffff
> |unreferenced object 0xc7a26000 (size 5120):
> |  comm "swapper", pid 1, jiffies 4294937513 (age 2320.130s)
> |  hex dump (first 32 bytes):
> |    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
> |    aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
> |  backtrace:
> |    [<c00112b8>] alloc_large_system_hash+0x188/0x224
> |    [<c001be14>] udp_table_init+0x44/0x180
> |    [<c001bff0>] udplite4_register+0x10/0x94
> |    [<c001c624>] inet_init+0x13c/0x240
> |    [<c0030368>] do_one_initcall+0x58/0x1a8
> |    [<c00083c8>] kernel_init+0x98/0x14c
> |    [<c0037714>] kernel_thread_exit+0x0/0x8
> |    [<ffffffff>] 0xffffffff
> 
> The offending code in net/ipv4/udp.c is:
> |void __init udp_table_init(struct udp_table *table, const char *name)
> |{
> |	unsigned int i;
> |
> |	if (!CONFIG_BASE_SMALL)
> |		table->hash = alloc_large_system_hash(name,
> |			2 * sizeof(struct udp_hslot),
> |			uhash_entries,
> |			21, /* one slot per 2 MB */
> |			0,
> |			&table->log,
> |			&table->mask,
> |			64 * 1024);
> |	/*
> |	 * Make sure hash table has the minimum size
> |	 */
> |	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
> |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> In case of !CONFIG_BASE_SMALL and 'table->mask < UDP_HTABLE_SIZE_MIN - 1)'
> the memory allocated in the previous if clause becomes inacessible!
> 
> Shouldn't this be:
> |	if (!CONFIG_BASE_SMALL && table->mask >= UDP_HTABLE_SIZE_MIN - 1) {
> |		table->hash = alloc_large_system_hash(name,
> |			2 * sizeof(struct udp_hslot),
> |			uhash_entries,
> |			21, /* one slot per 2 MB */
> |			0,
> |			&table->log,
> |			&table->mask,
> |			64 * 1024);
> |	} else {
> |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> [...]
> 
> 
> 
> Lothar Waßmann

Nothing we can do about it, there is no API to reverse the
alloc_large_system_hash() effect. We could call kmemleak api to at least
avoid this false alarm.

We really want a minimum size for the UDP hash table, because our algos
depend on this.

Why on your config alloc_large_system_hash() is allocating 5120 bytes, I
dont know :(



^ permalink raw reply

* Re: e1000: more than two seconds to get the flag RUNNING
From: Nicolas Dichtel @ 2010-12-17 10:43 UTC (permalink / raw)
  To: Ronciak, John
  Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W,
	Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Waskiewicz Jr, Peter P, Duyck, Alexander H, netdev
In-Reply-To: <DDC57477F5D6F845A0DDCB99D3C4812D0D2E664702@orsmsx510.amr.corp.intel.com>

Le 16.12.2010 17:44, Ronciak, John a écrit :
> I'm assuming when you say "get the flag RUNNING" you mean to get link.  Is that right?  Are you connected to a switch?  It is entirely possible to wait 2 seconds to get link on a 1000Base-T (RJ45) link.  By specification link can take as much as 4 seconds.  Do you have spanning tree enabled on the switch?  That could be delaying link as well.
Yes, I mean 'get the link'. I was connected to a switch, but no spanning tree.

> 
> To check this connect it back to back to another system.  With this old of an adapter you may need to connect the two systems with a cross-over cable.
I get the same issue.


Regards,
Nicolas

> 
> Cheers,
> John
> 
> 
>> -----Original Message-----
>> From: Nicolas Dichtel [mailto:nicolas.dichtel@6wind.com]
>> Sent: Thursday, December 16, 2010 8:17 AM
>> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
>> Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
>> Duyck, Alexander H; Ronciak, John
>> Cc: netdev
>> Subject: e1000: more than two seconds to get the flag RUNNING
>>
>> Hi,
>>
>> maybe this problem has already been discussed, but I didn't find the
>> thread.
>> When I put an interface managed by the e1000 driver up, down and up
>> again, I must wait more than 2 seconds to get the flag running again.
>>
>> Here is the sequence:
>>
>> shelby:~# uname -a
>> Linux shelby 2.6.37-rc5+ #9 SMP Wed Dec 15 13:16:10 EST 2010 i686
>> GNU/Linux shelby:~# lsmod | grep e1000
>> e1000                  76543  0
>> shelby:~# lspci | grep Gigabit
>> 01:09.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet
>> Controller (rev 02) shelby:~# cat check_link_state.sh #!/bin/bash
>>
>> sleep_time=$1
>> ip link set eth0 up
>> ip link set eth0 down
>> ip link set eth0 up
>> while [ $sleep_time -gt 0 ] ; do
>>      date
>>      #ip link show eth0
>>      ifconfig eth0 | grep MULTICAST
>>      sleep 1
>>      echo ""
>>      sleep_time=`expr $sleep_time - 1`
>> done
>> shelby:~# ifconfig eth0
>> eth0      Link encap:Ethernet  HWaddr 00:30:1b:b4:dc:88
>>            inet addr:10.16.0.72  Bcast:10.16.0.255  Mask:255.255.255.0
>>            inet6 addr: fe80::230:1bff:feb4:dc88/64 Scope:Link
>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>            RX packets:22051 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:63 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:2480685 (2.3 MiB)  TX bytes:5242 (5.1 KiB)
>>
>> shelby:~# ./check_link_state.sh 3
>> [83270.080175] ADDRCONF(NETDEV_UP): eth0: link is not ready Thu Dec 16
>> 12:45:56 EST 2010
>>            UP BROADCAST MULTICAST  MTU:1500  Metric:1
>>
>> Thu Dec 16 12:45:57 EST 2010
>>            UP BROADCAST MULTICAST  MTU:1500  Metric:1 [83271.828371]
>> e1000: eth0 NIC Link is Up 100 Mbps Half Duplex, Flow Control: None
>> [83271.835878] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>
>> Thu Dec 16 12:45:58 EST 2010
>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>
>> shelby:~#
>>
>> I get the same result with a 2.6.15, so it seems that the problem is
>> here since a long time.
>> Has anyone an input for this problem?
>>
>>
>> Regards,
>> Nicolas

-- 
Nicolas DICHTEL
6WIND
R&D Engineer

Tel: +33 1 39 30 92 10
Fax: +33 1 39 30 92 11
nicolas.dichtel@6wind.com
www.6wind.com
Join the Multicore Packet Processing Forum: www.multicorepacketprocessing.com

Ce courriel ainsi que toutes les pièces jointes, est uniquement destiné à son ou
ses destinataires. Il contient des informations confidentielles qui sont la
propriété de 6WIND. Toute révélation, distribution ou copie des informations
qu'il contient est strictement interdite. Si vous avez reçu ce message par
erreur, veuillez immédiatement le signaler à l'émetteur et détruire toutes les
données reçues.

This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and contains information that is confidential and
proprietary to 6WIND. All unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender
by reply e-mail and destroy all copies of the original message.


^ permalink raw reply

* Re: [BUG] 2.6.37-rc5 Memory leak in net/ipv4/udp.c
From: Lothar Waßmann @ 2010-12-17 11:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1292582116.2906.5.camel@edumazet-laptop>

Hi,

Eric Dumazet writes:
> Le vendredi 17 décembre 2010 à 11:18 +0100, Lothar Waßmann a écrit :
> > The offending code in net/ipv4/udp.c is:
> > |void __init udp_table_init(struct udp_table *table, const char *name)
> > |{
> > |	unsigned int i;
> > |
> > |	if (!CONFIG_BASE_SMALL)
> > |		table->hash = alloc_large_system_hash(name,
> > |			2 * sizeof(struct udp_hslot),
> > |			uhash_entries,
> > |			21, /* one slot per 2 MB */
> > |			0,
> > |			&table->log,
> > |			&table->mask,
> > |			64 * 1024);
> > |	/*
> > |	 * Make sure hash table has the minimum size
> > |	 */
> > |	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
> > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > In case of !CONFIG_BASE_SMALL and 'table->mask < UDP_HTABLE_SIZE_MIN - 1)'
> > the memory allocated in the previous if clause becomes inacessible!
> > 
> > Shouldn't this be:
> > |	if (!CONFIG_BASE_SMALL && table->mask >= UDP_HTABLE_SIZE_MIN - 1) {
> > |		table->hash = alloc_large_system_hash(name,
> > |			2 * sizeof(struct udp_hslot),
> > |			uhash_entries,
> > |			21, /* one slot per 2 MB */
> > |			0,
> > |			&table->log,
> > |			&table->mask,
> > |			64 * 1024);
> > |	} else {
> > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > [...]
> > 
> 
> Nothing we can do about it, there is no API to reverse the
> alloc_large_system_hash() effect. We could call kmemleak api to at least
> avoid this false alarm.
> 
Do you have to call it at all in case of table->mask < UDP_HTABLE_SIZE_MIN - 1?

> We really want a minimum size for the UDP hash table, because our algos
> depend on this.
> 
I can't see why this could not be achieved by doing _either_
alloc_large_system_hash() _OR_ kmalloc() as stated above, but not
both.


Lothar Waßmann
-- 
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | info@karo-electronics.de
___________________________________________________________

^ permalink raw reply

* Re: [BUG] 2.6.37-rc5 Memory leak in net/ipv4/udp.c
From: Eric Dumazet @ 2010-12-17 11:32 UTC (permalink / raw)
  To: Lothar Waßmann; +Cc: netdev
In-Reply-To: <19723.17775.241784.993744@ipc1.ka-ro>

Le vendredi 17 décembre 2010 à 12:11 +0100, Lothar Waßmann a écrit :
> Hi,
> 
> Eric Dumazet writes:
> > Le vendredi 17 décembre 2010 à 11:18 +0100, Lothar Waßmann a écrit :
> > > The offending code in net/ipv4/udp.c is:
> > > |void __init udp_table_init(struct udp_table *table, const char *name)
> > > |{
> > > |	unsigned int i;
> > > |
> > > |	if (!CONFIG_BASE_SMALL)
> > > |		table->hash = alloc_large_system_hash(name,
> > > |			2 * sizeof(struct udp_hslot),
> > > |			uhash_entries,
> > > |			21, /* one slot per 2 MB */
> > > |			0,
> > > |			&table->log,
> > > |			&table->mask,
> > > |			64 * 1024);
> > > |	/*
> > > |	 * Make sure hash table has the minimum size
> > > |	 */
> > > |	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
> > > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > > In case of !CONFIG_BASE_SMALL and 'table->mask < UDP_HTABLE_SIZE_MIN - 1)'
> > > the memory allocated in the previous if clause becomes inacessible!
> > > 
> > > Shouldn't this be:
> > > |	if (!CONFIG_BASE_SMALL && table->mask >= UDP_HTABLE_SIZE_MIN - 1) {
> > > |		table->hash = alloc_large_system_hash(name,
> > > |			2 * sizeof(struct udp_hslot),
> > > |			uhash_entries,
> > > |			21, /* one slot per 2 MB */
> > > |			0,
> > > |			&table->log,
> > > |			&table->mask,
> > > |			64 * 1024);
> > > |	} else {
> > > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > > [...]
> > > 
> > 
> > Nothing we can do about it, there is no API to reverse the
> > alloc_large_system_hash() effect. We could call kmemleak api to at least
> > avoid this false alarm.
> > 
> Do you have to call it at all in case of table->mask < UDP_HTABLE_SIZE_MIN - 1?
> 

We call alloc_large_system_hash() asking it to size the table _itself_.
We give some hints : 

- How many slots per MB of avail memory.
- An upper limit (64*1024 slots because we only handle 65536 udp ports)
- but not a lower limit (not available in the API)

Problem is in your case, alloc_large_system_hash() allocates a very
small area. Then we catch the problem, seeing table->mask is too small
for our needs. We prefer to 'lost' this too small memory than crashing
kernel later.


> > We really want a minimum size for the UDP hash table, because our algos
> > depend on this.
> > 
> I can't see why this could not be achieved by doing _either_
> alloc_large_system_hash() _OR_ kmalloc() as stated above, but not
> both.

We definitly want alloc_large_system_hash() for the general case
(nice NUMA spread, while kmalloc() would allocate the hash table on a
single memory node. Not so nice)

One way to handle this problem would be to add a new parameter to
alloc_large_system_hash() to specify a lower limit.
alloc_large_system_hash() would not even try to allocate a too small
array.

Is your system so small ?




^ permalink raw reply

* Re: [BUG] 2.6.37-rc5 Memory leak in net/ipv4/udp.c
From: Lothar Waßmann @ 2010-12-17 11:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1292585534.2906.12.camel@edumazet-laptop>

Eric Dumazet writes:
> Le vendredi 17 décembre 2010 à 12:11 +0100, Lothar Waßmann a écrit :
> > Hi,
> > 
> > Eric Dumazet writes:
> > > Le vendredi 17 décembre 2010 à 11:18 +0100, Lothar Waßmann a écrit :
> > > > The offending code in net/ipv4/udp.c is:
> > > > |void __init udp_table_init(struct udp_table *table, const char *name)
> > > > |{
> > > > |	unsigned int i;
> > > > |
> > > > |	if (!CONFIG_BASE_SMALL)
> > > > |		table->hash = alloc_large_system_hash(name,
> > > > |			2 * sizeof(struct udp_hslot),
> > > > |			uhash_entries,
> > > > |			21, /* one slot per 2 MB */
> > > > |			0,
> > > > |			&table->log,
> > > > |			&table->mask,
> > > > |			64 * 1024);
> > > > |	/*
> > > > |	 * Make sure hash table has the minimum size
> > > > |	 */
> > > > |	if (CONFIG_BASE_SMALL || table->mask < UDP_HTABLE_SIZE_MIN - 1) {
> > > > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > > > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > > > In case of !CONFIG_BASE_SMALL and 'table->mask < UDP_HTABLE_SIZE_MIN - 1)'
> > > > the memory allocated in the previous if clause becomes inacessible!
> > > > 
> > > > Shouldn't this be:
> > > > |	if (!CONFIG_BASE_SMALL && table->mask >= UDP_HTABLE_SIZE_MIN - 1) {
> > > > |		table->hash = alloc_large_system_hash(name,
> > > > |			2 * sizeof(struct udp_hslot),
> > > > |			uhash_entries,
> > > > |			21, /* one slot per 2 MB */
> > > > |			0,
> > > > |			&table->log,
> > > > |			&table->mask,
> > > > |			64 * 1024);
> > > > |	} else {
> > > > |		table->hash = kmalloc(UDP_HTABLE_SIZE_MIN *
> > > > |				      2 * sizeof(struct udp_hslot), GFP_KERNEL);
> > > > [...]
> > > > 
> > > 
> > > Nothing we can do about it, there is no API to reverse the
> > > alloc_large_system_hash() effect. We could call kmemleak api to at least
> > > avoid this false alarm.
> > > 
> > Do you have to call it at all in case of table->mask < UDP_HTABLE_SIZE_MIN - 1?
> > 
> 
> We call alloc_large_system_hash() asking it to size the table _itself_.
> We give some hints : 
> 
> - How many slots per MB of avail memory.
> - An upper limit (64*1024 slots because we only handle 65536 udp ports)
> - but not a lower limit (not available in the API)
> 
> Problem is in your case, alloc_large_system_hash() allocates a very
> small area. Then we catch the problem, seeing table->mask is too small
> for our needs. We prefer to 'lost' this too small memory than crashing
> kernel later.
> 
table->mask is not altered by alloc_large_system_hash(), so you could
detect the situation beforhand and avoid calling that function in this
case. As far as I can tell there is no need for
alloc_large_system_hash() if you later decide to use kmalloc'ed memory
instead.

The current situation is
if (!CONFIG_BASE_SMALL)
	call alloc_large_system_hash()
if (CONFIG_BASE_SMALL || table->mask < MIN)
	call kmalloc() dropping evnetually allocated memory from the
	previous if clause

My proposal was:
if (!CONFIG_BASE_SMALL && table->mask >= MIN)
	call alloc_large_system_hash()
else
	call kmalloc()

which is functionally equivalent except for the missing call to
alloc_large_system_hash() if the memory allocated by that function is
not used.

> > > We really want a minimum size for the UDP hash table, because our algos
> > > depend on this.
> > > 
> > I can't see why this could not be achieved by doing _either_
> > alloc_large_system_hash() _OR_ kmalloc() as stated above, but not
> > both.
> 
> We definitly want alloc_large_system_hash() for the general case
> (nice NUMA spread, while kmalloc() would allocate the hash table on a
> single memory node. Not so nice)
> 
That would still be the case with my proposed solution.


Lothar Waßmann
-- 
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | info@karo-electronics.de
___________________________________________________________

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox