Netdev List
 help / color / mirror / Atom feed
* Re: SPLICE_F_NONBLOCK semantics...
From: Jens Axboe @ 2009-10-02  7:47 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, eric.dumazet, jgunthorpe, vl, opurdila, netdev,
	linux-kernel
In-Reply-To: <20091001.152717.187318570.davem@davemloft.net>

On Thu, Oct 01 2009, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Thu, 1 Oct 2009 15:21:44 -0700 (PDT)
> 
> > On Thu, 1 Oct 2009, David Miller wrote:
> >> 
> >> It depends upon our interpretation of how you intended the
> >> SPLICE_F_NONBLOCK flag to work when you added it way back
> >> when.
> >> 
> >> Linus introduced  SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> >> (splice: add SPLICE_F_NONBLOCK flag )
> >> 
> >>   It doesn't make the splice itself necessarily nonblocking (because the
> >>   actual file descriptors that are spliced from/to may block unless they
> >>   have the O_NONBLOCK flag set), but it makes the splice pipe operations
> >>   nonblocking.
> >> 
> >> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
> > 
> > Ack. The original intent was for the flag to affect the buffering, not the 
> > end points.
> 
> Great, thanks for reviewing.
> 
> > Although the more I think about it, the more I suspect that the
> > whole NONBLOCK thing should probably have been two bits, and simply
> > been about "nonblocking input" vs "nonblocking output" (so that you
> > could control both sides on a call-by-call basis).
> 
> I think we could still extend things in this way if we wanted to.
> So if you specify the explicit input and/or output nonblock flag,
> it takes precedence over the SPLICE_F_NONBLOCK thing.

Yes I agree, thank god for having a 'flags' parameter for the syscalls
:-). I'll make a note to add and test bidirectional nonblock hints.

The net patch looks fine and correct to me, feel free to add my acked-by
if you want.

-- 
Jens Axboe


^ permalink raw reply

* Re: 2.6.32-rc1-git2: Reported regressions from 2.6.31
From: Jaswinder Singh Rajput @ 2009-10-02  7:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <9UCePxij8cB.A.VCG.-3SxKB@chimera>

Hello Rafael,

On Thu, 2009-10-01 at 21:26 +0200, Rafael J. Wysocki wrote:
> [Notes:
> 
>  * Here's the first summary report of known regressions from 2.6.31.  There's
>    not too many of them at the moment, which is nice.
> 
>  * We're still getting quite a number of reports of regressions from 2.6.30 and
>    it's been that way since 2.6.31 was released.  For details please see the
>    summary report of regressions 2.6.30 -> 2.6.31 that will follow shortly.]
> 
> This message contains a list of some regressions from 2.6.31, for which there
> are no fixes in the mainline I know of.  If any of them have been fixed already,
> please let me know.
> 
> If you know of any other unresolved regressions from 2.6.31, please let me know
> either and I'll add them to the list.  Also, please let me know if any of the
> entries below are invalid.
> 
> Each entry from the list will be sent additionally in an automatic reply to
> this message with CCs to the people involved in reporting and handling the
> issue.
> 
> 
> Listed regressions statistics:
> 
>   Date          Total  Pending  Unresolved
>   ----------------------------------------
>   2009-10-02       22       15           9
> 
> 
> Unresolved regressions
> ----------------------
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=14299
> Subject		: oops in wireless, iwl3945 related?
> Submitter	: Pavel Machek <pavel@ucw.cz>
> Date		: 2009-09-29 17:12 (3 days old)
> References	: http://marc.info/?l=linux-kernel&m=125424439725743&w=4
> 

If you add one more entry say "Suspected commit :" then it will be great
and will solve regressions much faster. You can request submitter to
submit 'suspected commit' by git bisect and also specify git bisect
links like : (for more information about git bisect check
http://kerneltrap.org/node/11753)

Thanks,
--
JSR

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:32 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <20091002070819.GA9694@ff.dom.local>

On Fri, Oct 02, 2009 at 07:08:19AM +0000, Jarek Poplawski wrote:
> On 01-10-2009 23:21, Jarek Poplawski wrote:
...
> To make my point clare: [...]

Am I clair? ;-)

Jarek P.

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, kaber, netdev
In-Reply-To: <4AC5A7F9.3000005@gmail.com>

On Fri, Oct 02, 2009 at 09:12:57AM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> 
> > To make my point clare: why not something like this?:
> > 
> > static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
> >                          u32 pid, u32 seq, u16 flags, int event)
> > {
> > 	...
> > 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> > 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
> >              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
> >             gnet_stats_copy_queue(&d, &q->qstats) < 0)
> >                 goto nla_put_failure;
> > 
> > BTW, I'm not sure we need to chanage user visible API for this.
> > (Is it really expected to work after updating gen_stats.h only in
> > iproute?)
> > 
> 
> Thats would be better indeed, do you want to work on it or let me do it ?

I want you work on it.

Thanks,
Jarek P.

^ permalink raw reply

* [net-2.6 PATCH] e1000e/igb/ixgbe: Don't report an error if devices don't support AER
From: Jeff Kirsher @ 2009-10-02  7:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Frans Pop, Jeff Kirsher

From: Frans Pop <elendil@planet.nl>

The only error returned by pci_{en,dis}able_pcie_error_reporting() is
-EIO which simply means that Advanced Error Reporting is not supported.
There is no need to report that, so remove the error check from e1000e,
igb and ixgbe.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/netdev.c    |   13 ++-----------
 drivers/net/igb/igb_main.c     |   13 ++-----------
 drivers/net/ixgbe/ixgbe_main.c |   13 ++-----------
 3 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 16c193a..0687c6a 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4982,12 +4982,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 		goto err_pci_reg;
 
 	/* AER (Advanced Error Reporting) hooks */
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		        "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	/* PCI config space info */
@@ -5263,7 +5258,6 @@ static void __devexit e1000_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	int err;
 
 	/*
 	 * flush_scheduled work may reschedule our watchdog task, so
@@ -5299,10 +5293,7 @@ static void __devexit e1000_remove(struct pci_dev *pdev)
 	free_netdev(netdev);
 
 	/* AER disable */
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 5d6c153..714c3a4 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1246,12 +1246,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_pci_reg;
 
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		        "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	pci_save_state(pdev);
@@ -1628,7 +1623,6 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
-	int err;
 
 	/* flush_scheduled work may reschedule our watchdog task, so
 	 * explicitly disable watchdog tasks from being rescheduled  */
@@ -1682,10 +1676,7 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 
 	free_netdev(netdev);
 
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 1cbc6a3..28fbb9d 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -5507,12 +5507,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 		goto err_pci_reg;
 	}
 
-	err = pci_enable_pcie_error_reporting(pdev);
-	if (err) {
-		dev_err(&pdev->dev, "pci_enable_pcie_error_reporting failed "
-		                    "0x%x\n", err);
-		/* non-fatal, continue */
-	}
+	pci_enable_pcie_error_reporting(pdev);
 
 	pci_set_master(pdev);
 	pci_save_state(pdev);
@@ -5821,7 +5816,6 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
 
 	set_bit(__IXGBE_DOWN, &adapter->state);
 	/* clear the module not found bit to make sure the worker won't
@@ -5872,10 +5866,7 @@ static void __devexit ixgbe_remove(struct pci_dev *pdev)
 
 	free_netdev(netdev);
 
-	err = pci_disable_pcie_error_reporting(pdev);
-	if (err)
-		dev_err(&pdev->dev,
-		        "pci_disable_pcie_error_reporting failed 0x%x\n", err);
+	pci_disable_pcie_error_reporting(pdev);
 
 	pci_disable_device(pdev);
 }


^ permalink raw reply related

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Eric Dumazet @ 2009-10-02  7:12 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, kaber, netdev
In-Reply-To: <20091002070819.GA9694@ff.dom.local>

Jarek Poplawski a écrit :

> To make my point clare: why not something like this?:
> 
> static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
>                          u32 pid, u32 seq, u16 flags, int event)
> {
> 	...
> 	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
> 	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
>              gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
>             gnet_stats_copy_queue(&d, &q->qstats) < 0)
>                 goto nla_put_failure;
> 
> BTW, I'm not sure we need to chanage user visible API for this.
> (Is it really expected to work after updating gen_stats.h only in
> iproute?)
> 

Thats would be better indeed, do you want to work on it or let me do it ?

Thanks

^ permalink raw reply

* Re: [RFC] pkt_sched: gen_estimator: Dont report fake rate estimators
From: Jarek Poplawski @ 2009-10-02  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, kaber, netdev
In-Reply-To: <4AC51D3D.8010700@gmail.com>

On 01-10-2009 23:21, Jarek Poplawski wrote:
> David Miller wrote, On 10/01/2009 11:14 PM:
> 
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 01 Oct 2009 23:05:53 +0200
>>
>>> Since you ask... I wonder about this whole int plus quite a bit of
>>> struct unreadability for one flag only. Maybe it could be queried
>>> on qdisc level (with a flag if necessary), and additional parameter
>>> of gnet_stats_copy_rate_est()? (Qdiscs should have no problem with
>>> setting this param for their classes too.)
>> Certainly, that's another approach to this problem.
>>
>> But logically, just like we wouldn't emit a block of RED scheduler
>> data to 'tc' unless RED is actually configured, it seems consistent to
>> not emit estimator data when no estimator is even there.
> 
> Sure! I've exaggerated with this additional parameter. ;-)

To make my point clare: why not something like this?:

static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
                         u32 pid, u32 seq, u16 flags, int event)
{
	...
	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
	    (gen_estimator_active(&q->bstats, &q->rate_est) &&
             gnet_stats_copy_rate_est(&d, &q->rate_est) < 0) ||
            gnet_stats_copy_queue(&d, &q->qstats) < 0)
                goto nla_put_failure;

BTW, I'm not sure we need to chanage user visible API for this.
(Is it really expected to work after updating gen_stats.h only in
iproute?)

Jarek P.

^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Eric Dumazet @ 2009-10-02  6:50 UTC (permalink / raw)
  To: Gerrit Renker, netdev
In-Reply-To: <20091002062532.GA15755@gerrit.erg.abdn.ac.uk>

Gerrit Renker a écrit :
> Please forget the posting, this is correct; the clamping is
> 
>   8 <= nr_table_entries <=  sysctl_max_syn_backlog,
> 
> i.e. the minimum table size is 16.
>

Yes, agreed, 8+1 -> 16


^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Eric Dumazet @ 2009-10-02  6:49 UTC (permalink / raw)
  To: Gerrit Renker, netdev
In-Reply-To: <20091002061134.GC5646@gerrit.erg.abdn.ac.uk>

Gerrit Renker a écrit :
> Can someone please have a look, it may be that I am missing something?
> 
> It seems that in the following the maximum number of table entries is set
> to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
> overriding the 'backlog' parameter to listen(2).

False alarm ;)

> 
> net/core/request_sock.c
> -----------------------
> 
> int reqsk_queue_alloc(struct request_sock_queue *queue,
>                       unsigned int nr_table_entries)
> {
>         size_t lopt_size = sizeof(struct listen_sock);
>         struct listen_sock *lopt;
> 
> 	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);

Here we take the _minimum_ value.
If you have  nr_table_entries=4096 and sysctl_max_syn_backlog=1024,
result is 1024

>         nr_table_entries = max_t(u32, nr_table_entries, 8);

Here we take the _maximum_ value of nr_table_entries and 8

-> 1024

Deal is : We want at least 8 slots, even if users called listen(fd, 1);

(Later, user can change its mind and call listen(fd, 1024).

We dont resize hashtable yet, so we guarantee at least 8 slots fot pathological cases.

>         nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
> 
> 	//...
> 	for (lopt->max_qlen_log = 3;
>              (1 << lopt->max_qlen_log) < nr_table_entries;
>              lopt->max_qlen_log++);
> 
>  	//...
> 	lopt->nr_table_entries = nr_table_entries;
> 	
> 	//...
> 	return 0
> }
> 
> The function is called with an argument 'nr_table_entries', which is then clamped as
> 
>    sysctl_max_syn_backlog <= nr_table_entries <= 8
> 
> If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.
> 
> The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).
> 
> The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
> which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
>  * net/dccp/proto.c   (dccp_listen_start) or
>  * net/ipv4/af_inet.c (inet_listen).


^ permalink raw reply

* [BUG net-2.6] bluetooth/rfcomm : sleeping function called from invalid context at mm/slub.c:1719
From: Oliver Hartkopp @ 2009-10-02  6:28 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Linux Netdev List, linux-bluetooth-u79uwXL29TY76Z2rM5mHXA

Hello Marcel,

with current net-2.6 tree ...

While starting my PPP Bluetooth dialup networking, i got this:

[  722.461549] PPP generic driver version 2.4.2
[  722.477519] BUG: sleeping function called from invalid context at
mm/slub.c:1719
[  722.477530] in_atomic(): 1, irqs_disabled(): 0, pid: 4677, name: pppd
[  722.477537] 3 locks held by pppd/4677:
[  722.477542]  #0:  (rfcomm_mutex){+.+.+.}, at: [<fa5df2a1>]
rfcomm_dlc_open+0x28/0x2d6 [rfcomm]
[  722.477568]  #1:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at:
[<fa5414f8>] l2cap_sock_connect+0x62/0x2c6 [l2cap]
[  722.477589]  #2:  (&hdev->lock){+...+.}, at: [<fa5415b4>]
l2cap_sock_connect+0x11e/0x2c6 [l2cap]
[  722.477613] Pid: 4677, comm: pppd Not tainted 2.6.31-08939-gdb8abec-dirty #21
[  722.477619] Call Trace:
[  722.477633]  [<c1042a2b>] ? __debug_show_held_locks+0x1e/0x20
[  722.477644]  [<c10212a1>] __might_sleep+0xc9/0xce
[  722.477655]  [<c1078b62>] __kmalloc+0x6d/0xfb
[  722.477666]  [<c119e739>] ? kzalloc+0xb/0xd
[  722.477674]  [<c119e739>] kzalloc+0xb/0xd
[  722.477683]  [<c119ef1a>] device_private_init+0x15/0x3d
[  722.477693]  [<c11a0e1b>] dev_set_drvdata+0x18/0x26
[  722.477718]  [<f8b7ca1b>] hci_conn_init_sysfs+0x3d/0xc7 [bluetooth]
[  722.477737]  [<f8b791b3>] hci_conn_add+0x1c0/0x1d5 [bluetooth]
[  722.477756]  [<f8b79360>] hci_connect+0x71/0x17d [bluetooth]
[  722.477769]  [<fa54162c>] l2cap_sock_connect+0x196/0x2c6 [l2cap]
[  722.477782]  [<c1246e3d>] kernel_connect+0xd/0x12
[  722.477795]  [<fa5df3c3>] rfcomm_dlc_open+0x14a/0x2d6 [rfcomm]
[  722.477810]  [<fa5e10fa>] ? rfcomm_tty_open+0x73/0x227 [rfcomm]
[  722.477825]  [<fa5e1130>] rfcomm_tty_open+0xa9/0x227 [rfcomm]
[  722.477836]  [<c1022e3f>] ? default_wake_function+0x0/0xd
[  722.477847]  [<c1180c79>] tty_open+0x29e/0x399
[  722.477858]  [<c107e9bd>] chrdev_open+0x13f/0x156
[  722.477868]  [<c107b0d3>] __dentry_open+0x11b/0x20f
[  722.477878]  [<c107b261>] nameidata_to_filp+0x2c/0x43
[  722.477888]  [<c107e87e>] ? chrdev_open+0x0/0x156
[  722.477898]  [<c1084e9e>] do_filp_open+0x3c6/0x70a
[  722.477910]  [<c108d3e4>] ? alloc_fd+0xc8/0xd2
[  722.477920]  [<c108d3e4>] ? alloc_fd+0xc8/0xd2
[  722.477930]  [<c107aebc>] do_sys_open+0x4a/0xe7
[  722.477940]  [<c1002acc>] ? restore_all_notrace+0x0/0x18
[  722.477950]  [<c107af9b>] sys_open+0x1e/0x26
[  722.477959]  [<c1002a18>] sysenter_do_call+0x12/0x36
[  729.658613] PPP BSD Compression module registered
[  729.684789] PPP Deflate Compression module registered

Any idea?

Regards,
Oliver

^ permalink raw reply

* Re: [Question]: reqsk table size limited to 16?
From: Gerrit Renker @ 2009-10-02  6:25 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20091002061134.GC5646@gerrit.erg.abdn.ac.uk>

Please forget the posting, this is correct; the clamping is

  8 <= nr_table_entries <=  sysctl_max_syn_backlog,

i.e. the minimum table size is 16.

Quoting Gerrit:
| Can someone please have a look, it may be that I am missing something?
| 
| It seems that in the following the maximum number of table entries is set
| to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
| overriding the 'backlog' parameter to listen(2).
| 
| net/core/request_sock.c
| -----------------------
| 
| int reqsk_queue_alloc(struct request_sock_queue *queue,
|                       unsigned int nr_table_entries)
| {
|         size_t lopt_size = sizeof(struct listen_sock);
|         struct listen_sock *lopt;
| 
| 	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
|         nr_table_entries = max_t(u32, nr_table_entries, 8);
|         nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
| 
| 	//...
| 	for (lopt->max_qlen_log = 3;
|              (1 << lopt->max_qlen_log) < nr_table_entries;
|              lopt->max_qlen_log++);
| 
|  	//...
| 	lopt->nr_table_entries = nr_table_entries;
| 	
| 	//...
| 	return 0
| }
| 
| The function is called with an argument 'nr_table_entries', which is then clamped as
| 
|    sysctl_max_syn_backlog <= nr_table_entries <= 8
| 
| If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.
| 
| The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).
| 
| The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
| which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
|  * net/dccp/proto.c   (dccp_listen_start) or
|  * net/ipv4/af_inet.c (inet_listen).

-- 

^ permalink raw reply

* Re: [PATCH] connector: Fix regression introduced by sid connector
From: Christian Borntraeger @ 2009-10-02  6:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: oleg, scott, zbr, linux-kernel, matthltc, davem, netdev
In-Reply-To: <20091001141426.2c1a0139.akpm@linux-foundation.org>

Sorry about that. Dont know how this escaped.  It was probably hiding between
all the sparse warnings I get in kernel/* and a lack of knowledge in this
area.
Here is a new version:



since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add event 
for process becoming session leader) we have the following warning:
Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
 [<000000000048a946>] sk_filter+0x9a/0xd0
 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
 [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
 [<0000000000142604>] __set_special_pids+0x58/0x90
 [<0000000000159938>] sys_setsid+0xb4/0xd8
 [<00000000001187fe>] sysc_noemu+0x10/0x16
 [<00000041616cb266>] 0x41616cb266

The warning is
--->    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling
the connector. 
After a discussion we agreed that we can move proc_sid_connector
from __set_special_pids to sys_setsid.
We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CCed: Scott James Remnant <scott@ubuntu.com>
CCed: Matt Helsley <matthltc@us.ibm.com>
CCed: David S. Miller <davem@davemloft.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
---
 kernel/exit.c |    4 +---
 kernel/sys.c  |    2 ++
 2 files changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/exit.c
===================================================================
--- linux-2.6.orig/kernel/exit.c
+++ linux-2.6/kernel/exit.c
@@ -359,10 +359,8 @@ void __set_special_pids(struct pid *pid)
 {
 	struct task_struct *curr = current->group_leader;
 
-	if (task_session(curr) != pid) {
+	if (task_session(curr) != pid)
 		change_pid(curr, PIDTYPE_SID, pid);
-		proc_sid_connector(curr);
-	}
 
 	if (task_pgrp(curr) != pid)
 		change_pid(curr, PIDTYPE_PGID, pid);
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -1110,6 +1110,8 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
+	if (err > 0)
+		proc_sid_connector(group_leader);
 	return err;
 }
 

^ permalink raw reply

* [Question]: reqsk table size limited to 16?
From: Gerrit Renker @ 2009-10-02  6:11 UTC (permalink / raw)
  To: netdev

Can someone please have a look, it may be that I am missing something?

It seems that in the following the maximum number of table entries is set
to always 16, despite sysctl_max_syn_backlog (tcp_max_syn_backlog), 
overriding the 'backlog' parameter to listen(2).

net/core/request_sock.c
-----------------------

int reqsk_queue_alloc(struct request_sock_queue *queue,
                      unsigned int nr_table_entries)
{
        size_t lopt_size = sizeof(struct listen_sock);
        struct listen_sock *lopt;

	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
        nr_table_entries = max_t(u32, nr_table_entries, 8);
        nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

	//...
	for (lopt->max_qlen_log = 3;
             (1 << lopt->max_qlen_log) < nr_table_entries;
             lopt->max_qlen_log++);

 	//...
	lopt->nr_table_entries = nr_table_entries;
	
	//...
	return 0
}

The function is called with an argument 'nr_table_entries', which is then clamped as

   sysctl_max_syn_backlog <= nr_table_entries <= 8

If nr_table_entries = 8, then round_pow_of_two(8 + 1) = 16.

The sysctl value is set to a much higher value (default 128 or 1024, net/ipv4/tcp.c).

The reqsk_queue_alloc() gets 'nr_table_entries' passed directly from inet_csk_listen_start(),
which in turn gets its 'nr_table_entries' as the 'backlog' argument to listen(2) via
 * net/dccp/proto.c   (dccp_listen_start) or
 * net/ipv4/af_inet.c (inet_listen).

^ permalink raw reply

* [PATCH] cnic: Fix NETDEV_UP event processing.
From: Michael Chan @ 2009-10-02  6:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, michaelc, Michael Chan, Benjamin Li

This fixes the problem of not handling the NETDEV_UP event properly
during hot-plug or modprobe of bnx2 after cnic.  The handling was
skipped by mistakenly using "else if" to check for the event.

Also update version to 2.0.1.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/cnic.c    |    3 ++-
 drivers/net/cnic_if.h |    4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
index 211c8e9..46c87ec 100644
--- a/drivers/net/cnic.c
+++ b/drivers/net/cnic.c
@@ -2733,7 +2733,8 @@ static int cnic_netdev_event(struct notifier_block *this, unsigned long event,
 			cnic_ulp_init(dev);
 		else if (event == NETDEV_UNREGISTER)
 			cnic_ulp_exit(dev);
-		else if (event == NETDEV_UP) {
+
+		if (event == NETDEV_UP) {
 			if (cnic_register_netdev(dev) != 0) {
 				cnic_put(dev);
 				goto done;
diff --git a/drivers/net/cnic_if.h b/drivers/net/cnic_if.h
index a492357..d8b09ef 100644
--- a/drivers/net/cnic_if.h
+++ b/drivers/net/cnic_if.h
@@ -12,8 +12,8 @@
 #ifndef CNIC_IF_H
 #define CNIC_IF_H
 
-#define CNIC_MODULE_VERSION	"2.0.0"
-#define CNIC_MODULE_RELDATE	"May 21, 2009"
+#define CNIC_MODULE_VERSION	"2.0.1"
+#define CNIC_MODULE_RELDATE	"Oct 01, 2009"
 
 #define CNIC_ULP_RDMA		0
 #define CNIC_ULP_ISCSI		1
-- 
1.6.4.GIT



^ permalink raw reply related

* Re: [PATCH] Use sk_mark for routing lookup in more places
From: Eric Dumazet @ 2009-10-02  6:08 UTC (permalink / raw)
  Cc: David Miller, atis, panther, netdev
In-Reply-To: <4AC58C46.8080408@gmail.com>

Eric Dumazet a écrit :
> Here is a followup on this area, thanks.
> 
> [RFC] af_packet: fill skb->mark at xmit
> 
> skb->mark may be used by classifiers, so fill it in case user 
> set a SO_MARK option on socket.
> 

Maybe a more generic way to handle this for various protocols
would be to fill skb->mark in sock_alloc_send_pskb()



^ permalink raw reply

* Re: query: adding a sysctl
From: Stephen Hemminger @ 2009-10-02  5:57 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev
In-Reply-To: <4AC57AC5.3080703@gmail.com>

On Fri, 02 Oct 2009 00:00:05 -0400
William Allen Simpson <william.allen.simpson@gmail.com> wrote:

> [My first post here, hopefully not a FAQ, as I've googled it, but cannot find
> the definitive answer.]
> 
> I've been trying to add a sysctl, and I've noticed this message:
> 
> sysctl table check failed: /net/ipv4/tcp_cookie_size .3.5.126 Unknown sysctl binary path
> 
> I modeled the code on sysctl_tcp_syncookies, and apparently I'm missing some
> additional magic?  Or does something need to be done other than C?

The sysctl table check code is kernel/sysctl.c, it maps numerical
sysctl values to /proc paths so that the permissions checks on the numeric
sysctl match those of the /proc file involved.

Hint: the easiest way to find things out is to use git grep
to see how any related sysctl is implemented.

BUT numbered sysctl values are deprecated and should no longer be added.
The current way is to use CTL_UNNUMBERED instead, if you use CTL_UNNUMBERED
then the table does not need to be changed.

-- 

^ permalink raw reply

* Re: [PATCH 00/31] Swap over NFS -v20
From: Neil Brown @ 2009-10-02  5:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <20091001174201.GA30068@infradead.org>

On Thursday October 1, hch@infradead.org wrote:
> 
> The other really big one is adding a proper method for safe, page-backed
> kernelspace I/O on files.  That is not something like the grotty
> swap-tied address_space operations in this patch, but more something in
> the direction of the kernel direct I/O patches from Jenx Axboe he did
> for using in the loop driver.  But even those aren't complete as they
> don't touch the locking issue yet.

Do you have a problem with the proposed address_space operations apart
from their names including the word "swap"?  Would something like:
  direct_on, direct_off, direct_read, direct_write
be better.
Semantics being that the read and write:
  - bypass the page cache (invalidation is up to caller)
  - must not make a blocking non-emergency memory allocation
direct_on does any pre-allocation and pre-reading to ensure those
semantics and be provided.

I have wondered if an extra flag along the lines of "I don't care
about this data after a crash" would be useful.
It would be set for swap, but not set for other users.  Thus
e.g. RAID1 could easily avoid resyncing an area that was used only for
swap.

The only thing of Jens' that I could find used bmap - is there
something more recent I should look for?

> 
> Especially the latter is an absolutely essential step to make any
> progress here, and an excellent patch series of it's own as there are
> multiple users for this, like making swap safe on btrfs files, making
> the MD bitmap code actually safe or improving the loop driver.

100% agree.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Gerrit Renker @ 2009-10-01 20:40 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <cb00fa210909231843q7f13b2c3i32672e883a017b7b@mail.gmail.com>

| >> The following code would be correct then?
| >>
| >>	 if ((len <= 0) ||
| >>	     (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval)))
| > {
| >> +		 cur->li_losses += rh->num_losses;
| >> + 		 rh->num_losses  = 0;
| >> 		 return false;
| >> With this change I suppose the could be fixed. With that, the
| >> rh->num_losses couldn't added twice. Am I correct?
| >>
| >>
| > The function tfrc_lh_interval_add() is called when
| >  * __two_after_loss() returns true (a new loss is detected) or
| >  * a data packet is ECN-CE marked.
| >
| > I am still not sure about the 'len <= 0' case; this would be true
| > if an ECN-marked packet arrives whose sequence number is 'before'
| > the start of the current loss interval, or if a loss is detected
| > which is older than the start of the current loss interval.
| >
| > The other case (tfrc_lh_closed_check) returns 1 if the current loss
| > interval is 'closed' according to RFC 4342, 10.2.
| >
| > Intuitively, in the first case it refers to the preceding loss
| > interval (i.e. not cur->...), in the second case it seems correct.
| >
| > Doing the first case is complicated due to going back in history.
| > The simplest solution I can think of at the moment is to ignore
| > the exception-case of reordered packets and do something like
| >
| >  if (len <= 0) {
| >     /* FIXME: this belongs into the previous loss interval */
| >     tfrc_pr_debug("Warning: ignoring loss due to reordering");
| > 	return false;
| > }
| >  if (!tfrc_lh_closed_check(...)) {
| >     // your code from above
| > }
| 
| Okay, i'll add your sugestion. But i don't know how this would be fixed at all.
|
If it doesn't we will just do another iteration and fix it.



| > So it is necessary to decide whether to go the full way, which means
| >  * support Loss Intervals and Dropped Packets alike
| >  * modify TFRC library (it will be a redesign)
| >  * modify receiver code
| >  * modify sender code,
| >    or to use the present approach where
| >  * the receiver computes the Loss Rate and
| >  * a Mandatory Send Loss Event Rate feature is present during feature
| >    negotiation, to avoid problems with incompatible senders
| >   (there is a comment explaining this, in net/dccp/feat.c).
| >
| > Thoughts?
| 
<snip>

| I believe that the first way is better (to "support Loss Intervals and
| Dropped Packets alike..."), because RFC requires loss intervals option
| to be sent. And so, proceed and implement dropped packets option for
| TFRC-SP. You are right, this would need a redesign and rewrite of
| sender and receiver code.
| 
Agree, then let's do that. It requires some coordination on how to arrange
the patches, but we can simplify the process by using the test tree to 
store all intermediate results (i.e. use a separate tree for the rewrite
until it is sufficiently stable/useful).

^ permalink raw reply

* Re: [PATCH 1/7] mlx4: Added interrupts test support
From: David Miller @ 2009-10-02  5:27 UTC (permalink / raw)
  To: rdreier; +Cc: yevgenyp, netdev
In-Reply-To: <adaeipmqxmv.fsf@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Thu, 01 Oct 2009 20:32:08 -0700

> This feels like a pretty risky thing to do while the device might be
> handling all sorts of other traffic at the same time.  Are you sure
> there are no races you expose here?  Have you actually seen cases where
> the interrupt test during initialization works but then this test
> catches a problem?  (My experience has been that if any MSI-X interrupts
> work from a device, then they'll all work)

I would suggest only allowing the test while the interface is down.
That way the test has exclusive control of the IRQ.

^ permalink raw reply

* Re: [PATCH 01/31] mm: serialize access to min_free_kbytes
From: Neil Brown @ 2009-10-02  5:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <alpine.DEB.1.00.0910011330430.27559@chino.kir.corp.google.com>

On Thursday October 1, rientjes@google.com wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> 
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl> 
> > 
> > There is a small race between the procfs caller and the memory hotplug caller
> > of setup_per_zone_wmarks(). Not a big deal, but the next patch will add yet
> > another caller. Time to close the gap.
> > 
> 
> By "next patch," you mean "mm: emegency pool" (patch 08/31)?

:-)  It is always safer to say "a subsequent patch", isn't it....

> 
> If so, can't you eliminate var_free_mutex entirely from that patch and 
> take min_free_lock in adjust_memalloc_reserve() instead?

adjust_memalloc_reserve does a test alloc/free cycle under a lock.
That cannot be done under a spin-lock, it must be a mutex.
So I don't think you can eliminate var_free_mutex.

Thanks,
NeilBrown

> 
>  [ __adjust_memalloc_reserve() would call __setup_per_zone_wmarks()
>    under lock instead, now. ]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] Use sk_mark for routing lookup in more places
From: Eric Dumazet @ 2009-10-02  5:14 UTC (permalink / raw)
  To: David Miller; +Cc: atis, panther, netdev
In-Reply-To: <20091001.151823.263194343.davem@davemloft.net>

Here is a followup on this area, thanks.

[RFC] af_packet: fill skb->mark at xmit

skb->mark may be used by classifiers, so fill it in case user 
set a SO_MARK option on socket.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/packet/af_packet.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d7ecca0..610f150 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -490,6 +490,7 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct socket *sock,
 	skb->protocol = proto;
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 	if (err)
 		goto out_free;
 
@@ -856,6 +857,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->protocol = proto;
 	skb->dev = dev;
 	skb->priority = po->sk.sk_priority;
+	skb->mark = po->sk.sk_mark;
 	skb_shinfo(skb)->destructor_arg = ph.raw;
 
 	switch (po->tp_version) {
@@ -1125,6 +1127,7 @@ static int packet_snd(struct socket *sock,
 	skb->protocol = proto;
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 
 	/*
 	 *	Now send it


^ permalink raw reply related

* Re: [PATCH 03/31] mm: expose gfp_to_alloc_flags()
From: Neil Brown @ 2009-10-02  5:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <alpine.DEB.1.00.0910011355230.32006@chino.kir.corp.google.com>

On Thursday October 1, rientjes@google.com wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> 
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl> 
> > 
> > Expose the gfp to alloc_flags mapping, so we can use it in other parts
> > of the vm.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
> 
> Nack, these flags are internal to the page allocator and exporting them to 
> generic VM code is unnecessary.
> 
> The only bit you actually use in your patchset is ALLOC_NO_WATERMARKS to 
> determine whether a particular allocation can use memory reserves.  I'd 
> suggest adding a bool function that returns whether the current context is 
> given access to reserves including your new __GFP_MEMALLOC flag and 
> exporting that instead.

That sounds like a very appropriate suggestion, thanks.

So something like this?
Then change every occurrence of
+		if (!(gfp_to_alloc_flags(gfpflags) & ALLOC_NO_WATERMARKS))
to
+		if (!(gfp_has_no_watermarks(gfpflags)))

??

Thanks,
NeilBrown



diff --git a/mm/internal.h b/mm/internal.h
index 22ec8d2..7ff78d6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -195,6 +195,8 @@ static inline struct page *mem_map_next(struct page *iter,
 #define __paginginit __init
 #endif
 
+int gfp_has_no_watermarks(gfp_t gfp_mask);
+
 /* Memory initialisation debug and verification */
 enum mminit_level {
 	MMINIT_WARNING,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..4b4292a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1782,6 +1782,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	return alloc_flags;
 }
 
+int gfp_has_no_watermarks(gfp_t gfp_mask)
+{
+	return (gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
+}
+
 static inline struct page *
 __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH 30/31] Fix use of uninitialized variable in cache_grow()
From: Neil Brown @ 2009-10-02  4:54 UTC (permalink / raw)
  To: David Rientjes
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <alpine.DEB.1.00.0910011341280.27559@chino.kir.corp.google.com>

On Thursday October 1, rientjes@google.com wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> 
> > From: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > This fixes a bug in reserve-slub.patch.
> > 
> > If cache_grow() was called with objp != NULL then the 'reserve' local
> > variable wasn't initialized. This resulted in ac->reserve being set to
> > a rubbish value.  Due to this in some circumstances huge amounts of
> > slab pages were allocated (due to slab_force_alloc() returning true),
> > which caused atomic page allocation failures and slowdown of the
> > system.
> > 
> > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> > Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
> > ---
> >  mm/slab.c |    5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > Index: mmotm/mm/slab.c
> > ===================================================================
> > --- mmotm.orig/mm/slab.c
> > +++ mmotm/mm/slab.c
> > @@ -2760,7 +2760,7 @@ static int cache_grow(struct kmem_cache
> >  	size_t offset;
> >  	gfp_t local_flags;
> >  	struct kmem_list3 *l3;
> > -	int reserve;
> > +	int reserve = -1;
> >  
> >  	/*
> >  	 * Be lazy and only check for valid flags here,  keeping it out of the
> > @@ -2816,7 +2816,8 @@ static int cache_grow(struct kmem_cache
> >  	if (local_flags & __GFP_WAIT)
> >  		local_irq_disable();
> >  	check_irq_off();
> > -	slab_set_reserve(cachep, reserve);
> > +	if (reserve != -1)
> > +		slab_set_reserve(cachep, reserve);
> >  	spin_lock(&l3->list_lock);
> >  
> >  	/* Make slab active. */
> 
> Given the patch description, shouldn't this be a test for objp != NULL 
> instead, then?

In between those to patch hunks, cache_grow contains the code:
	if (!objp)
		objp = kmem_getpages(cachep, local_flags, nodeid, &reserve);
	if (!objp)
		goto failed;

We can no longer test if objp was NULL on entry to the function.
We could take a copy of objp on entry to the function, and test it
here.  But initialising 'reserve' to an invalid value is easier.



> 
> If so, it doesn't make sense because reserve will only be initialized when 
> objp == NULL in the call to kmem_getpages() from cache_grow().
> 
> 
> The title of the patch suggests this is just dealing with an uninitialized 
> auto variable so the anticipated change would be from "int reserve" to 
> "int uninitialized_var(result)".

That change is only appropriate when the compiler is issuing a
warning that the variable is used before it is initialised, but we
know that not to be the case.
In this situation, we know it *is* being used before it is
initialised, and so we need to initialise it to something.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 04/31] mm: tag reseve pages
From: Neil Brown @ 2009-10-02  4:43 UTC (permalink / raw)
  To: David Rientjes
  Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
	trond.myklebust
In-Reply-To: <alpine.DEB.1.00.0910011407390.32006@chino.kir.corp.google.com>

On Thursday October 1, rientjes@google.com wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> 
> > Index: mmotm/mm/page_alloc.c
> > ===================================================================
> > --- mmotm.orig/mm/page_alloc.c
> > +++ mmotm/mm/page_alloc.c
> > @@ -1501,8 +1501,10 @@ zonelist_scan:
> >  try_this_zone:
> >  		page = buffered_rmqueue(preferred_zone, zone, order,
> >  						gfp_mask, migratetype);
> > -		if (page)
> > +		if (page) {
> > +			page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> >  			break;
> > +		}
> >  this_zone_full:
> >  		if (NUMA_BUILD)
> >  			zlc_mark_zone_full(zonelist, z);
> 
> page->reserve won't necessary indicate that access to reserves was 
> _necessary_ for the allocation to succeed, though.  This will mark any 
> page being allocated under PF_MEMALLOC as reserve when all zones may be 
> well above their min watermarks.

Normally if zones are above their watermarks, page->reserve will not
be set.
This is because __alloc_page_nodemask (which seems to be the main
non-inline entrypoint) first calls get_page_from_freelist with
alloc_flags set to ALLOC_WMARK_LOW|ALLOC_CPUSET.
Only if this fails does __alloc_page_nodemask call
__alloc_pages_slowpath which potentially sets ALLOC_NO_WATERMARKS in
alloc_flags.

So page->reserved being set actually tells us:
  PF_MEMALLOC or GFP_MEMALLOC were used, and
  a WMARK_LOW allocation attempt failed very recently

which is close enough to "the emergency reserves were used" I think.

Thanks,
NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] tg3: Remove prev_vlan_tag from struct tx_ring_info
From: Eric Dumazet @ 2009-10-02  4:16 UTC (permalink / raw)
  To: David Miller; +Cc: mcarlson, netdev, mchan
In-Reply-To: <20091001.143859.53379358.davem@davemloft.net>

David Miller a écrit :
> 
> Applied, thanks.
> 
> Eric, I had to apply this by hand because:
> 
>>> @@ -2412,7 +2412,6 @@ struct ring_info {
>>>  
>>>  struct tx_ring_info {
>>>  	struct sk_buff                  *skb;
>>> -	u32                             prev_vlan_tag;
>>>  };
> 
> Your email client changed tabs into spaces.

Oops, I'm sorry Dave, I did a copy/paste and forgot about tabs.

Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox