Netdev List
 help / color / mirror / Atom feed
* [PATCH] netdev: smc91x: drop Blackfin cruft
From: Mike Frysinger @ 2009-09-14 18:03 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-kernel, Michael Hennerich

From: Michael Hennerich <michael.hennerich@analog.com>

Now that all Blackfin boards are using the board resources, we don't need
to keep the arch/board specific crap in the driver header.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/smc91x.h |   28 ----------------------------
 1 files changed, 0 insertions(+), 28 deletions(-)

diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h
index 57a159f..9c8c6ed 100644
--- a/drivers/net/smc91x.h
+++ b/drivers/net/smc91x.h
@@ -83,34 +83,6 @@ static inline void SMC_outw(u16 val, void __iomem *ioaddr, int reg)
 	}
 }
 
-#elif defined(CONFIG_BLACKFIN)
-
-#define SMC_IRQ_FLAGS		IRQF_TRIGGER_HIGH
-#define RPC_LSA_DEFAULT		RPC_LED_100_10
-#define RPC_LSB_DEFAULT		RPC_LED_TX_RX
-
-#define SMC_CAN_USE_8BIT	0
-#define SMC_CAN_USE_16BIT	1
-# if defined(CONFIG_BF561)
-#define SMC_CAN_USE_32BIT	1
-# else
-#define SMC_CAN_USE_32BIT	0
-# endif
-#define SMC_IO_SHIFT		0
-#define SMC_NOWAIT      	1
-#define SMC_USE_BFIN_DMA	0
-
-#define SMC_inw(a, r)		readw((a) + (r))
-#define SMC_outw(v, a, r)	writew(v, (a) + (r))
-#define SMC_insw(a, r, p, l)	readsw((a) + (r), p, l)
-#define SMC_outsw(a, r, p, l)	writesw((a) + (r), p, l)
-# if SMC_CAN_USE_32BIT
-#define SMC_inl(a, r)		readl((a) + (r))
-#define SMC_outl(v, a, r)	writel(v, (a) + (r))
-#define SMC_insl(a, r, p, l)	readsl((a) + (r), p, l)
-#define SMC_outsl(a, r, p, l)	writesl((a) + (r), p, l)
-# endif
-
 #elif defined(CONFIG_REDWOOD_5) || defined(CONFIG_REDWOOD_6)
 
 /* We can only do 16-bit reads and writes in the static memory space. */
-- 
1.6.4.2

^ permalink raw reply related

* Re: [PATCH] pkt_sched: Fix qdisc_graft WRT ingress qdisc
From: Patrick McHardy @ 2009-09-14 17:58 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev
In-Reply-To: <20090914083544.GA10444@ff.dom.local>

Jarek Poplawski wrote:
> After the recent mq change using ingress qdisc overwrites dev->qdisc;
> there is also a wrong old qdisc pointer passed to notify_and_destroy.

Good catch, thanks.

^ permalink raw reply

* [net-next PATCH 3/3] ixgbe: Create separate media type for CX4 adapters
From: Jeff Kirsher @ 2009-09-14 17:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Peter P Waskiewicz Jr, Jeff Kirsher, Don Skidmore
In-Reply-To: <20090914174659.4560.28814.stgit@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

Currently the media type detection for CX4 adapters lumps them into a
type of fiber.  This causes some strange fallout when firmware verification
is done on the NIC, and certain fiber NIC rules get enforced incorrectly.

This patch introduces a new media type for CX4, and puts both 82598 and
82599 CX4 adapters into this bucket.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
---

 drivers/net/ixgbe/ixgbe_82598.c |    6 ++++--
 drivers/net/ixgbe/ixgbe_82599.c |    2 +-
 drivers/net/ixgbe/ixgbe_type.h  |    1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82598.c b/drivers/net/ixgbe/ixgbe_82598.c
index cb7f0c3..56b12f3 100644
--- a/drivers/net/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ixgbe/ixgbe_82598.c
@@ -322,14 +322,16 @@ static enum ixgbe_media_type ixgbe_get_media_type_82598(struct ixgbe_hw *hw)
 		break;
 	case IXGBE_DEV_ID_82598AF_DUAL_PORT:
 	case IXGBE_DEV_ID_82598AF_SINGLE_PORT:
-	case IXGBE_DEV_ID_82598EB_CX4:
-	case IXGBE_DEV_ID_82598_CX4_DUAL_PORT:
 	case IXGBE_DEV_ID_82598_DA_DUAL_PORT:
 	case IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM:
 	case IXGBE_DEV_ID_82598EB_XF_LR:
 	case IXGBE_DEV_ID_82598EB_SFP_LOM:
 		media_type = ixgbe_media_type_fiber;
 		break;
+	case IXGBE_DEV_ID_82598EB_CX4:
+	case IXGBE_DEV_ID_82598_CX4_DUAL_PORT:
+		media_type = ixgbe_media_type_cx4;
+		break;
 	case IXGBE_DEV_ID_82598AT:
 	case IXGBE_DEV_ID_82598AT2:
 		media_type = ixgbe_media_type_copper;
diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index c9006bb..2ec58dc 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -338,7 +338,7 @@ static enum ixgbe_media_type ixgbe_get_media_type_82599(struct ixgbe_hw *hw)
 		media_type = ixgbe_media_type_fiber;
 		break;
 	case IXGBE_DEV_ID_82599_CX4:
-		media_type = ixgbe_media_type_fiber;
+		media_type = ixgbe_media_type_cx4;
 		break;
 	default:
 		media_type = ixgbe_media_type_unknown;
diff --git a/drivers/net/ixgbe/ixgbe_type.h b/drivers/net/ixgbe/ixgbe_type.h
index 37303a1..8761d78 100644
--- a/drivers/net/ixgbe/ixgbe_type.h
+++ b/drivers/net/ixgbe/ixgbe_type.h
@@ -2144,6 +2144,7 @@ enum ixgbe_media_type {
 	ixgbe_media_type_fiber,
 	ixgbe_media_type_copper,
 	ixgbe_media_type_backplane,
+	ixgbe_media_type_cx4,
 	ixgbe_media_type_virtual
 };
 


^ permalink raw reply related

* [net-next PATCH 2/3] ixgbe: Add support for 82599-based CX4 adapters
From: Jeff Kirsher @ 2009-09-14 17:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Peter P Waskiewicz Jr, Jeff Kirsher, Don Skidmore
In-Reply-To: <20090914174659.4560.28814.stgit@localhost.localdomain>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

This patch adds support for CX4 adapters based on 82599.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
---

 drivers/net/ixgbe/ixgbe_82599.c |    3 +++
 drivers/net/ixgbe/ixgbe_main.c  |    2 ++
 drivers/net/ixgbe/ixgbe_type.h  |    1 +
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index 61af47e..c9006bb 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -337,6 +337,9 @@ static enum ixgbe_media_type ixgbe_get_media_type_82599(struct ixgbe_hw *hw)
 	case IXGBE_DEV_ID_82599_SFP:
 		media_type = ixgbe_media_type_fiber;
 		break;
+	case IXGBE_DEV_ID_82599_CX4:
+		media_type = ixgbe_media_type_fiber;
+		break;
 	default:
 		media_type = ixgbe_media_type_unknown;
 		break;
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index b47aaa8..59ad959 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -97,6 +97,8 @@ static struct pci_device_id ixgbe_pci_tbl[] = {
 	 board_82599 },
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP),
 	 board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_CX4),
+	 board_82599 },
 
 	/* required last entry */
 	{0, }
diff --git a/drivers/net/ixgbe/ixgbe_type.h b/drivers/net/ixgbe/ixgbe_type.h
index 8ba90ee..37303a1 100644
--- a/drivers/net/ixgbe/ixgbe_type.h
+++ b/drivers/net/ixgbe/ixgbe_type.h
@@ -49,6 +49,7 @@
 #define IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM      0x10E1
 #define IXGBE_DEV_ID_82598EB_XF_LR       0x10F4
 #define IXGBE_DEV_ID_82599_KX4           0x10F7
+#define IXGBE_DEV_ID_82599_CX4           0x10F9
 #define IXGBE_DEV_ID_82599_SFP           0x10FB
 #define IXGBE_DEV_ID_82599_XAUI_LOM      0x10FC
 


^ permalink raw reply related

* [net-next PATCH 1/3] ixgbe: Properly disable packet split per-ring when globally disabled
From: Jeff Kirsher @ 2009-09-14 17:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Peter P Waskiewicz Jr, Jeff Kirsher, Don Skidmore

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

The packet split feature was recently moved out of the adapter-wide flags
feature field and into a per-Rx ring feature field.  In the process, packet
split isn't properly disabled in the Rx ring if the adapter has it globally
disabled, followed by a device reset.

This won't impact the driver today, since it's always in packet split mode.
However, this will prevent any pitfalls if someone disables packet split on
the adapter in the future and doesn't disable it in each ring.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
---

 drivers/net/ixgbe/ixgbe_main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 45bf8b9..b47aaa8 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -2055,6 +2055,8 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter)
 
 		if (adapter->flags & IXGBE_FLAG_RX_PS_ENABLED)
 			rx_ring->flags |= IXGBE_RING_RX_PS_ENABLED;
+		else
+			rx_ring->flags &= ~IXGBE_RING_RX_PS_ENABLED;
 
 #ifdef IXGBE_FCOE
 		if (netdev->features & NETIF_F_FCOE_MTU) {


^ permalink raw reply related

* : Release of iptables-1.4.5
From: Patrick McHardy @ 2009-09-14 17:05 UTC (permalink / raw)
  To: Netfilter Development Mailinglist, Linux Netdev List,
	'netfilter@vger.kernel.org', netfilter

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

The netfilter coreteam presents:

    iptables version 1.4.5

the iptables release for the 2.6.31 kernel. Changes include:

- support for nfnetlink_queue queue balancing

- support for revision 2 of the conntrack match, which fixes a
  regression in state matching

- support for multiple IP address specifications with -s/-d

- various internal cleanups and improvements

- manpage updates

Version 1.4.5 can be obtained from:

http://www.netfilter.org/projects/iptables/downloads.html
ftp://ftp.netfilter.org/pub/iptables/
git://git.netfilter.org/iptables.git

On behalf of the Netfilter Core Team.
Happy firewalling!


[-- Attachment #2: changes-iptables-1.4.5.txt --]
[-- Type: text/plain, Size: 1335 bytes --]

Florian Westphal (1):
      libxt_NFQUEUE: add new v1 version with queue-balance option

Jan Engelhardt (18):
      xt_conntrack: revision 2 for enlarged state_mask member
      libxt_helper: fix invalid passed option to check_inverse
      libiptc: split v4 and v6
      extensions: collapse registration structures
      iptables: allow for parse-less extensions
      iptables: allow for help-less extensions
      extensions: remove empty help and parse functions
      xtables: add multi-registration functions
      extensions: collapse data variables to use multi-reg calls
      xtables: warn of missing version identifier in extensions
      COMMIT_NOTES: notice to check for soversion bumps
      build: order of dependent libs is sensitive
      multi binary: allow subcommand via argv[1]
      build: fix struct size mismatch
      build: combine iptables-multi and iptables-static
      build: build only iptables-multi
      Merge branch 'stable'
      manpages: more fixes to minuses, hyphens, dashes

Laurence J. Lane (1):
      manpage: fix lintian warnings

Michael Granzow (1):
      iptables: accept multiple IP address specifications for -s, -d

Patrick McHardy (2):
      man: fix incorrect plural in libipt_set.man
      Bump version number to 1.4.5

Trent W. Buck (1):
      ipt_set: fix a typo in the manpage


^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-14 16:53 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, netdev, virtualization, kvm, linux-kernel, mingo,
	linux-mm, akpm, hpa, Rusty Russell, s.hetze
In-Reply-To: <4AAE6A97.7090808@gmail.com>

On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
> >> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
> >> the memory is not assumed to be a userspace address.  Rather, it is a
> >> memctx-specific address, which can be userspace, or any other type
> >> (including hardware, dma-engine, etc).  As long as the memctx knows how
> >> to translate it, it will work.
> > 
> > How would permissions be handled?
> 
> Same as anything else, really.  Read on for details.
> 
> > it's easy to allow an app to pass in virtual addresses in its own address space.
> 
> Agreed, and this is what I do.
> 
> The guest always passes its own physical addresses (using things like
> __pa() in linux).  This address passed is memctx specific, but generally
> would fall into the category of "virtual-addresses" from the hosts
> perspective.
> 
> For a KVM/AlacrityVM guest example, the addresses are GPAs, accessed
> internally to the context via a gfn_to_hva conversion (you can see this
> occuring in the citation links I sent)
> 
> For Ira's example, the addresses would represent a physical address on
> the PCI boards, and would follow any kind of relevant rules for
> converting a "GPA" to a host accessible address (even if indirectly, via
> a dma controller).

So vbus can let an application access either its own virtual memory or a
physical memory on a PCI device.  My question is, is any application
that's allowed to do the former also granted rights to do the later?

> >  But we can't let the guest specify physical addresses.
> 
> Agreed.  Neither your proposal nor mine operate this way afaict.
> 
> HTH
> 
> Kind Regards,
> -Greg
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-14 16:47 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, netdev, virtualization, kvm, linux-kernel, mingo,
	linux-mm, akpm, hpa, Rusty Russell, s.hetze
In-Reply-To: <4AAE6A97.7090808@gmail.com>

On Mon, Sep 14, 2009 at 12:08:55PM -0400, Gregory Haskins wrote:
> For Ira's example, the addresses would represent a physical address on
> the PCI boards, and would follow any kind of relevant rules for
> converting a "GPA" to a host accessible address (even if indirectly, via
> a dma controller).

I don't think limiting addresses to PCI physical addresses will work
well.  From what I rememeber, Ira's x86 can not initiate burst
transactions on PCI, and it's the ppc that initiates all DMA.

> 
> >  But we can't let the guest specify physical addresses.
> 
> Agreed.  Neither your proposal nor mine operate this way afaict.

But this seems to be what Ira needs.

> HTH
> 
> Kind Regards,
> -Greg
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: INGO Why you remove  set_user_nice() from kernel/kthread.c
From: Mike Galbraith @ 2009-09-14 16:36 UTC (permalink / raw)
  To: Chris Friesen
  Cc: pavel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Linux Network Development list
In-Reply-To: <4AAE6FD7.5070401@nortel.com>

On Mon, 2009-09-14 at 10:31 -0600, Chris Friesen wrote:

> I've added netdev to the CC list.  Maybe some of the people there have
> an opinion on what the default priority should be for ksoftirqd.

Good idea.

	-Mike

^ permalink raw reply

* Re: INGO Why you remove  set_user_nice() from kernel/kthread.c
From: Chris Friesen @ 2009-09-14 16:31 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: pavel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Linux Network Development list
In-Reply-To: <1252943123.12986.70.camel@marge.simson.net>

On 09/14/2009 09:45 AM, Mike Galbraith wrote:
> On Mon, 2009-09-14 at 09:12 -0600, Chris Friesen wrote:
>> On 09/14/2009 08:05 AM, Mike Galbraith wrote:

>>> I did that, not Ingo, and did so because with kthreads that use
>>> diddly-spit CPU (every one I see), it's just a waste of math.  What
>>> kthreads are you seeing using so much CPU that their weight is a factor?
>>> They _should_ be able to preempt and get their work done just fine
>>> without a boost.
>>
>> Under heavy network load ksoftirqd can use significant amounts of cpu.
> 
> OK, that may justify a weight adjustment, since it is a proxy for many.
> Question is, does it really need it?

I guess it depends how we want it to behave by default.  Likely anyone
that really cares is going to fine-tune the ksoftirqd priority level
anyways.

I've added netdev to the CC list.  Maybe some of the people there have
an opinion on what the default priority should be for ksoftirqd.

Chris

^ permalink raw reply

* Re: ipv4 regression in 2.6.31 ?
From: Stephen Hemminger @ 2009-09-14 16:31 UTC (permalink / raw)
  To: Stephan von Krawczynski
  Cc: Eric Dumazet, linux-kernel, davem, Linux Netdev List
In-Reply-To: <20090914175505.a3f132ee.skraw@ithnet.com>

On Mon, 14 Sep 2009 17:55:05 +0200
Stephan von Krawczynski <skraw@ithnet.com> wrote:

> On Mon, 14 Sep 2009 15:57:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Stephan von Krawczynski a écrit :
> > > Hello all,
> > > 
> > > today we experienced some sort of regression in 2.6.31 ipv4 implementation, or
> > > at least some incompatibility with former 2.6.30.X kernels.
> > > 
> > > We have the following situation:
> > > 
> > >                                        ---------- vlan1@eth0 192.168.2.1/24
> > >                                       /
> > > host A 192.168.1.1/24 eth0  -------<router>            host B
> > >                                       \
> > >                                        ---------- eth1 192.168.3.1/24
> > > 
> > > 
> > > Now, if you route 192.168.1.0/24 via interface vlan1@eth0 on host B and let
> > > host A ping 192.168.2.1 everything works. But if you route 192.168.1.0/24 via
> > > interface eth1 on host B and let host A ping 192.168.2.1 you get no reply.
> > > With tcpdump we see the icmp packets arrive at vlan1@eth0, but no icmp echo
> > > reply being generated neither on vlan1 nor eth1.
> > > Kernels 2.6.30.X and below do not show this behaviour.
> > > Is this intended? Do we need to reconfigure something to restore the old
> > > behaviour?
> > > 
> > 
> > Asymetric routing ?
> > 
> > Check your rp_filter settings
> > 
> > grep . `find /proc/sys/net -name rp_filter`
> > 
> > rp_filter - INTEGER
> >         0 - No source validation.
> >         1 - Strict mode as defined in RFC3704 Strict Reverse Path
> >             Each incoming packet is tested against the FIB and if the interface
> >             is not the best reverse path the packet check will fail.
> >             By default failed packets are discarded.
> >         2 - Loose mode as defined in RFC3704 Loose Reverse Path
> >             Each incoming packet's source address is also tested against the FIB
> >             and if the source address is not reachable via any interface
> >             the packet check will fail.
> > 
> >         Current recommended practice in RFC3704 is to enable strict mode
> >         to prevent IP spoofing from DDos attacks. If using asymmetric routing
> >         or other complicated routing, then loose mode is recommended.
> > 
> >         conf/all/rp_filter must also be set to non-zero to do source validation
> >         on the interface
> > 
> >         Default value is 0. Note that some distributions enable it
> >         in startup scripts.
> 
> Ok, here you can see 2.6.31 values from the discussed box:
> (remember, no ping reply in this setup)
> 
> /proc/sys/net/ipv4/conf/all/rp_filter:1
> /proc/sys/net/ipv4/conf/default/rp_filter:0
> /proc/sys/net/ipv4/conf/lo/rp_filter:0
> /proc/sys/net/ipv4/conf/eth2/rp_filter:0
> /proc/sys/net/ipv4/conf/eth0/rp_filter:0
> /proc/sys/net/ipv4/conf/eth1/rp_filter:0
> /proc/sys/net/ipv4/conf/vlan1/rp_filter:0
> 
> 
> And these are from the same box with 2.6.30.5:
> (ping reply works)
> 
> /proc/sys/net/ipv4/conf/all/rp_filter:1
> /proc/sys/net/ipv4/conf/default/rp_filter:0
> /proc/sys/net/ipv4/conf/lo/rp_filter:0
> /proc/sys/net/ipv4/conf/eth2/rp_filter:0
> /proc/sys/net/ipv4/conf/eth0/rp_filter:0
> /proc/sys/net/ipv4/conf/eth1/rp_filter:0
> /proc/sys/net/ipv4/conf/vlan1/rp_filter:0
> 
> As you can see they're all the same. Does this mean that rp_filter never
> really worked as intended before 2.6.31 ? Or does it mean that rp_filter=0
> (eth1 and vlan1) gets overriden by all/rp_filter=1 in 2.6.31 and not before?

RP filter did not work correctly in 2.6.30. The code added to to the loose
mode caused a bug; the rp_filter value was being computed as:
  rp_filter = interface_value & all_value;
So in order to get reverse path filter both would have to be set.

In 2.6.31 this was change to:
   rp_filter = max(interface_value, all_value);

This was the intended behaviour, if user asks all interfaces to have rp
filtering turned on, then set /proc/sys/net/ipv4/conf/all/rp_filter = 1
or to turn on just one interface, set it for just that interface.

Sorry for any confusion this caused.



-- 

^ permalink raw reply

* [PATCH alt] sky2: Make sure both ports initialize correctly
From: Stephen Hemminger @ 2009-09-14 16:22 UTC (permalink / raw)
  To: Mike McCormack, David Miller; +Cc: netdev
In-Reply-To: <4AAD7B86.6060300@ring3k.org>

Sorry Mike, I sent you off the wrong way. The following is simpler and the
second port is diffrent enough in setup (because of NAPI), that the
following is simpler.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/sky2.c	2009-09-14 09:16:34.313731803 -0700
+++ b/drivers/net/sky2.c	2009-09-14 09:19:48.240726224 -0700
@@ -4550,16 +4550,18 @@ static int __devinit sky2_probe(struct p
 	if (hw->ports > 1) {
 		struct net_device *dev1;
 
+		err = -ENOMEM;
 		dev1 = sky2_init_netdev(hw, 1, using_dac, wol_default);
-		if (!dev1)
-			dev_warn(&pdev->dev, "allocation for second device failed\n");
-		else if ((err = register_netdev(dev1))) {
+		if (dev1 && (err = register_netdev(dev1)) == 0)
+			sky2_show_addr(dev1);
+		else {
 			dev_warn(&pdev->dev,
 				 "register of second port failed (%d)\n", err);
 			hw->dev[1] = NULL;
-			free_netdev(dev1);
-		} else
-			sky2_show_addr(dev1);
+			hw->ports = 1;
+			if (dev1)
+				free_netdev(dev1);
+		}
 	}
 
 	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);



-- 

^ permalink raw reply

* [PATCH alt] sky2: transmit ring accounting
From: Stephen Hemminger @ 2009-09-14 16:12 UTC (permalink / raw)
  To: Mike McCormack, David Miller; +Cc: netdev
In-Reply-To: <4AAD7B3D.7010403@ring3k.org>

Be more accurate about number of transmit list elements required.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Note: this is an optimization, the old code guaranteed more space
than necessary (MAX_SKB_TX_LE was always bigger than needed).

--- a/drivers/net/sky2.c	2009-09-14 08:45:55.730729606 -0700
+++ b/drivers/net/sky2.c	2009-09-14 09:09:42.395712810 -0700
@@ -65,8 +65,8 @@
 #define RX_DEF_PENDING		RX_MAX_PENDING
 
 /* This is the worst case number of transmit list elements for a single skb:
-   VLAN + TSO + CKSUM + Data + skb_frags * DMA */
-#define MAX_SKB_TX_LE	(4 + (sizeof(dma_addr_t)/sizeof(u32))*MAX_SKB_FRAGS)
+   VLAN:GSO + CKSUM + Data + skb_frags * DMA */
+#define MAX_SKB_TX_LE	(2 + (sizeof(dma_addr_t)/sizeof(u32))*(MAX_SKB_FRAGS+1))
 #define TX_MIN_PENDING		(MAX_SKB_TX_LE+1)
 #define TX_MAX_PENDING		4096
 #define TX_DEF_PENDING		127
@@ -1567,11 +1567,13 @@ static unsigned tx_le_req(const struct s
 {
 	unsigned count;
 
-	count = sizeof(dma_addr_t) / sizeof(u32);
-	count += skb_shinfo(skb)->nr_frags * count;
+	count = (skb_shinfo(skb)->nr_frags + 1)
+		* (sizeof(dma_addr_t) / sizeof(u32));
 
 	if (skb_is_gso(skb))
 		++count;
+	else if (sizeof(dma_addr_t) == sizeof(u32))
+		++count;	/* possible vlan */
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
 		++count;

^ permalink raw reply

* Re: ipv4 regression in 2.6.31 ?
From: Eric Dumazet @ 2009-09-14 16:10 UTC (permalink / raw)
  To: Stephan von Krawczynski
  Cc: Stephen Hemminger, linux-kernel, davem, Linux Netdev List
In-Reply-To: <20090914175505.a3f132ee.skraw@ithnet.com>

Stephan von Krawczynski a écrit :
> On Mon, 14 Sep 2009 15:57:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Stephan von Krawczynski a écrit :
>>> Hello all,
>>>
>>> today we experienced some sort of regression in 2.6.31 ipv4 implementation, or
>>> at least some incompatibility with former 2.6.30.X kernels.
>>>
>>> We have the following situation:
>>>
>>>                                        ---------- vlan1@eth0 192.168.2.1/24
>>>                                       /
>>> host A 192.168.1.1/24 eth0  -------<router>            host B
>>>                                       \
>>>                                        ---------- eth1 192.168.3.1/24
>>>
>>>
>>> Now, if you route 192.168.1.0/24 via interface vlan1@eth0 on host B and let
>>> host A ping 192.168.2.1 everything works. But if you route 192.168.1.0/24 via
>>> interface eth1 on host B and let host A ping 192.168.2.1 you get no reply.
>>> With tcpdump we see the icmp packets arrive at vlan1@eth0, but no icmp echo
>>> reply being generated neither on vlan1 nor eth1.
>>> Kernels 2.6.30.X and below do not show this behaviour.
>>> Is this intended? Do we need to reconfigure something to restore the old
>>> behaviour?
>>>
>> Asymetric routing ?
>>
>> Check your rp_filter settings
>>
>> grep . `find /proc/sys/net -name rp_filter`
>>
>> rp_filter - INTEGER
>>         0 - No source validation.
>>         1 - Strict mode as defined in RFC3704 Strict Reverse Path
>>             Each incoming packet is tested against the FIB and if the interface
>>             is not the best reverse path the packet check will fail.
>>             By default failed packets are discarded.
>>         2 - Loose mode as defined in RFC3704 Loose Reverse Path
>>             Each incoming packet's source address is also tested against the FIB
>>             and if the source address is not reachable via any interface
>>             the packet check will fail.
>>
>>         Current recommended practice in RFC3704 is to enable strict mode
>>         to prevent IP spoofing from DDos attacks. If using asymmetric routing
>>         or other complicated routing, then loose mode is recommended.
>>
>>         conf/all/rp_filter must also be set to non-zero to do source validation
>>         on the interface
>>
>>         Default value is 0. Note that some distributions enable it
>>         in startup scripts.
> 
> Ok, here you can see 2.6.31 values from the discussed box:
> (remember, no ping reply in this setup)
> 
> /proc/sys/net/ipv4/conf/all/rp_filter:1
> /proc/sys/net/ipv4/conf/default/rp_filter:0
> /proc/sys/net/ipv4/conf/lo/rp_filter:0
> /proc/sys/net/ipv4/conf/eth2/rp_filter:0
> /proc/sys/net/ipv4/conf/eth0/rp_filter:0
> /proc/sys/net/ipv4/conf/eth1/rp_filter:0
> /proc/sys/net/ipv4/conf/vlan1/rp_filter:0
> 
> 
> And these are from the same box with 2.6.30.5:
> (ping reply works)
> 
> /proc/sys/net/ipv4/conf/all/rp_filter:1
> /proc/sys/net/ipv4/conf/default/rp_filter:0
> /proc/sys/net/ipv4/conf/lo/rp_filter:0
> /proc/sys/net/ipv4/conf/eth2/rp_filter:0
> /proc/sys/net/ipv4/conf/eth0/rp_filter:0
> /proc/sys/net/ipv4/conf/eth1/rp_filter:0
> /proc/sys/net/ipv4/conf/vlan1/rp_filter:0
> 
> As you can see they're all the same. Does this mean that rp_filter never
> really worked as intended before 2.6.31 ? Or does it mean that rp_filter=0
> (eth1 and vlan1) gets overriden by all/rp_filter=1 in 2.6.31 and not before?
>

Yes, previous kernels ignored /proc/sys/net/ipv4/conf/all/rp_filter value, it was a bug.

commit 27fed4175acf81ddd91d9a4ee2fd298981f60295
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Mon Jul 27 18:39:45 2009 -0700

    ip: fix logic of reverse path filter sysctl

    Even though reverse path filter was changed from simple boolean to
    trinary control, the loose mode only works if both all and device are
    configured because of this logic error.

    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


In your case, you *need*
echo 0 >/proc/sys/net/ipv4/conf/all/rp_filter
or
echo 2 >/proc/sys/net/ipv4/conf/all/rp_filter


^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Gregory Haskins @ 2009-09-14 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ira W. Snyder, netdev, virtualization, kvm, linux-kernel, mingo,
	linux-mm, akpm, hpa, Rusty Russell, s.hetze
In-Reply-To: <20090913120140.GA31218@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

Michael S. Tsirkin wrote:
> On Fri, Sep 11, 2009 at 12:00:21PM -0400, Gregory Haskins wrote:
>> FWIW: VBUS handles this situation via the "memctx" abstraction.  IOW,
>> the memory is not assumed to be a userspace address.  Rather, it is a
>> memctx-specific address, which can be userspace, or any other type
>> (including hardware, dma-engine, etc).  As long as the memctx knows how
>> to translate it, it will work.
> 
> How would permissions be handled?

Same as anything else, really.  Read on for details.

> it's easy to allow an app to pass in virtual addresses in its own address space.

Agreed, and this is what I do.

The guest always passes its own physical addresses (using things like
__pa() in linux).  This address passed is memctx specific, but generally
would fall into the category of "virtual-addresses" from the hosts
perspective.

For a KVM/AlacrityVM guest example, the addresses are GPAs, accessed
internally to the context via a gfn_to_hva conversion (you can see this
occuring in the citation links I sent)

For Ira's example, the addresses would represent a physical address on
the PCI boards, and would follow any kind of relevant rules for
converting a "GPA" to a host accessible address (even if indirectly, via
a dma controller).


>  But we can't let the guest specify physical addresses.

Agreed.  Neither your proposal nor mine operate this way afaict.

HTH

Kind Regards,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

^ permalink raw reply

* Re: ipv4 regression in 2.6.31 ?
From: Stephan von Krawczynski @ 2009-09-14 15:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, davem, Linux Netdev List
In-Reply-To: <4AAE4BAF.2010406@gmail.com>

On Mon, 14 Sep 2009 15:57:03 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephan von Krawczynski a écrit :
> > Hello all,
> > 
> > today we experienced some sort of regression in 2.6.31 ipv4 implementation, or
> > at least some incompatibility with former 2.6.30.X kernels.
> > 
> > We have the following situation:
> > 
> >                                        ---------- vlan1@eth0 192.168.2.1/24
> >                                       /
> > host A 192.168.1.1/24 eth0  -------<router>            host B
> >                                       \
> >                                        ---------- eth1 192.168.3.1/24
> > 
> > 
> > Now, if you route 192.168.1.0/24 via interface vlan1@eth0 on host B and let
> > host A ping 192.168.2.1 everything works. But if you route 192.168.1.0/24 via
> > interface eth1 on host B and let host A ping 192.168.2.1 you get no reply.
> > With tcpdump we see the icmp packets arrive at vlan1@eth0, but no icmp echo
> > reply being generated neither on vlan1 nor eth1.
> > Kernels 2.6.30.X and below do not show this behaviour.
> > Is this intended? Do we need to reconfigure something to restore the old
> > behaviour?
> > 
> 
> Asymetric routing ?
> 
> Check your rp_filter settings
> 
> grep . `find /proc/sys/net -name rp_filter`
> 
> rp_filter - INTEGER
>         0 - No source validation.
>         1 - Strict mode as defined in RFC3704 Strict Reverse Path
>             Each incoming packet is tested against the FIB and if the interface
>             is not the best reverse path the packet check will fail.
>             By default failed packets are discarded.
>         2 - Loose mode as defined in RFC3704 Loose Reverse Path
>             Each incoming packet's source address is also tested against the FIB
>             and if the source address is not reachable via any interface
>             the packet check will fail.
> 
>         Current recommended practice in RFC3704 is to enable strict mode
>         to prevent IP spoofing from DDos attacks. If using asymmetric routing
>         or other complicated routing, then loose mode is recommended.
> 
>         conf/all/rp_filter must also be set to non-zero to do source validation
>         on the interface
> 
>         Default value is 0. Note that some distributions enable it
>         in startup scripts.

Ok, here you can see 2.6.31 values from the discussed box:
(remember, no ping reply in this setup)

/proc/sys/net/ipv4/conf/all/rp_filter:1
/proc/sys/net/ipv4/conf/default/rp_filter:0
/proc/sys/net/ipv4/conf/lo/rp_filter:0
/proc/sys/net/ipv4/conf/eth2/rp_filter:0
/proc/sys/net/ipv4/conf/eth0/rp_filter:0
/proc/sys/net/ipv4/conf/eth1/rp_filter:0
/proc/sys/net/ipv4/conf/vlan1/rp_filter:0


And these are from the same box with 2.6.30.5:
(ping reply works)

/proc/sys/net/ipv4/conf/all/rp_filter:1
/proc/sys/net/ipv4/conf/default/rp_filter:0
/proc/sys/net/ipv4/conf/lo/rp_filter:0
/proc/sys/net/ipv4/conf/eth2/rp_filter:0
/proc/sys/net/ipv4/conf/eth0/rp_filter:0
/proc/sys/net/ipv4/conf/eth1/rp_filter:0
/proc/sys/net/ipv4/conf/vlan1/rp_filter:0

As you can see they're all the same. Does this mean that rp_filter never
really worked as intended before 2.6.31 ? Or does it mean that rp_filter=0
(eth1 and vlan1) gets overriden by all/rp_filter=1 in 2.6.31 and not before?

--
Regards,
Stephan

^ permalink raw reply

* Re: [PATCH RFC] tun: export underlying socket
From: Michael S. Tsirkin @ 2009-09-14 15:40 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, netdev, herbert
In-Reply-To: <4AAE4DFC.9080500@voltaire.com>

On Mon, Sep 14, 2009 at 05:06:52PM +0300, Or Gerlitz wrote:
> Michael S. Tsirkin wrote:
>>> how  would the use case with vhost will look like?
>> - Configure bridge and tun using existing scripts
>> - pass tun fd to vhost via an ioctl
>> - vhost calls tun_get_socket
>> - from this point, guest networking just goes faster
>
> let me see I am with you:
>
> 1. vhost gets from user space through ioctl packet socket fd OR tun fd -  
> but never both

Right

> 2. for packet socket fd
> VM.TX is translated by vhost to sendmsg which goes through the NIC
> NIC RX  makes the fd poll to signal and then recvmsg is called on the  
> fd, then vhost places the packet in a virtq
>
> 3. for tun fd
> VM.TX is translated by vhost to sendmsg which is translated by tun to  
> netif_rx which is then handled by the bridge
> NIC RX  goes to the bridge which xmits the packet a tun interface, now  
> what makes tun provide this packet to vhost and how it is done?

Same as above. vhost polls tun and calls recvmsg on the socket.

>
>> A lot of people have asked for tun support in vhost, because qemu
>> currently uses tun.  With this scheme existing code and scripts can
>> be used to configure both tun and bridge.  You also can utilize
>> virtualization-specific features in tun.

( broken too-long lines up. please do not merge them. )

> Tun has code to support some virtualization-specific features, however,  
> it has also some inherent problems, I think, for example, you don't know  
> over which NIC eventually a packet will be sent and as such, the feature  
> advertising to the guest (virtio-net) NIC is problematic,
> for example,  
> TSO. With vhost, since you are directly attached to a NIC and assuming  
> its a PF or VF NIC and not something like macvlan/veth you can actually  
> know what features are supported by this NIC.
>
> Or.

Herbert addressed the TSO example.

Generally, feature negotiation does become more complicated in bridged
configurations, but some users require bridging. So with vhost, feature
negotiation is mostly done in userspace (e.g. vhost does not expose a
TSO cpability, devices do this already); vhost itself only cares about
virtio features such as mergeable buffers.
Policy decisions, including whether to use packet socket or
tun+bridge, are up to the user.

-- 
MST

^ permalink raw reply

* Re: ipv4 regression in 2.6.31 ?
From: Eric Dumazet @ 2009-09-14 15:21 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel, davem, Linux Netdev List
In-Reply-To: <20090914171001.47371b3d.skraw@ithnet.com>

Stephan von Krawczynski a écrit :
> On Mon, 14 Sep 2009 15:57:03 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Stephan von Krawczynski a écrit :
>>> Hello all,
>>>
>>> today we experienced some sort of regression in 2.6.31 ipv4 implementation, or
>>> at least some incompatibility with former 2.6.30.X kernels.
>>>
>>> We have the following situation:
>>>
>>>                                        ---------- vlan1@eth0 192.168.2.1/24
>>>                                       /
>>> host A 192.168.1.1/24 eth0  -------<router>            host B
>>>                                       \
>>>                                        ---------- eth1 192.168.3.1/24
>>>
>>>
>>> Now, if you route 192.168.1.0/24 via interface vlan1@eth0 on host B and let
>>> host A ping 192.168.2.1 everything works. But if you route 192.168.1.0/24 via
>>> interface eth1 on host B and let host A ping 192.168.2.1 you get no reply.
>>> With tcpdump we see the icmp packets arrive at vlan1@eth0, but no icmp echo
>>> reply being generated neither on vlan1 nor eth1.
>>> Kernels 2.6.30.X and below do not show this behaviour.
>>> Is this intended? Do we need to reconfigure something to restore the old
>>> behaviour?
>>>
>> Asymetric routing ?
>>
>> Check your rp_filter settings
>>
>> grep . `find /proc/sys/net -name rp_filter`
>>
>> rp_filter - INTEGER
>>         0 - No source validation.
>>         1 - Strict mode as defined in RFC3704 Strict Reverse Path
>>             Each incoming packet is tested against the FIB and if the interface
>>             is not the best reverse path the packet check will fail.
>>             By default failed packets are discarded.
>>         2 - Loose mode as defined in RFC3704 Loose Reverse Path
>>             Each incoming packet's source address is also tested against the FIB
>>             and if the source address is not reachable via any interface
>>             the packet check will fail.
>>
>>         Current recommended practice in RFC3704 is to enable strict mode
>>         to prevent IP spoofing from DDos attacks. If using asymmetric routing
>>         or other complicated routing, then loose mode is recommended.
>>
>>         conf/all/rp_filter must also be set to non-zero to do source validation
>>         on the interface
>>
>>         Default value is 0. Note that some distributions enable it
>>         in startup scripts.
> 
> Problem is this:
> Kernel 2.6.30.X and below work flawlessly in this setup, only kernel 2.6.31
> acts different. Is this an intended change in policy?
> 

Here, it only depends on rp_filter settings, kernel 2.6.30 or 2.6.31

Please give your settings for further investigations, for all hosts involved.

^ permalink raw reply

* Re: ipv4 regression in 2.6.31 ?
From: Stephan von Krawczynski @ 2009-09-14 15:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, davem, Linux Netdev List
In-Reply-To: <4AAE4BAF.2010406@gmail.com>

On Mon, 14 Sep 2009 15:57:03 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephan von Krawczynski a écrit :
> > Hello all,
> > 
> > today we experienced some sort of regression in 2.6.31 ipv4 implementation, or
> > at least some incompatibility with former 2.6.30.X kernels.
> > 
> > We have the following situation:
> > 
> >                                        ---------- vlan1@eth0 192.168.2.1/24
> >                                       /
> > host A 192.168.1.1/24 eth0  -------<router>            host B
> >                                       \
> >                                        ---------- eth1 192.168.3.1/24
> > 
> > 
> > Now, if you route 192.168.1.0/24 via interface vlan1@eth0 on host B and let
> > host A ping 192.168.2.1 everything works. But if you route 192.168.1.0/24 via
> > interface eth1 on host B and let host A ping 192.168.2.1 you get no reply.
> > With tcpdump we see the icmp packets arrive at vlan1@eth0, but no icmp echo
> > reply being generated neither on vlan1 nor eth1.
> > Kernels 2.6.30.X and below do not show this behaviour.
> > Is this intended? Do we need to reconfigure something to restore the old
> > behaviour?
> > 
> 
> Asymetric routing ?
> 
> Check your rp_filter settings
> 
> grep . `find /proc/sys/net -name rp_filter`
> 
> rp_filter - INTEGER
>         0 - No source validation.
>         1 - Strict mode as defined in RFC3704 Strict Reverse Path
>             Each incoming packet is tested against the FIB and if the interface
>             is not the best reverse path the packet check will fail.
>             By default failed packets are discarded.
>         2 - Loose mode as defined in RFC3704 Loose Reverse Path
>             Each incoming packet's source address is also tested against the FIB
>             and if the source address is not reachable via any interface
>             the packet check will fail.
> 
>         Current recommended practice in RFC3704 is to enable strict mode
>         to prevent IP spoofing from DDos attacks. If using asymmetric routing
>         or other complicated routing, then loose mode is recommended.
> 
>         conf/all/rp_filter must also be set to non-zero to do source validation
>         on the interface
> 
>         Default value is 0. Note that some distributions enable it
>         in startup scripts.

Problem is this:
Kernel 2.6.30.X and below work flawlessly in this setup, only kernel 2.6.31
acts different. Is this an intended change in policy?

-- 
Regards,
Stephan

^ permalink raw reply

* Re: [PATCH RFC] tun: export underlying socket
From: Herbert Xu @ 2009-09-14 15:03 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Michael S. Tsirkin, David Miller, netdev
In-Reply-To: <4AAE4DFC.9080500@voltaire.com>

On Mon, Sep 14, 2009 at 05:06:52PM +0300, Or Gerlitz wrote:
>
>> A lot of people have asked for tun support in vhost, because qemu currently uses tun.  With this scheme existing code and scripts can be used to configure both tun and bridge.  You also can utilize virtualization-specific features in tun.
> Tun has code to support some virtualization-specific features, however,  
> it has also some inherent problems, I think, for example, you don't know  
> over which NIC eventually a packet will be sent and as such, the feature  
> advertising to the guest (virtio-net) NIC is problematic, for example,  
> TSO. With vhost, since you are directly attached to a NIC and assuming  
> its a PF or VF NIC and not something like macvlan/veth you can actually  
> know what features are supported by this NIC.

TSO is not a problem because we provide a software fallback when
the hardware does not support it.  So guests should always enable
TSO if they support it and not worry about the physical NIC.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RE: [PATCH 29/29] ioat2, 3: cacheline align software descriptor allocations
From: Sosnowski, Maciej @ 2009-09-14 15:02 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20090904023257.32667.53926.stgit@dwillia2-linux.ch.intel.com>

Williams, Dan J wrote:
> All the necessary fields for handling an ioat2,3 ring entry can fit into
> one cacheline.  Move ->len prior to ->txd in struct ioat_ring_ent, and
> move allocation of these entries to a hw-cache-aligned kmem cache to
> reduce the number of cachelines dirtied for descriptor management.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>

^ permalink raw reply

* RE: [PATCH 28/29] dmaengine: kill tx_list
From: Sosnowski, Maciej @ 2009-09-14 15:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20090904023252.32667.51136.stgit@dwillia2-linux.ch.intel.com>

Williams, Dan J wrote:
> The tx_list attribute of struct dma_async_tx_descriptor is common to
> most, but not all dma driver implementations.  None of the upper level
> code (dmaengine/async_tx) uses it, so allow drivers to implement it
> locally if they need it.  This saves sizeof(struct list_head) bytes for
> drivers that do not manage descriptors with a linked list (e.g.: ioatdma
> v2,3).
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>

^ permalink raw reply

* RE: [PATCH 26/29] ioat: implement a private tx_list
From: Sosnowski, Maciej @ 2009-09-14 15:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20090904023242.32667.27473.stgit@dwillia2-linux.ch.intel.com>

Williams, Dan J wrote:
> Drop ioatdma's use of tx_list from struct dma_async_tx_descriptor in
> preparation for removal of this field.
> 
> Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>

^ permalink raw reply

* RE: [PATCH 22/29] net_dma: poll for a descriptor after allocation failure
From: Sosnowski, Maciej @ 2009-09-14 15:00 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20090904023221.32667.70000.stgit@dwillia2-linux.ch.intel.com>

Williams, Dan J wrote:
> Handle descriptor allocation failures by polling for a descriptor.  The
> driver will force forward progress when polled.  In the best case this
> polling interval will be the time it takes for one dma memcpy
> transaction to complete.  In the worst case, channel hang, we will need
> to wait 100ms for the cleanup watchdog to fire (ioatdma driver).
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>

^ permalink raw reply

* RE: [PATCH 21/29] ioat2,3: dynamically resize descriptor ring
From: Sosnowski, Maciej @ 2009-09-14 15:00 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20090904023216.32667.55942.stgit@dwillia2-linux.ch.intel.com>

Williams, Dan J wrote:
> Increment the allocation order of the descriptor ring every time we run
> out of descriptors up to a maximum of allocation order specified by the
> module parameter 'ioat_max_alloc_order'.  After each idle period
> decrement the allocation order to a minimum order of
> 'ioat_ring_alloc_order' (i.e. the default ring size, tunable as a module
> parameter).
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>

Just one thing:

> +static int ioat_ring_max_alloc_order = IOAT_MAX_ORDER;
> +module_param(ioat_ring_max_alloc_order, int, 0644);
> +MODULE_PARM_DESC(ioat_ring_max_alloc_order,
> +		 "ioat2+: upper limit for dynamic ring resizing (default: n=16)");
[...]
> --- a/drivers/dma/ioat/dma_v2.h
> +++ b/drivers/dma/ioat/dma_v2.h
> @@ -37,6 +37,8 @@ extern int ioat_pending_level;
>  #define IOAT_MAX_ORDER 16
>  #define ioat_get_alloc_order() \
>  	(min(ioat_ring_alloc_order, IOAT_MAX_ORDER))
> +#define ioat_get_max_alloc_order() \
> +	(min(ioat_ring_max_alloc_order, IOAT_MAX_ORDER))

Making the max_alloc_order a module parameter gives impression
that it can be modified by an user, including making it larger than default.
The default is however its maximum value, which may be confusing.
Why not to use parameter only as the upper limit?

Thanks,
Maciej

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox