Netdev List

Netdev List
 help / color / mirror / Atom feed

* Resend: [PATCH] TCP Early Retransmit: reduce required dupacks for triggering fast retrans
From: Christian Samsel @ 2009-09-22  8:59 UTC (permalink / raw)
  To: netdev

This patch implements draft-ietf-tcpm-early-rexmt. The early retransmit 
mechanism allows the transport to reduce the number of duplicate
acknowledgments required to trigger a fast retransmission in case we
don't expect enough dupacks, (e.g. because there are not enough
packets inflight and nothing to send). This allows the transport to use
fast retransmit to recover packet losses that would otherwise require
a lengthy retransmission timeout.

See: http://tools.ietf.org/html/draft-ietf-tcpm-early-rexmt-01

Signed-off-by: Christian Samsel <christian.samsel@rwth-aachen.de>

---
 net/ipv4/tcp_input.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af6d6fa..c0cc4fd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2913,6 +2913,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
  int do_lost = is_dupack || ((flag & FLAG_DATA_SACKED) &&
                                     (tcp_fackets_out(tp) > tp->reordering));
  int fast_rexmit = 0, mib_idx;
+ u32 in_flight;
 
  if (WARN_ON(!tp->packets_out && tp->sacked_out))
          tp->sacked_out = 0;
@@ -3062,6 +3063,21 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
  if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
          tcp_update_scoreboard(sk, fast_rexmit);
  tcp_cwnd_down(sk, flag);
+       
+
+ /* draft-ietf-tcpm-early-rexmt: lowers dup ack threshold to prevent rto
+         * in case we don't expect enough dup ack. if number of outstanding
+         * packets is less than four and there is either no unsent data ready
+         * for transmission or the advertised window does not permit new
+         * segments.
+         */
+ in_flight = tcp_packets_in_flight(tp);
+ if ( in_flight < 4 && (skb_queue_empty(&sk->sk_write_queue) ||
+         tcp_may_send_now(sk) == 0) )
+         tp->reordering = in_flight - 1;
+ else if (tp->reordering != sysctl_tcp_reordering)
+         tp->reordering = sysctl_tcp_reordering;
+
  tcp_xmit_retransmit_queue(sk);
 }
 
-- 
1.6.4.1


^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-22  9:43 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: Gregory Haskins, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <20090921214312.GJ7182@ovro.caltech.edu>

On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
>
>> Sure, virtio-ira and he is on his own to make a bus-model under that, or
>> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
>> model can work, I agree.
>>
>>      
> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> virtio-s390. It isn't especially easy. I can steal lots of code from the
> lguest bus model, but sometimes it is good to generalize, especially
> after the fourth implemention or so. I think this is what GHaskins tried
> to do.
>    

Yes.  vbus is more finely layered so there is less code duplication.

The virtio layering was more or less dictated by Xen which doesn't have 
shared memory (it uses grant references instead).  As a matter of fact 
lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
part is duplicated.  It's probably possible to add a virtio-shmem.ko 
library that people who do have shared memory can reuse.

> I've given it some thought, and I think that running vhost-net (or
> similar) on the ppc boards, with virtio-net on the x86 crate server will
> work. The virtio-ring abstraction is almost good enough to work for this
> situation, but I had to re-invent it to work with my boards.
>
> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> Remember that this is the "host" system. I used each 4K block as a
> "device descriptor" which contains:
>
> 1) the type of device, config space, etc. for virtio
> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> 3) the "avail" table (available entries in the desc table)
>    

Won't access from x86 be slow to this memory (on the other hand, if you 
change it to main memory access from ppc will be slow... really depends 
on how your system is tuned.

> Parts 2 and 3 are repeated three times, to allow for a maximum of three
> virtqueues per device. This is good enough for all current drivers.
>    

The plan is to switch to multiqueue soon.  Will not affect you if your 
boards are uniprocessor or small smp.

> I've gotten plenty of email about this from lots of interested
> developers. There are people who would like this kind of system to just
> work, while having to write just some glue for their device, just like a
> network driver. I hunch most people have created some proprietary mess
> that basically works, and left it at that.
>    

So long as you keep the system-dependent features hookable or 
configurable, it should work.

> So, here is a desperate cry for help. I'd like to make this work, and
> I'd really like to see it in mainline. I'm trying to give back to the
> community from which I've taken plenty.
>    

Not sure who you're crying for help to.  Once you get this working, post 
patches.  If the patches are reasonably clean and don't impact 
performance for the main use case, and if you can show the need, I 
expect they'll be merged.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
From: Michael S. Tsirkin @ 2009-09-22 10:38 UTC (permalink / raw)
  To: Chris Wright
  Cc: Stephen Hemminger, Rusty Russell, virtualization, Xin, Xiaohui,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, hpa@zytor.com,
	mingo@elte.hu, akpm@linux-foundation.org
In-Reply-To: <20090921162718.GM26034@sequoia.sous-sol.org>

On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote:
> * Stephen Hemminger (shemminger@vyatta.com) wrote:
> > On Mon, 21 Sep 2009 16:37:22 +0930
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 
> > > > > Actually this framework can apply to traditional network adapters which have
> > > > > just one tx/rx queue pair. And applications using the same user/kernel interface
> > > > > can utilize this framework to send/receive network traffic directly thru a tx/rx
> > > > > queue pair in a network adapter.
> > > > > 
> > 
> > More importantly, when virtualizations is used with multi-queue
> > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > NIC should preserve the parallelism (lock free) using multiple
> > receive/transmit queues. The number of queues should equal the
> > number of CPUs.
> 
> Yup, multiqueue virtio is on todo list ;-)
> 
> thanks,
> -chris

Note we'll need multiqueue tap for that to help.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] drivers/net/wireless: Use usb_endpoint_dir_out
From: Julia Lawall @ 2009-09-22 11:45 UTC (permalink / raw)
  To: John W. Linville, Ulrich Kunitz, Daniel Drake, linux-wireless,
	netdev, linux-kernel

From: Julia Lawall <julia@diku.dk>

Use the usb_endpoint_dir_out API function.  Note that the use of
USB_TYPE_MASK in the original code is incorrect; it results in a test that
is always false.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
struct usb_endpoint_descriptor *endpoint;
expression E;
@@

- (endpoint->bEndpointAddress & E) == USB_DIR_OUT
+ usb_endpoint_dir_out(endpoint)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/zd1211rw/zd_usb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -u -p a/drivers/net/wireless/zd1211rw/zd_usb.c b/drivers/net/wireless/zd1211rw/zd_usb.c
--- a/drivers/net/wireless/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zd1211rw/zd_usb.c
@@ -1070,7 +1070,7 @@ static int eject_installer(struct usb_in
 
 	/* Find bulk out endpoint */
 	endpoint = &iface_desc->endpoint[1].desc;
-	if ((endpoint->bEndpointAddress & USB_TYPE_MASK) == USB_DIR_OUT &&
+	if (usb_endpoint_dir_out(endpoint) &&
 	    usb_endpoint_xfer_bulk(endpoint)) {
 		bulk_out_ep = endpoint->bEndpointAddress;
 	} else {

^ permalink raw reply

* Re: r8169 64-bit DMA support
From: Francois Romieu @ 2009-09-22 11:53 UTC (permalink / raw)
  To: Robert Hancock; +Cc: netdev
In-Reply-To: <4AB6BCEC.3070001@gmail.com>

Robert Hancock <hancockrwd@gmail.com> :
[...]
> It's not clear (from the mails I've read) exactly what was going on in  
> the case that caused this to be added.

Some AMD + r8169 systems simply did not work.

> Normally these days the PCI subsystem is supposed to detect that DAC
> isn't usable on a machine and refuse setting 64-bit DMA masks, it's
> not the driver's responsibility to handle this.
> I'm guessing that when this change was made that detection didn't exist
> though.

Not exactly. It was required for DAC to be explicitely enabled through
the CPlusCmd register.

> Thoughts on whether this default can be changed now ?

The 8168 does not seem to need the CPlusCmd stuff. I'll check it but it
should be possible to enable high DMA without condition for it.

-- 
Ueimor

^ permalink raw reply

* Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
From: Arnd Bergmann @ 2009-09-22 11:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Chris Wright, Stephen Hemminger, Rusty Russell, virtualization,
	Xin, Xiaohui, kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, hpa@zytor.com,
	mingo@elte.hu, akpm@linux-foundation.org
In-Reply-To: <20090922103807.GA2555@redhat.com>

On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > More importantly, when virtualizations is used with multi-queue
> > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > NIC should preserve the parallelism (lock free) using multiple
> > > receive/transmit queues. The number of queues should equal the
> > > number of CPUs.
> > 
> > Yup, multiqueue virtio is on todo list ;-)
> > 
> 
> Note we'll need multiqueue tap for that to help.

My idea for that was to open multiple file descriptors to the same
macvtap device and let the kernel figure out the  right thing to
do with that. You can do the same with raw packed sockets in case
of vhost_net, but I wouldn't want to add more complexity to the
tun/tap driver for this.

	Arnd <><

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] skge: request IRQ on activating the interface
From: Michal Schmidt @ 2009-09-22 12:01 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

skge requests IRQ in its probe function. This causes a problem in
the following real-life scenario with two different NICs in the machine:

1. modprobe skge
   The card is detected as eth0 and requests IRQ 17. Directory
   /proc/irq/17/eth0 is created.
2. There is an udev rule which says this interface should be called
   eth1, so udev renames eth0 -> eth1.
3. modprobe 8139too
   The Realtek card is detected as eth0. It will be using IRQ 17 too.
4. ip link set eth0 up
   Now 8139too requests IRQ 17.

The result is:
WARNING: at fs/proc/generic.c:590 proc_register ...
proc_dir_entry '17/eth0' already registered
...

And "ls /proc/irq/17" shows two subdirectories, both called eth0.

Fix it by requesting the IRQ in skge when the interface is activated.
This works, because interfaces can be renamed only while they are down.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---

 drivers/net/skge.c |   27 +++++++++++++++------------
 1 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 62e852e..7e90f27 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -105,6 +105,7 @@ static void yukon_init(struct skge_hw *hw, int port);
 static void genesis_mac_init(struct skge_hw *hw, int port);
 static void genesis_link_up(struct skge_port *skge);
 static void skge_set_multicast(struct net_device *dev);
+static irqreturn_t skge_intr(int irq, void *dev_id);
 
 /* Avoid conditionals by using array */
 static const int txqaddr[] = { Q_XA1, Q_XA2 };
@@ -2572,18 +2573,26 @@ static int skge_up(struct net_device *dev)
 	if (netif_msg_ifup(skge))
 		printk(KERN_INFO PFX "%s: enabling interface\n", dev->name);
 
+	err = request_irq(dev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
+	if (err) {
+		dev_err(&hw->pdev->dev, "%s: cannot assign irq %d\n",
+			dev->name, dev->irq);
+		return err;
+	}
+
 	if (dev->mtu > RX_BUF_SIZE)
 		skge->rx_buf_size = dev->mtu + ETH_HLEN;
 	else
 		skge->rx_buf_size = RX_BUF_SIZE;
 
-
 	rx_size = skge->rx_ring.count * sizeof(struct skge_rx_desc);
 	tx_size = skge->tx_ring.count * sizeof(struct skge_tx_desc);
 	skge->mem_size = tx_size + rx_size;
 	skge->mem = pci_alloc_consistent(hw->pdev, skge->mem_size, &skge->dma);
-	if (!skge->mem)
-		return -ENOMEM;
+	if (!skge->mem) {
+		err = -ENOMEM;
+		goto free_irq;
+	}
 
 	BUG_ON(skge->dma & 7);
 
@@ -2646,6 +2655,8 @@ static int skge_up(struct net_device *dev)
  free_pci_mem:
 	pci_free_consistent(hw->pdev, skge->mem_size, skge->mem, skge->dma);
 	skge->mem = NULL;
+ free_irq:
+	free_irq(dev->irq, hw);
 
 	return err;
 }
@@ -2733,6 +2744,7 @@ static int skge_down(struct net_device *dev)
 	kfree(skge->tx_ring.start);
 	pci_free_consistent(hw->pdev, skge->mem_size, skge->mem, skge->dma);
 	skge->mem = NULL;
+	free_irq(dev->irq, hw);
 	return 0;
 }
 
@@ -3974,12 +3986,6 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 		goto err_out_free_netdev;
 	}
 
-	err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
-	if (err) {
-		dev_err(&pdev->dev, "%s: cannot assign irq %d\n",
-		       dev->name, pdev->irq);
-		goto err_out_unregister;
-	}
 	skge_show_addr(dev);
 
 	if (hw->ports > 1 && (dev1 = skge_devinit(hw, 1, using_dac))) {
@@ -3996,8 +4002,6 @@ static int __devinit skge_probe(struct pci_dev *pdev,
 
 	return 0;
 
-err_out_unregister:
-	unregister_netdev(dev);
 err_out_free_netdev:
 	free_netdev(dev);
 err_out_led_off:
@@ -4041,7 +4045,6 @@ static void __devexit skge_remove(struct pci_dev *pdev)
 	skge_write16(hw, B0_LED, LED_STAT_OFF);
 	skge_write8(hw, B0_CTST, CS_RST_SET);
 
-	free_irq(pdev->irq, hw);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
 	if (dev1)


^ permalink raw reply related

* [PATCH] 8139cp: fix duplicate loglevel in module load message
From: Alan Jenkins @ 2009-09-22 14:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, Alexander Beregalov

This was introduced by b93d58 "8139*: convert printk() to pr_<foo>()":

[ 2256252443 ] <6>8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)

The "version" string is printed using pr_info(), so it doesn't need to
include a loglevel.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
CC: Alexander Beregalov <a.beregalov@gmail.com>
---
 drivers/net/8139cp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
index d0dbbf3..6841a9a 100644
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -87,7 +87,7 @@
 
 /* These identify the driver base version and may not be removed. */
 static char version[] =
-KERN_INFO DRV_NAME ": 10/100 PCI Ethernet driver v" DRV_VERSION " (" DRV_RELDATE ")\n";
+DRV_NAME ": 10/100 PCI Ethernet driver v" DRV_VERSION " (" DRV_RELDATE ")\n";
 
 MODULE_AUTHOR("Jeff Garzik <jgarzik@pobox.com>");
 MODULE_DESCRIPTION("RealTek RTL-8139C+ series 10/100 PCI Ethernet driver");
-- 
1.6.3.2

^ permalink raw reply related

* [PATCH] smsc95xx: fix transmission where ZLP is expected
From: Steve Glendinning @ 2009-09-22 14:00 UTC (permalink / raw)
  To: netdev; +Cc: Ian Saturley, David Miller, Vlad Lyalikov, Steve Glendinning

Usbnet framework assumes USB hardware doesn't handle zero length
packets, but SMSC LAN95xx requires these to be sent for correct
operation.

This patch fixes an easily reproducible tx lockup when sending a frame
that results in exactly 512 bytes in a USB transmission (e.g. a UDP
frame with 458 data bytes, due to IP headers and our USB headers).  It
adds an extra flag to usbnet for the hardware driver to indicate that
it can handle and requires the zero length packets.

This patch should not affect other usbnet users, please also consider
for -stable.

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
---
 drivers/net/usb/smsc95xx.c |    2 +-
 drivers/net/usb/usbnet.c   |    2 +-
 include/linux/usb/usbnet.h |    1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 938fb35..6e9410f 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1227,7 +1227,7 @@ static const struct driver_info smsc95xx_info = {
 	.rx_fixup	= smsc95xx_rx_fixup,
 	.tx_fixup	= smsc95xx_tx_fixup,
 	.status		= smsc95xx_status,
-	.flags		= FLAG_ETHER,
+	.flags		= FLAG_ETHER | FLAG_SEND_ZLP,
 };
 
 static const struct usb_device_id products[] = {
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 24b36f7..ca5ca5a 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1049,7 +1049,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	 * NOTE:  strictly conforming cdc-ether devices should expect
 	 * the ZLP here, but ignore the one-byte packet.
 	 */
-	if ((length % dev->maxpacket) == 0) {
+	if (!(info->flags & FLAG_SEND_ZLP) && (length % dev->maxpacket) == 0) {
 		urb->transfer_buffer_length++;
 		if (skb_tailroom(skb)) {
 			skb->data[skb->len] = 0;
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index bb69e25..f814730 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -89,6 +89,7 @@ struct driver_info {
 #define FLAG_FRAMING_AX 0x0040		/* AX88772/178 packets */
 #define FLAG_WLAN	0x0080		/* use "wlan%d" names */
 #define FLAG_AVOID_UNLINK_URBS 0x0100	/* don't unlink urbs at usbnet_stop() */
+#define FLAG_SEND_ZLP	0x0200		/* hw requires ZLPs are sent */
 
 
 	/* init device ... can sleep, or cause probe() failure */
-- 
1.6.2.5


^ permalink raw reply related

* Re: [PATCH 13/13] TProxy: use the interface primary IP address as a default value for --on-ip
From: Brian Haley @ 2009-09-22 14:17 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: netfilter-devel, netdev
In-Reply-To: <1253601509.6883.5.camel@bzorp.balabit>

Balazs Scheidler wrote:
> On Mon, 2009-09-21 at 14:00 -0400, Brian Haley wrote:
>> Balazs Scheidler wrote: 
>>>  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>>> +
>>> +static inline const struct in6_addr *
>>> +tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr, const struct in6_addr *daddr)
>>> +{
>>> +	struct inet6_dev *indev;
>>> +	struct inet6_ifaddr *ifa;
>>> +	struct in6_addr *laddr;
>>> +	
>>> +        if (!ipv6_addr_any(user_laddr))
>>> +                return user_laddr;
>>> +	
>>> +        laddr = NULL;
>>> +        rcu_read_lock();
>>> +        indev = __in6_dev_get(skb->dev);
>>> +        if (indev && (ifa = indev->addr_list)) {
>>> +		laddr = &ifa->addr;
>>> +	}
>>> +        rcu_read_unlock();
>>> +        
>>> +        return laddr ? laddr : daddr;
>>> +}
>> You should call ipv6_dev_get_saddr() to get a source address based on the target
>> destination address.
> 
> Thanks for this hint, however this is not selecting a source address for
> a given destination, rather it selects the address where tproxy is
> redirecting the connection in case the user specified no --on-ip
> parameter.
> 
> e.g. 
> 
> ip6tables -A PREROUTING -p tcp --dport 80 -j TPROXY --on-port 50080
> 
> This should redirect the connection to the primary IP address of the
> incoming interface. In fact I spent 2 hours to figure out how to find
> the proper address, and at the end I used the first IP address
> configured to the interface, seeing that those addresses are sorted in
> 'scope' order, e.g. link-local and site-local addresses are at the end
> of the list, thus the front should be ok.

Yes, the addresses are sorted by scope, but just because they're in the
list doesn't mean they can be used, for example that address might have
failed DAD or be Deprecated.  ipv6_dev_get_saddr() will follow the rules
from RFC 3484 in picking the best address to use, or none if there isn't
anything appropriate.

> Since I'm not that much into IPv6, I'd appreciate some help, is
> ipv6_dev_get_saddr(client_ip_address) indeed the best solution here?

Probably.  An alternative might be to use ip6_dst_lookup() (see tcp_v6_connect()),
but a lot more code for you.

-Brian

^ permalink raw reply

* Re: fanotify as syscalls
From: Davide Libenzi @ 2009-09-22 14:51 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Andreas Gruenbacher, Eric Paris, Linus Torvalds, Evgeniy Polyakov,
	David Miller, Linux Kernel Mailing List, linux-fsdevel, netdev,
	viro, alan, hch
In-Reply-To: <20090921231227.GJ14700@shareable.org>

On Tue, 22 Sep 2009, Jamie Lokier wrote:

> I don't mind at all if fanotify is replaced by a general purpose "take
> over the system call table" solution ...

That was not what I meant ;)
You'd register/unregister as syscall interceptor, receiving syscall number 
and parameters, you'd be able to return status/error codes directly, and 
you'd have the ability to eventually change the parameters. All this 
should be pretty trivial code, and at the same time give full syscall 
visibility to the modules.
The complexity would be left to the interceptors, as they already do it 
today.

> But I can't help noticing that we _already_ have quite well placed
> hooks for intercepting system calls, called security_this and
> security_that (SELinux etc), ...

That has "some" limits WRT non-GPL modules and relative static linkage.

> However, being a little kinder, I suspect even the anti-malware
> vendors would rather not slow down everything with race-prone
> complicated tracking of everything every process does...  which is why
> fanotify allows it's "interest set" to be reduced from everything to a
> subset of files, and it's results to be cached, and let the races be
> handled in the normal way by VFS.

They are already doing it today, since they are forced to literally find 
and hack the syscall table.
They already have things like process whitelists, path whitelists, scan 
caches, and all the whistles, in their code.
Of course, some of them might be interested in pushing given complexity 
inside the kernel, since they won't have to maintain it.
Some other OTOH, might be interested in keeping a syscall-based access, 
since they already have working code based on that abstraction.
The good part of this would be that all the userspace communication API, 
whitelists, caches, etc...  would be left to the module implementors, and 
not pushed inside the kernel.
That, and the flexibility of being able to intercept all the userspace 
entrances into the kernel.

- Davide

^ permalink raw reply

* [PATCH] smsc95xx: add additional USB product IDs
From: Steve Glendinning @ 2009-09-22 15:13 UTC (permalink / raw)
  To: netdev; +Cc: Ian Saturley, David Miller, Vlad Lyalikov, Steve Glendinning

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
---
 drivers/net/usb/smsc95xx.c |   65 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 938fb35..3aafebd 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1237,10 +1237,75 @@ static const struct usb_device_id products[] = {
 		.driver_info = (unsigned long) &smsc95xx_info,
 	},
 	{
+		/* SMSC9505 USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9505),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9500A USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9E00),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9505A USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9E01),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
 		/* SMSC9512/9514 USB Hub & Ethernet Device */
 		USB_DEVICE(0x0424, 0xec00),
 		.driver_info = (unsigned long) &smsc95xx_info,
 	},
+	{
+		/* SMSC9500 USB Ethernet Device (SAL10) */
+		USB_DEVICE(0x0424, 0x9900),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9505 USB Ethernet Device (SAL10) */
+		USB_DEVICE(0x0424, 0x9901),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9500A USB Ethernet Device (SAL10) */
+		USB_DEVICE(0x0424, 0x9902),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9505A USB Ethernet Device (SAL10) */
+		USB_DEVICE(0x0424, 0x9903),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9512/9514 USB Hub & Ethernet Device (SAL10) */
+		USB_DEVICE(0x0424, 0x9904),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9500A USB Ethernet Device (HAL) */
+		USB_DEVICE(0x0424, 0x9905),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9505A USB Ethernet Device (HAL) */
+		USB_DEVICE(0x0424, 0x9906),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9500 USB Ethernet Device (Alternate ID) */
+		USB_DEVICE(0x0424, 0x9907),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9500A USB Ethernet Device (Alternate ID) */
+		USB_DEVICE(0x0424, 0x9908),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC9512/9514 USB Hub & Ethernet Device (Alternate ID) */
+		USB_DEVICE(0x0424, 0x9909),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
 	{ },		/* END */
 };
 MODULE_DEVICE_TABLE(usb, products);
-- 
1.6.2.5


^ permalink raw reply related

* [PATCH] net: xilinx_emaclite: Fix problem with first incoming packet
From: John Linn @ 2009-09-22 15:24 UTC (permalink / raw)
  To: netdev, davem, linuxppc-dev, grant.likely, jwboyer,
	sadanand.mutyala
  Cc: Michal Simek

From: Michal Simek <monstr@monstr.eu>

You can't ping the board or connect to it unless you send
any packet out from board.

Tested-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Acked-by: John Linn <john.linn@xilinx.com>
---
 drivers/net/xilinx_emaclite.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xilinx_emaclite.c b/drivers/net/xilinx_emaclite.c
index dc22782..83a044d 100644
--- a/drivers/net/xilinx_emaclite.c
+++ b/drivers/net/xilinx_emaclite.c
@@ -134,18 +134,15 @@ static void xemaclite_enable_interrupts(struct net_local *drvdata)
 	}
 
 	/* Enable the Rx interrupts for the first buffer */
-	reg_data = in_be32(drvdata->base_addr + XEL_RSR_OFFSET);
 	out_be32(drvdata->base_addr + XEL_RSR_OFFSET,
-		 reg_data | XEL_RSR_RECV_IE_MASK);
+		 XEL_RSR_RECV_IE_MASK);
 
 	/* Enable the Rx interrupts for the second Buffer if
 	 * configured in HW */
 	if (drvdata->rx_ping_pong != 0) {
-		reg_data = in_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
-				   XEL_RSR_OFFSET);
 		out_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
 			 XEL_RSR_OFFSET,
-			 reg_data | XEL_RSR_RECV_IE_MASK);
+			 XEL_RSR_RECV_IE_MASK);
 	}
 
 	/* Enable the Global Interrupt Enable */
-- 
1.6.2.1



This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.



^ permalink raw reply related

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Ira W. Snyder @ 2009-09-22 15:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gregory Haskins, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4AB89C48.4020903@redhat.com>

On Tue, Sep 22, 2009 at 12:43:36PM +0300, Avi Kivity wrote:
> On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
> >
> >> Sure, virtio-ira and he is on his own to make a bus-model under that, or
> >> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
> >> model can work, I agree.
> >>
> >>      
> > Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> > virtio-s390. It isn't especially easy. I can steal lots of code from the
> > lguest bus model, but sometimes it is good to generalize, especially
> > after the fourth implemention or so. I think this is what GHaskins tried
> > to do.
> >    
> 
> Yes.  vbus is more finely layered so there is less code duplication.
> 
> The virtio layering was more or less dictated by Xen which doesn't have 
> shared memory (it uses grant references instead).  As a matter of fact 
> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
> part is duplicated.  It's probably possible to add a virtio-shmem.ko 
> library that people who do have shared memory can reuse.
> 

Seems like a nice benefit of vbus.

> > I've given it some thought, and I think that running vhost-net (or
> > similar) on the ppc boards, with virtio-net on the x86 crate server will
> > work. The virtio-ring abstraction is almost good enough to work for this
> > situation, but I had to re-invent it to work with my boards.
> >
> > I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> > Remember that this is the "host" system. I used each 4K block as a
> > "device descriptor" which contains:
> >
> > 1) the type of device, config space, etc. for virtio
> > 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> > 3) the "avail" table (available entries in the desc table)
> >    
> 
> Won't access from x86 be slow to this memory (on the other hand, if you 
> change it to main memory access from ppc will be slow... really depends 
> on how your system is tuned.
> 

Writes across the bus are fast, reads across the bus are slow. These are
just the descriptor tables for memory buffers, not the physical memory
buffers themselves.

These only need to be written by the guest (x86), and read by the host
(ppc). The host never changes the tables, so we can cache a copy in the
guest, for a fast detach_buf() implementation (see virtio-ring, which
I'm copying the design from).

The only accesses are writes across the PCI bus. There is never a need
to do a read (except for slow-path configuration).

> > Parts 2 and 3 are repeated three times, to allow for a maximum of three
> > virtqueues per device. This is good enough for all current drivers.
> >    
> 
> The plan is to switch to multiqueue soon.  Will not affect you if your 
> boards are uniprocessor or small smp.
> 

Everything I have is UP. I don't need extreme performance, either.
40MB/sec is the minimum I need to reach, though I'd like to have some
headroom.

For reference, using the CPU to handle data transfers, I get ~2MB/sec
transfers. Using the DMA engine, I've hit about 60MB/sec with my
"crossed-wires" virtio-net.

> > I've gotten plenty of email about this from lots of interested
> > developers. There are people who would like this kind of system to just
> > work, while having to write just some glue for their device, just like a
> > network driver. I hunch most people have created some proprietary mess
> > that basically works, and left it at that.
> >    
> 
> So long as you keep the system-dependent features hookable or 
> configurable, it should work.
> 
> > So, here is a desperate cry for help. I'd like to make this work, and
> > I'd really like to see it in mainline. I'm trying to give back to the
> > community from which I've taken plenty.
> >    
> 
> Not sure who you're crying for help to.  Once you get this working, post 
> patches.  If the patches are reasonably clean and don't impact 
> performance for the main use case, and if you can show the need, I 
> expect they'll be merged.
> 

In the spirit of "post early and often", I'm making my code available,
that's all. I'm asking anyone interested for some review, before I have
to re-code this for about the fifth time now. I'm trying to avoid
Haskins' situation, where he's invented and debugged a lot of new code,
and then been told to do it completely differently.

Yes, the code I posted is only compile-tested, because quite a lot of
code (kernel and userspace) must be working before anything works at
all. I hate to design the whole thing, then be told that something
fundamental about it is wrong, and have to completely re-write it.

Thanks for the comments,
Ira

> -- 
> error compiling committee.c: too many arguments to function
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: fanotify as syscalls
From: Andreas Gruenbacher @ 2009-09-22 15:31 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Jamie Lokier, Eric Paris, Linus Torvalds, Evgeniy Polyakov,
	David Miller, Linux Kernel Mailing List, linux-fsdevel, netdev,
	viro, alan, hch
In-Reply-To: <alpine.DEB.2.00.0909211816020.1116@makko.or.mcafeemobile.com>

On Tuesday, 22 September 2009 16:51:39 Davide Libenzi wrote:
> On Tue, 22 Sep 2009, Jamie Lokier wrote:
> > I don't mind at all if fanotify is replaced by a general purpose "take
> > over the system call table" solution ...
>
> That was not what I meant ;)
> You'd register/unregister as syscall interceptor, receiving syscall number
> and parameters, you'd be able to return status/error codes directly, and
> you'd have the ability to eventually change the parameters. All this
> should be pretty trivial code, and at the same time give full syscall
> visibility to the modules.

The fatal flaw of syscall interception is race conditions: you look up a 
pathname in your interception layer; then when you call into the proper 
syscall, the kernel again looks up the same pathname. There is no way to 
guarantee that you end up at the same object in both lookups. The security 
and fsnotify hooks are placed in the appropriate spots to avoid exactly that.

Andreas

^ permalink raw reply

* Re: igb VF allocation with quirk_i82576_sriov
From: Alexander Duyck @ 2009-09-22 15:41 UTC (permalink / raw)
  To: Chris Wright
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Ronciak, John
In-Reply-To: <20090922051910.GC1035@sequoia.sous-sol.org>

Chris Wright wrote:
> Is this known to work?  During recent virt testing for upcoming Fedora 12,
> a box w/out SR-IOV support in BIOS was using quirk to create VF BAR space,
> VF allocation worked enough to assign a device to the guest, but igbvf
> was not actually functioning properly in the guest.
> 
> Is it worth debugging this further, or is it already a known issue?

You could be experiencing one of a couple different issues.

First when you say you started SR-IOV on a box w/out SR-IOV support I 
assume you are using "pci=assign-busses" in order to reserve the bus 
space for the VFs, is that correct?  Also while your system may not 
support SR-IOV does it at least support VT-d?  Without VT-d support you 
won't be able to assign a device to the guest.

My recommendations for further testing would be to test a VF on the host 
kernel to see if that works.  If it does then you could also try direct 
assigning an entire port to see if that works.  If the entire port 
doesn't work then you probably don't have VT-d enabled.

Thanks,

Alex

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-22 15:56 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: Gregory Haskins, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <20090922152520.GA9154@ovro.caltech.edu>

On 09/22/2009 06:25 PM, Ira W. Snyder wrote:
>
>> Yes.  vbus is more finely layered so there is less code duplication.
>>
>> The virtio layering was more or less dictated by Xen which doesn't have
>> shared memory (it uses grant references instead).  As a matter of fact
>> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that
>> part is duplicated.  It's probably possible to add a virtio-shmem.ko
>> library that people who do have shared memory can reuse.
>>
>>      
> Seems like a nice benefit of vbus.
>    

Yes, it is.  With some work virtio can gain that too (virtio-shmem.ko).

>>> I've given it some thought, and I think that running vhost-net (or
>>> similar) on the ppc boards, with virtio-net on the x86 crate server will
>>> work. The virtio-ring abstraction is almost good enough to work for this
>>> situation, but I had to re-invent it to work with my boards.
>>>
>>> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
>>> Remember that this is the "host" system. I used each 4K block as a
>>> "device descriptor" which contains:
>>>
>>> 1) the type of device, config space, etc. for virtio
>>> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
>>> 3) the "avail" table (available entries in the desc table)
>>>
>>>        
>> Won't access from x86 be slow to this memory (on the other hand, if you
>> change it to main memory access from ppc will be slow... really depends
>> on how your system is tuned.
>>
>>      
> Writes across the bus are fast, reads across the bus are slow. These are
> just the descriptor tables for memory buffers, not the physical memory
> buffers themselves.
>
> These only need to be written by the guest (x86), and read by the host
> (ppc). The host never changes the tables, so we can cache a copy in the
> guest, for a fast detach_buf() implementation (see virtio-ring, which
> I'm copying the design from).
>
> The only accesses are writes across the PCI bus. There is never a need
> to do a read (except for slow-path configuration).
>    

Okay, sounds like what you're doing it optimal then.

> In the spirit of "post early and often", I'm making my code available,
> that's all. I'm asking anyone interested for some review, before I have
> to re-code this for about the fifth time now. I'm trying to avoid
> Haskins' situation, where he's invented and debugged a lot of new code,
> and then been told to do it completely differently.
>
> Yes, the code I posted is only compile-tested, because quite a lot of
> code (kernel and userspace) must be working before anything works at
> all. I hate to design the whole thing, then be told that something
> fundamental about it is wrong, and have to completely re-write it.
>    

Understood.  Best to get a review from Rusty then.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] netfilter: nf_nat_helper: tidy up adjust_tcp_sequence
From: Hannes Eder @ 2009-09-22 16:00 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, linux-kernel

The variable 'other_way' gets initialized but is not read afterwards,
so remove it.  Pass the right arguments to a pr_debug call.

While being at tidy up a bit and it fix this checkpatch warning:
  WARNING: suspect code indent for conditional statements

Signed-off-by: Hannes Eder <heder@google.com>

 net/ipv4/netfilter/nf_nat_helper.c |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_helper.c b/net/ipv4/netfilter/nf_nat_helper.c
index 09172a6..56daa1b 100644
--- a/net/ipv4/netfilter/nf_nat_helper.c
+++ b/net/ipv4/netfilter/nf_nat_helper.c
@@ -41,18 +41,13 @@ adjust_tcp_sequence(u32 seq,
 		    struct nf_conn *ct,
 		    enum ip_conntrack_info ctinfo)
 {
-	int dir;
-	struct nf_nat_seq *this_way, *other_way;
+	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn_nat *nat = nfct_nat(ct);
+	struct nf_nat_seq *this_way = &nat->seq[dir];
 
-	pr_debug("adjust_tcp_sequence: seq = %u, sizediff = %d\n", seq, seq);
-
-	dir = CTINFO2DIR(ctinfo);
-
-	this_way = &nat->seq[dir];
-	other_way = &nat->seq[!dir];
+	pr_debug("%s(): seq = %u, sizediff = %d\n", __func__, seq, sizediff);
 
-	pr_debug("nf_nat_resize_packet: Seq_offset before: ");
+	pr_debug("%s(): Seq_offset before: ", __func__);
 	DUMP_OFFSET(this_way);
 
 	spin_lock_bh(&nf_nat_seqofs_lock);
@@ -63,13 +58,13 @@ adjust_tcp_sequence(u32 seq,
 	 * retransmit */
 	if (this_way->offset_before == this_way->offset_after ||
 	    before(this_way->correction_pos, seq)) {
-		   this_way->correction_pos = seq;
-		   this_way->offset_before = this_way->offset_after;
-		   this_way->offset_after += sizediff;
+		this_way->correction_pos = seq;
+		this_way->offset_before = this_way->offset_after;
+		this_way->offset_after += sizediff;
 	}
 	spin_unlock_bh(&nf_nat_seqofs_lock);
 
-	pr_debug("nf_nat_resize_packet: Seq_offset after: ");
+	pr_debug("%s(): Seq_offset after: ", __func__);
 	DUMP_OFFSET(this_way);
 }
 


^ permalink raw reply related

* Re: fanotify as syscalls
From: Davide Libenzi @ 2009-09-22 16:04 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Jamie Lokier, Eric Paris, Linus Torvalds, Evgeniy Polyakov,
	David Miller, Linux Kernel Mailing List, linux-fsdevel, netdev,
	viro, alan, hch
In-Reply-To: <200909221731.34717.agruen@suse.de>

On Tue, 22 Sep 2009, Andreas Gruenbacher wrote:

> The fatal flaw of syscall interception is race conditions: you look up a 
> pathname in your interception layer; then when you call into the proper 
> syscall, the kernel again looks up the same pathname. There is no way to 
> guarantee that you end up at the same object in both lookups. The security 
> and fsnotify hooks are placed in the appropriate spots to avoid exactly that.

Fatal? You mean, for this corner case that the anti-malware industry lived 
with for so much time (in Linux and Windows), you're prepared in pushing 
all the logic that is currently implemented into their modules, into the 
kernel?
This includes process whitelisting, path whitelisting, caches, userspace 
access API definition, and so on? On top of providing a generally more 
limited interception.
Why don't we instead offer a lower and broader level of interception, 
letting the users decide if such fatal flaw needs to be addressed or 
not, in their modules?
They get a broader inteception layer, with the option to decide if or if 
not address certain scenarios, and we get less code inside the kernel.
A win/win situation, if you ask me.

- Davide

^ permalink raw reply

* Re: fanotify as syscalls
From: Eric Paris @ 2009-09-22 16:11 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Davide Libenzi, Jamie Lokier, Linus Torvalds, Evgeniy Polyakov,
	David Miller, Linux Kernel Mailing List, linux-fsdevel, netdev,
	viro, alan, hch
In-Reply-To: <200909221731.34717.agruen@suse.de>

On Tue, 2009-09-22 at 17:31 +0200, Andreas Gruenbacher wrote:
> On Tuesday, 22 September 2009 16:51:39 Davide Libenzi wrote:
> > On Tue, 22 Sep 2009, Jamie Lokier wrote:
> > > I don't mind at all if fanotify is replaced by a general purpose "take
> > > over the system call table" solution ...
> >
> > That was not what I meant ;)
> > You'd register/unregister as syscall interceptor, receiving syscall number
> > and parameters, you'd be able to return status/error codes directly, and
> > you'd have the ability to eventually change the parameters. All this
> > should be pretty trivial code, and at the same time give full syscall
> > visibility to the modules.
> 
> The fatal flaw of syscall interception is race conditions: 

That's not the fatal flaw.  The fatal flaw is that I am not going to
write 90% of a rootkit and make it easy to use.  Not going to happen.
There's a reason we went to the trouble to mark the syscall call RO, we
don't export it, and we don't want people playing with it.  It clearly
would have been the quickest, easiest, and fastest way to make
anti-virus companies happy, but it doesn't really solve a good problem
and it leaves all of us in a worse position than we are today.  Easy !=
Good.

-Eric 

^ permalink raw reply

* Re: fanotify as syscalls
From: Jamie Lokier @ 2009-09-22 16:27 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Davide Libenzi, Linus Torvalds,
	Evgeniy Polyakov, David Miller, Linux Kernel Mailing List,
	linux-fsdevel, netdev, viro, alan, hch
In-Reply-To: <1253635918.2747.5.camel@dhcp231-106.rdu.redhat.com>

Eric Paris wrote:
> That's not the fatal flaw.  The fatal flaw is that I am not going to
> write 90% of a rootkit and make it easy to use.

I hate to point out the obvious, but fanotify's ability to intercept
every file access and rewrite the file before the access proceeds is
also 90% of a rootkit...

But fortunately both fanotify and syscall rewriting require root in
the first place.

I think that makes the rootkit argument moot.  As long as fanotify
doesn't have a non-root flavour... which really would be handy for
rootkits :-)

> Easy != Good.

I agree.

-- Jamie

^ permalink raw reply

* Re: [PATCH] skge: request IRQ on activating the interface
From: Stephen Hemminger @ 2009-09-22 16:28 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: netdev
In-Reply-To: <20090922120127.14242.71353.stgit@localhost.localdomain>

On Tue, 22 Sep 2009 14:01:31 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:

> skge requests IRQ in its probe function. This causes a problem in
> the following real-life scenario with two different NICs in the machine:
> 
> 1. modprobe skge
>    The card is detected as eth0 and requests IRQ 17. Directory
>    /proc/irq/17/eth0 is created.
> 2. There is an udev rule which says this interface should be called
>    eth1, so udev renames eth0 -> eth1.
> 3. modprobe 8139too
>    The Realtek card is detected as eth0. It will be using IRQ 17 too.
> 4. ip link set eth0 up
>    Now 8139too requests IRQ 17.
> 
> The result is:
> WARNING: at fs/proc/generic.c:590 proc_register ...
> proc_dir_entry '17/eth0' already registered
> ...
> 
> And "ls /proc/irq/17" shows two subdirectories, both called eth0.
> 
> Fix it by requesting the IRQ in skge when the interface is activated.
> This works, because interfaces can be renamed only while they are down.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

No. because two port cards have a single IRQ for both ports.
The choice of ethX in irq name was done because irqbalance looks for this.
Probably better to change skge/sky2 and other devices with same issue
to use skge-N ... for request_irq, and teach irqbalance how to do deal
with it.

^ permalink raw reply

* Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM
From: Stephen Hemminger @ 2009-09-22 16:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Michael S. Tsirkin, Chris Wright, Rusty Russell, virtualization,
	Xin, Xiaohui, kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, hpa@zytor.com,
	mingo@elte.hu, akpm@linux-foundation.org
In-Reply-To: <200909221350.54847.arnd@arndb.de>

On Tue, 22 Sep 2009 13:50:54 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > > More importantly, when virtualizations is used with multi-queue
> > > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > > NIC should preserve the parallelism (lock free) using multiple
> > > > receive/transmit queues. The number of queues should equal the
> > > > number of CPUs.
> > > 
> > > Yup, multiqueue virtio is on todo list ;-)
> > > 
> > 
> > Note we'll need multiqueue tap for that to help.
> 
> My idea for that was to open multiple file descriptors to the same
> macvtap device and let the kernel figure out the  right thing to
> do with that. You can do the same with raw packed sockets in case
> of vhost_net, but I wouldn't want to add more complexity to the
> tun/tap driver for this.
> 
> 	Arnd <><


Or get tap out of the way entirely. The packets should not have
to go out to user space at all (see veth)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] net: xilinx_emaclite: Fix problem with first incoming packet
From: Grant Likely @ 2009-09-22 16:53 UTC (permalink / raw)
  To: John Linn; +Cc: Michal Simek, sadanand.mutyala, netdev, linuxppc-dev, davem
In-Reply-To: <fac40d47-5b19-4225-9fee-f7a058851fc0@SG2EHSMHS017.ehs.local>

On Tue, Sep 22, 2009 at 8:24 AM, John Linn <john.linn@xilinx.com> wrote:
> From: Michal Simek <monstr@monstr.eu>
>
> You can't ping the board or connect to it unless you send
> any packet out from board.
>
> Tested-by: John Williams <john.williams@petalogix.com>
> Signed-off-by: Michal Simek <monstr@monstr.eu>
> Acked-by: John Linn <john.linn@xilinx.com>

John, Since this patch is being *sent* by you, then you should use a
"signed-off-by" tag instead because it actually passed through your
hands.

Oh, and:
Acked-by: Grant Likely <grant.likely@secretlab.ca>

> ---
>  drivers/net/xilinx_emaclite.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/xilinx_emaclite.c b/drivers/net/xilinx_emaclite.c
> index dc22782..83a044d 100644
> --- a/drivers/net/xilinx_emaclite.c
> +++ b/drivers/net/xilinx_emaclite.c
> @@ -134,18 +134,15 @@ static void xemaclite_enable_interrupts(struct net_local *drvdata)
>        }
>
>        /* Enable the Rx interrupts for the first buffer */
> -       reg_data = in_be32(drvdata->base_addr + XEL_RSR_OFFSET);
>        out_be32(drvdata->base_addr + XEL_RSR_OFFSET,
> -                reg_data | XEL_RSR_RECV_IE_MASK);
> +                XEL_RSR_RECV_IE_MASK);
>
>        /* Enable the Rx interrupts for the second Buffer if
>         * configured in HW */
>        if (drvdata->rx_ping_pong != 0) {
> -               reg_data = in_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
> -                                  XEL_RSR_OFFSET);
>                out_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
>                         XEL_RSR_OFFSET,
> -                        reg_data | XEL_RSR_RECV_IE_MASK);
> +                        XEL_RSR_RECV_IE_MASK);
>        }
>
>        /* Enable the Global Interrupt Enable */
> --
> 1.6.2.1
>
>
>
> This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.
>
>
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* RE: [PATCH] net: xilinx_emaclite: Fix problem with first incoming packet
From: John Linn @ 2009-09-22 16:56 UTC (permalink / raw)
  To: Grant Likely; +Cc: Michal Simek, linuxppc-dev, netdev, Sadanand Mutyala, davem
In-Reply-To: <fa686aa40909220953g708445d9s9c25bd839cc2dd2e@mail.gmail.com>

Thanks Grant, I wondered about that myself.

> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of Grant Likely
> Sent: Tuesday, September 22, 2009 10:54 AM
> To: John Linn
> Cc: netdev@vger.kernel.org; davem@davemloft.net; linuxppc-dev@ozlabs.org; jwboyer@linux.vnet.ibm.com;
> Sadanand Mutyala; Michal Simek
> Subject: Re: [PATCH] net: xilinx_emaclite: Fix problem with first incoming packet
> 
> On Tue, Sep 22, 2009 at 8:24 AM, John Linn <john.linn@xilinx.com> wrote:
> > From: Michal Simek <monstr@monstr.eu>
> >
> > You can't ping the board or connect to it unless you send
> > any packet out from board.
> >
> > Tested-by: John Williams <john.williams@petalogix.com>
> > Signed-off-by: Michal Simek <monstr@monstr.eu>
> > Acked-by: John Linn <john.linn@xilinx.com>
> 
> John, Since this patch is being *sent* by you, then you should use a
> "signed-off-by" tag instead because it actually passed through your
> hands.
> 
> Oh, and:
> Acked-by: Grant Likely <grant.likely@secretlab.ca>
> 
> > ---
> >  drivers/net/xilinx_emaclite.c |    7 ++-----
> >  1 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/xilinx_emaclite.c b/drivers/net/xilinx_emaclite.c
> > index dc22782..83a044d 100644
> > --- a/drivers/net/xilinx_emaclite.c
> > +++ b/drivers/net/xilinx_emaclite.c
> > @@ -134,18 +134,15 @@ static void xemaclite_enable_interrupts(struct net_local *drvdata)
> >        }
> >
> >        /* Enable the Rx interrupts for the first buffer */
> > -       reg_data = in_be32(drvdata->base_addr + XEL_RSR_OFFSET);
> >        out_be32(drvdata->base_addr + XEL_RSR_OFFSET,
> > -                reg_data | XEL_RSR_RECV_IE_MASK);
> > +                XEL_RSR_RECV_IE_MASK);
> >
> >        /* Enable the Rx interrupts for the second Buffer if
> >         * configured in HW */
> >        if (drvdata->rx_ping_pong != 0) {
> > -               reg_data = in_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
> > -                                  XEL_RSR_OFFSET);
> >                out_be32(drvdata->base_addr + XEL_BUFFER_OFFSET +
> >                         XEL_RSR_OFFSET,
> > -                        reg_data | XEL_RSR_RECV_IE_MASK);
> > +                        XEL_RSR_RECV_IE_MASK);
> >        }
> >
> >        /* Enable the Global Interrupt Enable */
> > --
> > 1.6.2.1
> >
> >
> >
> > This email and any attachments are intended for the sole use of the named recipient(s) and
> contain(s) confidential information that may be proprietary, privileged or copyrighted under
> applicable law. If you are not the intended recipient, do not read, copy, or forward this email
> message or any attachments. Delete this email message and any attachments immediately.
> >
> >
> >
> 
> 
> 
> --
> Grant Likely, B.Sc., P.Eng.
> Secret Lab Technologies Ltd.


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox