Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Ben Hutchings @ 2012-09-06 23:39 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Richard Cochran, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580785FA77@ORSMSX105.amr.corp.intel.com>

On Thu, 2012-09-06 at 23:24 +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > Sent: Thursday, September 06, 2012 4:23 PM
> > To: Keller, Jacob E; Richard Cochran
> > Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> > netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> > Subject: RE: [net-next 4/6] igb: Store the MAC address in the name in the
> > PTP struct.
> > 
> > On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > > > -----Original Message-----
> > > > From: netdev-owner@vger.kernel.org
> > > > [mailto:netdev-owner@vger.kernel.org]
> > > > On Behalf Of Richard Cochran
> > > > Sent: Thursday, September 06, 2012 1:06 AM
> > > > To: Kirsher, Jeffrey T
> > > > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > > > gospo@redhat.com; sassmann@redhat.com
> > > > Subject: Re: [net-next 4/6] igb: Store the MAC address in the name
> > > > in the PTP struct.
> > > >
> > > > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > > > From: Matthew Vick <matthew.vick@intel.com>
> > > > >
> > > > > Change the name of the adapter in the PTP struct to enable easier
> > > > > correlation between interface and PTP device.
> > > >
> > > > Still think it is better to have the driver name, not the MAC address.
> > > > The correlation is clear by using the ethtool method.
> > > >
> > >
> > > I found a method to correct the init code so that the device name can
> > > appear here, and I would prefer that over either the driver name or
> > > MAC address. It requires moving the ptp initialization into the netdev
> > > open handler though.
> > 
> > I don't think you should use the net device name for this, as net devices
> > can be renamed after creation.  (Though you could update the struct
> > ptp_clock_info whenever this happens.)
> > 
> > I'm starting to wonder what use there is for a 'clock name' distinct from
> > the clock device name.  What would be more useful is to give it a parent.
> > Richard, is there any reason why ptp_clock_register() doesn't allow
> > specifying the parent device?  You can then make the clock_name attribute
> > contain the parent device's driver name or whatever it is you prefer.
> > 
> > Ben.
> 
> Because ideally the ptp device belongs to multiple ethernet devices...
> 
> Sadly current intel hardware doesn't support this.

The PTP clock is instantiated by some driver, bound to some device,
whether that's a PCI device or a platform device or something else.  I
think that device (not a net device) should be the parent of the clock
device.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Keller, Jacob E @ 2012-09-06 23:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Richard Cochran, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <1346974770.2714.64.camel@bwh-desktop.uk.solarflarecom.com>



> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Thursday, September 06, 2012 4:40 PM
> To: Keller, Jacob E
> Cc: Richard Cochran; Kirsher, Jeffrey T; davem@davemloft.net; Vick,
> Matthew; netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> Subject: RE: [net-next 4/6] igb: Store the MAC address in the name in the
> PTP struct.
> 
> On Thu, 2012-09-06 at 23:24 +0000, Keller, Jacob E wrote:
> > > -----Original Message-----
> > > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > > Sent: Thursday, September 06, 2012 4:23 PM
> > > To: Keller, Jacob E; Richard Cochran
> > > Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> > > netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> > > Subject: RE: [net-next 4/6] igb: Store the MAC address in the name
> > > in the PTP struct.
> > >
> > > On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > > > > -----Original Message-----
> > > > > From: netdev-owner@vger.kernel.org
> > > > > [mailto:netdev-owner@vger.kernel.org]
> > > > > On Behalf Of Richard Cochran
> > > > > Sent: Thursday, September 06, 2012 1:06 AM
> > > > > To: Kirsher, Jeffrey T
> > > > > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > > > > gospo@redhat.com; sassmann@redhat.com
> > > > > Subject: Re: [net-next 4/6] igb: Store the MAC address in the
> > > > > name in the PTP struct.
> > > > >
> > > > > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > > > > From: Matthew Vick <matthew.vick@intel.com>
> > > > > >
> > > > > > Change the name of the adapter in the PTP struct to enable
> > > > > > easier correlation between interface and PTP device.
> > > > >
> > > > > Still think it is better to have the driver name, not the MAC
> address.
> > > > > The correlation is clear by using the ethtool method.
> > > > >
> > > >
> > > > I found a method to correct the init code so that the device name
> > > > can appear here, and I would prefer that over either the driver
> > > > name or MAC address. It requires moving the ptp initialization
> > > > into the netdev open handler though.
> > >
> > > I don't think you should use the net device name for this, as net
> > > devices can be renamed after creation.  (Though you could update the
> > > struct ptp_clock_info whenever this happens.)
> > >
> > > I'm starting to wonder what use there is for a 'clock name' distinct
> > > from the clock device name.  What would be more useful is to give it a
> parent.
> > > Richard, is there any reason why ptp_clock_register() doesn't allow
> > > specifying the parent device?  You can then make the clock_name
> > > attribute contain the parent device's driver name or whatever it is
> you prefer.
> > >
> > > Ben.
> >
> > Because ideally the ptp device belongs to multiple ethernet devices...
> >
> > Sadly current intel hardware doesn't support this.
> 
> The PTP clock is instantiated by some driver, bound to some device,
> whether that's a PCI device or a platform device or something else.  I
> think that device (not a net device) should be the parent of the clock
> device.
> 

Agreed, that would make the most sense.

> Ben.
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer;
> that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Sasha Levin @ 2012-09-07  0:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, kvm, virtualization
In-Reply-To: <20120906120828.GA1534@redhat.com>

Hi Michael,

On 09/06/2012 02:08 PM, Michael S. Tsirkin wrote:
> Add multiqueue support to virtio network device.
> Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
> configuration field max_virtqueue_pairs to detect supported number of
> virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
> packet steering.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Some comments about the change:

 - "The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
is set." => Should be "exist" (I think).

 - "When rule is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are
steered by driver to the first (param+1) multiqueue virtqueues
transmitq1...transmitqN;" - Why param+1?  I thought we ignore the default
transmit/receive in this case.

 - "As selecting a specific steering ais n optimization feature" - "is an".

 - It's mentioned several times that the ability to read the steering rule from
the virtio-net config is there for debug reasons. Is it really necessary? I
think it's the first time I see debug features go in as part of the spec.

 - I'm slightly confused, why are there both receive and transmit steering? I
can't find a difference in the way to configure the rule for transmit and
receive. Is it a plan for the future to allow different rules for tx and rx? If
so, shouldn't we use different ctrl commands (
VIRTIO_NET_CTRL_TX_STEERING/VIRTIO_NET_CTRL_RX_STEERING)?

 - "When rule is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered
to the default virtqueue receveq (0);" - "receiveq (0)"



Thanks,
Sasha

^ permalink raw reply

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-07  0:26 UTC (permalink / raw)
  To: Sasha Levin; +Cc: netdev, kvm, virtualization
In-Reply-To: <50493D04.1090408@gmail.com>

On Fri, Sep 07, 2012 at 02:17:08AM +0200, Sasha Levin wrote:
> Hi Michael,
> 
> On 09/06/2012 02:08 PM, Michael S. Tsirkin wrote:
> > Add multiqueue support to virtio network device.
> > Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
> > configuration field max_virtqueue_pairs to detect supported number of
> > virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
> > packet steering.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Some comments about the change:
> 
>  - "The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
> is set." => Should be "exist" (I think).
> 
>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are
> steered by driver to the first (param+1) multiqueue virtqueues
> transmitq1...transmitqN;" - Why param+1?  I thought we ignore the default
> transmit/receive in this case.

This is so that all values are valid. E.g. param=0 -> use q1.

> 
>  - "As selecting a specific steering ais n optimization feature" - "is an".
> 
>  - It's mentioned several times that the ability to read the steering rule from
> the virtio-net config is there for debug reasons. Is it really necessary? I
> think it's the first time I see debug features go in as part of the spec.

I actually think we need to work on more debug features.

>  - I'm slightly confused, why are there both receive and transmit steering? I
> can't find a difference in the way to configure the rule for transmit and
> receive. Is it a plan for the future to allow different rules for tx and rx? If
> so, shouldn't we use different ctrl commands (
> VIRTIO_NET_CTRL_TX_STEERING/VIRTIO_NET_CTRL_RX_STEERING)?

No. Receive steering is done by device. Documented for driver writer's
benefit. xmit streering by driver. documented for device benefit.
They must match for tcp to work well so I doubt we will have
commands to control them separately.

>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered
> to the default virtqueue receveq (0);" - "receiveq (0)"
> 
> 
> 
> Thanks,
> Sasha


Thanks for the comments will address!

^ permalink raw reply

* [PATCH] ip autoconfig: allow specifying a list of devices rather than one or all
From: Chris Friesen @ 2012-09-07  0:31 UTC (permalink / raw)
  To: netdev
In-Reply-To: <50492547.8070307@genband.com>

On 09/06/2012 04:35 PM, Chris Friesen wrote:

> We need to be able to boot off of both the gigE ports for redundancy, so 
> it seems like it might be a reasonable idea to extend the "ip=" boot arg 
> to allow specifying a list of devices rather than choosing between a 
> single device or all of them.
> 
> Thoughts? Anyone ever done this?

Just for kicks I tried coding it up.  This seems to work, so I include it
as fuel for discussion.  This applies to linus' current git.

Chris




Allow the user to specify up to four device names as part of IP
autoconfiguration.  This allows a degree of redundancy while also
allowing the user to limit which devices are opened by the kernel
during boot (this may be useful if some devices are very high traffic
or need special configuration).


Signed-off-by: Chris Friesen <chris.friesen@genband.com>


diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 67e8a6b..4d24ad4 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -157,7 +157,9 @@ static u8 ic_domain[64];		/* DNS (not NIS) domain name */
  */
 
 /* Name of user-selected boot device */
-static char user_dev_name[IFNAMSIZ] __initdata = { 0, };
+#define NUM_NAMED_BOOTDEV 4
+static int user_dev_count __initdata;
+static char user_dev_name[NUM_NAMED_BOOTDEV][IFNAMSIZ] __initdata;
 
 /* Protocols supported by available interfaces */
 static int ic_proto_have_if __initdata = 0;
@@ -189,16 +191,27 @@ struct ic_device {
 static struct ic_device *ic_first_dev __initdata = NULL;/* List of open device */
 static struct net_device *ic_dev __initdata = NULL;	/* Selected device */
 
+static int __init dev_in_userdevlist(struct net_device *dev)
+{
+	int i;
+	for(i=0;i<NUM_NAMED_BOOTDEV;i++) {
+		if (!strcmp(dev->name, user_dev_name[i]))
+			return 1;
+	}
+	return 0;
+}
+
 static bool __init ic_is_init_dev(struct net_device *dev)
 {
 	if (dev->flags & IFF_LOOPBACK)
 		return false;
-	return user_dev_name[0] ? !strcmp(dev->name, user_dev_name) :
+	return user_dev_count ? dev_in_userdevlist(dev) :
 	    (!(dev->flags & IFF_LOOPBACK) &&
 	     (dev->flags & (IFF_POINTOPOINT|IFF_BROADCAST)) &&
 	     strncmp(dev->name, "dummy", 5));
 }
 
+
 static int __init ic_open_devs(void)
 {
 	struct ic_device *d, **last;
@@ -274,9 +287,8 @@ have_carrier:
 	*last = NULL;
 
 	if (!ic_first_dev) {
-		if (user_dev_name[0])
-			pr_err("IP-Config: Device `%s' not found\n",
-			       user_dev_name);
+		if (user_dev_count)
+			pr_err("IP-Config: Specified boot device(s) not found\n");
 		else
 			pr_err("IP-Config: No network devices available\n");
 		return -ENODEV;
@@ -1605,7 +1617,16 @@ static int __init ip_auto_config_setup(char *addrs)
 				ic_host_name_set = 1;
 				break;
 			case 5:
-				strlcpy(user_dev_name, ip, sizeof(user_dev_name));
+				do {
+					dp = strchr(ip, ',');
+					if (dp)
+						*dp++ = '\0';
+					strlcpy(user_dev_name[user_dev_count], ip,
+						sizeof(user_dev_name[0]));
+					user_dev_count++;
+					if (dp)
+						ip = dp;
+				} while (dp && (user_dev_count < NUM_NAMED_BOOTDEV));
 				break;
 			case 6:
 				if (ic_proto_name(ip) == 0 &&

^ permalink raw reply related

* Re: [PATCH v2 2/6] net: pxaficp_ir: add irq resources
From: Eric Miao @ 2012-09-07  3:54 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel, Russell King, Arnd Bergmann, Olof Johansson,
	Linus Walleij, Rob Herring, Samuel Ortiz, Haojian Zhuang, netdev
In-Reply-To: <1346960563-18689-3-git-send-email-robherring2@gmail.com>

On Fri, Sep 7, 2012 at 3:42 AM, Rob Herring <robherring2@gmail.com> wrote:
> From: Rob Herring <rob.herring@calxeda.com>
>
> In order to remove dependency on mach/irqs.h, add platform device
> resources for irqs.
>

Looks good to me, we shall have this change landed long before this, thanks Rob

Acked-by: Eric Miao <eric.y.miao@gmail.com>

> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Cc: Samuel Ortiz <samuel@sortiz.org>
> Cc: Eric Miao <eric.y.miao@gmail.com>
> Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
> Cc: netdev@vger.kernel.org
> ---
>  arch/arm/mach-pxa/devices.c   |   15 +++++++++++++++
>  drivers/net/irda/pxaficp_ir.c |   28 +++++++++++++++++-----------
>  2 files changed, 32 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/mach-pxa/devices.c b/arch/arm/mach-pxa/devices.c
> index 166eee5..339eb2a 100644
> --- a/arch/arm/mach-pxa/devices.c
> +++ b/arch/arm/mach-pxa/devices.c
> @@ -384,9 +384,24 @@ struct platform_device pxa_device_asoc_platform = {
>
>  static u64 pxaficp_dmamask = ~(u32)0;
>
> +static struct resource pxa_ir_resources[] = {
> +       [0] = {
> +               .start  = IRQ_STUART,
> +               .end    = IRQ_STUART,
> +               .flags  = IORESOURCE_IRQ,
> +       },
> +       [1] = {
> +               .start  = IRQ_ICP,
> +               .end    = IRQ_ICP,
> +               .flags  = IORESOURCE_IRQ,
> +       },
> +};
> +
>  struct platform_device pxa_device_ficp = {
>         .name           = "pxa2xx-ir",
>         .id             = -1,
> +       .num_resources  = ARRAY_SIZE(pxa_ir_resources),
> +       .resource       = pxa_ir_resources,
>         .dev            = {
>                 .dma_mask = &pxaficp_dmamask,
>                 .coherent_dma_mask = 0xffffffff,
> diff --git a/drivers/net/irda/pxaficp_ir.c b/drivers/net/irda/pxaficp_ir.c
> index 8d54767..cb0a5d3 100644
> --- a/drivers/net/irda/pxaficp_ir.c
> +++ b/drivers/net/irda/pxaficp_ir.c
> @@ -29,8 +29,8 @@
>
>  #include <mach/dma.h>
>  #include <mach/irda.h>
> -#include <mach/regs-uart.h>
>  #include <mach/regs-ost.h>
> +#include <mach/regs-uart.h>
>
>  #define FICP           __REG(0x40800000)  /* Start of FICP area */
>  #define ICCR0          __REG(0x40800000)  /* ICP Control Register 0 */
> @@ -112,6 +112,9 @@ struct pxa_irda {
>         int                     txdma;
>         int                     rxdma;
>
> +       int                     uart_irq;
> +       int                     icp_irq;
> +
>         struct irlap_cb         *irlap;
>         struct qos_info         qos;
>
> @@ -672,19 +675,19 @@ static int pxa_irda_start(struct net_device *dev)
>
>         si->speed = 9600;
>
> -       err = request_irq(IRQ_STUART, pxa_irda_sir_irq, 0, dev->name, dev);
> +       err = request_irq(si->uart_irq, pxa_irda_sir_irq, 0, dev->name, dev);
>         if (err)
>                 goto err_irq1;
>
> -       err = request_irq(IRQ_ICP, pxa_irda_fir_irq, 0, dev->name, dev);
> +       err = request_irq(si->icp_irq, pxa_irda_fir_irq, 0, dev->name, dev);
>         if (err)
>                 goto err_irq2;
>
>         /*
>          * The interrupt must remain disabled for now.
>          */
> -       disable_irq(IRQ_STUART);
> -       disable_irq(IRQ_ICP);
> +       disable_irq(si->uart_irq);
> +       disable_irq(si->icp_irq);
>
>         err = -EBUSY;
>         si->rxdma = pxa_request_dma("FICP_RX",DMA_PRIO_LOW, pxa_irda_fir_dma_rx_irq, dev);
> @@ -720,8 +723,8 @@ static int pxa_irda_start(struct net_device *dev)
>         /*
>          * Now enable the interrupt and start the queue
>          */
> -       enable_irq(IRQ_STUART);
> -       enable_irq(IRQ_ICP);
> +       enable_irq(si->uart_irq);
> +       enable_irq(si->icp_irq);
>         netif_start_queue(dev);
>
>         printk(KERN_DEBUG "pxa_ir: irda driver opened\n");
> @@ -738,9 +741,9 @@ err_dma_rx_buff:
>  err_tx_dma:
>         pxa_free_dma(si->rxdma);
>  err_rx_dma:
> -       free_irq(IRQ_ICP, dev);
> +       free_irq(si->icp_irq, dev);
>  err_irq2:
> -       free_irq(IRQ_STUART, dev);
> +       free_irq(si->uart_irq, dev);
>  err_irq1:
>
>         return err;
> @@ -760,8 +763,8 @@ static int pxa_irda_stop(struct net_device *dev)
>                 si->irlap = NULL;
>         }
>
> -       free_irq(IRQ_STUART, dev);
> -       free_irq(IRQ_ICP, dev);
> +       free_irq(si->uart_irq, dev);
> +       free_irq(si->icp_irq, dev);
>
>         pxa_free_dma(si->rxdma);
>         pxa_free_dma(si->txdma);
> @@ -851,6 +854,9 @@ static int pxa_irda_probe(struct platform_device *pdev)
>         si->dev = &pdev->dev;
>         si->pdata = pdev->dev.platform_data;
>
> +       si->uart_irq = platform_get_irq(pdev, 0);
> +       si->icp_irq = platform_get_irq(pdev, 1);
> +
>         si->sir_clk = clk_get(&pdev->dev, "UARTCLK");
>         si->fir_clk = clk_get(&pdev->dev, "FICPCLK");
>         if (IS_ERR(si->sir_clk) || IS_ERR(si->fir_clk)) {
> --
> 1.7.9.5
>

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Ming Lei @ 2012-09-07  3:55 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Bjørn Mork, alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1676538.CEPj2RtrPe-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

On Fri, Sep 7, 2012 at 1:56 AM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
> On Friday 07 September 2012 00:09:13 Ming Lei wrote:
>> On Thu, Sep 6, 2012 at 4:30 PM, Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org> wrote:
>> > Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> >> Looks the introduced .tx_bundle is not necessary since .tx_fixup is OK.
>> >
>> > The minidriver does not have any information about tx in progress.  The
>>
>> Inside .tx_fixup, the low level driver will get the tx progress information.
>
> That information changes after tx_fixup

You mean the low level drive can't see if the later transmission
for the aggregated frame is successful?  If so, there is no any complete
notification to all low level drivers (with aggregation capability
or not) now, isn't there?

If no aggregated frame is ready for transmission, the information
can't be changed after current tx_fixup and before the next .tx_fixup.

>
>> > strategy above, which is what cdc_ncm uses today, is fine as long as you
>> > always want to wait as long as you always want to fill as many frames as
>> > possible in each packet.  But what if the queue is empty and the device
>>
>> For cdc_ncm, the wait time is controlled by cdc_ncm driver, see
>> cdc_ncm_tx_timeout_start().
>
> Well, that is the mistake. Using a timer is a bad idea.

Why is a bad idea?  Without a timer, how can usbnet or
low level driver aggregate the later coming transmit frames to
form a biger aggregated frame?

>
>> If we can abstract some common things about aggregation, it should be
>> meaningful. As far as I know, most of aggregation protocol is very different,
>> so almost all aggregation work is only done by low level driver, such as
>> cdc_ncm.
>>
>> If we want to implement some aggregation framework, maybe below is
>> one solution, correct me if it is wrong.
>
> It isn't so much wrong as incomplete.
>
> It seems to me we can have
>
> - can queue, buffer not full -> do nothing more

As I said above, without a timeout timer, the queued skb
might not be sent out even some long time passed if no
frame comes later.

So could you add the wait for more mechanism to your patch
for further discussion?

> - can queue, buffer full -> transmit
> - cannot queue, buffer full -> transmit and then try again to queue

This might not happen, the buffer full should have been observed
in last xmit path.

>
> and an error case
>
> - cannot queue, buffer not full



Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Increased multicast packet drops in 3.4
From: Shawn Bohrer @ 2012-09-07  4:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1346937667.2484.33.camel@edumazet-glaptop>

On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> kfree_skb() can free a list of skb, and we use a generic function to do
> so, without forwarding the drop/notdrop status. So its unfortunate, but
> adding extra parameters just for the sake of drop_monitor is not worth
> it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()

I understand that this means that dropwatch or the skb:kfree_skb
tracepoint won't know if the packet was really dropped, but do we
know in this case from the context of the stack trace?  I'm assuming
since we didn't receive an error that the packet was delivered and
these aren't real drops.

> Are you receiving fragmented UDP frames ?

I looked at the sending application and it yes it is possible it is
sending fragmented frames.

> I ask this because with latest kernels (linux-3.5), we should no longer
> build a list of skb, but a single skb with page fragments.
> 
> commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat May 19 03:02:20 2012 +0000
> 
>     ipv4: use skb coalescing in defragmentation
>     
>     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
>     reducing memory used by them (truesize), and reducing number of cache
>     line misses and overhead for the consumer.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

I'll have to give 3.5 a try tomorrow and see if it has the same
problem.  After backporting all of your patches to convert kfree_skb()
to consume_skb() to 3.4 I actually don't have that many different
places hitting the skb:kfree_skb tracepoint at the time of the drop.
Here are some of the ones I have left that might be relevant.

          <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
          <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
=> ip_rcv (ffffffff81404e33)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
skb->pkt_type == PACKET_OTHERHOST?

          <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
          <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
=> ip_rcv_finish (ffffffff81404660)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

I see two places here that I might be hitting that don't increment any
counters.  I can try instrumenting these to see which one I hit.

          <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
          <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
=> llc_rcv (ffffffffa00492d4)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

This is protocol=4 so I don't know if it is really relevant but then
again I don't know what this is.

          <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
          <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
=> igmp_rcv (ffffffff8143bfa8)
=> ip_local_deliver_finish (ffffffff814049ed)
=> ip_local_deliver (ffffffff81404d1a)
=> ip_rcv_finish (ffffffff814046ad)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> mlx4_en_process_rx_cq (ffffffffa010a4fe)
=> mlx4_en_poll_rx_cq (ffffffffa010a9ef)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

Also don't know if this one is relevant.  This looks like an igmp
packet so probably not my drop, but I am receiving multicast packets
in this case so maybe it is somehow related.

If any of these spark any ideas let me know otherwise I'll keep
digging and try 3.5/3.6 tomorrow.

Thanks,
Shawn


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply

* [PATCH net-next] scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie.
From: Eric W. Biederman @ 2012-09-07  4:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Serge E. Hallyn, Eric Dumazet


Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace.  Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
  that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
  cred_to_ucred to fill out struct ucred.  This conversion ensures
  userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
  before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
  instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
  into the NETLINK_CB.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/netlink.h  |    3 ++-
 include/net/scm.h        |   23 +++++++++++++++++++----
 net/core/scm.c           |   17 +++++++++++------
 net/netlink/af_netlink.c |    2 +-
 4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index c9fdde2..df73cf4 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -153,6 +153,7 @@ struct nlattr {
 
 #include <linux/capability.h>
 #include <linux/skbuff.h>
+#include <net/scm.h>
 
 struct net;
 
@@ -162,7 +163,7 @@ static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)
 }
 
 struct netlink_skb_parms {
-	struct ucred		creds;		/* Skb credentials	*/
+	struct scm_creds	creds;		/* Skb credentials	*/
 	__u32			pid;
 	__u32			dst_group;
 	struct sock		*ssk;
diff --git a/include/net/scm.h b/include/net/scm.h
index 079d788..000b6f7 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -12,6 +12,12 @@
  */
 #define SCM_MAX_FD	253
 
+struct scm_creds {
+	u32	pid;
+	kuid_t	uid;
+	kgid_t	gid;
+};
+
 struct scm_fp_list {
 	short			count;
 	short			max;
@@ -22,7 +28,7 @@ struct scm_cookie {
 	struct pid		*pid;		/* Skb credentials */
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
-	struct ucred		creds;		/* Skb credentials	*/
+	struct scm_creds	creds;		/* Skb credentials	*/
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Passed security ID 	*/
 #endif
@@ -49,7 +55,9 @@ static __inline__ void scm_set_cred(struct scm_cookie *scm,
 {
 	scm->pid  = get_pid(pid);
 	scm->cred = cred ? get_cred(cred) : NULL;
-	cred_to_ucred(pid, cred, &scm->creds);
+	scm->creds.pid = pid_vnr(pid);
+	scm->creds.uid = cred ? cred->euid : INVALID_UID;
+	scm->creds.gid = cred ? cred->egid : INVALID_GID;
 }
 
 static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
@@ -110,8 +118,15 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 		return;
 	}
 
-	if (test_bit(SOCK_PASSCRED, &sock->flags))
-		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
+	if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+		struct user_namespace *current_ns = current_user_ns();
+		struct ucred ucreds = {
+			.pid = scm->creds.pid,
+			.uid = from_kuid_munged(current_ns, scm->creds.uid),
+			.gid = from_kgid_munged(current_ns, scm->creds.gid),
+		};
+		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
+	}
 
 	scm_destroy_cred(scm);
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 5472ae7..64b41d8 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -155,19 +155,21 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			break;
 		case SCM_CREDENTIALS:
 		{
+			struct ucred creds;
 			kuid_t uid;
 			kgid_t gid;
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
-			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			memcpy(&creds, CMSG_DATA(cmsg), sizeof(struct ucred));
+			err = scm_check_creds(&creds);
 			if (err)
 				goto error;
 
-			if (!p->pid || pid_vnr(p->pid) != p->creds.pid) {
+			p->creds.pid = creds.pid;
+			if (!p->pid || pid_vnr(p->pid) != creds.pid) {
 				struct pid *pid;
 				err = -ESRCH;
-				pid = find_get_pid(p->creds.pid);
+				pid = find_get_pid(creds.pid);
 				if (!pid)
 					goto error;
 				put_pid(p->pid);
@@ -175,11 +177,14 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			}
 
 			err = -EINVAL;
-			uid = make_kuid(current_user_ns(), p->creds.uid);
-			gid = make_kgid(current_user_ns(), p->creds.gid);
+			uid = make_kuid(current_user_ns(), creds.uid);
+			gid = make_kgid(current_user_ns(), creds.gid);
 			if (!uid_valid(uid) || !gid_valid(gid))
 				goto error;
 
+			p->creds.uid = uid;
+			p->creds.gid = gid;
+
 			if (!p->cred ||
 			    !uid_eq(p->cred->euid, uid) ||
 			    !gid_eq(p->cred->egid, gid)) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7cb7867..6473267 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1398,7 +1398,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 
 	NETLINK_CB(skb).pid	= nlk->pid;
 	NETLINK_CB(skb).dst_group = dst_group;
-	memcpy(NETLINK_CREDS(skb), &siocb->scm->creds, sizeof(struct ucred));
+	NETLINK_CB(skb).creds	= siocb->scm->creds;
 
 	err = -EFAULT;
 	if (memcpy_fromiovec(skb_put(skb, len), msg->msg_iov, len)) {
-- 
1.7.5.4

^ permalink raw reply related

* Re: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Richard Cochran @ 2012-09-07  5:51 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Ben Hutchings, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580785FAB4@ORSMSX105.amr.corp.intel.com>

On Thu, Sep 06, 2012 at 11:40:23PM +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > 
> > The PTP clock is instantiated by some driver, bound to some device,
> > whether that's a PCI device or a platform device or something else.  I
> > think that device (not a net device) should be the parent of the clock
> > device.
> > 
> 
> Agreed, that would make the most sense.

Okay, makes sense, but that will require a change to the phc core.

For the present, I can buy Jacob's argument that the MAC address is a
way to clarify to the user that there are "unfortunate" multiple clock
instances.

Thanks,
Richard

^ permalink raw reply

* Re: Increased multicast packet drops in 3.4
From: Eric Dumazet @ 2012-09-07  6:08 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev
In-Reply-To: <20120907040043.GA2714@BohrerMBP.rgmadvisors.com>

On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> > kfree_skb() can free a list of skb, and we use a generic function to do
> > so, without forwarding the drop/notdrop status. So its unfortunate, but
> > adding extra parameters just for the sake of drop_monitor is not worth
> > it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
> 
> I understand that this means that dropwatch or the skb:kfree_skb
> tracepoint won't know if the packet was really dropped, but do we
> know in this case from the context of the stack trace?  I'm assuming
> since we didn't receive an error that the packet was delivered and
> these aren't real drops.

I am starting to believe this is an application error.

This application uses recvmmsg() to fetch a lot of messages in one
syscall, and it might well be it throws out a batch of 50+ messages
because of an application bug. Yes, this starts with 3.4, but it can b
triggered by a timing difference or something that is not a proper
kernel bug...

> 
> > Are you receiving fragmented UDP frames ?
> 
> I looked at the sending application and it yes it is possible it is
> sending fragmented frames.
> 
> > I ask this because with latest kernels (linux-3.5), we should no longer
> > build a list of skb, but a single skb with page fragments.
> > 
> > commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Sat May 19 03:02:20 2012 +0000
> > 
> >     ipv4: use skb coalescing in defragmentation
> >     
> >     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
> >     reducing memory used by them (truesize), and reducing number of cache
> >     line misses and overhead for the consumer.
> >     
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I'll have to give 3.5 a try tomorrow and see if it has the same
> problem.  After backporting all of your patches to convert kfree_skb()
> to consume_skb() to 3.4 I actually don't have that many different
> places hitting the skb:kfree_skb tracepoint at the time of the drop.
> Here are some of the ones I have left that might be relevant.
> 
>           <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
>           <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
> => ip_rcv (ffffffff81404e33)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
> IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
> skb->pkt_type == PACKET_OTHERHOST?
> 
>           <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
>           <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
> => ip_rcv_finish (ffffffff81404660)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> I see two places here that I might be hitting that don't increment any
> counters.  I can try instrumenting these to see which one I hit.
> 
>           <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
>           <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
> => llc_rcv (ffffffffa00492d4)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> This is protocol=4 so I don't know if it is really relevant but then
> again I don't know what this is.

You can ignore this

> 
>           <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
>           <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
> => igmp_rcv (ffffffff8143bfa8)
> => ip_local_deliver_finish (ffffffff814049ed)
> => ip_local_deliver (ffffffff81404d1a)
> => ip_rcv_finish (ffffffff814046ad)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => mlx4_en_process_rx_cq (ffffffffa010a4fe)
> => mlx4_en_poll_rx_cq (ffffffffa010a9ef)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> Also don't know if this one is relevant.  This looks like an igmp
> packet so probably not my drop, but I am receiving multicast packets
> in this case so maybe it is somehow related.

Yes, we need to change igmp to call consume_skb() for all frames that
were properly handled.

So you can ignore this as well.

^ permalink raw reply

* [PATCH net-next] igmp: avoid drop_monitor false positives
From: Eric Dumazet @ 2012-09-07  6:37 UTC (permalink / raw)
  To: David Miller; +Cc: Shawn Bohrer, netdev

From: Eric Dumazet <edumazet@google.com>

igmp should call consume_skb() for all correctly processed packets,
to avoid false dropwatch/drop_monitor false positives.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/igmp.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 0b5580c..3479b98 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -815,14 +815,15 @@ static int igmp_marksources(struct ip_mc_list *pmc, int nsrcs, __be32 *srcs)
 	return 1;
 }
 
-static void igmp_heard_report(struct in_device *in_dev, __be32 group)
+/* return true if packet was dropped */
+static bool igmp_heard_report(struct in_device *in_dev, __be32 group)
 {
 	struct ip_mc_list *im;
 
 	/* Timers are only set for non-local groups */
 
 	if (group == IGMP_ALL_HOSTS)
-		return;
+		return false;
 
 	rcu_read_lock();
 	for_each_pmc_rcu(in_dev, im) {
@@ -832,9 +833,11 @@ static void igmp_heard_report(struct in_device *in_dev, __be32 group)
 		}
 	}
 	rcu_read_unlock();
+	return false;
 }
 
-static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
+/* return true if packet was dropped */
+static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 	int len)
 {
 	struct igmphdr 		*ih = igmp_hdr(skb);
@@ -866,7 +869,7 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 		/* clear deleted report items */
 		igmpv3_clear_delrec(in_dev);
 	} else if (len < 12) {
-		return;	/* ignore bogus packet; freed by caller */
+		return true;	/* ignore bogus packet; freed by caller */
 	} else if (IGMP_V1_SEEN(in_dev)) {
 		/* This is a v3 query with v1 queriers present */
 		max_delay = IGMP_Query_Response_Interval;
@@ -883,13 +886,13 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			max_delay = 1;	/* can't mod w/ 0 */
 	} else { /* v3 */
 		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
-			return;
+			return true;
 
 		ih3 = igmpv3_query_hdr(skb);
 		if (ih3->nsrcs) {
 			if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)
 					   + ntohs(ih3->nsrcs)*sizeof(__be32)))
-				return;
+				return true;
 			ih3 = igmpv3_query_hdr(skb);
 		}
 
@@ -901,9 +904,9 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			in_dev->mr_qrv = ih3->qrv;
 		if (!group) { /* general query */
 			if (ih3->nsrcs)
-				return;	/* no sources allowed */
+				return false;	/* no sources allowed */
 			igmp_gq_start_timer(in_dev);
-			return;
+			return false;
 		}
 		/* mark sources to include, if group & source-specific */
 		mark = ih3->nsrcs != 0;
@@ -939,6 +942,7 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			igmp_mod_timer(im, max_delay);
 	}
 	rcu_read_unlock();
+	return false;
 }
 
 /* called in rcu_read_lock() section */
@@ -948,6 +952,7 @@ int igmp_rcv(struct sk_buff *skb)
 	struct igmphdr *ih;
 	struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
 	int len = skb->len;
+	bool dropped = true;
 
 	if (in_dev == NULL)
 		goto drop;
@@ -969,7 +974,7 @@ int igmp_rcv(struct sk_buff *skb)
 	ih = igmp_hdr(skb);
 	switch (ih->type) {
 	case IGMP_HOST_MEMBERSHIP_QUERY:
-		igmp_heard_query(in_dev, skb, len);
+		dropped = igmp_heard_query(in_dev, skb, len);
 		break;
 	case IGMP_HOST_MEMBERSHIP_REPORT:
 	case IGMPV2_HOST_MEMBERSHIP_REPORT:
@@ -979,7 +984,7 @@ int igmp_rcv(struct sk_buff *skb)
 		/* don't rely on MC router hearing unicast reports */
 		if (skb->pkt_type == PACKET_MULTICAST ||
 		    skb->pkt_type == PACKET_BROADCAST)
-			igmp_heard_report(in_dev, ih->group);
+			dropped = igmp_heard_report(in_dev, ih->group);
 		break;
 	case IGMP_PIM:
 #ifdef CONFIG_IP_PIMSM_V1
@@ -997,7 +1002,10 @@ int igmp_rcv(struct sk_buff *skb)
 	}
 
 drop:
-	kfree_skb(skb);
+	if (dropped)
+		kfree_skb(skb);
+	else
+		consume_skb(skb);
 	return 0;
 }
 

^ permalink raw reply related

* Congratulations: You Have Won.
From: Mr.Tan Wong @ 2012-09-06 20:40 UTC (permalink / raw)
  To: info

Congratulations: You Have Won

The MICROSOFT EMAIL PROMO TEAM is glad to announce that
after a successful completion of the PROMO DRAWS held
on the 1st September 2012 your e-mail address,attached
to winning numbers: (11) (80) (12)(96) (09) (43) won in
the Tenth lottery category.

You have therefore been approved to claim a total sum
of? 450,000,00 Pounds Sterling in cash credited to REF
NO:MICRO-L/2009-END10.

All participants were selected through our Microsoft
computer ballot system drawn from 167,000 Names, as
part of our International "E-MAIL" Promotion Program
for our prominent MS-WORD users all over the world and
for the continuous use of the Internet. You are advised
to contact the claims-processor with the details below
via his e-mail address :

NAME:  Mr. Harris Conklin
EMAIL: mrharrisconklin1@yahoo.com.hk

PLEASE NOTE YOU ARE TO SEND THE FOLLOWING INFORMATION
TO CLAIM YOUR PRIZE:

1.Full Name:.......
2.Address:.......
3.Phone:.......
4.Country:.......
5.Sex/Gender:.......

YOURS SINCERELY,
Mr. Harris Conklin.

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Oliver Neukum @ 2012-09-07  7:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Bjørn Mork, alexey.orishko, netdev, linux-usb
In-Reply-To: <CACVXFVM9OdUbmOFvf1KG4ePoz7wCxLRwCET3cgL=EynNBLx2AA@mail.gmail.com>

On Friday 07 September 2012 11:55:53 Ming Lei wrote:
> On Fri, Sep 7, 2012 at 1:56 AM, Oliver Neukum <oneukum@suse.de> wrote:
> > On Friday 07 September 2012 00:09:13 Ming Lei wrote:
> >> On Thu, Sep 6, 2012 at 4:30 PM, Bjørn Mork <bjorn@mork.no> wrote:
> >> > Ming Lei <tom.leiming@gmail.com> writes:
> >
> >> >> Looks the introduced .tx_bundle is not necessary since .tx_fixup is OK.
> >> >
> >> > The minidriver does not have any information about tx in progress.  The
> >>
> >> Inside .tx_fixup, the low level driver will get the tx progress information.
> >
> > That information changes after tx_fixup
> 
> You mean the low level drive can't see if the later transmission
> for the aggregated frame is successful?  If so, there is no any complete

Not successful, necessary.

> notification to all low level drivers (with aggregation capability
> or not) now, isn't there?

There doesn't need to be.

> > Well, that is the mistake. Using a timer is a bad idea.
> 
> Why is a bad idea?  Without a timer, how can usbnet or
> low level driver aggregate the later coming transmit frames to
> form a biger aggregated frame?

By registering demand in an abstract sense. The timer makes sense
only when no other buffers are being transfered.

The timer means that we transmit if no other more packages are coming
down from the upper layers. Now, either the device is still busy or it is not.
While it is busy, we might just as well wait, because the upper layer may
change its mind.
If the device is not busy we worsen latency by waiting for the timer.

In addition the timer causes complications with recursing into usbnet
and hassle with suspend/resume and disconnection.

> >> If we can abstract some common things about aggregation, it should be
> >> meaningful. As far as I know, most of aggregation protocol is very different,
> >> so almost all aggregation work is only done by low level driver, such as
> >> cdc_ncm.
> >>
> >> If we want to implement some aggregation framework, maybe below is
> >> one solution, correct me if it is wrong.
> >
> > It isn't so much wrong as incomplete.
> >
> > It seems to me we can have
> >
> > - can queue, buffer not full -> do nothing more
> 
> As I said above, without a timeout timer, the queued skb
> might not be sent out even some long time passed if no
> frame comes later.

Therefore we check whether the device is still busy.

> So could you add the wait for more mechanism to your patch
> for further discussion?

I am attaching a version that is algorithmically complete
except for error handling.

> > - can queue, buffer full -> transmit
> > - cannot queue, buffer full -> transmit and then try again to queue
> 
> This might not happen, the buffer full should have been observed
> in last xmit path.

Possibly, but the driver cannot know in advance how large the next package
will be.

	Reagrds
		Oliver

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 4cd582a..56ef743 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -135,9 +135,6 @@ struct cdc_ncm_ctx {
 	u16 connected;
 };
 
-static void cdc_ncm_txpath_bh(unsigned long param);
-static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx *ctx);
-static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *hr_timer);
 static struct usb_driver cdc_ncm_driver;
 
 static void
@@ -464,10 +461,6 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf)
 	if (ctx == NULL)
 		return -ENODEV;
 
-	hrtimer_init(&ctx->tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	ctx->tx_timer.function = &cdc_ncm_tx_timer_cb;
-	ctx->bh.data = (unsigned long)ctx;
-	ctx->bh.func = cdc_ncm_txpath_bh;
 	atomic_set(&ctx->stop, 0);
 	spin_lock_init(&ctx->mtx);
 	ctx->netdev = dev->net;
@@ -650,7 +643,7 @@ static void cdc_ncm_zero_fill(u8 *ptr, u32 first, u32 end, u32 max)
 	memset(ptr + first, 0, end - first);
 }
 
-static struct sk_buff *
+static int
 cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 {
 	struct sk_buff *skb_out;
@@ -659,12 +652,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 	u32 last_offset;
 	u16 n = 0, index;
 	u8 ready2send = 0;
-
-	/* if there is a remaining skb, it gets priority */
-	if (skb != NULL)
-		swap(skb, ctx->tx_rem_skb);
-	else
-		ready2send = 1;
+	u8 error = 0;
 
 	/*
 	 * +----------------+
@@ -690,7 +678,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 				dev_kfree_skb_any(skb);
 				ctx->netdev->stats.tx_dropped++;
 			}
-			goto exit_no_skb;
+			return -EBUSY;
 		}
 
 		/* make room for NTH and NDP */
@@ -719,28 +707,15 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		/* compute maximum buffer size */
 		rem = ctx->tx_max - offset;
 
-		if (skb == NULL) {
-			skb = ctx->tx_rem_skb;
-			ctx->tx_rem_skb = NULL;
-
-			/* check for end of skb */
-			if (skb == NULL)
-				break;
-		}
-
 		if (skb->len > rem) {
 			if (n == 0) {
 				/* won't fit, MTU problem? */
 				dev_kfree_skb_any(skb);
 				skb = NULL;
 				ctx->netdev->stats.tx_dropped++;
+				error = 1;
 			} else {
-				/* no room for skb - store for later */
-				if (ctx->tx_rem_skb != NULL) {
-					dev_kfree_skb_any(ctx->tx_rem_skb);
-					ctx->netdev->stats.tx_dropped++;
-				}
-				ctx->tx_rem_skb = skb;
+
 				skb = NULL;
 				ready2send = 1;
 			}
@@ -768,13 +743,6 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		skb = NULL;
 	}
 
-	/* free up any dangling skb */
-	if (skb != NULL) {
-		dev_kfree_skb_any(skb);
-		skb = NULL;
-		ctx->netdev->stats.tx_dropped++;
-	}
-
 	ctx->tx_curr_frame_num = n;
 
 	if (n == 0) {
@@ -791,9 +759,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		ctx->tx_curr_skb = skb_out;
 		ctx->tx_curr_offset = offset;
 		ctx->tx_curr_last_offset = last_offset;
-		/* set the pending count */
-		if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
-			ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
+
 		goto exit_no_skb;
 
 	} else {
@@ -874,71 +840,37 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 	/* return skb */
 	ctx->tx_curr_skb = NULL;
 	ctx->netdev->stats.tx_packets += ctx->tx_curr_frame_num;
-	return skb_out;
 
-exit_no_skb:
-	/* Start timer, if there is a remaining skb */
-	if (ctx->tx_curr_skb != NULL)
-		cdc_ncm_tx_timeout_start(ctx);
-	return NULL;
-}
-
-static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx *ctx)
-{
-	/* start timer, if not already started */
-	if (!(hrtimer_active(&ctx->tx_timer) || atomic_read(&ctx->stop)))
-		hrtimer_start(&ctx->tx_timer,
-				ktime_set(0, CDC_NCM_TIMER_INTERVAL),
-				HRTIMER_MODE_REL);
-}
+	if (error)
+		return -EBUSY;
+	if (ready2send)
+		return -EBUSY;
+	else
+		return 0;
 
-static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *timer)
-{
-	struct cdc_ncm_ctx *ctx =
-			container_of(timer, struct cdc_ncm_ctx, tx_timer);
+exit_no_skb:
 
-	if (!atomic_read(&ctx->stop))
-		tasklet_schedule(&ctx->bh);
-	return HRTIMER_NORESTART;
+	return -EAGAIN;
 }
 
-static void cdc_ncm_txpath_bh(unsigned long param)
+static int cdc_ncm_tx_bundle(struct usbnet *dev, struct sk_buff *skb, gfp_t flags)
 {
-	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)param;
-
-	spin_lock_bh(&ctx->mtx);
-	if (ctx->tx_timer_pending != 0) {
-		ctx->tx_timer_pending--;
-		cdc_ncm_tx_timeout_start(ctx);
-		spin_unlock_bh(&ctx->mtx);
-	} else if (ctx->netdev != NULL) {
-		spin_unlock_bh(&ctx->mtx);
-		netif_tx_lock_bh(ctx->netdev);
-		usbnet_start_xmit(NULL, ctx->netdev);
-		netif_tx_unlock_bh(ctx->netdev);
-	}
+	int err;
+	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
+	
+	err = cdc_ncm_fill_tx_frame(ctx, skb);
+	return err;
 }
 
 static struct sk_buff *
 cdc_ncm_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags)
 {
-	struct sk_buff *skb_out;
 	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
 
-	/*
-	 * The Ethernet API we are using does not support transmitting
-	 * multiple Ethernet frames in a single call. This driver will
-	 * accumulate multiple Ethernet frames and send out a larger
-	 * USB frame when the USB buffer is full or when a single jiffies
-	 * timeout happens.
-	 */
 	if (ctx == NULL)
 		goto error;
 
-	spin_lock_bh(&ctx->mtx);
-	skb_out = cdc_ncm_fill_tx_frame(ctx, skb);
-	spin_unlock_bh(&ctx->mtx);
-	return skb_out;
+	return ctx->tx_curr_skb;
 
 error:
 	if (skb != NULL)
@@ -1197,6 +1129,7 @@ static const struct driver_info cdc_ncm_info = {
 	.manage_power = cdc_ncm_manage_power,
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
+	.tx_bundle = cdc_ncm_tx_bundle,
 	.tx_fixup = cdc_ncm_tx_fixup,
 };
 
@@ -1211,6 +1144,7 @@ static const struct driver_info wwan_info = {
 	.manage_power = cdc_ncm_manage_power,
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
+	.tx_bundle = cdc_ncm_tx_bundle,
 	.tx_fixup = cdc_ncm_tx_fixup,
 };
 
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 8531c1c..429b08b 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -86,6 +86,8 @@ static u8	node_id [ETH_ALEN];
 
 static const char driver_name [] = "usbnet";
 
+static void tx_complete(struct urb *urb);
+
 /* use ethtool to change the level for any given device */
 static int msg_level = -1;
 module_param (msg_level, int, 0);
@@ -1016,17 +1018,76 @@ skip_reset:
 		netdev_dbg(dev->net, "kevent done, flags = 0x%lx\n", dev->flags);
 }
 
+static int submit_next_bundle(struct usbnet *dev, struct sk_buff *skb)
+{
+	struct skb_data *entry;
+	int length = skb->len;
+	int retval;
+	struct driver_info *info = dev->driver_info;
+	struct urb *urb;
+
+	urb = usb_alloc_urb(0, GFP_ATOMIC);
+	if (!urb)
+		return -ENOMEM;
+
+	entry = (struct skb_data *) skb->cb;
+	entry->urb = urb;
+	entry->dev = dev;
+	entry->length = length;
+
+	usb_fill_bulk_urb (urb, dev->udev, dev->out,
+			skb->data, skb->len, tx_complete, skb);
+
+	/* don't assume the hardware handles USB_ZERO_PACKET
+	 * NOTE:  strictly conforming cdc-ether devices should expect
+	 * the ZLP here, but ignore the one-byte packet.
+	 * NOTE2: CDC NCM specification is different from CDC ECM when
+	 * handling ZLP/short packets, so cdc_ncm driver will make short
+	 * packet itself if needed.
+	 */
+	if (length % dev->maxpacket == 0) {
+		if (!(info->flags & FLAG_SEND_ZLP)) {
+			if (!(info->flags & FLAG_MULTI_PACKET)) {
+				urb->transfer_buffer_length++;
+				if (skb_tailroom(skb)) {
+					skb->data[skb->len] = 0;
+					__skb_put(skb, 1);
+				}
+			}
+		} else
+			urb->transfer_flags |= URB_ZERO_PACKET;
+	}
+
+	atomic_inc(&dev->tx_in_flight);
+	retval = usb_submit_urb(urb, GFP_ATOMIC);
+	if (retval < 0)
+		atomic_dec(&dev->tx_in_flight);
+	//FIXME: full error handling
+
+	return retval;
+}
 /*-------------------------------------------------------------------------*/
 
 static void tx_complete (struct urb *urb)
 {
 	struct sk_buff		*skb = (struct sk_buff *) urb->context;
+	struct sk_buff		*skb2;
 	struct skb_data		*entry = (struct skb_data *) skb->cb;
 	struct usbnet		*dev = entry->dev;
+	struct driver_info	*info = dev->driver_info;
+	int			in_flight;
 
+	in_flight = atomic_dec_return(&dev->tx_in_flight);
 	if (urb->status == 0) {
-		if (!(dev->driver_info->flags & FLAG_MULTI_PACKET))
+		if (!(dev->driver_info->flags & FLAG_MULTI_PACKET)) {
 			dev->net->stats.tx_packets++;
+		} else {
+			if (in_flight < 2) {
+				skb2 = info->tx_fixup(dev, NULL, GFP_ATOMIC);
+				if (skb2)
+					submit_next_bundle(dev, skb2);
+			}
+		}
 		dev->net->stats.tx_bytes += entry->length;
 	} else {
 		dev->net->stats.tx_errors++;
@@ -1089,23 +1150,50 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	struct urb		*urb = NULL;
 	struct skb_data		*entry;
 	struct driver_info	*info = dev->driver_info;
+	struct sk_buff		*skb_old = NULL;
 	unsigned long		flags;
 	int retval;
+	int transmit_now = 1;
+	int bundle_again = 0;
 
 	if (skb)
 		skb_tx_timestamp(skb);
 
+	/*
+	 * first we allow drivers to bundle packets together
+	 * maintainance of the buffer is the responsibility
+	 * of the lower layer
+	 */
+rebundle:
+	if (info->tx_bundle) {
+		bundle_again = 0;
+		retval = info->tx_bundle(dev, skb, GFP_ATOMIC);
+
+		switch (retval) {
+		case 0: /* the package has been bundled */
+			if (atomic_read(&dev->tx_in_flight) < 2)
+				transmit_now = 1;
+			else
+				transmit_now = 0;
+			break;
+		case -EAGAIN:
+			transmit_now = 1;
+			bundle_again = 1;
+			skb_old = skb;
+			break;
+		case -EBUSY:
+			transmit_now = 1;
+			break;
+		}
+	}
 	// some devices want funky USB-level framing, for
 	// win32 driver (usually) and/or hardware quirks
-	if (info->tx_fixup) {
+	if (transmit_now && info->tx_fixup) {
 		skb = info->tx_fixup (dev, skb, GFP_ATOMIC);
 		if (!skb) {
 			if (netif_msg_tx_err(dev)) {
 				netif_dbg(dev, tx_err, dev->net, "can't tx_fixup skb\n");
 				goto drop;
-			} else {
-				/* cdc_ncm collected packet; waits for more */
-				goto not_drop;
 			}
 		}
 	}
@@ -1164,14 +1252,17 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	}
 #endif
 
+	atomic_inc(&dev->tx_in_flight);
 	switch ((retval = usb_submit_urb (urb, GFP_ATOMIC))) {
 	case -EPIPE:
 		netif_stop_queue (net);
 		usbnet_defer_kevent (dev, EVENT_TX_HALT);
+		atomic_dec(&dev->tx_in_flight);
 		usb_autopm_put_interface_async(dev->intf);
 		break;
 	default:
 		usb_autopm_put_interface_async(dev->intf);
+		atomic_dec(&dev->tx_in_flight);
 		netif_dbg(dev, tx_err, dev->net,
 			  "tx: submit urb err %d\n", retval);
 		break;
@@ -1187,7 +1278,6 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 		netif_dbg(dev, tx_err, dev->net, "drop, code %d\n", retval);
 drop:
 		dev->net->stats.tx_dropped++;
-not_drop:
 		if (skb)
 			dev_kfree_skb_any (skb);
 		usb_free_urb (urb);
@@ -1197,6 +1287,10 @@ not_drop:
 #ifdef CONFIG_PM
 deferred:
 #endif
+	if (bundle_again) {
+		skb = skb_old;
+		goto rebundle;
+	}
 	return NETDEV_TX_OK;
 }
 EXPORT_SYMBOL_GPL(usbnet_start_xmit);
@@ -1393,6 +1487,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 	dev->delay.data = (unsigned long) dev;
 	init_timer (&dev->delay);
 	mutex_init (&dev->phy_mutex);
+	atomic_set(&dev->tx_in_flight, 0); 
 
 	dev->net = net;
 	strcpy (net->name, "usb%d");
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index f87cf62..bb2f622 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -33,6 +33,7 @@ struct usbnet {
 	wait_queue_head_t	*wait;
 	struct mutex		phy_mutex;
 	unsigned char		suspend_count;
+	atomic_t		tx_in_flight;
 
 	/* i/o info: pipes etc */
 	unsigned		in, out;
@@ -133,6 +134,12 @@ struct driver_info {
 	/* fixup rx packet (strip framing) */
 	int	(*rx_fixup)(struct usbnet *dev, struct sk_buff *skb);
 
+	/* bundle individual package for transmission as
+	 * larger package. This cannot sleep
+	 */
+	int	(*tx_bundle)(struct usbnet *dev,
+				struct sk_buff *skb, gfp_t flags);
+
 	/* fixup tx packet (add framing) */
 	struct sk_buff	*(*tx_fixup)(struct usbnet *dev,
 				struct sk_buff *skb, gfp_t flags);

^ permalink raw reply related

* MBIM (was: Re: changing usbnet's API to better deal with cdc-ncm)
From: Oliver Neukum @ 2012-09-07  7:40 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87ehmf36qd.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

On Thursday 06 September 2012 10:17:46 Bjørn Mork wrote:
> Not really related, but I am still worrying how MBIM is going to fit
> into the usbnet model where you have a strict relation between one
> netdev and one USB interface.  Do you see any way usbnet could be
> extended to manage a list of netdevs?
> 
> All the netdev related functions doing
> 
>         struct usbnet           *dev = netdev_priv(net);
> 
> would still work.  But all the USB related functions using dev->net to
> get the netdev would fail.  Maybe this will be too difficult to
> implement within the usbnet framework at all?

It depends on how much support we get from upper layers.
You can give two IPs to an ethernet card as well.

It would be best if you could come up with some preliminary code
at least.

	Reagrds
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: MBIM
From: Bjørn Mork @ 2012-09-07  8:53 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: alexey.orishko, netdev, linux-usb
In-Reply-To: <3121990.L41g6gGrVV@linux-lqwf.site>

Oliver Neukum <oneukum@suse.de> writes:
> On Thursday 06 September 2012 10:17:46 Bjørn Mork wrote:
>> Not really related, but I am still worrying how MBIM is going to fit
>> into the usbnet model where you have a strict relation between one
>> netdev and one USB interface.  Do you see any way usbnet could be
>> extended to manage a list of netdevs?
>> 
>> All the netdev related functions doing
>> 
>>         struct usbnet           *dev = netdev_priv(net);
>> 
>> would still work.  But all the USB related functions using dev->net to
>> get the netdev would fail.  Maybe this will be too difficult to
>> implement within the usbnet framework at all?
>
> It depends on how much support we get from upper layers.
> You can give two IPs to an ethernet card as well.

Yes, of course.  But I don't think that fits the MBIM model, where
multiple independent connections (Internet, VoIP, MMS, VPN1, VPN2, etc)
to different APNs will be multiplexed over the same USB endpoints.  And
to make this not fly at all: Some of these may be IPv4, some IPv6 and
some dual stack.  You cannot support that on a single netdev.  Mixing
them all together on the host will not do.  When the driver sees an IPv6
multicast packet (RS, ND, whatever) on the netdev, which MBIM session
should it forward that packet too?

If I understand MBIM correctly, we will see device configurations with
_one_ CDC data interface carrying all data traffic for multiple device
functions.  A MBIM driver must de-multiplex this, and each IP function
should be routed to its own netdev IMHO.

> It would be best if you could come up with some preliminary code
> at least.

Yup.  Finally got a device.  Will start playing.

Bjørn

^ permalink raw reply

* [PATCH net-next] ipv4/route: arg delay is useless in rt_cache_flush()
From: Nicolas Dichtel @ 2012-09-07  9:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, Nicolas Dichtel

Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/route.h     |  2 +-
 net/ipv4/arp.c          |  2 +-
 net/ipv4/devinet.c      |  6 +++---
 net/ipv4/fib_frontend.c | 20 ++++++++++----------
 net/ipv4/fib_rules.c    |  2 +-
 net/ipv4/fib_trie.c     |  6 +++---
 net/ipv4/route.c        | 14 +++-----------
 7 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 776a27f..da22243 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -108,7 +108,7 @@ extern struct ip_rt_acct __percpu *ip_rt_acct;
 
 struct in_device;
 extern int		ip_rt_init(void);
-extern void		rt_cache_flush(struct net *net, int how);
+extern void		rt_cache_flush(struct net *net);
 extern void		rt_flush_dev(struct net_device *dev);
 extern struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
 extern struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 77e87af..4780045 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1225,7 +1225,7 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event,
 	switch (event) {
 	case NETDEV_CHANGEADDR:
 		neigh_changeaddr(&arp_tbl, dev);
-		rt_cache_flush(dev_net(dev), 0);
+		rt_cache_flush(dev_net(dev));
 		break;
 	default:
 		break;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index adf273f..f14eff5 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1500,7 +1500,7 @@ static int devinet_conf_proc(ctl_table *ctl, int write,
 		if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1 ||
 		    i == IPV4_DEVCONF_ROUTE_LOCALNET - 1)
 			if ((new_value == 0) && (old_value != 0))
-				rt_cache_flush(net, 0);
+				rt_cache_flush(net);
 	}
 
 	return ret;
@@ -1534,7 +1534,7 @@ static int devinet_sysctl_forward(ctl_table *ctl, int write,
 				dev_disable_lro(idev->dev);
 			}
 			rtnl_unlock();
-			rt_cache_flush(net, 0);
+			rt_cache_flush(net);
 		}
 	}
 
@@ -1551,7 +1551,7 @@ static int ipv4_doint_and_flush(ctl_table *ctl, int write,
 	struct net *net = ctl->extra2;
 
 	if (write && *valp != val)
-		rt_cache_flush(net, 0);
+		rt_cache_flush(net);
 
 	return ret;
 }
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index acdee32..8a7cd79 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -148,7 +148,7 @@ static void fib_flush(struct net *net)
 	}
 
 	if (flushed)
-		rt_cache_flush(net, -1);
+		rt_cache_flush(net);
 }
 
 /*
@@ -999,11 +999,11 @@ static void nl_fib_lookup_exit(struct net *net)
 	net->ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force, int delay)
+static void fib_disable_ip(struct net_device *dev, int force)
 {
 	if (fib_sync_down_dev(dev, force))
 		fib_flush(dev_net(dev));
-	rt_cache_flush(dev_net(dev), delay);
+	rt_cache_flush(dev_net(dev));
 	arp_ifdown(dev);
 }
 
@@ -1020,7 +1020,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 		fib_sync_up(dev);
 #endif
 		atomic_inc(&net->ipv4.dev_addr_genid);
-		rt_cache_flush(dev_net(dev), -1);
+		rt_cache_flush(dev_net(dev));
 		break;
 	case NETDEV_DOWN:
 		fib_del_ifaddr(ifa, NULL);
@@ -1029,9 +1029,9 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 			/* Last address was deleted from this interface.
 			 * Disable IP.
 			 */
-			fib_disable_ip(dev, 1, 0);
+			fib_disable_ip(dev, 1);
 		} else {
-			rt_cache_flush(dev_net(dev), -1);
+			rt_cache_flush(dev_net(dev));
 		}
 		break;
 	}
@@ -1045,7 +1045,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	struct net *net = dev_net(dev);
 
 	if (event == NETDEV_UNREGISTER) {
-		fib_disable_ip(dev, 2, -1);
+		fib_disable_ip(dev, 2);
 		rt_flush_dev(dev);
 		return NOTIFY_DONE;
 	}
@@ -1061,14 +1061,14 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		fib_sync_up(dev);
 #endif
 		atomic_inc(&net->ipv4.dev_addr_genid);
-		rt_cache_flush(net, -1);
+		rt_cache_flush(net);
 		break;
 	case NETDEV_DOWN:
-		fib_disable_ip(dev, 0, 0);
+		fib_disable_ip(dev, 0);
 		break;
 	case NETDEV_CHANGEMTU:
 	case NETDEV_CHANGE:
-		rt_cache_flush(net, 0);
+		rt_cache_flush(net);
 		break;
 	}
 	return NOTIFY_DONE;
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index a83d74e..274309d 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -259,7 +259,7 @@ static size_t fib4_rule_nlmsg_payload(struct fib_rule *rule)
 
 static void fib4_rule_flush_cache(struct fib_rules_ops *ops)
 {
-	rt_cache_flush(ops->fro_net, -1);
+	rt_cache_flush(ops->fro_net);
 }
 
 static const struct fib_rules_ops __net_initdata fib4_rules_ops_template = {
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3c820da..8d76610 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1286,7 +1286,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
-				rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+				rt_cache_flush(cfg->fc_nlinfo.nl_net);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
 
@@ -1333,7 +1333,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	list_add_tail_rcu(&new_fa->fa_list,
 			  (fa ? &fa->fa_list : fa_head));
 
-	rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+	rt_cache_flush(cfg->fc_nlinfo.nl_net);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 succeeded:
@@ -1711,7 +1711,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 		trie_leaf_remove(t, l);
 
 	if (fa->fa_state & FA_S_ACCESSED)
-		rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+		rt_cache_flush(cfg->fc_nlinfo.nl_net);
 
 	fib_release_info(fa->fa_info);
 	alias_free_mem_rcu(fa);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dc9549b..915c592 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -461,11 +461,7 @@ static void rt_cache_invalidate(struct net *net)
 	atomic_add(shuffle + 1U, &net->ipv4.rt_genid);
 }
 
-/*
- * delay < 0  : invalidate cache (fast : entries will be deleted later)
- * delay >= 0 : invalidate & flush cache (can be long)
- */
-void rt_cache_flush(struct net *net, int delay)
+void rt_cache_flush(struct net *net)
 {
 	rt_cache_invalidate(net);
 }
@@ -2345,7 +2341,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 
 void ip_rt_multicast_event(struct in_device *in_dev)
 {
-	rt_cache_flush(dev_net(in_dev->dev), 0);
+	rt_cache_flush(dev_net(in_dev->dev));
 }
 
 #ifdef CONFIG_SYSCTL
@@ -2354,16 +2350,12 @@ static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
 					size_t *lenp, loff_t *ppos)
 {
 	if (write) {
-		int flush_delay;
 		ctl_table ctl;
 		struct net *net;
 
 		memcpy(&ctl, __ctl, sizeof(ctl));
-		ctl.data = &flush_delay;
-		proc_dointvec(&ctl, write, buffer, lenp, ppos);
-
 		net = (struct net *)__ctl->extra1;
-		rt_cache_flush(net, flush_delay);
+		rt_cache_flush(net);
 		return 0;
 	}
 
-- 
1.7.12

^ permalink raw reply related

* Re: [PATCH net-next] ipv4/route: arg delay is useless in rt_cache_flush()
From: Eric Dumazet @ 2012-09-07 10:29 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, netdev
In-Reply-To: <1347010284-3419-1-git-send-email-nicolas.dichtel@6wind.com>

On Fri, 2012-09-07 at 11:31 +0200, Nicolas Dichtel wrote:
> Since route cache deletion (89aef8921bfbac22f), delay is no
> more used. Remove it.
> 
...

>  #ifdef CONFIG_SYSCTL
> @@ -2354,16 +2350,12 @@ static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
>  					size_t *lenp, loff_t *ppos)
>  {
>  	if (write) {
> -		int flush_delay;
>  		ctl_table ctl;
>  		struct net *net;
>  
>  		memcpy(&ctl, __ctl, sizeof(ctl));
> -		ctl.data = &flush_delay;
> -		proc_dointvec(&ctl, write, buffer, lenp, ppos);
> -
>  		net = (struct net *)__ctl->extra1;
> -		rt_cache_flush(net, flush_delay);
> +		rt_cache_flush(net);
>  		return 0;
>  	}
>  

Why do you keep ctl then ?

	if (write) {
		rt_cache_flush((struct net *)__ctl->extra1);
		return 0;
	}

^ permalink raw reply

* Re: [PATCH net-next] ipv4/route: arg delay is useless in rt_cache_flush()
From: Nicolas Dichtel @ 2012-09-07 10:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1347013794.2484.488.camel@edumazet-glaptop>

Le 07/09/2012 12:29, Eric Dumazet a écrit :
> On Fri, 2012-09-07 at 11:31 +0200, Nicolas Dichtel wrote:
>> Since route cache deletion (89aef8921bfbac22f), delay is no
>> more used. Remove it.
>>
> ...
>
>>   #ifdef CONFIG_SYSCTL
>> @@ -2354,16 +2350,12 @@ static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
>>   					size_t *lenp, loff_t *ppos)
>>   {
>>   	if (write) {
>> -		int flush_delay;
>>   		ctl_table ctl;
>>   		struct net *net;
>>
>>   		memcpy(&ctl, __ctl, sizeof(ctl));
>> -		ctl.data = &flush_delay;
>> -		proc_dointvec(&ctl, write, buffer, lenp, ppos);
>> -
>>   		net = (struct net *)__ctl->extra1;
>> -		rt_cache_flush(net, flush_delay);
>> +		rt_cache_flush(net);
>>   		return 0;
>>   	}
>>
>
> Why do you keep ctl then ?
>
> 	if (write) {
> 		rt_cache_flush((struct net *)__ctl->extra1);
> 		return 0;
> 	}
>
>
Right, there is no reason.


Regards,
Nicolas

^ permalink raw reply

* [PATCH net-next v2] ipv4/route: arg delay is useless in rt_cache_flush()
From: Nicolas Dichtel @ 2012-09-07 10:45 UTC (permalink / raw)
  To: davem, eric.dumazet; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <1347013794.2484.488.camel@edumazet-glaptop>

Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.

v2: remove ctl in ipv4_sysctl_rtcache_flush()

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/route.h     |  2 +-
 net/ipv4/arp.c          |  2 +-
 net/ipv4/devinet.c      |  6 +++---
 net/ipv4/fib_frontend.c | 20 ++++++++++----------
 net/ipv4/fib_rules.c    |  2 +-
 net/ipv4/fib_trie.c     |  6 +++---
 net/ipv4/route.c        | 19 +++----------------
 7 files changed, 22 insertions(+), 35 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 776a27f..da22243 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -108,7 +108,7 @@ extern struct ip_rt_acct __percpu *ip_rt_acct;
 
 struct in_device;
 extern int		ip_rt_init(void);
-extern void		rt_cache_flush(struct net *net, int how);
+extern void		rt_cache_flush(struct net *net);
 extern void		rt_flush_dev(struct net_device *dev);
 extern struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
 extern struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 77e87af..4780045 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1225,7 +1225,7 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event,
 	switch (event) {
 	case NETDEV_CHANGEADDR:
 		neigh_changeaddr(&arp_tbl, dev);
-		rt_cache_flush(dev_net(dev), 0);
+		rt_cache_flush(dev_net(dev));
 		break;
 	default:
 		break;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index adf273f..f14eff5 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1500,7 +1500,7 @@ static int devinet_conf_proc(ctl_table *ctl, int write,
 		if (i == IPV4_DEVCONF_ACCEPT_LOCAL - 1 ||
 		    i == IPV4_DEVCONF_ROUTE_LOCALNET - 1)
 			if ((new_value == 0) && (old_value != 0))
-				rt_cache_flush(net, 0);
+				rt_cache_flush(net);
 	}
 
 	return ret;
@@ -1534,7 +1534,7 @@ static int devinet_sysctl_forward(ctl_table *ctl, int write,
 				dev_disable_lro(idev->dev);
 			}
 			rtnl_unlock();
-			rt_cache_flush(net, 0);
+			rt_cache_flush(net);
 		}
 	}
 
@@ -1551,7 +1551,7 @@ static int ipv4_doint_and_flush(ctl_table *ctl, int write,
 	struct net *net = ctl->extra2;
 
 	if (write && *valp != val)
-		rt_cache_flush(net, 0);
+		rt_cache_flush(net);
 
 	return ret;
 }
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index acdee32..8a7cd79 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -148,7 +148,7 @@ static void fib_flush(struct net *net)
 	}
 
 	if (flushed)
-		rt_cache_flush(net, -1);
+		rt_cache_flush(net);
 }
 
 /*
@@ -999,11 +999,11 @@ static void nl_fib_lookup_exit(struct net *net)
 	net->ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force, int delay)
+static void fib_disable_ip(struct net_device *dev, int force)
 {
 	if (fib_sync_down_dev(dev, force))
 		fib_flush(dev_net(dev));
-	rt_cache_flush(dev_net(dev), delay);
+	rt_cache_flush(dev_net(dev));
 	arp_ifdown(dev);
 }
 
@@ -1020,7 +1020,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 		fib_sync_up(dev);
 #endif
 		atomic_inc(&net->ipv4.dev_addr_genid);
-		rt_cache_flush(dev_net(dev), -1);
+		rt_cache_flush(dev_net(dev));
 		break;
 	case NETDEV_DOWN:
 		fib_del_ifaddr(ifa, NULL);
@@ -1029,9 +1029,9 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 			/* Last address was deleted from this interface.
 			 * Disable IP.
 			 */
-			fib_disable_ip(dev, 1, 0);
+			fib_disable_ip(dev, 1);
 		} else {
-			rt_cache_flush(dev_net(dev), -1);
+			rt_cache_flush(dev_net(dev));
 		}
 		break;
 	}
@@ -1045,7 +1045,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	struct net *net = dev_net(dev);
 
 	if (event == NETDEV_UNREGISTER) {
-		fib_disable_ip(dev, 2, -1);
+		fib_disable_ip(dev, 2);
 		rt_flush_dev(dev);
 		return NOTIFY_DONE;
 	}
@@ -1061,14 +1061,14 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		fib_sync_up(dev);
 #endif
 		atomic_inc(&net->ipv4.dev_addr_genid);
-		rt_cache_flush(net, -1);
+		rt_cache_flush(net);
 		break;
 	case NETDEV_DOWN:
-		fib_disable_ip(dev, 0, 0);
+		fib_disable_ip(dev, 0);
 		break;
 	case NETDEV_CHANGEMTU:
 	case NETDEV_CHANGE:
-		rt_cache_flush(net, 0);
+		rt_cache_flush(net);
 		break;
 	}
 	return NOTIFY_DONE;
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index a83d74e..274309d 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -259,7 +259,7 @@ static size_t fib4_rule_nlmsg_payload(struct fib_rule *rule)
 
 static void fib4_rule_flush_cache(struct fib_rules_ops *ops)
 {
-	rt_cache_flush(ops->fro_net, -1);
+	rt_cache_flush(ops->fro_net);
 }
 
 static const struct fib_rules_ops __net_initdata fib4_rules_ops_template = {
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3c820da..8d76610 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1286,7 +1286,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
-				rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+				rt_cache_flush(cfg->fc_nlinfo.nl_net);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
 
@@ -1333,7 +1333,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 	list_add_tail_rcu(&new_fa->fa_list,
 			  (fa ? &fa->fa_list : fa_head));
 
-	rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+	rt_cache_flush(cfg->fc_nlinfo.nl_net);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 succeeded:
@@ -1711,7 +1711,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 		trie_leaf_remove(t, l);
 
 	if (fa->fa_state & FA_S_ACCESSED)
-		rt_cache_flush(cfg->fc_nlinfo.nl_net, -1);
+		rt_cache_flush(cfg->fc_nlinfo.nl_net);
 
 	fib_release_info(fa->fa_info);
 	alias_free_mem_rcu(fa);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index dc9549b..ab1e7c7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -461,11 +461,7 @@ static void rt_cache_invalidate(struct net *net)
 	atomic_add(shuffle + 1U, &net->ipv4.rt_genid);
 }
 
-/*
- * delay < 0  : invalidate cache (fast : entries will be deleted later)
- * delay >= 0 : invalidate & flush cache (can be long)
- */
-void rt_cache_flush(struct net *net, int delay)
+void rt_cache_flush(struct net *net)
 {
 	rt_cache_invalidate(net);
 }
@@ -2345,7 +2341,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 
 void ip_rt_multicast_event(struct in_device *in_dev)
 {
-	rt_cache_flush(dev_net(in_dev->dev), 0);
+	rt_cache_flush(dev_net(in_dev->dev));
 }
 
 #ifdef CONFIG_SYSCTL
@@ -2354,16 +2350,7 @@ static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
 					size_t *lenp, loff_t *ppos)
 {
 	if (write) {
-		int flush_delay;
-		ctl_table ctl;
-		struct net *net;
-
-		memcpy(&ctl, __ctl, sizeof(ctl));
-		ctl.data = &flush_delay;
-		proc_dointvec(&ctl, write, buffer, lenp, ppos);
-
-		net = (struct net *)__ctl->extra1;
-		rt_cache_flush(net, flush_delay);
+		rt_cache_flush((struct net *)__ctl->extra1);
 		return 0;
 	}
 
-- 
1.7.12

^ permalink raw reply related

* Re: [PATCH 8/10] net/mac80211/scan.c: removes unnecessary semicolon
From: Johannes Berg @ 2012-09-07 11:30 UTC (permalink / raw)
  To: Peter Senna Tschudin
  Cc: John W. Linville, kernel-janitors, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <1346947757-10481-9-git-send-email-peter.senna@gmail.com>

On Thu, 2012-09-06 at 18:09 +0200, Peter Senna Tschudin wrote:
> From: Peter Senna Tschudin <peter.senna@gmail.com>
> 
> removes unnecessary semicolon
> 
> Found by Coccinelle: http://coccinelle.lip6.fr/

Applied. It'd be nice to have the spatch, I guess :)

johannes

^ permalink raw reply

* RE: changing usbnet's API to better deal with cdc-ncm
From: Alexey ORISHKO @ 2012-09-07 12:01 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Bjørn Mork, Ming Lei,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1446113.XC2Qbo4flT-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

> -----Original Message-----
> From: Oliver Neukum [mailto:oneukum-l3A5Bk7waGM@public.gmane.org]
> Sent: Friday, September 07, 2012 9:36 AM

> > > Well, that is the mistake. Using a timer is a bad idea.
> >
> > Why is a bad idea?  Without a timer, how can usbnet or low level
> > driver aggregate the later coming transmit frames to form a biger
> > aggregated frame?
> 
> By registering demand in an abstract sense. The timer makes sense only
> when no other buffers are being transfered.
> 
> The timer means that we transmit if no other more packages are coming
> down from the upper layers. Now, either the device is still busy or it
> is not.
> While it is busy, we might just as well wait, because the upper layer
> may change its mind.
> If the device is not busy we worsen latency by waiting for the timer.

...

> > As I said above, without a timeout timer, the queued skb might not be
> > sent out even some long time passed if no frame comes later.
> 
> Therefore we check whether the device is still busy.
> 

The purpose of NCM is to reduce the amount of interrupts and the number
of DMA jobs that must be handled by the device. It is most efficient to
send and especially to receive NTBs of agreed size, padded if needed.

Misunderstanding of the reason why NCM protocol is doing aggregation
of several ETH frames into a single NTB leads to crippled implementation. 
Look at NCM gadget, which is not really NCM device at all, it's kinda ECM
device with NCM framing.

The experience from early implementations and prototyping of NCM was that
using NCM with 4-8kB NTB increased max throughput in loop-mode by a factor
of 5-6 even 8-10 times compared to using ECM. One real-world example was
modem for 21+6Mbit/s what used 100% CPU with ECM responsible for approx. 40%
of the MIPS used. Using NCM instead CPU was only at approx. 65% utilization.
Which allowed multiple other functions to be added and significantly increased
the usability and value of the modem.

NCM efficiency is gained either by accessing TX queue of upper layer or
by using timer. 

These findings were later confirmed by multiple major industry players
(names withheld), and demonstrated during multiple NCM interoperability
workshops using multiple device HW platforms, multiple operating systems
and multiple host HW ranging from Beagleboard to latest quad-core x86.    

Until we do something with network device framework in order to get access
to upper layer Tx queue we need to utilize timer.

Regards,
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] sctp: check dst validity after IPsec operations
From: Nicolas Dichtel @ 2012-09-07 12:07 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, netdev
In-Reply-To: <5048C984.3030306@gmail.com>

Le 06/09/2012 18:04, Vlad Yasevich a écrit :
> On 09/06/2012 01:40 PM, Nicolas Dichtel wrote:
>> dst stored in struct sctp_transport needs to be recalculated when ipsec policy
>> are updated. We use flow_cache_genid for that.
>>
>> For example, if a SCTP connection is established and then an IPsec policy is
>> set, the old SCTP flow will not be updated and thus will not use the new
>> IPsec policy.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> why doesn't this need to be done for TCP?  What makes SCTP special in this case?
>
> ip_queue_xmit does an __sk_dst_check() which is essentially what
> sctp_transport_dst_check() does.  That should determine if the currently cached
> route is valid or not.
>
> Looks like sctp may need to change to using ip_route_output_ports() call
> as ip_route_output_key may not do all that is necessary
I try, but it doesn't solve the problem. In fact, it seems better to use 
ip_route_output_ports(), would you like me to send a patch?


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH] sctp: check dst validity after IPsec operations
From: Nicolas Dichtel @ 2012-09-07 12:24 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, netdev
In-Reply-To: <5048D774.3020008@gmail.com>

Le 06/09/2012 19:03, Vlad Yasevich a écrit :
> On 09/06/2012 12:40 PM, Nicolas Dichtel wrote:
>> Le 06/09/2012 18:04, Vlad Yasevich a écrit :
>>> On 09/06/2012 01:40 PM, Nicolas Dichtel wrote:
>>>> dst stored in struct sctp_transport needs to be recalculated when
>>>> ipsec policy
>>>> are updated. We use flow_cache_genid for that.
>>>>
>>>> For example, if a SCTP connection is established and then an IPsec
>>>> policy is
>>>> set, the old SCTP flow will not be updated and thus will not use the new
>>>> IPsec policy.
>>>>
>>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>
>>> why doesn't this need to be done for TCP?  What makes SCTP special in
>>> this case?
>> Tests prove that the pb does not exist with TCP. I made the patch some
>> times ago, I will look again deeply to find the difference.
>>
>
> TCP appears to cache the flowi and uses that to re-route the packet.
> However, re-route still seems predicated on dst_check()...
Yes ... but I don't find the difference. Re-route is not done immediately in 
TCP, it takes few seconds.

>
>>>
>>> ip_queue_xmit does an __sk_dst_check() which is essentially what
>>> sctp_transport_dst_check() does.  That should determine if the
>>> currently cached
>>> route is valid or not.
>> The problem is that route will not be invalidated, because dst->check()
>> has no xfrm path so xfrm_dst_check() will never be called.
>>
>
> Shouldn't the cache be invalidated in this case?  If the cache is invalidated,
> that should cause a new lookup.  If the cache isn't invalidated, then any
> established connections that may now be impacted by the policy will not pick it up.
Yes, you're right. If I flush the cache manually (with the sysctl), route are 
correctly updated.
I will send a new proposal.


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
From: Mel Gorman @ 2012-09-07 12:55 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Chuck Lever, Pekka Enberg, David Rientjes, Christoph Lameter
In-Reply-To: <CAAmzW4MfFUH1Mi447sQvPNeae_BShEmbECUaK9eoX-8ughEdJw@mail.gmail.com>

On Fri, Sep 07, 2012 at 03:05:39AM +0900, JoonSoo Kim wrote:
> Correct Pekka's mail address and resend.
> Sorry.
> 
> Add "Cc" to "Christoph Lameter" <cl@linux.com>
> 
> 2012/9/5 Mel Gorman <mgorman@suse.de>:
> > Right now, we call ClearSlabPfmemalloc() for first page of slab when we
> > clear SlabPfmemalloc flag. This is fine for most swap-over-network use
> > cases as it is expected that order-0 pages are in use. Unfortunately it
> > is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
> > and while this is harmless, it is sloppy. This patch ensures that the head
> > page is always used.
> >
> > This problem was originally identified by Joonsoo Kim.
> >
> > [js1304@gmail.com: Original implementation and problem identification]
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/slab.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 811af03..d34a903 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
> >                 l3 = cachep->nodelists[numa_mem_id()];
> >                 if (!list_empty(&l3->slabs_free) && force_refill) {
> >                         struct slab *slabp = virt_to_slab(objp);
> > -                       ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
> > +                       ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
> >                         clear_obj_pfmemalloc(&objp);
> >                         recheck_pfmemalloc_active(cachep, ac);
> >                         return objp;
> 
> We assume that slabp->s_mem's address is always in head page, so
> "virt_to_head_page" is not needed.
> 

Fair point. I thought it would be more "obvious" later that we really
always intended to use the head page but it is unnecessary.

> > @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
> >  {
> >         if (unlikely(pfmemalloc_active)) {
> >                 /* Some pfmemalloc slabs exist, check if this is one */
> > -               struct page *page = virt_to_page(objp);
> > +               struct page *page = virt_to_head_page(objp);
> >                 if (PageSlabPfmemalloc(page))
> >                         set_obj_pfmemalloc(&objp);
> >         }
> > --
> > 1.7.9.2
> >
> 
> If we always use head page, following suggestion is more good to me.
> How about you?
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index f8b0d53..ce70989 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache
> *cachep, struct array_cache *ac,
>  {
>         if (unlikely(pfmemalloc_active)) {
>                 /* Some pfmemalloc slabs exist, check if this is one */
> -               struct page *page = virt_to_page(objp);
> +               struct page *page = virt_to_head_page(objp);
>                 if (PageSlabPfmemalloc(page))
>                         set_obj_pfmemalloc(&objp);
>         }

ok.

> @@ -1921,10 +1921,9 @@ static void *kmem_getpages(struct kmem_cache
> *cachep, gfp_t flags, int nodeid)
>                         NR_SLAB_UNRECLAIMABLE, nr_pages);
>         for (i = 0; i < nr_pages; i++) {
>                 __SetPageSlab(page + i);
> -
> -               if (page->pfmemalloc)
> -                       SetPageSlabPfmemalloc(page + i);
>         }
> +       if (page->pfmemalloc)
> +               SetPageSlabPfmemalloc(page);
> 
>         if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
>                 kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);

ok.

> @@ -1943,26 +1942,26 @@ static void *kmem_getpages(struct kmem_cache
> *cachep, gfp_t flags, int nodeid)
>   */
>  static void kmem_freepages(struct kmem_cache *cachep, void *addr)
>  {
> -       unsigned long i = (1 << cachep->gfporder);
> +       int nr_pages = (1 << cachep->gfporder);
> +       int i;
>         struct page *page = virt_to_page(addr);
> -       const unsigned long nr_freed = i;
> 
>         kmemcheck_free_shadow(page, cachep->gfporder);
> 
>         if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
>                 sub_zone_page_state(page_zone(page),
> -                               NR_SLAB_RECLAIMABLE, nr_freed);
> +                               NR_SLAB_RECLAIMABLE, nr_pages);
>         else
>                 sub_zone_page_state(page_zone(page),
> -                               NR_SLAB_UNRECLAIMABLE, nr_freed);
> -       while (i--) {
> -               BUG_ON(!PageSlab(page));
> -               __ClearPageSlabPfmemalloc(page);
> -               __ClearPageSlab(page);
> -               page++;
> +                               NR_SLAB_UNRECLAIMABLE, nr_pages);
> +       for (i = 0; i < nr_pages; i++) {
> +               BUG_ON(!PageSlab(page + i));
> +               __ClearPageSlab(page + i);
>         }
> +       __ClearPageSlabPfmemalloc(page);
> +
>         if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += nr_freed;
> +               current->reclaim_state->reclaimed_slab += nr_pages;
>         free_pages((unsigned long)addr, cachep->gfporder);
>  }

This churns code a lot more than is necessary. How about this as a
replacement patch?

---8<---
From: Joonsoo Kim <js1304@gmail.com>
Subject: [PATCH] slab: do ClearSlabPfmemalloc() for all pages of slab

Right now, we call ClearSlabPfmemalloc() for first page of slab when we
clear SlabPfmemalloc flag. This is fine for most swap-over-network use
cases as it is expected that order-0 pages are in use. Unfortunately it
is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
and while this is harmless, it is sloppy. This patch ensures that the head
page is always used.

[mgorman@suse.de: Easier implementation, changelog cleanup]
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/slab.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..590d52a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_page(objp);
+		struct page *page = virt_to_head_page(objp);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
@@ -1919,12 +1919,10 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 	else
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++) {
+	for (i = 0; i < nr_pages; i++)
 		__SetPageSlab(page + i);
-
-		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page + i);
-	}
+	if (page->pfmemalloc)
+		SetPageSlabPfmemalloc(page);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1955,9 +1953,9 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	else
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
+	__ClearPageSlabPfmemalloc(page);
 	while (i--) {
 		BUG_ON(!PageSlab(page));
-		__ClearPageSlabPfmemalloc(page);
 		__ClearPageSlab(page);
 		page++;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox