Netdev List

Netdev List
 help / color / mirror / Atom feed

* MBIM (was: Re: changing usbnet's API to better deal with cdc-ncm)
From: Oliver Neukum @ 2012-09-07  7:40 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87ehmf36qd.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

On Thursday 06 September 2012 10:17:46 Bjørn Mork wrote:
> Not really related, but I am still worrying how MBIM is going to fit
> into the usbnet model where you have a strict relation between one
> netdev and one USB interface.  Do you see any way usbnet could be
> extended to manage a list of netdevs?
> 
> All the netdev related functions doing
> 
>         struct usbnet           *dev = netdev_priv(net);
> 
> would still work.  But all the USB related functions using dev->net to
> get the netdev would fail.  Maybe this will be too difficult to
> implement within the usbnet framework at all?

It depends on how much support we get from upper layers.
You can give two IPs to an ethernet card as well.

It would be best if you could come up with some preliminary code
at least.

	Reagrds
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Oliver Neukum @ 2012-09-07  7:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Bjørn Mork, alexey.orishko, netdev, linux-usb
In-Reply-To: <CACVXFVM9OdUbmOFvf1KG4ePoz7wCxLRwCET3cgL=EynNBLx2AA@mail.gmail.com>

On Friday 07 September 2012 11:55:53 Ming Lei wrote:
> On Fri, Sep 7, 2012 at 1:56 AM, Oliver Neukum <oneukum@suse.de> wrote:
> > On Friday 07 September 2012 00:09:13 Ming Lei wrote:
> >> On Thu, Sep 6, 2012 at 4:30 PM, Bjørn Mork <bjorn@mork.no> wrote:
> >> > Ming Lei <tom.leiming@gmail.com> writes:
> >
> >> >> Looks the introduced .tx_bundle is not necessary since .tx_fixup is OK.
> >> >
> >> > The minidriver does not have any information about tx in progress.  The
> >>
> >> Inside .tx_fixup, the low level driver will get the tx progress information.
> >
> > That information changes after tx_fixup
> 
> You mean the low level drive can't see if the later transmission
> for the aggregated frame is successful?  If so, there is no any complete

Not successful, necessary.

> notification to all low level drivers (with aggregation capability
> or not) now, isn't there?

There doesn't need to be.

> > Well, that is the mistake. Using a timer is a bad idea.
> 
> Why is a bad idea?  Without a timer, how can usbnet or
> low level driver aggregate the later coming transmit frames to
> form a biger aggregated frame?

By registering demand in an abstract sense. The timer makes sense
only when no other buffers are being transfered.

The timer means that we transmit if no other more packages are coming
down from the upper layers. Now, either the device is still busy or it is not.
While it is busy, we might just as well wait, because the upper layer may
change its mind.
If the device is not busy we worsen latency by waiting for the timer.

In addition the timer causes complications with recursing into usbnet
and hassle with suspend/resume and disconnection.

> >> If we can abstract some common things about aggregation, it should be
> >> meaningful. As far as I know, most of aggregation protocol is very different,
> >> so almost all aggregation work is only done by low level driver, such as
> >> cdc_ncm.
> >>
> >> If we want to implement some aggregation framework, maybe below is
> >> one solution, correct me if it is wrong.
> >
> > It isn't so much wrong as incomplete.
> >
> > It seems to me we can have
> >
> > - can queue, buffer not full -> do nothing more
> 
> As I said above, without a timeout timer, the queued skb
> might not be sent out even some long time passed if no
> frame comes later.

Therefore we check whether the device is still busy.

> So could you add the wait for more mechanism to your patch
> for further discussion?

I am attaching a version that is algorithmically complete
except for error handling.

> > - can queue, buffer full -> transmit
> > - cannot queue, buffer full -> transmit and then try again to queue
> 
> This might not happen, the buffer full should have been observed
> in last xmit path.

Possibly, but the driver cannot know in advance how large the next package
will be.

	Reagrds
		Oliver

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 4cd582a..56ef743 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -135,9 +135,6 @@ struct cdc_ncm_ctx {
 	u16 connected;
 };
 
-static void cdc_ncm_txpath_bh(unsigned long param);
-static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx *ctx);
-static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *hr_timer);
 static struct usb_driver cdc_ncm_driver;
 
 static void
@@ -464,10 +461,6 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf)
 	if (ctx == NULL)
 		return -ENODEV;
 
-	hrtimer_init(&ctx->tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	ctx->tx_timer.function = &cdc_ncm_tx_timer_cb;
-	ctx->bh.data = (unsigned long)ctx;
-	ctx->bh.func = cdc_ncm_txpath_bh;
 	atomic_set(&ctx->stop, 0);
 	spin_lock_init(&ctx->mtx);
 	ctx->netdev = dev->net;
@@ -650,7 +643,7 @@ static void cdc_ncm_zero_fill(u8 *ptr, u32 first, u32 end, u32 max)
 	memset(ptr + first, 0, end - first);
 }
 
-static struct sk_buff *
+static int
 cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 {
 	struct sk_buff *skb_out;
@@ -659,12 +652,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 	u32 last_offset;
 	u16 n = 0, index;
 	u8 ready2send = 0;
-
-	/* if there is a remaining skb, it gets priority */
-	if (skb != NULL)
-		swap(skb, ctx->tx_rem_skb);
-	else
-		ready2send = 1;
+	u8 error = 0;
 
 	/*
 	 * +----------------+
@@ -690,7 +678,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 				dev_kfree_skb_any(skb);
 				ctx->netdev->stats.tx_dropped++;
 			}
-			goto exit_no_skb;
+			return -EBUSY;
 		}
 
 		/* make room for NTH and NDP */
@@ -719,28 +707,15 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		/* compute maximum buffer size */
 		rem = ctx->tx_max - offset;
 
-		if (skb == NULL) {
-			skb = ctx->tx_rem_skb;
-			ctx->tx_rem_skb = NULL;
-
-			/* check for end of skb */
-			if (skb == NULL)
-				break;
-		}
-
 		if (skb->len > rem) {
 			if (n == 0) {
 				/* won't fit, MTU problem? */
 				dev_kfree_skb_any(skb);
 				skb = NULL;
 				ctx->netdev->stats.tx_dropped++;
+				error = 1;
 			} else {
-				/* no room for skb - store for later */
-				if (ctx->tx_rem_skb != NULL) {
-					dev_kfree_skb_any(ctx->tx_rem_skb);
-					ctx->netdev->stats.tx_dropped++;
-				}
-				ctx->tx_rem_skb = skb;
+
 				skb = NULL;
 				ready2send = 1;
 			}
@@ -768,13 +743,6 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		skb = NULL;
 	}
 
-	/* free up any dangling skb */
-	if (skb != NULL) {
-		dev_kfree_skb_any(skb);
-		skb = NULL;
-		ctx->netdev->stats.tx_dropped++;
-	}
-
 	ctx->tx_curr_frame_num = n;
 
 	if (n == 0) {
@@ -791,9 +759,7 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 		ctx->tx_curr_skb = skb_out;
 		ctx->tx_curr_offset = offset;
 		ctx->tx_curr_last_offset = last_offset;
-		/* set the pending count */
-		if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
-			ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
+
 		goto exit_no_skb;
 
 	} else {
@@ -874,71 +840,37 @@ cdc_ncm_fill_tx_frame(struct cdc_ncm_ctx *ctx, struct sk_buff *skb)
 	/* return skb */
 	ctx->tx_curr_skb = NULL;
 	ctx->netdev->stats.tx_packets += ctx->tx_curr_frame_num;
-	return skb_out;
 
-exit_no_skb:
-	/* Start timer, if there is a remaining skb */
-	if (ctx->tx_curr_skb != NULL)
-		cdc_ncm_tx_timeout_start(ctx);
-	return NULL;
-}
-
-static void cdc_ncm_tx_timeout_start(struct cdc_ncm_ctx *ctx)
-{
-	/* start timer, if not already started */
-	if (!(hrtimer_active(&ctx->tx_timer) || atomic_read(&ctx->stop)))
-		hrtimer_start(&ctx->tx_timer,
-				ktime_set(0, CDC_NCM_TIMER_INTERVAL),
-				HRTIMER_MODE_REL);
-}
+	if (error)
+		return -EBUSY;
+	if (ready2send)
+		return -EBUSY;
+	else
+		return 0;
 
-static enum hrtimer_restart cdc_ncm_tx_timer_cb(struct hrtimer *timer)
-{
-	struct cdc_ncm_ctx *ctx =
-			container_of(timer, struct cdc_ncm_ctx, tx_timer);
+exit_no_skb:
 
-	if (!atomic_read(&ctx->stop))
-		tasklet_schedule(&ctx->bh);
-	return HRTIMER_NORESTART;
+	return -EAGAIN;
 }
 
-static void cdc_ncm_txpath_bh(unsigned long param)
+static int cdc_ncm_tx_bundle(struct usbnet *dev, struct sk_buff *skb, gfp_t flags)
 {
-	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)param;
-
-	spin_lock_bh(&ctx->mtx);
-	if (ctx->tx_timer_pending != 0) {
-		ctx->tx_timer_pending--;
-		cdc_ncm_tx_timeout_start(ctx);
-		spin_unlock_bh(&ctx->mtx);
-	} else if (ctx->netdev != NULL) {
-		spin_unlock_bh(&ctx->mtx);
-		netif_tx_lock_bh(ctx->netdev);
-		usbnet_start_xmit(NULL, ctx->netdev);
-		netif_tx_unlock_bh(ctx->netdev);
-	}
+	int err;
+	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
+	
+	err = cdc_ncm_fill_tx_frame(ctx, skb);
+	return err;
 }
 
 static struct sk_buff *
 cdc_ncm_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags)
 {
-	struct sk_buff *skb_out;
 	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
 
-	/*
-	 * The Ethernet API we are using does not support transmitting
-	 * multiple Ethernet frames in a single call. This driver will
-	 * accumulate multiple Ethernet frames and send out a larger
-	 * USB frame when the USB buffer is full or when a single jiffies
-	 * timeout happens.
-	 */
 	if (ctx == NULL)
 		goto error;
 
-	spin_lock_bh(&ctx->mtx);
-	skb_out = cdc_ncm_fill_tx_frame(ctx, skb);
-	spin_unlock_bh(&ctx->mtx);
-	return skb_out;
+	return ctx->tx_curr_skb;
 
 error:
 	if (skb != NULL)
@@ -1197,6 +1129,7 @@ static const struct driver_info cdc_ncm_info = {
 	.manage_power = cdc_ncm_manage_power,
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
+	.tx_bundle = cdc_ncm_tx_bundle,
 	.tx_fixup = cdc_ncm_tx_fixup,
 };
 
@@ -1211,6 +1144,7 @@ static const struct driver_info wwan_info = {
 	.manage_power = cdc_ncm_manage_power,
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
+	.tx_bundle = cdc_ncm_tx_bundle,
 	.tx_fixup = cdc_ncm_tx_fixup,
 };
 
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 8531c1c..429b08b 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -86,6 +86,8 @@ static u8	node_id [ETH_ALEN];
 
 static const char driver_name [] = "usbnet";
 
+static void tx_complete(struct urb *urb);
+
 /* use ethtool to change the level for any given device */
 static int msg_level = -1;
 module_param (msg_level, int, 0);
@@ -1016,17 +1018,76 @@ skip_reset:
 		netdev_dbg(dev->net, "kevent done, flags = 0x%lx\n", dev->flags);
 }
 
+static int submit_next_bundle(struct usbnet *dev, struct sk_buff *skb)
+{
+	struct skb_data *entry;
+	int length = skb->len;
+	int retval;
+	struct driver_info *info = dev->driver_info;
+	struct urb *urb;
+
+	urb = usb_alloc_urb(0, GFP_ATOMIC);
+	if (!urb)
+		return -ENOMEM;
+
+	entry = (struct skb_data *) skb->cb;
+	entry->urb = urb;
+	entry->dev = dev;
+	entry->length = length;
+
+	usb_fill_bulk_urb (urb, dev->udev, dev->out,
+			skb->data, skb->len, tx_complete, skb);
+
+	/* don't assume the hardware handles USB_ZERO_PACKET
+	 * NOTE:  strictly conforming cdc-ether devices should expect
+	 * the ZLP here, but ignore the one-byte packet.
+	 * NOTE2: CDC NCM specification is different from CDC ECM when
+	 * handling ZLP/short packets, so cdc_ncm driver will make short
+	 * packet itself if needed.
+	 */
+	if (length % dev->maxpacket == 0) {
+		if (!(info->flags & FLAG_SEND_ZLP)) {
+			if (!(info->flags & FLAG_MULTI_PACKET)) {
+				urb->transfer_buffer_length++;
+				if (skb_tailroom(skb)) {
+					skb->data[skb->len] = 0;
+					__skb_put(skb, 1);
+				}
+			}
+		} else
+			urb->transfer_flags |= URB_ZERO_PACKET;
+	}
+
+	atomic_inc(&dev->tx_in_flight);
+	retval = usb_submit_urb(urb, GFP_ATOMIC);
+	if (retval < 0)
+		atomic_dec(&dev->tx_in_flight);
+	//FIXME: full error handling
+
+	return retval;
+}
 /*-------------------------------------------------------------------------*/
 
 static void tx_complete (struct urb *urb)
 {
 	struct sk_buff		*skb = (struct sk_buff *) urb->context;
+	struct sk_buff		*skb2;
 	struct skb_data		*entry = (struct skb_data *) skb->cb;
 	struct usbnet		*dev = entry->dev;
+	struct driver_info	*info = dev->driver_info;
+	int			in_flight;
 
+	in_flight = atomic_dec_return(&dev->tx_in_flight);
 	if (urb->status == 0) {
-		if (!(dev->driver_info->flags & FLAG_MULTI_PACKET))
+		if (!(dev->driver_info->flags & FLAG_MULTI_PACKET)) {
 			dev->net->stats.tx_packets++;
+		} else {
+			if (in_flight < 2) {
+				skb2 = info->tx_fixup(dev, NULL, GFP_ATOMIC);
+				if (skb2)
+					submit_next_bundle(dev, skb2);
+			}
+		}
 		dev->net->stats.tx_bytes += entry->length;
 	} else {
 		dev->net->stats.tx_errors++;
@@ -1089,23 +1150,50 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	struct urb		*urb = NULL;
 	struct skb_data		*entry;
 	struct driver_info	*info = dev->driver_info;
+	struct sk_buff		*skb_old = NULL;
 	unsigned long		flags;
 	int retval;
+	int transmit_now = 1;
+	int bundle_again = 0;
 
 	if (skb)
 		skb_tx_timestamp(skb);
 
+	/*
+	 * first we allow drivers to bundle packets together
+	 * maintainance of the buffer is the responsibility
+	 * of the lower layer
+	 */
+rebundle:
+	if (info->tx_bundle) {
+		bundle_again = 0;
+		retval = info->tx_bundle(dev, skb, GFP_ATOMIC);
+
+		switch (retval) {
+		case 0: /* the package has been bundled */
+			if (atomic_read(&dev->tx_in_flight) < 2)
+				transmit_now = 1;
+			else
+				transmit_now = 0;
+			break;
+		case -EAGAIN:
+			transmit_now = 1;
+			bundle_again = 1;
+			skb_old = skb;
+			break;
+		case -EBUSY:
+			transmit_now = 1;
+			break;
+		}
+	}
 	// some devices want funky USB-level framing, for
 	// win32 driver (usually) and/or hardware quirks
-	if (info->tx_fixup) {
+	if (transmit_now && info->tx_fixup) {
 		skb = info->tx_fixup (dev, skb, GFP_ATOMIC);
 		if (!skb) {
 			if (netif_msg_tx_err(dev)) {
 				netif_dbg(dev, tx_err, dev->net, "can't tx_fixup skb\n");
 				goto drop;
-			} else {
-				/* cdc_ncm collected packet; waits for more */
-				goto not_drop;
 			}
 		}
 	}
@@ -1164,14 +1252,17 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	}
 #endif
 
+	atomic_inc(&dev->tx_in_flight);
 	switch ((retval = usb_submit_urb (urb, GFP_ATOMIC))) {
 	case -EPIPE:
 		netif_stop_queue (net);
 		usbnet_defer_kevent (dev, EVENT_TX_HALT);
+		atomic_dec(&dev->tx_in_flight);
 		usb_autopm_put_interface_async(dev->intf);
 		break;
 	default:
 		usb_autopm_put_interface_async(dev->intf);
+		atomic_dec(&dev->tx_in_flight);
 		netif_dbg(dev, tx_err, dev->net,
 			  "tx: submit urb err %d\n", retval);
 		break;
@@ -1187,7 +1278,6 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 		netif_dbg(dev, tx_err, dev->net, "drop, code %d\n", retval);
 drop:
 		dev->net->stats.tx_dropped++;
-not_drop:
 		if (skb)
 			dev_kfree_skb_any (skb);
 		usb_free_urb (urb);
@@ -1197,6 +1287,10 @@ not_drop:
 #ifdef CONFIG_PM
 deferred:
 #endif
+	if (bundle_again) {
+		skb = skb_old;
+		goto rebundle;
+	}
 	return NETDEV_TX_OK;
 }
 EXPORT_SYMBOL_GPL(usbnet_start_xmit);
@@ -1393,6 +1487,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 	dev->delay.data = (unsigned long) dev;
 	init_timer (&dev->delay);
 	mutex_init (&dev->phy_mutex);
+	atomic_set(&dev->tx_in_flight, 0); 
 
 	dev->net = net;
 	strcpy (net->name, "usb%d");
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index f87cf62..bb2f622 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -33,6 +33,7 @@ struct usbnet {
 	wait_queue_head_t	*wait;
 	struct mutex		phy_mutex;
 	unsigned char		suspend_count;
+	atomic_t		tx_in_flight;
 
 	/* i/o info: pipes etc */
 	unsigned		in, out;
@@ -133,6 +134,12 @@ struct driver_info {
 	/* fixup rx packet (strip framing) */
 	int	(*rx_fixup)(struct usbnet *dev, struct sk_buff *skb);
 
+	/* bundle individual package for transmission as
+	 * larger package. This cannot sleep
+	 */
+	int	(*tx_bundle)(struct usbnet *dev,
+				struct sk_buff *skb, gfp_t flags);
+
 	/* fixup tx packet (add framing) */
 	struct sk_buff	*(*tx_fixup)(struct usbnet *dev,
 				struct sk_buff *skb, gfp_t flags);

^ permalink raw reply related

* Congratulations: You Have Won.
From: Mr.Tan Wong @ 2012-09-06 20:40 UTC (permalink / raw)
  To: info

Congratulations: You Have Won

The MICROSOFT EMAIL PROMO TEAM is glad to announce that
after a successful completion of the PROMO DRAWS held
on the 1st September 2012 your e-mail address,attached
to winning numbers: (11) (80) (12)(96) (09) (43) won in
the Tenth lottery category.

You have therefore been approved to claim a total sum
of? 450,000,00 Pounds Sterling in cash credited to REF
NO:MICRO-L/2009-END10.

All participants were selected through our Microsoft
computer ballot system drawn from 167,000 Names, as
part of our International "E-MAIL" Promotion Program
for our prominent MS-WORD users all over the world and
for the continuous use of the Internet. You are advised
to contact the claims-processor with the details below
via his e-mail address :

NAME:  Mr. Harris Conklin
EMAIL: mrharrisconklin1@yahoo.com.hk

PLEASE NOTE YOU ARE TO SEND THE FOLLOWING INFORMATION
TO CLAIM YOUR PRIZE:

1.Full Name:.......
2.Address:.......
3.Phone:.......
4.Country:.......
5.Sex/Gender:.......

YOURS SINCERELY,
Mr. Harris Conklin.

^ permalink raw reply

* [PATCH net-next] igmp: avoid drop_monitor false positives
From: Eric Dumazet @ 2012-09-07  6:37 UTC (permalink / raw)
  To: David Miller; +Cc: Shawn Bohrer, netdev

From: Eric Dumazet <edumazet@google.com>

igmp should call consume_skb() for all correctly processed packets,
to avoid false dropwatch/drop_monitor false positives.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/igmp.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 0b5580c..3479b98 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -815,14 +815,15 @@ static int igmp_marksources(struct ip_mc_list *pmc, int nsrcs, __be32 *srcs)
 	return 1;
 }
 
-static void igmp_heard_report(struct in_device *in_dev, __be32 group)
+/* return true if packet was dropped */
+static bool igmp_heard_report(struct in_device *in_dev, __be32 group)
 {
 	struct ip_mc_list *im;
 
 	/* Timers are only set for non-local groups */
 
 	if (group == IGMP_ALL_HOSTS)
-		return;
+		return false;
 
 	rcu_read_lock();
 	for_each_pmc_rcu(in_dev, im) {
@@ -832,9 +833,11 @@ static void igmp_heard_report(struct in_device *in_dev, __be32 group)
 		}
 	}
 	rcu_read_unlock();
+	return false;
 }
 
-static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
+/* return true if packet was dropped */
+static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 	int len)
 {
 	struct igmphdr 		*ih = igmp_hdr(skb);
@@ -866,7 +869,7 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 		/* clear deleted report items */
 		igmpv3_clear_delrec(in_dev);
 	} else if (len < 12) {
-		return;	/* ignore bogus packet; freed by caller */
+		return true;	/* ignore bogus packet; freed by caller */
 	} else if (IGMP_V1_SEEN(in_dev)) {
 		/* This is a v3 query with v1 queriers present */
 		max_delay = IGMP_Query_Response_Interval;
@@ -883,13 +886,13 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			max_delay = 1;	/* can't mod w/ 0 */
 	} else { /* v3 */
 		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
-			return;
+			return true;
 
 		ih3 = igmpv3_query_hdr(skb);
 		if (ih3->nsrcs) {
 			if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)
 					   + ntohs(ih3->nsrcs)*sizeof(__be32)))
-				return;
+				return true;
 			ih3 = igmpv3_query_hdr(skb);
 		}
 
@@ -901,9 +904,9 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			in_dev->mr_qrv = ih3->qrv;
 		if (!group) { /* general query */
 			if (ih3->nsrcs)
-				return;	/* no sources allowed */
+				return false;	/* no sources allowed */
 			igmp_gq_start_timer(in_dev);
-			return;
+			return false;
 		}
 		/* mark sources to include, if group & source-specific */
 		mark = ih3->nsrcs != 0;
@@ -939,6 +942,7 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 			igmp_mod_timer(im, max_delay);
 	}
 	rcu_read_unlock();
+	return false;
 }
 
 /* called in rcu_read_lock() section */
@@ -948,6 +952,7 @@ int igmp_rcv(struct sk_buff *skb)
 	struct igmphdr *ih;
 	struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
 	int len = skb->len;
+	bool dropped = true;
 
 	if (in_dev == NULL)
 		goto drop;
@@ -969,7 +974,7 @@ int igmp_rcv(struct sk_buff *skb)
 	ih = igmp_hdr(skb);
 	switch (ih->type) {
 	case IGMP_HOST_MEMBERSHIP_QUERY:
-		igmp_heard_query(in_dev, skb, len);
+		dropped = igmp_heard_query(in_dev, skb, len);
 		break;
 	case IGMP_HOST_MEMBERSHIP_REPORT:
 	case IGMPV2_HOST_MEMBERSHIP_REPORT:
@@ -979,7 +984,7 @@ int igmp_rcv(struct sk_buff *skb)
 		/* don't rely on MC router hearing unicast reports */
 		if (skb->pkt_type == PACKET_MULTICAST ||
 		    skb->pkt_type == PACKET_BROADCAST)
-			igmp_heard_report(in_dev, ih->group);
+			dropped = igmp_heard_report(in_dev, ih->group);
 		break;
 	case IGMP_PIM:
 #ifdef CONFIG_IP_PIMSM_V1
@@ -997,7 +1002,10 @@ int igmp_rcv(struct sk_buff *skb)
 	}
 
 drop:
-	kfree_skb(skb);
+	if (dropped)
+		kfree_skb(skb);
+	else
+		consume_skb(skb);
 	return 0;
 }
 

^ permalink raw reply related

* Re: Increased multicast packet drops in 3.4
From: Eric Dumazet @ 2012-09-07  6:08 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev
In-Reply-To: <20120907040043.GA2714@BohrerMBP.rgmadvisors.com>

On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> > kfree_skb() can free a list of skb, and we use a generic function to do
> > so, without forwarding the drop/notdrop status. So its unfortunate, but
> > adding extra parameters just for the sake of drop_monitor is not worth
> > it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
> 
> I understand that this means that dropwatch or the skb:kfree_skb
> tracepoint won't know if the packet was really dropped, but do we
> know in this case from the context of the stack trace?  I'm assuming
> since we didn't receive an error that the packet was delivered and
> these aren't real drops.

I am starting to believe this is an application error.

This application uses recvmmsg() to fetch a lot of messages in one
syscall, and it might well be it throws out a batch of 50+ messages
because of an application bug. Yes, this starts with 3.4, but it can b
triggered by a timing difference or something that is not a proper
kernel bug...

> 
> > Are you receiving fragmented UDP frames ?
> 
> I looked at the sending application and it yes it is possible it is
> sending fragmented frames.
> 
> > I ask this because with latest kernels (linux-3.5), we should no longer
> > build a list of skb, but a single skb with page fragments.
> > 
> > commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Sat May 19 03:02:20 2012 +0000
> > 
> >     ipv4: use skb coalescing in defragmentation
> >     
> >     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
> >     reducing memory used by them (truesize), and reducing number of cache
> >     line misses and overhead for the consumer.
> >     
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I'll have to give 3.5 a try tomorrow and see if it has the same
> problem.  After backporting all of your patches to convert kfree_skb()
> to consume_skb() to 3.4 I actually don't have that many different
> places hitting the skb:kfree_skb tracepoint at the time of the drop.
> Here are some of the ones I have left that might be relevant.
> 
>           <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
>           <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
> => ip_rcv (ffffffff81404e33)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
> IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
> skb->pkt_type == PACKET_OTHERHOST?
> 
>           <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
>           <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
> => ip_rcv_finish (ffffffff81404660)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> I see two places here that I might be hitting that don't increment any
> counters.  I can try instrumenting these to see which one I hit.
> 
>           <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
>           <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
> => llc_rcv (ffffffffa00492d4)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => process_responses (ffffffffa018486c)
> => napi_rx_handler (ffffffffa0185606)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> This is protocol=4 so I don't know if it is really relevant but then
> again I don't know what this is.

You can ignore this

> 
>           <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
>           <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
> => igmp_rcv (ffffffff8143bfa8)
> => ip_local_deliver_finish (ffffffff814049ed)
> => ip_local_deliver (ffffffff81404d1a)
> => ip_rcv_finish (ffffffff814046ad)
> => ip_rcv (ffffffff81404f61)
> => __netif_receive_skb (ffffffff813ce123)
> => netif_receive_skb (ffffffff813d0da1)
> => mlx4_en_process_rx_cq (ffffffffa010a4fe)
> => mlx4_en_poll_rx_cq (ffffffffa010a9ef)
> => net_rx_action (ffffffff813d2449)
> => __do_softirq (ffffffff8103bfd0)
> => call_softirq (ffffffff8148a14c)
> => do_softirq (ffffffff81003e85)
> => irq_exit (ffffffff8103c3a5)
> => do_IRQ (ffffffff8148a693)
> => ret_from_intr (ffffffff814814a7)
> => cpu_idle (ffffffff8100ac16)
> => start_secondary (ffffffff81af5e66)
> 
> Also don't know if this one is relevant.  This looks like an igmp
> packet so probably not my drop, but I am receiving multicast packets
> in this case so maybe it is somehow related.

Yes, we need to change igmp to call consume_skb() for all frames that
were properly handled.

So you can ignore this as well.

^ permalink raw reply

* Re: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Richard Cochran @ 2012-09-07  5:51 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Ben Hutchings, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580785FAB4@ORSMSX105.amr.corp.intel.com>

On Thu, Sep 06, 2012 at 11:40:23PM +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > 
> > The PTP clock is instantiated by some driver, bound to some device,
> > whether that's a PCI device or a platform device or something else.  I
> > think that device (not a net device) should be the parent of the clock
> > device.
> > 
> 
> Agreed, that would make the most sense.

Okay, makes sense, but that will require a change to the phc core.

For the present, I can buy Jacob's argument that the MAC address is a
way to clarify to the user that there are "unfortunate" multiple clock
instances.

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next] scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie.
From: Eric W. Biederman @ 2012-09-07  4:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Serge E. Hallyn, Eric Dumazet


Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace.  Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
  that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
  cred_to_ucred to fill out struct ucred.  This conversion ensures
  userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
  before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
  instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
  into the NETLINK_CB.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/netlink.h  |    3 ++-
 include/net/scm.h        |   23 +++++++++++++++++++----
 net/core/scm.c           |   17 +++++++++++------
 net/netlink/af_netlink.c |    2 +-
 4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index c9fdde2..df73cf4 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -153,6 +153,7 @@ struct nlattr {
 
 #include <linux/capability.h>
 #include <linux/skbuff.h>
+#include <net/scm.h>
 
 struct net;
 
@@ -162,7 +163,7 @@ static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)
 }
 
 struct netlink_skb_parms {
-	struct ucred		creds;		/* Skb credentials	*/
+	struct scm_creds	creds;		/* Skb credentials	*/
 	__u32			pid;
 	__u32			dst_group;
 	struct sock		*ssk;
diff --git a/include/net/scm.h b/include/net/scm.h
index 079d788..000b6f7 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -12,6 +12,12 @@
  */
 #define SCM_MAX_FD	253
 
+struct scm_creds {
+	u32	pid;
+	kuid_t	uid;
+	kgid_t	gid;
+};
+
 struct scm_fp_list {
 	short			count;
 	short			max;
@@ -22,7 +28,7 @@ struct scm_cookie {
 	struct pid		*pid;		/* Skb credentials */
 	const struct cred	*cred;
 	struct scm_fp_list	*fp;		/* Passed files		*/
-	struct ucred		creds;		/* Skb credentials	*/
+	struct scm_creds	creds;		/* Skb credentials	*/
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Passed security ID 	*/
 #endif
@@ -49,7 +55,9 @@ static __inline__ void scm_set_cred(struct scm_cookie *scm,
 {
 	scm->pid  = get_pid(pid);
 	scm->cred = cred ? get_cred(cred) : NULL;
-	cred_to_ucred(pid, cred, &scm->creds);
+	scm->creds.pid = pid_vnr(pid);
+	scm->creds.uid = cred ? cred->euid : INVALID_UID;
+	scm->creds.gid = cred ? cred->egid : INVALID_GID;
 }
 
 static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
@@ -110,8 +118,15 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 		return;
 	}
 
-	if (test_bit(SOCK_PASSCRED, &sock->flags))
-		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
+	if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+		struct user_namespace *current_ns = current_user_ns();
+		struct ucred ucreds = {
+			.pid = scm->creds.pid,
+			.uid = from_kuid_munged(current_ns, scm->creds.uid),
+			.gid = from_kgid_munged(current_ns, scm->creds.gid),
+		};
+		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
+	}
 
 	scm_destroy_cred(scm);
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 5472ae7..64b41d8 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -155,19 +155,21 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			break;
 		case SCM_CREDENTIALS:
 		{
+			struct ucred creds;
 			kuid_t uid;
 			kgid_t gid;
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
-			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			memcpy(&creds, CMSG_DATA(cmsg), sizeof(struct ucred));
+			err = scm_check_creds(&creds);
 			if (err)
 				goto error;
 
-			if (!p->pid || pid_vnr(p->pid) != p->creds.pid) {
+			p->creds.pid = creds.pid;
+			if (!p->pid || pid_vnr(p->pid) != creds.pid) {
 				struct pid *pid;
 				err = -ESRCH;
-				pid = find_get_pid(p->creds.pid);
+				pid = find_get_pid(creds.pid);
 				if (!pid)
 					goto error;
 				put_pid(p->pid);
@@ -175,11 +177,14 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			}
 
 			err = -EINVAL;
-			uid = make_kuid(current_user_ns(), p->creds.uid);
-			gid = make_kgid(current_user_ns(), p->creds.gid);
+			uid = make_kuid(current_user_ns(), creds.uid);
+			gid = make_kgid(current_user_ns(), creds.gid);
 			if (!uid_valid(uid) || !gid_valid(gid))
 				goto error;
 
+			p->creds.uid = uid;
+			p->creds.gid = gid;
+
 			if (!p->cred ||
 			    !uid_eq(p->cred->euid, uid) ||
 			    !gid_eq(p->cred->egid, gid)) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7cb7867..6473267 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1398,7 +1398,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 
 	NETLINK_CB(skb).pid	= nlk->pid;
 	NETLINK_CB(skb).dst_group = dst_group;
-	memcpy(NETLINK_CREDS(skb), &siocb->scm->creds, sizeof(struct ucred));
+	NETLINK_CB(skb).creds	= siocb->scm->creds;
 
 	err = -EFAULT;
 	if (memcpy_fromiovec(skb_put(skb, len), msg->msg_iov, len)) {
-- 
1.7.5.4

^ permalink raw reply related

* Re: Increased multicast packet drops in 3.4
From: Shawn Bohrer @ 2012-09-07  4:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1346937667.2484.33.camel@edumazet-glaptop>

On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> kfree_skb() can free a list of skb, and we use a generic function to do
> so, without forwarding the drop/notdrop status. So its unfortunate, but
> adding extra parameters just for the sake of drop_monitor is not worth
> it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()

I understand that this means that dropwatch or the skb:kfree_skb
tracepoint won't know if the packet was really dropped, but do we
know in this case from the context of the stack trace?  I'm assuming
since we didn't receive an error that the packet was delivered and
these aren't real drops.

> Are you receiving fragmented UDP frames ?

I looked at the sending application and it yes it is possible it is
sending fragmented frames.

> I ask this because with latest kernels (linux-3.5), we should no longer
> build a list of skb, but a single skb with page fragments.
> 
> commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat May 19 03:02:20 2012 +0000
> 
>     ipv4: use skb coalescing in defragmentation
>     
>     ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
>     reducing memory used by them (truesize), and reducing number of cache
>     line misses and overhead for the consumer.
>     
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

I'll have to give 3.5 a try tomorrow and see if it has the same
problem.  After backporting all of your patches to convert kfree_skb()
to consume_skb() to 3.4 I actually don't have that many different
places hitting the skb:kfree_skb tracepoint at the time of the drop.
Here are some of the ones I have left that might be relevant.

          <idle>-0     [001] 11933.738797: kfree_skb:            skbaddr=0xffff8805ebcf9500 protocol=2048 location=0xffffffff81404e33
          <idle>-0     [001] 11933.738801: kernel_stack:         <stack trace>
=> ip_rcv (ffffffff81404e33)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

My IPSTATS_MIB_INHDRERRORS, IPSTATS_MIB_INDISCARDS, and
IPSTATS_MIB_INTRUNCATEDPKTS counters are all 0 so maybe this is from
skb->pkt_type == PACKET_OTHERHOST?

          <idle>-0     [001] 11933.937378: kfree_skb:            skbaddr=0xffff8805ebcf8c00 protocol=2048 location=0xffffffff81404660
          <idle>-0     [001] 11933.937385: kernel_stack:         <stack trace>
=> ip_rcv_finish (ffffffff81404660)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

I see two places here that I might be hitting that don't increment any
counters.  I can try instrumenting these to see which one I hit.

          <idle>-0     [003] 11932.454375: kfree_skb:            skbaddr=0xffff880584843700 protocol=4 location=0xffffffffa00492d4
          <idle>-0     [003] 11932.454382: kernel_stack:         <stack trace>
=> llc_rcv (ffffffffa00492d4)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> process_responses (ffffffffa018486c)
=> napi_rx_handler (ffffffffa0185606)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

This is protocol=4 so I don't know if it is really relevant but then
again I don't know what this is.

          <idle>-0     [003] 11914.266635: kfree_skb:            skbaddr=0xffff880584843b00 protocol=2048 location=0xffffffff8143bfa8
          <idle>-0     [003] 11914.266641: kernel_stack:         <stack trace>
=> igmp_rcv (ffffffff8143bfa8)
=> ip_local_deliver_finish (ffffffff814049ed)
=> ip_local_deliver (ffffffff81404d1a)
=> ip_rcv_finish (ffffffff814046ad)
=> ip_rcv (ffffffff81404f61)
=> __netif_receive_skb (ffffffff813ce123)
=> netif_receive_skb (ffffffff813d0da1)
=> mlx4_en_process_rx_cq (ffffffffa010a4fe)
=> mlx4_en_poll_rx_cq (ffffffffa010a9ef)
=> net_rx_action (ffffffff813d2449)
=> __do_softirq (ffffffff8103bfd0)
=> call_softirq (ffffffff8148a14c)
=> do_softirq (ffffffff81003e85)
=> irq_exit (ffffffff8103c3a5)
=> do_IRQ (ffffffff8148a693)
=> ret_from_intr (ffffffff814814a7)
=> cpu_idle (ffffffff8100ac16)
=> start_secondary (ffffffff81af5e66)

Also don't know if this one is relevant.  This looks like an igmp
packet so probably not my drop, but I am receiving multicast packets
in this case so maybe it is somehow related.

If any of these spark any ideas let me know otherwise I'll keep
digging and try 3.5/3.6 tomorrow.

Thanks,
Shawn


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply

* Re: changing usbnet's API to better deal with cdc-ncm
From: Ming Lei @ 2012-09-07  3:55 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Bjørn Mork, alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1676538.CEPj2RtrPe-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

On Fri, Sep 7, 2012 at 1:56 AM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
> On Friday 07 September 2012 00:09:13 Ming Lei wrote:
>> On Thu, Sep 6, 2012 at 4:30 PM, Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org> wrote:
>> > Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> >> Looks the introduced .tx_bundle is not necessary since .tx_fixup is OK.
>> >
>> > The minidriver does not have any information about tx in progress.  The
>>
>> Inside .tx_fixup, the low level driver will get the tx progress information.
>
> That information changes after tx_fixup

You mean the low level drive can't see if the later transmission
for the aggregated frame is successful?  If so, there is no any complete
notification to all low level drivers (with aggregation capability
or not) now, isn't there?

If no aggregated frame is ready for transmission, the information
can't be changed after current tx_fixup and before the next .tx_fixup.

>
>> > strategy above, which is what cdc_ncm uses today, is fine as long as you
>> > always want to wait as long as you always want to fill as many frames as
>> > possible in each packet.  But what if the queue is empty and the device
>>
>> For cdc_ncm, the wait time is controlled by cdc_ncm driver, see
>> cdc_ncm_tx_timeout_start().
>
> Well, that is the mistake. Using a timer is a bad idea.

Why is a bad idea?  Without a timer, how can usbnet or
low level driver aggregate the later coming transmit frames to
form a biger aggregated frame?

>
>> If we can abstract some common things about aggregation, it should be
>> meaningful. As far as I know, most of aggregation protocol is very different,
>> so almost all aggregation work is only done by low level driver, such as
>> cdc_ncm.
>>
>> If we want to implement some aggregation framework, maybe below is
>> one solution, correct me if it is wrong.
>
> It isn't so much wrong as incomplete.
>
> It seems to me we can have
>
> - can queue, buffer not full -> do nothing more

As I said above, without a timeout timer, the queued skb
might not be sent out even some long time passed if no
frame comes later.

So could you add the wait for more mechanism to your patch
for further discussion?

> - can queue, buffer full -> transmit
> - cannot queue, buffer full -> transmit and then try again to queue

This might not happen, the buffer full should have been observed
in last xmit path.

>
> and an error case
>
> - cannot queue, buffer not full



Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 2/6] net: pxaficp_ir: add irq resources
From: Eric Miao @ 2012-09-07  3:54 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel, Russell King, Arnd Bergmann, Olof Johansson,
	Linus Walleij, Rob Herring, Samuel Ortiz, Haojian Zhuang, netdev
In-Reply-To: <1346960563-18689-3-git-send-email-robherring2@gmail.com>

On Fri, Sep 7, 2012 at 3:42 AM, Rob Herring <robherring2@gmail.com> wrote:
> From: Rob Herring <rob.herring@calxeda.com>
>
> In order to remove dependency on mach/irqs.h, add platform device
> resources for irqs.
>

Looks good to me, we shall have this change landed long before this, thanks Rob

Acked-by: Eric Miao <eric.y.miao@gmail.com>

> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Cc: Samuel Ortiz <samuel@sortiz.org>
> Cc: Eric Miao <eric.y.miao@gmail.com>
> Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
> Cc: netdev@vger.kernel.org
> ---
>  arch/arm/mach-pxa/devices.c   |   15 +++++++++++++++
>  drivers/net/irda/pxaficp_ir.c |   28 +++++++++++++++++-----------
>  2 files changed, 32 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/mach-pxa/devices.c b/arch/arm/mach-pxa/devices.c
> index 166eee5..339eb2a 100644
> --- a/arch/arm/mach-pxa/devices.c
> +++ b/arch/arm/mach-pxa/devices.c
> @@ -384,9 +384,24 @@ struct platform_device pxa_device_asoc_platform = {
>
>  static u64 pxaficp_dmamask = ~(u32)0;
>
> +static struct resource pxa_ir_resources[] = {
> +       [0] = {
> +               .start  = IRQ_STUART,
> +               .end    = IRQ_STUART,
> +               .flags  = IORESOURCE_IRQ,
> +       },
> +       [1] = {
> +               .start  = IRQ_ICP,
> +               .end    = IRQ_ICP,
> +               .flags  = IORESOURCE_IRQ,
> +       },
> +};
> +
>  struct platform_device pxa_device_ficp = {
>         .name           = "pxa2xx-ir",
>         .id             = -1,
> +       .num_resources  = ARRAY_SIZE(pxa_ir_resources),
> +       .resource       = pxa_ir_resources,
>         .dev            = {
>                 .dma_mask = &pxaficp_dmamask,
>                 .coherent_dma_mask = 0xffffffff,
> diff --git a/drivers/net/irda/pxaficp_ir.c b/drivers/net/irda/pxaficp_ir.c
> index 8d54767..cb0a5d3 100644
> --- a/drivers/net/irda/pxaficp_ir.c
> +++ b/drivers/net/irda/pxaficp_ir.c
> @@ -29,8 +29,8 @@
>
>  #include <mach/dma.h>
>  #include <mach/irda.h>
> -#include <mach/regs-uart.h>
>  #include <mach/regs-ost.h>
> +#include <mach/regs-uart.h>
>
>  #define FICP           __REG(0x40800000)  /* Start of FICP area */
>  #define ICCR0          __REG(0x40800000)  /* ICP Control Register 0 */
> @@ -112,6 +112,9 @@ struct pxa_irda {
>         int                     txdma;
>         int                     rxdma;
>
> +       int                     uart_irq;
> +       int                     icp_irq;
> +
>         struct irlap_cb         *irlap;
>         struct qos_info         qos;
>
> @@ -672,19 +675,19 @@ static int pxa_irda_start(struct net_device *dev)
>
>         si->speed = 9600;
>
> -       err = request_irq(IRQ_STUART, pxa_irda_sir_irq, 0, dev->name, dev);
> +       err = request_irq(si->uart_irq, pxa_irda_sir_irq, 0, dev->name, dev);
>         if (err)
>                 goto err_irq1;
>
> -       err = request_irq(IRQ_ICP, pxa_irda_fir_irq, 0, dev->name, dev);
> +       err = request_irq(si->icp_irq, pxa_irda_fir_irq, 0, dev->name, dev);
>         if (err)
>                 goto err_irq2;
>
>         /*
>          * The interrupt must remain disabled for now.
>          */
> -       disable_irq(IRQ_STUART);
> -       disable_irq(IRQ_ICP);
> +       disable_irq(si->uart_irq);
> +       disable_irq(si->icp_irq);
>
>         err = -EBUSY;
>         si->rxdma = pxa_request_dma("FICP_RX",DMA_PRIO_LOW, pxa_irda_fir_dma_rx_irq, dev);
> @@ -720,8 +723,8 @@ static int pxa_irda_start(struct net_device *dev)
>         /*
>          * Now enable the interrupt and start the queue
>          */
> -       enable_irq(IRQ_STUART);
> -       enable_irq(IRQ_ICP);
> +       enable_irq(si->uart_irq);
> +       enable_irq(si->icp_irq);
>         netif_start_queue(dev);
>
>         printk(KERN_DEBUG "pxa_ir: irda driver opened\n");
> @@ -738,9 +741,9 @@ err_dma_rx_buff:
>  err_tx_dma:
>         pxa_free_dma(si->rxdma);
>  err_rx_dma:
> -       free_irq(IRQ_ICP, dev);
> +       free_irq(si->icp_irq, dev);
>  err_irq2:
> -       free_irq(IRQ_STUART, dev);
> +       free_irq(si->uart_irq, dev);
>  err_irq1:
>
>         return err;
> @@ -760,8 +763,8 @@ static int pxa_irda_stop(struct net_device *dev)
>                 si->irlap = NULL;
>         }
>
> -       free_irq(IRQ_STUART, dev);
> -       free_irq(IRQ_ICP, dev);
> +       free_irq(si->uart_irq, dev);
> +       free_irq(si->icp_irq, dev);
>
>         pxa_free_dma(si->rxdma);
>         pxa_free_dma(si->txdma);
> @@ -851,6 +854,9 @@ static int pxa_irda_probe(struct platform_device *pdev)
>         si->dev = &pdev->dev;
>         si->pdata = pdev->dev.platform_data;
>
> +       si->uart_irq = platform_get_irq(pdev, 0);
> +       si->icp_irq = platform_get_irq(pdev, 1);
> +
>         si->sir_clk = clk_get(&pdev->dev, "UARTCLK");
>         si->fir_clk = clk_get(&pdev->dev, "FICPCLK");
>         if (IS_ERR(si->sir_clk) || IS_ERR(si->fir_clk)) {
> --
> 1.7.9.5
>

^ permalink raw reply

* [PATCH] ip autoconfig: allow specifying a list of devices rather than one or all
From: Chris Friesen @ 2012-09-07  0:31 UTC (permalink / raw)
  To: netdev
In-Reply-To: <50492547.8070307@genband.com>

On 09/06/2012 04:35 PM, Chris Friesen wrote:

> We need to be able to boot off of both the gigE ports for redundancy, so 
> it seems like it might be a reasonable idea to extend the "ip=" boot arg 
> to allow specifying a list of devices rather than choosing between a 
> single device or all of them.
> 
> Thoughts? Anyone ever done this?

Just for kicks I tried coding it up.  This seems to work, so I include it
as fuel for discussion.  This applies to linus' current git.

Chris




Allow the user to specify up to four device names as part of IP
autoconfiguration.  This allows a degree of redundancy while also
allowing the user to limit which devices are opened by the kernel
during boot (this may be useful if some devices are very high traffic
or need special configuration).


Signed-off-by: Chris Friesen <chris.friesen@genband.com>


diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 67e8a6b..4d24ad4 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -157,7 +157,9 @@ static u8 ic_domain[64];		/* DNS (not NIS) domain name */
  */
 
 /* Name of user-selected boot device */
-static char user_dev_name[IFNAMSIZ] __initdata = { 0, };
+#define NUM_NAMED_BOOTDEV 4
+static int user_dev_count __initdata;
+static char user_dev_name[NUM_NAMED_BOOTDEV][IFNAMSIZ] __initdata;
 
 /* Protocols supported by available interfaces */
 static int ic_proto_have_if __initdata = 0;
@@ -189,16 +191,27 @@ struct ic_device {
 static struct ic_device *ic_first_dev __initdata = NULL;/* List of open device */
 static struct net_device *ic_dev __initdata = NULL;	/* Selected device */
 
+static int __init dev_in_userdevlist(struct net_device *dev)
+{
+	int i;
+	for(i=0;i<NUM_NAMED_BOOTDEV;i++) {
+		if (!strcmp(dev->name, user_dev_name[i]))
+			return 1;
+	}
+	return 0;
+}
+
 static bool __init ic_is_init_dev(struct net_device *dev)
 {
 	if (dev->flags & IFF_LOOPBACK)
 		return false;
-	return user_dev_name[0] ? !strcmp(dev->name, user_dev_name) :
+	return user_dev_count ? dev_in_userdevlist(dev) :
 	    (!(dev->flags & IFF_LOOPBACK) &&
 	     (dev->flags & (IFF_POINTOPOINT|IFF_BROADCAST)) &&
 	     strncmp(dev->name, "dummy", 5));
 }
 
+
 static int __init ic_open_devs(void)
 {
 	struct ic_device *d, **last;
@@ -274,9 +287,8 @@ have_carrier:
 	*last = NULL;
 
 	if (!ic_first_dev) {
-		if (user_dev_name[0])
-			pr_err("IP-Config: Device `%s' not found\n",
-			       user_dev_name);
+		if (user_dev_count)
+			pr_err("IP-Config: Specified boot device(s) not found\n");
 		else
 			pr_err("IP-Config: No network devices available\n");
 		return -ENODEV;
@@ -1605,7 +1617,16 @@ static int __init ip_auto_config_setup(char *addrs)
 				ic_host_name_set = 1;
 				break;
 			case 5:
-				strlcpy(user_dev_name, ip, sizeof(user_dev_name));
+				do {
+					dp = strchr(ip, ',');
+					if (dp)
+						*dp++ = '\0';
+					strlcpy(user_dev_name[user_dev_count], ip,
+						sizeof(user_dev_name[0]));
+					user_dev_count++;
+					if (dp)
+						ip = dp;
+				} while (dp && (user_dev_count < NUM_NAMED_BOOTDEV));
 				break;
 			case 6:
 				if (ic_proto_name(ip) == 0 &&

^ permalink raw reply related

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-07  0:26 UTC (permalink / raw)
  To: Sasha Levin; +Cc: netdev, kvm, virtualization
In-Reply-To: <50493D04.1090408@gmail.com>

On Fri, Sep 07, 2012 at 02:17:08AM +0200, Sasha Levin wrote:
> Hi Michael,
> 
> On 09/06/2012 02:08 PM, Michael S. Tsirkin wrote:
> > Add multiqueue support to virtio network device.
> > Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
> > configuration field max_virtqueue_pairs to detect supported number of
> > virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
> > packet steering.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Some comments about the change:
> 
>  - "The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
> is set." => Should be "exist" (I think).
> 
>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are
> steered by driver to the first (param+1) multiqueue virtqueues
> transmitq1...transmitqN;" - Why param+1?  I thought we ignore the default
> transmit/receive in this case.

This is so that all values are valid. E.g. param=0 -> use q1.

> 
>  - "As selecting a specific steering ais n optimization feature" - "is an".
> 
>  - It's mentioned several times that the ability to read the steering rule from
> the virtio-net config is there for debug reasons. Is it really necessary? I
> think it's the first time I see debug features go in as part of the spec.

I actually think we need to work on more debug features.

>  - I'm slightly confused, why are there both receive and transmit steering? I
> can't find a difference in the way to configure the rule for transmit and
> receive. Is it a plan for the future to allow different rules for tx and rx? If
> so, shouldn't we use different ctrl commands (
> VIRTIO_NET_CTRL_TX_STEERING/VIRTIO_NET_CTRL_RX_STEERING)?

No. Receive steering is done by device. Documented for driver writer's
benefit. xmit streering by driver. documented for device benefit.
They must match for tcp to work well so I doubt we will have
commands to control them separately.

>  - "When rule is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered
> to the default virtqueue receveq (0);" - "receiveq (0)"
> 
> 
> 
> Thanks,
> Sasha


Thanks for the comments will address!

^ permalink raw reply

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Sasha Levin @ 2012-09-07  0:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, kvm, virtualization
In-Reply-To: <20120906120828.GA1534@redhat.com>

Hi Michael,

On 09/06/2012 02:08 PM, Michael S. Tsirkin wrote:
> Add multiqueue support to virtio network device.
> Add a new feature flag VIRTIO_NET_F_MULTIQUEUE for this feature, a new
> configuration field max_virtqueue_pairs to detect supported number of
> virtqueues as well as a new command VIRTIO_NET_CTRL_STEERING to program
> packet steering.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Some comments about the change:

 - "The following four read-only fields only exists if VIRTIO_NET_F_MULTIQUEUE
is set." => Should be "exist" (I think).

 - "When rule is set to VIRTIO_NET_CTRL_STEERING_RX_FOLLOWS_TX packets are
steered by driver to the first (param+1) multiqueue virtqueues
transmitq1...transmitqN;" - Why param+1?  I thought we ignore the default
transmit/receive in this case.

 - "As selecting a specific steering ais n optimization feature" - "is an".

 - It's mentioned several times that the ability to read the steering rule from
the virtio-net config is there for debug reasons. Is it really necessary? I
think it's the first time I see debug features go in as part of the spec.

 - I'm slightly confused, why are there both receive and transmit steering? I
can't find a difference in the way to configure the rule for transmit and
receive. Is it a plan for the future to allow different rules for tx and rx? If
so, shouldn't we use different ctrl commands (
VIRTIO_NET_CTRL_TX_STEERING/VIRTIO_NET_CTRL_RX_STEERING)?

 - "When rule is set to VIRTIO_NET_CTRL_STEERING_SINGLE all packets are steered
to the default virtqueue receveq (0);" - "receiveq (0)"



Thanks,
Sasha

^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Keller, Jacob E @ 2012-09-06 23:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Richard Cochran, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <1346974770.2714.64.camel@bwh-desktop.uk.solarflarecom.com>



> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Thursday, September 06, 2012 4:40 PM
> To: Keller, Jacob E
> Cc: Richard Cochran; Kirsher, Jeffrey T; davem@davemloft.net; Vick,
> Matthew; netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> Subject: RE: [net-next 4/6] igb: Store the MAC address in the name in the
> PTP struct.
> 
> On Thu, 2012-09-06 at 23:24 +0000, Keller, Jacob E wrote:
> > > -----Original Message-----
> > > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > > Sent: Thursday, September 06, 2012 4:23 PM
> > > To: Keller, Jacob E; Richard Cochran
> > > Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> > > netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> > > Subject: RE: [net-next 4/6] igb: Store the MAC address in the name
> > > in the PTP struct.
> > >
> > > On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > > > > -----Original Message-----
> > > > > From: netdev-owner@vger.kernel.org
> > > > > [mailto:netdev-owner@vger.kernel.org]
> > > > > On Behalf Of Richard Cochran
> > > > > Sent: Thursday, September 06, 2012 1:06 AM
> > > > > To: Kirsher, Jeffrey T
> > > > > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > > > > gospo@redhat.com; sassmann@redhat.com
> > > > > Subject: Re: [net-next 4/6] igb: Store the MAC address in the
> > > > > name in the PTP struct.
> > > > >
> > > > > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > > > > From: Matthew Vick <matthew.vick@intel.com>
> > > > > >
> > > > > > Change the name of the adapter in the PTP struct to enable
> > > > > > easier correlation between interface and PTP device.
> > > > >
> > > > > Still think it is better to have the driver name, not the MAC
> address.
> > > > > The correlation is clear by using the ethtool method.
> > > > >
> > > >
> > > > I found a method to correct the init code so that the device name
> > > > can appear here, and I would prefer that over either the driver
> > > > name or MAC address. It requires moving the ptp initialization
> > > > into the netdev open handler though.
> > >
> > > I don't think you should use the net device name for this, as net
> > > devices can be renamed after creation.  (Though you could update the
> > > struct ptp_clock_info whenever this happens.)
> > >
> > > I'm starting to wonder what use there is for a 'clock name' distinct
> > > from the clock device name.  What would be more useful is to give it a
> parent.
> > > Richard, is there any reason why ptp_clock_register() doesn't allow
> > > specifying the parent device?  You can then make the clock_name
> > > attribute contain the parent device's driver name or whatever it is
> you prefer.
> > >
> > > Ben.
> >
> > Because ideally the ptp device belongs to multiple ethernet devices...
> >
> > Sadly current intel hardware doesn't support this.
> 
> The PTP clock is instantiated by some driver, bound to some device,
> whether that's a PCI device or a platform device or something else.  I
> think that device (not a net device) should be the parent of the clock
> device.
> 

Agreed, that would make the most sense.

> Ben.
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer;
> that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Ben Hutchings @ 2012-09-06 23:39 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Richard Cochran, Kirsher, Jeffrey T, davem@davemloft.net,
	Vick, Matthew, netdev@vger.kernel.org, gospo@redhat.com,
	sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580785FA77@ORSMSX105.amr.corp.intel.com>

On Thu, 2012-09-06 at 23:24 +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > Sent: Thursday, September 06, 2012 4:23 PM
> > To: Keller, Jacob E; Richard Cochran
> > Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> > netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> > Subject: RE: [net-next 4/6] igb: Store the MAC address in the name in the
> > PTP struct.
> > 
> > On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > > > -----Original Message-----
> > > > From: netdev-owner@vger.kernel.org
> > > > [mailto:netdev-owner@vger.kernel.org]
> > > > On Behalf Of Richard Cochran
> > > > Sent: Thursday, September 06, 2012 1:06 AM
> > > > To: Kirsher, Jeffrey T
> > > > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > > > gospo@redhat.com; sassmann@redhat.com
> > > > Subject: Re: [net-next 4/6] igb: Store the MAC address in the name
> > > > in the PTP struct.
> > > >
> > > > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > > > From: Matthew Vick <matthew.vick@intel.com>
> > > > >
> > > > > Change the name of the adapter in the PTP struct to enable easier
> > > > > correlation between interface and PTP device.
> > > >
> > > > Still think it is better to have the driver name, not the MAC address.
> > > > The correlation is clear by using the ethtool method.
> > > >
> > >
> > > I found a method to correct the init code so that the device name can
> > > appear here, and I would prefer that over either the driver name or
> > > MAC address. It requires moving the ptp initialization into the netdev
> > > open handler though.
> > 
> > I don't think you should use the net device name for this, as net devices
> > can be renamed after creation.  (Though you could update the struct
> > ptp_clock_info whenever this happens.)
> > 
> > I'm starting to wonder what use there is for a 'clock name' distinct from
> > the clock device name.  What would be more useful is to give it a parent.
> > Richard, is there any reason why ptp_clock_register() doesn't allow
> > specifying the parent device?  You can then make the clock_name attribute
> > contain the parent device's driver name or whatever it is you prefer.
> > 
> > Ben.
> 
> Because ideally the ptp device belongs to multiple ethernet devices...
> 
> Sadly current intel hardware doesn't support this.

The PTP clock is instantiated by some driver, bound to some device,
whether that's a PCI device or a platform device or something else.  I
think that device (not a net device) should be the parent of the clock
device.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Keller, Jacob E @ 2012-09-06 23:24 UTC (permalink / raw)
  To: Ben Hutchings, Richard Cochran
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, Vick, Matthew,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <1346973787.2714.62.camel@bwh-desktop.uk.solarflarecom.com>

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Thursday, September 06, 2012 4:23 PM
> To: Keller, Jacob E; Richard Cochran
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com
> Subject: RE: [net-next 4/6] igb: Store the MAC address in the name in the
> PTP struct.
> 
> On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org
> > > [mailto:netdev-owner@vger.kernel.org]
> > > On Behalf Of Richard Cochran
> > > Sent: Thursday, September 06, 2012 1:06 AM
> > > To: Kirsher, Jeffrey T
> > > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > > gospo@redhat.com; sassmann@redhat.com
> > > Subject: Re: [net-next 4/6] igb: Store the MAC address in the name
> > > in the PTP struct.
> > >
> > > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > > From: Matthew Vick <matthew.vick@intel.com>
> > > >
> > > > Change the name of the adapter in the PTP struct to enable easier
> > > > correlation between interface and PTP device.
> > >
> > > Still think it is better to have the driver name, not the MAC address.
> > > The correlation is clear by using the ethtool method.
> > >
> >
> > I found a method to correct the init code so that the device name can
> > appear here, and I would prefer that over either the driver name or
> > MAC address. It requires moving the ptp initialization into the netdev
> > open handler though.
> 
> I don't think you should use the net device name for this, as net devices
> can be renamed after creation.  (Though you could update the struct
> ptp_clock_info whenever this happens.)
> 
> I'm starting to wonder what use there is for a 'clock name' distinct from
> the clock device name.  What would be more useful is to give it a parent.
> Richard, is there any reason why ptp_clock_register() doesn't allow
> specifying the parent device?  You can then make the clock_name attribute
> contain the parent device's driver name or whatever it is you prefer.
> 
> Ben.

Because ideally the ptp device belongs to multiple ethernet devices...

Sadly current intel hardware doesn't support this.

- Jake

> 
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer;
> that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Ben Hutchings @ 2012-09-06 23:23 UTC (permalink / raw)
  To: Keller, Jacob E, Richard Cochran
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, Vick, Matthew,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580785F9F4@ORSMSX105.amr.corp.intel.com>

On Thu, 2012-09-06 at 22:59 +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Richard Cochran
> > Sent: Thursday, September 06, 2012 1:06 AM
> > To: Kirsher, Jeffrey T
> > Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> > gospo@redhat.com; sassmann@redhat.com
> > Subject: Re: [net-next 4/6] igb: Store the MAC address in the name in the
> > PTP struct.
> > 
> > On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > > From: Matthew Vick <matthew.vick@intel.com>
> > >
> > > Change the name of the adapter in the PTP struct to enable easier
> > > correlation between interface and PTP device.
> > 
> > Still think it is better to have the driver name, not the MAC address.
> > The correlation is clear by using the ethtool method.
> > 
> 
> I found a method to correct the init code so that the device name can
> appear here, and I would prefer that over either the driver name or
> MAC address. It requires moving the ptp initialization into the netdev
> open handler though.

I don't think you should use the net device name for this, as net
devices can be renamed after creation.  (Though you could update the
struct ptp_clock_info whenever this happens.)

I'm starting to wonder what use there is for a 'clock name' distinct
from the clock device name.  What would be more useful is to give it a
parent.  Richard, is there any reason why ptp_clock_register() doesn't
allow specifying the parent device?  You can then make the clock_name
attribute contain the parent device's driver name or whatever it is you
prefer.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCHv3] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-09-06 23:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: netdev, kvm, virtualization
In-Reply-To: <5048D2B8.4060003@redhat.com>

On Thu, Sep 06, 2012 at 06:43:36PM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 14:08, Michael S. Tsirkin ha scritto:
> > +#define VIRTIO_NET_CTRL_STEERING_HOST  1
> 
> You didn't change this occurrence.

Good catch, thanks!

^ permalink raw reply

* RE: [net-next 13/13] igb: Store the MAC address in the name in the PTP struct.
From: Keller, Jacob E @ 2012-09-06 23:04 UTC (permalink / raw)
  To: Richard Cochran, Ben Hutchings
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, Vick, Matthew,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com,
	Stuart Hodgson
In-Reply-To: <20120824061956.GA2212@netboy.at.omicron.at>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Richard Cochran
> Sent: Thursday, August 23, 2012 11:20 PM
> To: Ben Hutchings
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; Vick, Matthew;
> netdev@vger.kernel.org; gospo@redhat.com; sassmann@redhat.com; Stuart
> Hodgson
> Subject: Re: [net-next 13/13] igb: Store the MAC address in the name in
> the PTP struct.
> 
> In an ideal world, there is only one clock device per system, so the
> original concept was to use the driver name. The hardware designers have
> ignored or not understood the "one clock per system" model, and so,
> unfortunately, we have to deal with it.
> 
> A clock device can and should be connected to more than one network device
> (see gianfar, dp83640), and to give it a MAC address is misleading.
> 
> The ethtool solution works perfectly to go from interface->clock, which is
> all the information that a user space PTP stack needs.
> 
> I think for sysfs and the ioctl it is nicer to have the driver name, since
> it is associated with the clock and its capabilities.
> 
Correct. MAC address is misleading. However driver name for our parts is
misleading because each driver creates one clock per port. In either case, one
driver handling multiple cards would also have to create multiple ptp devices
and that is just as misleading.

The ethtool method is definitely preferred, and is the correct way to identify
the clock device correlation.

- Jake

> Thanks,
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [net-next 4/6] igb: Store the MAC address in the name in the PTP struct.
From: Keller, Jacob E @ 2012-09-06 22:59 UTC (permalink / raw)
  To: Richard Cochran, Kirsher, Jeffrey T
  Cc: davem@davemloft.net, Vick, Matthew, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <20120906080628.GD2550@netboy.at.omicron.at>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Richard Cochran
> Sent: Thursday, September 06, 2012 1:06 AM
> To: Kirsher, Jeffrey T
> Cc: davem@davemloft.net; Vick, Matthew; netdev@vger.kernel.org;
> gospo@redhat.com; sassmann@redhat.com
> Subject: Re: [net-next 4/6] igb: Store the MAC address in the name in the
> PTP struct.
> 
> On Wed, Sep 05, 2012 at 04:35:04PM -0700, Jeff Kirsher wrote:
> > From: Matthew Vick <matthew.vick@intel.com>
> >
> > Change the name of the adapter in the PTP struct to enable easier
> > correlation between interface and PTP device.
> 
> Still think it is better to have the driver name, not the MAC address.
> The correlation is clear by using the ethtool method.
> 

I found a method to correct the init code so that the device name can appear here, and I would prefer that over either the driver name or MAC address. It requires moving the ptp initialization into the netdev open handler though.

- Jake

> Thanks,
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [RFC] ip autoconfig: allow specifying a list of devices rather than one or all
From: Chris Friesen @ 2012-09-06 22:35 UTC (permalink / raw)
  To: netdev

We have an embedded system that boots of the network.  This system has a 
pair of bonded gigE links used for booting and management, and a pair of 
bonded 10G links used for data traffic.

What we're seeing is that when ip_auto_config() calls
ic_open_devs() to open up all the network ports, the data ports 
immediately start getting *hammered* with traffic, we get "nf_conntrack: 
table full, dropping packet", errors, and we sometimes lose the DHCP 
responses because of this.

We need to be able to boot off of both the gigE ports for redundancy, so 
it seems like it might be a reasonable idea to extend the "ip=" boot arg 
to allow specifying a list of devices rather than choosing between a 
single device or all of them.

Thoughts?  Anyone ever done this?

Chris

-- 

Chris Friesen
Software Designer

3500 Carling Avenue
Ottawa, Ontario K2H 8E9
www.genband.com

^ permalink raw reply

* Re: [PATCH v2 06/10] cgroup: Assign subsystem IDs during compile time
From: Tejun Heo @ 2012-09-06 22:32 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	David S. Miller, Andrew Morton, Eric Dumazet, Gao feng,
	Glauber Costa, Jamal Hadi Salim, John Fastabend,
	Kamezawa Hiroyuki, Li Zefan, Neil Horman
In-Reply-To: <50390743.2090203-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Sat, Aug 25, 2012 at 07:11:31PM +0200, Daniel Wagner wrote:
> On 25.08.2012 01:38, Tejun Heo wrote:
> >>task_cls_classid() and task_netprioidx() (when built as
> >>module) are protected by a jump label and therefore we can
> >>simply replace the subsystem index lookup with the enum.
> >
> >Can we put these in a separate patch?  ie. The first patch makes all
> >subsys IDs constant and then patches to simplify users.
> 
> Wouldn't this break bisection? I merged this step so that all steps
> in this series are able to compile and run.

I don't see why it should but maybe I'm missing something.  If so,
please enlighten me.

> >>+#define IS_SUBSYS_ENABLED(option) IS_MODULE(option)
> >>+#include <linux/cgroup_subsys.h>
> >>+#undef IS_SUBSYS_ENABLED
> >>+
> >
> >Why do we need to segregate in-kernel and modular ones at all?  What's
> >wrong with just defining them in one go?
> 
> I have done that but the result was a panic. There seems some code
> which expects this ordering. Let me dig into this and fix it.

Yes, please.

> >Hmm... patch sequence looks odd to me.  If you first make all IDs
> >constant, you can first remove module specific ones and then later add
> >jump labels as separate patches.  Wouldn't that be simpler?
> 
> As said above, I tried to keep all steps usable so bisection would
> work. I think your steps would lead to non working versions of the
> kernel.

Hmmm... Yes, it should be bisectable but again I don't see why being
biesectable interferes with the patch ordering here.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH v2 05/10] cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
From: Tejun Heo @ 2012-09-06 22:23 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Li Zefan, David S. Miller
In-Reply-To: <5039046D.1040402-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

Hello, Daniel.

On Sat, Aug 25, 2012 at 06:59:25PM +0200, Daniel Wagner wrote:
> >Wouldn't it be better to explicitly state that a following patch would
> >reduce the SUBSYS_COUNT and stop putting builtin and module ones into
> >different sections?
> 
> Sure, can do that. I just to make sure I understand you correctly,
> What do you mean with different section? Do you refer to the enum sorting?

I meant that there's no reason to have all builtin ones contiguosly
and then the module ones using two cgroup_subsys.h inclusions.

Thanks!

-- 
tejun

^ permalink raw reply

* Re: [PATCH net-next] ipv6: Export nd_tbl to allow modules to support IPv6
From: Thomas Graf @ 2012-09-06 21:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120906.134738.425625608685741613.davem@davemloft.net>

On Thu, Sep 06, 2012 at 01:47:38PM -0400, David Miller wrote:
> Right now we're in a transition period where ipv4 is mostly refcount'less
> and ipv6 is not.
> 
> This is part of the reason I don't want to expose these things, the
> calling convention and locking requirements are going to be fluid for
> any interface you might propose.
> 
> arp_tbl was exported only for in-tree users, and I therefore say that
> we should only export nd_tbl for in-tree users as well, since that's
> the only way we can audit and update all the referencing callers as
> the semantics radically change.

Thanks for the explanations David. I'll pass this on to the involved
projects. We'll probably try to pick this up again as soon as the new
locking conventions have been put in place and are considered stable.

^ permalink raw reply

* Re: [ethtool 2/2] ethtool: allow setting MDI-X state
From: Jeff Kirsher @ 2012-09-06 20:57 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jesse Brandeburg, netdev, gospo, sassmann
In-Reply-To: <1346955923.2714.26.camel@bwh-desktop.uk.solarflarecom.com>

[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]

On Thu, 2012-09-06 at 19:25 +0100, Ben Hutchings wrote:
> On Tue, 2012-08-21 at 01:37 -0700, Jeff Kirsher wrote:
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > 
> > A bit ago ethtool added support for reading MDI-X state, this
> > patch finishes the implementation, adding the complementary write
> > command.
> [...]
> 
> Applied.  I also commited the following changes:
> 
> ---
> Subject: ethtool.8: Mark-up mdix arguments properly
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  ethtool.8.in |   11 ++++++-----
>  1 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/ethtool.8.in b/ethtool.8.in
> index 3208d38..2799e25 100644
> --- a/ethtool.8.in
> +++ b/ethtool.8.in
> @@ -523,11 +523,12 @@ Sets full or half duplex mode.
>  Selects device port.
>  .TP
>  .A3 mdix auto on off
> -Selects MDI-X mode for port. May be used to override the automatic detection
> -feature of most adapters.  Auto means automatic detection of MDI status, on
> -forces MDI-X (crossover) mode, while off means MDI (straight through) mode.
> -The driver should guarantee that this command takes effect immediately, and
> -if necessary may reset the link to cause the change to take effect.
> +Selects MDI-X mode for port. May be used to override the automatic
> +detection feature of most adapters. An argument of \fBauto\fR means
> +automatic detection of MDI status, \fBon\fR forces MDI-X (crossover)
> +mode, while \fBoff\fR means MDI (straight through) mode.  The driver
> +should guarantee that this command takes effect immediately, and if
> +necessary may reset the link to cause the change to take effect.
>  .TP
>  .A2 autoneg on off
>  Specifies whether autonegotiation should be enabled. Autonegotiation 
> ---
> Subject: test-cmdline: Test -s mdix keyword
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  test-cmdline.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/test-cmdline.c b/test-cmdline.c
> index 6a60ed4..85b4ce0 100644
> --- a/test-cmdline.c
> +++ b/test-cmdline.c
> @@ -22,12 +22,14 @@ static struct test_case {
>  	{ 1, "16_char_devname!" },
>  	/* Argument parsing for -s is specialised */
>  	{ 0, "-s devname" },
> -	{ 0, "--change devname speed 100 duplex half" },
> +	{ 0, "--change devname speed 100 duplex half mdix auto" },
>  	{ 1, "-s devname speed foo" },
>  	{ 1, "--change devname speed" },
>  	{ 0, "-s devname duplex half" },
>  	{ 1, "--change devname duplex foo" },
>  	{ 1, "-s devname duplex" },
> +	{ 1, "--change devname mdix foo" },
> +	{ 1, "-s devname mdix" },
>  	{ 0, "--change devname port tp" },
>  	{ 1, "-s devname port foo" },
>  	{ 1, "--change devname port" },
> ---
> 
> Please include at least basic test cases like this for any new feature.
> I did it this time because I've already kept you waiting and didn't
> think it would be fair to request changes.
> 
> Ben.
> 

Thanks Ben, I will make sure that we cover that with any new features
going forward.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox