Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/5] ARM: sunxi: Add support for A10 Ethernet controller
From: Oliver Schinagl @ 2013-03-19  9:47 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1363380605-6577-1-git-send-email-maxime.ripard@free-electrons.com>

Maxime Ripard <maxime.ripard <at> free-electrons.com> writes:

> 
> Hi,
> 
> The Allwinner A10 SoC has an ethernet that is said to be coming from
> Davicom embedded in it. This IP has no public documentation, so exact
> details are quite sparse, and this code come from refactored allwinner
> code.
As we discussed on #linux-sunxi, it may be davicom IP it also may not be.
Register address do not match at all, so it could be that it is only partial
davicom IP at best.

I therefore suggest we put it in kernel/drivers/net/ethernet/allwinner/ and name
the modules/files sunxi-emac.c and sunxi-gmac.c (for the new gigabit mac in the
newest SoC's)...


Thanks,
Oliver
> 
> Since we don't have any clock support yet for the Allwinner SoCs, we
> rely on the bootloader to enable the wemac clock.
> 
> Thanks,
> Maxime

^ permalink raw reply

* Re: [PATCH 3/4][net-next] gianfar: Remove redundant programming of [rt]xic registers
From: Sergei Shtylyov @ 2013-03-19 17:49 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: netdev, Paul Gortmaker, David S. Miller
In-Reply-To: <1363714805-9142-4-git-send-email-claudiu.manoil@freescale.com>

Hello.

On 19-03-2013 21:40, Claudiu Manoil wrote:

> For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
> are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
> avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.

> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
> ---
>   drivers/net/ethernet/freescale/gianfar.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)

> diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
> index 3f07dbd..e28b3e6 100644
> --- a/drivers/net/ethernet/freescale/gianfar.c
> +++ b/drivers/net/ethernet/freescale/gianfar.c
> @@ -1821,20 +1821,9 @@ void gfar_configure_coalescing(struct gfar_private *priv,
[...]
>   	if (priv->mode == MQ_MG_MODE) {
> +		int i = 0;

    Empty line wouldn't hurt here, after declaration.

>   		baddr = &regs->txic0;
>   		for_each_set_bit(i, &tx_mask, priv->num_tx_queues) {
>   			gfar_write(baddr + i, 0);
> @@ -1848,6 +1837,17 @@ void gfar_configure_coalescing(struct gfar_private *priv,
>   			if (likely(priv->rx_queue[i]->rxcoalescing))
>   				gfar_write(baddr + i, priv->rx_queue[i]->rxic);
>   		}
> +	} else {
> +		/* Backward compatible case ---- even if we enable

    s/----/--/

WBR, Sergei

^ permalink raw reply

* [PATCH 4/4][net-next] gianfar: Refactor config coalescing calls for all queues
From: Claudiu Manoil @ 2013-03-19 17:40 UTC (permalink / raw)
  To: netdev; +Cc: Paul Gortmaker, David S. Miller
In-Reply-To: <1363714805-9142-1-git-send-email-claudiu.manoil@freescale.com>

The only place where gfar_configure_coalescing is called
with an actual bitmask (other than 0xff) is in gfar_poll
(on the hot path). So make gfar_configure_coalescing()
static for the buffer processing path, and export
gfar_configure_coalescing_all() for the remaining cases
that require to set coalescing for all the queues at once
(on the slow path).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c         | 11 ++++++++---
 drivers/net/ethernet/freescale/gianfar.h         |  3 +--
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  2 +-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index e28b3e6..49ce83b 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -341,7 +341,7 @@ static void gfar_init_mac(struct net_device *ndev)
 	gfar_init_tx_rx_base(priv);
 
 	/* Configure the coalescing support */
-	gfar_configure_coalescing(priv, 0xFF, 0xFF);
+	gfar_configure_coalescing_all(priv);
 
 	/* set this when rx hw offload (TOE) functions are being used */
 	priv->uses_rxfcb = 0;
@@ -1816,7 +1816,7 @@ void gfar_start(struct net_device *dev)
 	dev->trans_start = jiffies; /* prevent tx timeout */
 }
 
-void gfar_configure_coalescing(struct gfar_private *priv,
+static void gfar_configure_coalescing(struct gfar_private *priv,
 			       unsigned long tx_mask, unsigned long rx_mask)
 {
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
@@ -1851,6 +1851,11 @@ void gfar_configure_coalescing(struct gfar_private *priv,
 	}
 }
 
+void gfar_configure_coalescing_all(struct gfar_private *priv)
+{
+	gfar_configure_coalescing(priv, 0xFF, 0xFF);
+}
+
 static int register_grp_irqs(struct gfar_priv_grp *grp)
 {
 	struct gfar_private *priv = grp->priv;
@@ -1940,7 +1945,7 @@ int startup_gfar(struct net_device *ndev)
 
 	phy_start(priv->phydev);
 
-	gfar_configure_coalescing(priv, 0xFF, 0xFF);
+	gfar_configure_coalescing_all(priv);
 
 	return 0;
 
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index b1d0c1c..eec87ea 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -1182,8 +1182,7 @@ extern void stop_gfar(struct net_device *dev);
 extern void gfar_halt(struct net_device *dev);
 extern void gfar_phy_test(struct mii_bus *bus, struct phy_device *phydev,
 		int enable, u32 regnum, u32 read);
-extern void gfar_configure_coalescing(struct gfar_private *priv,
-		unsigned long tx_mask, unsigned long rx_mask);
+extern void gfar_configure_coalescing_all(struct gfar_private *priv);
 void gfar_init_sysfs(struct net_device *dev);
 int gfar_set_features(struct net_device *dev, netdev_features_t features);
 extern void gfar_check_rx_parser_mode(struct gfar_private *priv);
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 75e89ac..8248df7 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -436,7 +436,7 @@ static int gfar_scoalesce(struct net_device *dev,
 			gfar_usecs2ticks(priv, cvals->tx_coalesce_usecs));
 	}
 
-	gfar_configure_coalescing(priv, 0xFF, 0xFF);
+	gfar_configure_coalescing_all(priv);
 
 	return 0;
 }
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH 0/4][net-next] gianfar: Address napi polling issues
From: Claudiu Manoil @ 2013-03-19 17:40 UTC (permalink / raw)
  To: netdev; +Cc: Paul Gortmaker, David S. Miller

Hello David, Paul,

These patches take on several issues in the current napi polling
routine, affecting mostly the newer Multi Queue Multi Group
(MQ_MG_MODE) devices. Seems that the code for these devices
has been neglected.

Tested on p1010 (single core, multi q), p1020 (dual core, multi q)
and p2020 (dual core, single q mode).

Thanks.

Claudiu Manoil (4):
  gianfar: Fix tx napi polling
  gianfar: Poll only active Rx queues
  gianfar: Remove redundant programming of [rt]xic registers
  gianfar: Refactor config coalescing calls for all queues

 drivers/net/ethernet/freescale/gianfar.c         | 135 +++++++++++++----------
 drivers/net/ethernet/freescale/gianfar.h         |   7 +-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |   2 +-
 3 files changed, 84 insertions(+), 60 deletions(-)

-- 
1.7.11.3

^ permalink raw reply

* [PATCH 1/4][net-next] gianfar: Fix tx napi polling
From: Claudiu Manoil @ 2013-03-19 17:40 UTC (permalink / raw)
  To: netdev; +Cc: Paul Gortmaker, David S. Miller
In-Reply-To: <1363714805-9142-1-git-send-email-claudiu.manoil@freescale.com>

There are 2 issues with the current napi poll routine, with regards
to tx ring cleanup:
1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map,
which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map"
holds a different value than rx_bit_map, the current polling routine will
service the wrong Tx queues in this case (i.e. the interrupt group will
receive interrupts from tx queues that it will not service)
2) Tx cleanup completion consumes napi budget, whereas the napi budget
should be reserved for Rx work only.

The patch fixes these issues and provides a clean napi polling routine.
Napi poll completion is reached when all the Rx queues have been
serviced and there is no Tx work to do.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 82 ++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 1b468a8..1e555a7 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -132,7 +132,7 @@ static int gfar_poll(struct napi_struct *napi, int budget);
 static void gfar_netpoll(struct net_device *dev);
 #endif
 int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit);
-static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
+static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue);
 static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb,
 			       int amount_pull, struct napi_struct *napi);
 void gfar_halt(struct net_device *dev);
@@ -2468,7 +2468,7 @@ static void gfar_align_skb(struct sk_buff *skb)
 }
 
 /* Interrupt Handler for Transmit complete */
-static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
+static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 {
 	struct net_device *dev = tx_queue->dev;
 	struct netdev_queue *txq;
@@ -2570,8 +2570,6 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	tx_queue->dirty_tx = bdp;
 
 	netdev_tx_completed_queue(txq, howmany, bytes_sent);
-
-	return howmany;
 }
 
 static void gfar_schedule_cleanup(struct gfar_priv_grp *gfargrp)
@@ -2834,62 +2832,72 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 	struct gfar __iomem *regs = gfargrp->regs;
 	struct gfar_priv_tx_q *tx_queue = NULL;
 	struct gfar_priv_rx_q *rx_queue = NULL;
-	int rx_cleaned = 0, budget_per_queue = 0, rx_cleaned_per_queue = 0;
-	int tx_cleaned = 0, i, left_over_budget = budget;
+	int work_done = 0, work_done_per_q = 0;
+	int i, budget_per_q;
+	int has_tx_work;
 	unsigned long serviced_queues = 0;
-	int num_queues = 0;
-
-	num_queues = gfargrp->num_rx_queues;
-	budget_per_queue = budget/num_queues;
+	int num_queues = gfargrp->num_rx_queues;
 
+	budget_per_q = budget/num_queues;
 	/* Clear IEVENT, so interrupts aren't called again
 	 * because of the packets that have already arrived
 	 */
 	gfar_write(&regs->ievent, IEVENT_RTX_MASK);
 
-	while (num_queues && left_over_budget) {
-		budget_per_queue = left_over_budget/num_queues;
-		left_over_budget = 0;
+	while (1) {
+		has_tx_work = 0;
+		for_each_set_bit(i, &gfargrp->tx_bit_map, priv->num_tx_queues) {
+			tx_queue = priv->tx_queue[i];
+			/* run Tx cleanup to completion */
+			if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) {
+				gfar_clean_tx_ring(tx_queue);
+				has_tx_work = 1;
+			}
+		}
 
 		for_each_set_bit(i, &gfargrp->rx_bit_map, priv->num_rx_queues) {
 			if (test_bit(i, &serviced_queues))
 				continue;
+
 			rx_queue = priv->rx_queue[i];
-			tx_queue = priv->tx_queue[rx_queue->qindex];
-
-			tx_cleaned += gfar_clean_tx_ring(tx_queue);
-			rx_cleaned_per_queue =
-				gfar_clean_rx_ring(rx_queue, budget_per_queue);
-			rx_cleaned += rx_cleaned_per_queue;
-			if (rx_cleaned_per_queue < budget_per_queue) {
-				left_over_budget = left_over_budget +
-					(budget_per_queue -
-					 rx_cleaned_per_queue);
+			work_done_per_q =
+				gfar_clean_rx_ring(rx_queue, budget_per_q);
+			work_done += work_done_per_q;
+
+			/* finished processing this queue */
+			if (work_done_per_q < budget_per_q) {
 				set_bit(i, &serviced_queues);
 				num_queues--;
+				if (!num_queues)
+					break;
+				/* recompute budget per Rx queue */
+				budget_per_q =
+					(budget - work_done) / num_queues;
 			}
 		}
-	}
 
-	if (tx_cleaned)
-		return budget;
+		if (work_done >= budget)
+			break;
 
-	if (rx_cleaned < budget) {
-		napi_complete(napi);
+		if (!num_queues && !has_tx_work) {
 
-		/* Clear the halt bit in RSTAT */
-		gfar_write(&regs->rstat, gfargrp->rstat);
+			napi_complete(napi);
 
-		gfar_write(&regs->imask, IMASK_DEFAULT);
+			/* Clear the halt bit in RSTAT */
+			gfar_write(&regs->rstat, gfargrp->rstat);
 
-		/* If we are coalescing interrupts, update the timer
-		 * Otherwise, clear it
-		 */
-		gfar_configure_coalescing(priv, gfargrp->rx_bit_map,
-					  gfargrp->tx_bit_map);
+			gfar_write(&regs->imask, IMASK_DEFAULT);
+
+			/* If we are coalescing interrupts, update the timer
+			 * Otherwise, clear it
+			 */
+			gfar_configure_coalescing(priv, gfargrp->rx_bit_map,
+						  gfargrp->tx_bit_map);
+			break;
+		}
 	}
 
-	return rx_cleaned;
+	return work_done;
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH 2/4][net-next] gianfar: Poll only active Rx queues
From: Claudiu Manoil @ 2013-03-19 17:40 UTC (permalink / raw)
  To: netdev; +Cc: Paul Gortmaker, David S. Miller
In-Reply-To: <1363714805-9142-1-git-send-email-claudiu.manoil@freescale.com>

Split the napi budget fairly among the active queues only, instead
of dividing it by the total number of Rx queues assigned to the
given interrupt group.
Use the h/w indication field RXFi in rstat (receive status register)
to identify the active rx queues from the current interrupt group
(i.e. receive event occured on ring i, if ring i is part of the current
interrupt group). This indication field in rstat, RXFi i=0..7,
allows us to find out on which queues of the same interrupt group
do we have incomming traffic once we entered the polling routine for
the given interrupt group. After servicing the ring i, the corresponding
bit RXFi will be written with 1 to clear the active queue indication for
that ring.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 28 +++++++++++++++++++---------
 drivers/net/ethernet/freescale/gianfar.h |  4 +++-
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 1e555a7..3f07dbd 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2835,15 +2835,20 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 	int work_done = 0, work_done_per_q = 0;
 	int i, budget_per_q;
 	int has_tx_work;
-	unsigned long serviced_queues = 0;
-	int num_queues = gfargrp->num_rx_queues;
+	unsigned long rstat_rxf;
+	int num_act_queues;
 
-	budget_per_q = budget/num_queues;
 	/* Clear IEVENT, so interrupts aren't called again
 	 * because of the packets that have already arrived
 	 */
 	gfar_write(&regs->ievent, IEVENT_RTX_MASK);
 
+	rstat_rxf = gfar_read(&regs->rstat) & RSTAT_RXF_MASK;
+
+	num_act_queues = bitmap_weight(&rstat_rxf, MAX_RX_QS);
+	if (num_act_queues)
+		budget_per_q = budget/num_act_queues;
+
 	while (1) {
 		has_tx_work = 0;
 		for_each_set_bit(i, &gfargrp->tx_bit_map, priv->num_tx_queues) {
@@ -2856,7 +2861,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 		}
 
 		for_each_set_bit(i, &gfargrp->rx_bit_map, priv->num_rx_queues) {
-			if (test_bit(i, &serviced_queues))
+			/* skip queue if not active */
+			if (!(rstat_rxf & (RSTAT_CLEAR_RXF0 >> i)))
 				continue;
 
 			rx_queue = priv->rx_queue[i];
@@ -2866,20 +2872,24 @@ static int gfar_poll(struct napi_struct *napi, int budget)
 
 			/* finished processing this queue */
 			if (work_done_per_q < budget_per_q) {
-				set_bit(i, &serviced_queues);
-				num_queues--;
-				if (!num_queues)
+				/* clear active queue hw indication */
+				gfar_write(&regs->rstat,
+					   RSTAT_CLEAR_RXF0 >> i);
+				rstat_rxf &= ~(RSTAT_CLEAR_RXF0 >> i);
+				num_act_queues--;
+
+				if (!num_act_queues)
 					break;
 				/* recompute budget per Rx queue */
 				budget_per_q =
-					(budget - work_done) / num_queues;
+					(budget - work_done) / num_act_queues;
 			}
 		}
 
 		if (work_done >= budget)
 			break;
 
-		if (!num_queues && !has_tx_work) {
+		if (!num_act_queues && !has_tx_work) {
 
 			napi_complete(napi);
 
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 63a28d2..b1d0c1c 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -291,7 +291,9 @@ extern const char gfar_driver_version[];
 #define RCTRL_PADDING(x)	((x << 16) & RCTRL_PAL_MASK)
 
 
-#define RSTAT_CLEAR_RHALT       0x00800000
+#define RSTAT_CLEAR_RHALT	0x00800000
+#define RSTAT_CLEAR_RXF0	0x00000080
+#define RSTAT_RXF_MASK		0x000000ff
 
 #define TCTRL_IPCSEN		0x00004000
 #define TCTRL_TUCSEN		0x00002000
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH 3/4][net-next] gianfar: Remove redundant programming of [rt]xic registers
From: Claudiu Manoil @ 2013-03-19 17:40 UTC (permalink / raw)
  To: netdev; +Cc: Paul Gortmaker, David S. Miller
In-Reply-To: <1363714805-9142-1-git-send-email-claudiu.manoil@freescale.com>

For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 3f07dbd..e28b3e6 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1821,20 +1821,9 @@ void gfar_configure_coalescing(struct gfar_private *priv,
 {
 	struct gfar __iomem *regs = priv->gfargrp[0].regs;
 	u32 __iomem *baddr;
-	int i = 0;
-
-	/* Backward compatible case ---- even if we enable
-	 * multiple queues, there's only single reg to program
-	 */
-	gfar_write(&regs->txic, 0);
-	if (likely(priv->tx_queue[0]->txcoalescing))
-		gfar_write(&regs->txic, priv->tx_queue[0]->txic);
-
-	gfar_write(&regs->rxic, 0);
-	if (unlikely(priv->rx_queue[0]->rxcoalescing))
-		gfar_write(&regs->rxic, priv->rx_queue[0]->rxic);
 
 	if (priv->mode == MQ_MG_MODE) {
+		int i = 0;
 		baddr = &regs->txic0;
 		for_each_set_bit(i, &tx_mask, priv->num_tx_queues) {
 			gfar_write(baddr + i, 0);
@@ -1848,6 +1837,17 @@ void gfar_configure_coalescing(struct gfar_private *priv,
 			if (likely(priv->rx_queue[i]->rxcoalescing))
 				gfar_write(baddr + i, priv->rx_queue[i]->rxic);
 		}
+	} else {
+		/* Backward compatible case ---- even if we enable
+		 * multiple queues, there's only single reg to program
+		 */
+		gfar_write(&regs->txic, 0);
+		if (likely(priv->tx_queue[0]->txcoalescing))
+			gfar_write(&regs->txic, priv->tx_queue[0]->txic);
+
+		gfar_write(&regs->rxic, 0);
+		if (unlikely(priv->rx_queue[0]->rxcoalescing))
+			gfar_write(&regs->rxic, priv->rx_queue[0]->rxic);
 	}
 }
 
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH 2/2] thermal: shorten too long mcast group name
From: Masatake YAMATO @ 2013-03-19 11:47 UTC (permalink / raw)
  To: netdev; +Cc: Masatake YAMATO
In-Reply-To: <1363693648-10015-1-git-send-email-yamato@redhat.com>

The original name is too long.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 include/linux/thermal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index f0bd7f9..e3c0ae9 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -44,7 +44,7 @@
 /* Adding event notification support elements */
 #define THERMAL_GENL_FAMILY_NAME                "thermal_event"
 #define THERMAL_GENL_VERSION                    0x01
-#define THERMAL_GENL_MCAST_GROUP_NAME           "thermal_mc_group"
+#define THERMAL_GENL_MCAST_GROUP_NAME           "thermal_mc_grp"
 
 /* Default Thermal Governor */
 #if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE)
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 0/2] netlink: protection and workaround for too long mcast group name
From: Masatake YAMATO @ 2013-03-19 11:47 UTC (permalink / raw)
  To: netdev; +Cc: Masatake YAMATO

You will see garbage at the end of line in the output of following command line:

	$ genl ctrl show | grep thermal_mc_group
        #1:  ID-0x2  name: thermal_mc_group^B

The type of structure field for "name" is char[16]:

    #define GENL_NAMSIZ	16	/* length of family name */
    ...
    struct genl_multicast_group {
            ...
	    char		name[GENL_NAMSIZ];
            ...
    };

strlen("thermal_mc_group") == 16 is too long for the array size.

This patch series provid a protection(patch for genetlink) for this 
kind of bug and workaround(patch for thermal). 

Masatake YAMATO (2):
  genetlink: trigger BUG_ON if a group name is too long
  thermal: shorten too long mcast group name

 include/linux/thermal.h | 2 +-
 net/netlink/genetlink.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

-- 
1.7.11.7

^ permalink raw reply

* [PATCH 1/2] genetlink: trigger BUG_ON if a group name is too long
From: Masatake YAMATO @ 2013-03-19 11:47 UTC (permalink / raw)
  To: netdev; +Cc: Masatake YAMATO
In-Reply-To: <1363693648-10015-1-git-send-email-yamato@redhat.com>

Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 net/netlink/genetlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f2aabb6..5a55be3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -142,6 +142,7 @@ int genl_register_mc_group(struct genl_family *family,
 	int err = 0;
 
 	BUG_ON(grp->name[0] == '\0');
+	BUG_ON(memchr(grp->name, '\0', GENL_NAMSIZ) == NULL);
 
 	genl_lock();
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH] Truncate MCAST_GRP_NAME and FAMILY_NAME char arrays as C strings
From: Masatake YAMATO @ 2013-03-19 12:27 UTC (permalink / raw)
  To: netdev; +Cc: Masatake YAMATO

This is a patch for genl command in iproute2.

You will see garbage at the end of line in the output of following command line:

	$ genl ctrl show | grep thermal_mc_group
        #1:  ID-0x2  name: thermal_mc_group^B

The type of structure field for "name" is char[16] in kernel:

    #define GENL_NAMSIZ	16	/* length of family name */
    ...
    struct genl_multicast_group {
            ...
	    char		name[GENL_NAMSIZ];
            ...
    };

strlen("thermal_mc_group") == 16 is too long for the array size.

This patch protects genl process from this kind of bug by putting
nul char at the end of array after receiving a message from the
kernel.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 genl/ctrl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/genl/ctrl.c b/genl/ctrl.c
index 7c42578..fd5a50a 100644
--- a/genl/ctrl.c
+++ b/genl/ctrl.c
@@ -168,6 +168,7 @@ static int print_ctrl_grp(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
 	}
 	if (tb[1]) {
 		char *name = RTA_DATA(tb[CTRL_ATTR_MCAST_GRP_NAME]);
+		name[GENL_NAMSIZ - 1] = '\0';
 		fprintf(fp, " name: %s ", name);
 	}
 	return 0;
@@ -214,6 +215,7 @@ static int print_ctrl(const struct sockaddr_nl *who, struct nlmsghdr *n,
 
 	if (tb[CTRL_ATTR_FAMILY_NAME]) {
 		char *name = RTA_DATA(tb[CTRL_ATTR_FAMILY_NAME]);
+		name[GENL_NAMSIZ - 1] = '\0';
 		fprintf(fp, "\nName: %s\n",name);
 	}
 	if (tb[CTRL_ATTR_FAMILY_ID]) {
-- 
1.7.11.7

^ permalink raw reply related

* Re: sfc fixes for stable
From: Ben Hutchings @ 2013-03-19 17:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers, scrum-linux
In-Reply-To: <20130319.113955.1978808530160968786.davem@davemloft.net>

On Tue, 2013-03-19 at 11:39 -0400, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Tue, 19 Mar 2013 15:38:01 +0000
> 
> > The last is not yet in Linus's tree but I assume you will ask him to
> > pull from net soon.
> 
> You should really, as I do, wait until it actually hits Linus tree
> before proposing it for -stable.

Sorry, I'll re-send once that happens.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 1/1] connector: Added coredumping event to the process connector
From: Hannes Frederic Sowa @ 2013-03-19 17:09 UTC (permalink / raw)
  To: Jesper Derehag; +Cc: netdev@vger.kernel.org, zbr@ioremap.net
In-Reply-To: <DUB002-W153FA7D4CB9EB6CFF1689B3DDE90@phx.gbl>

On Tue, Mar 19, 2013 at 07:32:10AM +0000, Jesper Derehag wrote:
> > From: jderehag@hotmail.com
> > To: netdev@vger.kernel.org
> > Subject: RE: [PATCH 1/1] connector: Added coredumping event to the process connector
> > Date: Sat, 16 Mar 2013 19:08:03 +0000
> > 
> > ----------------------------------------
> > > Date: Sat, 16 Mar 2013 19:40:36 +0100
> > > From: hannes@stressinduktion.org
> > > To: jderehag@hotmail.com
> > > CC: zbr@ioremap.net; netdev@vger.kernel.org
> > > Subject: Re: [PATCH 1/1] connector: Added coredumping event to the process connector
> > >
> > > On Sat, Mar 16, 2013 at 05:57:20PM +0000, Jesper Derehag wrote:
> > > >
> > > >
> > > > > Date: Sat, 16 Mar 2013 18:03:48 +0100
> > > > > From: hannes@stressinduktion.org
> > > > > To: jderehag@hotmail.com
> > > > > CC: zbr@ioremap.net; netdev@vger.kernel.org
> > > > > Subject: Re: [PATCH 1/1] connector: Added coredumping event to the process connector
> > > > >
> > > > > On Sat, Mar 16, 2013 at 11:50:50AM +0100, Jesper Derehag wrote:
> > > > > > + ev->event_data.exit.exit_code = task->exit_code;
> > > > > > + ev->event_data.exit.exit_signal = task->exit_signal;
> > > > >
> > > > > Do these already contain meaningful values?
> > > > >
> > > >
> > > > I have to admit that they dont.And you are correct, I should add a new event struct specific for the coredump event instead of piggybacking on the exit struct.Will re-submit a patch..
> > >
> > > Hm, I am still unsure if such a patch is needed. Couldn't you test for
> > > coredump by inspecting exit_code on PROC_EVENT_EXIT?
> > 
> > *** resubmitted message due to it got dropped by vger.kernel.org ***
> > 
> >  Well, what this patch adds I think is more a question of timing. 
> >  As an example, say you want to quickly detect process failures. In that case if we would only have the EXIT event, that would mean that we get notified after the dump is done, which could take minutes depending on how large the dump is. 
> > If we instead watch for both EXIT & COREDUMP events, it would mean that we would quickly catch any failing process, regardless of if its actually starting to coredump or if its exited for some other reason. 		 	   		  
> 
> 
> Any other comments on this before I send a v2 patch with the exit vs the coredump event struct change?

I would say just go on and submit the patch. Perhaps you can Cc someone
looking after the change in signal.c (maybe lkml). I just checked that
you don't hold any spinlocks while doing the netlink send.

^ permalink raw reply

* Re: [PATCH] ixp4xx_eth: set the device dma_coherent_mask
From: Mugunthan V N @ 2013-03-19 17:04 UTC (permalink / raw)
  To: David Miller; +Cc: aeschlimann, khc, netdev, linux-kernel, c.aeschlimann
In-Reply-To: <20130319.123359.2245221095583640859.davem@davemloft.net>

On 3/19/2013 10:03 PM, David Miller wrote:
> From: Christophe Aeschlimann<aeschlimann@gmail.com>
> Date: Tue, 19 Mar 2013 16:59:25 +0100
>
>> >Without the mask it is impossible to take the network interface up
>> >since it returns the following error:
>> >
>>> >>net eth1: coherent DMA mask is unset
>>> >>ifconfig: SIOCSIFFLAGS: Cannot allocate memory
>> >
>> >Tested on an out-of-tree ixp425 based board.
>> >
>> >Signed-off-by: Christophe Aeschlimann<c.aeschlimann@acn-group.ch>
>   ...
>> >@@ -1398,6 +1398,7 @@ static int eth_init_one(struct platform_device *pdev)
>> >  		return -ENOMEM;
>> >  
>> >  	SET_NETDEV_DEV(dev, &pdev->dev);
>> >+	dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
> Hmmm, shouldn't this be the default value, set by the bus layer or
> similar?
bus layer or any platform code doesn't init this value. The same issue 
applies
to CPSW driver also. Previously the same was done in board or device 
file. But
this approach is obsolete now, need to think of how it can be resolved in DT
approach

Regards
Mugunthan V N

^ permalink raw reply

* Re: [BUG][mvebu] mvneta: cannot request irq 25 on openblocks-ax3
From: Gregory CLEMENT @ 2013-03-19 16:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ezequiel Garcia, linux-arm-kernel, thomas.petazzoni, Jason Cooper,
	netdev, linux-kernel, yrl.pp-manager.tt@hitachi.com,
	Florian Fainelli
In-Reply-To: <514873FB.5050202@hitachi.com>

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

On 03/19/2013 03:19 PM, Masami Hiramatsu wrote:
> Hi Ezequiel,
> 
> (2013/03/19 22:39), Ezequiel Garcia wrote:
>> Hi Masami,
>>
>> On Tue, Mar 19, 2013 at 10:12:37PM +0900, Masami Hiramatsu wrote:
>>>
>>> Here I've hit a bug on the recent kernel. As far as I know, this bug
>>> exists on 3.9-rc1 too.
>>>
>>> When I tried the latest mvebu for-next tree
>>> (git://git.infradead.org/users/jcooper/linux.git mvebu/for-next),
>>> I got below warning at bootup time and mvneta didn't work (link was never up).
>>> I ensured that "ifconfig ethX up" always caused that.
>>>
>>> Does anyone succeed to boot openblocks-ax3 recently or hit same
>>> trouble?
>>
>> This is a known bug. Gregory Clement already has a fix and he
>> will submit it soon. In case you need this fixed ASAP, I'm attaching
>> you a patch with a fix.
> 
> Thanks! I'll try that.
> 
>> Please note the attached patch is not ready for mainline inclusion,
>> as I said Gregory will submit a cleaner version soon.
> 
> Yeah, I look forward to it :)

Hi Masami,

You can try this patch if you want.
I don't have the hardware today so I didn't test it.
If you (and also Florian and Ezequiel) can test it and if it fixed
the bug, then I will be able send a proper email for it,

Thanks,
-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


[-- Attachment #2: 0001-net-mvneta-convert-to-local-interrupt.patch --]
[-- Type: text/x-diff, Size: 3140 bytes --]

>From a82800cbd4f2ff34a4a03c8caa688149b8770ab7 Mon Sep 17 00:00:00 2001
From: Gregory CLEMENT <gregory.clement@free-electrons.com>
Date: Tue, 19 Mar 2013 15:11:48 +0100
Subject: [PATCH] net: mvneta: convert to local interrupt

Since commit 3a6f08a37 "arm: mvebu: Add support for local interrupt",
the mvneta interrupt is now managed as a local interrupt. That means
that the driver have to use the request_percpu_irq() function instead
of request_irq().

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index cd345b8..ad64a50 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -256,6 +256,8 @@ struct mvneta_port {
 	unsigned int link;
 	unsigned int duplex;
 	unsigned int speed;
+
+	struct mvneta_port __percpu **percpu_pp;
 };
 
 /* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -1799,7 +1801,7 @@ static void mvneta_set_rx_mode(struct net_device *dev)
 /* Interrupt handling - the callback for request_irq() */
 static irqreturn_t mvneta_isr(int irq, void *dev_id)
 {
-	struct mvneta_port *pp = (struct mvneta_port *)dev_id;
+	struct mvneta_port *pp = *(struct mvneta_port **)dev_id;
 
 	/* Mask all interrupts */
 	mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
@@ -2371,8 +2373,19 @@ static void mvneta_mdio_remove(struct mvneta_port *pp)
 static int mvneta_open(struct net_device *dev)
 {
 	struct mvneta_port *pp = netdev_priv(dev);
+
 	int ret;
 
+	/* As the mvneta interrupts are locals, we need to create a
+	 * percpu variable
+	 */
+	pp->percpu_pp = alloc_percpu(struct mvneta_port *);
+	if (!pp) {
+		ret = -ENOMEM;
+		goto err_percpu_alloc;
+	}
+	*__this_cpu_ptr(pp->percpu_pp) = pp;
+
 	mvneta_mac_addr_set(pp, dev->dev_addr, rxq_def);
 
 	pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
@@ -2385,13 +2398,15 @@ static int mvneta_open(struct net_device *dev)
 	if (ret)
 		goto err_cleanup_rxqs;
 
+
 	/* Connect to port interrupt line */
-	ret = request_irq(pp->dev->irq, mvneta_isr, 0,
-			  MVNETA_DRIVER_NAME, pp);
+	ret = request_percpu_irq(pp->dev->irq, mvneta_isr,
+				MVNETA_DRIVER_NAME, pp->percpu_pp);
 	if (ret) {
 		netdev_err(pp->dev, "cannot request irq %d\n", pp->dev->irq);
 		goto err_cleanup_txqs;
 	}
+	enable_percpu_irq(pp->dev->irq, 0);
 
 	/* In default link is down */
 	netif_carrier_off(pp->dev);
@@ -2407,11 +2422,13 @@ static int mvneta_open(struct net_device *dev)
 	return 0;
 
 err_free_irq:
+	free_percpu(pp->percpu_pp);
 	free_irq(pp->dev->irq, pp);
 err_cleanup_txqs:
 	mvneta_cleanup_txqs(pp);
 err_cleanup_rxqs:
 	mvneta_cleanup_rxqs(pp);
+err_percpu_alloc:
 	return ret;
 }
 
@@ -2422,7 +2439,8 @@ static int mvneta_stop(struct net_device *dev)
 
 	mvneta_stop_dev(pp);
 	mvneta_mdio_remove(pp);
-	free_irq(dev->irq, pp);
+	free_percpu(pp->percpu_pp);
+	free_percpu_irq(dev->irq, pp);
 	mvneta_cleanup_rxqs(pp);
 	mvneta_cleanup_txqs(pp);
 	del_timer(&pp->tx_done_timer);
-- 
1.7.9.5



^ permalink raw reply related

* Re: [PATCH v3 net-next 4/4] filter: add minimal BPF JIT emitted image disassembler
From: Eric Dumazet @ 2013-03-19 16:43 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem, Eric Dumazet
In-Reply-To: <1363711172-9728-5-git-send-email-dborkman@redhat.com>

On Tue, 2013-03-19 at 17:39 +0100, Daniel Borkmann wrote:
> This is a minimal stand-alone user space helper, that allows for debugging or
> verification of emitted BPF JIT images. This is in particular useful for
> emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
> The disassembler is architecture generic and uses libopcodes and libbfd.
> 
> How to get to the disassembly, example:
> 
>   1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
>   2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
>   3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output
> 
> `bpf_jit_disasm -o` will display the related opcodes to a particular instruction
> as well. Example for x86_64:
> 
> $./bpf_jit_disasm
> 94 bytes emitted from JIT compiler (pass:3, flen:9)
> ffffffffa0356000 + <x>:
>    0:	push   %rbp
>    1:	mov    %rsp,%rbp
>    4:	sub    $0x60,%rsp
>    8:	mov    %rbx,-0x8(%rbp)
>    c:	mov    0x68(%rdi),%r9d
>   10:	sub    0x6c(%rdi),%r9d
>   14:	mov    0xe0(%rdi),%r8
>   1b:	mov    $0xc,%esi
>   20:	callq  0xffffffffe0d01b71
>   25:	cmp    $0x86dd,%eax
>   2a:	jne    0x000000000000003d
>   2c:	mov    $0x14,%esi
>   31:	callq  0xffffffffe0d01b8d
>   36:	cmp    $0x6,%eax
> [...]
>   5c:	leaveq
>   5d:	retq
> 
> $ ./bpf_jit_disasm -o
> 94 bytes emitted from JIT compiler (pass:3, flen:9)
> ffffffffa0356000 + <x>:
>    0:	push   %rbp
> 	55
>    1:	mov    %rsp,%rbp
> 	48 89 e5
>    4:	sub    $0x60,%rsp
> 	48 83 ec 60
>    8:	mov    %rbx,-0x8(%rbp)
> 	48 89 5d f8
>    c:	mov    0x68(%rdi),%r9d
> 	44 8b 4f 68
>   10:	sub    0x6c(%rdi),%r9d
> 	44 2b 4f 6c
> [...]
>   5c:	leaveq
> 	c9
>   5d:	retq
> 	c3
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  scripts/bpf_jit_disasm.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 216 insertions(+)
>  create mode 100644 scripts/bpf_jit_disasm.c

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH v3 net-next 3/4] filter: add ANC_PAY_OFFSET instruction for loading payload start offset
From: Eric Dumazet @ 2013-03-19 16:42 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem
In-Reply-To: <1363711172-9728-4-git-send-email-dborkman@redhat.com>

On Tue, 2013-03-19 at 17:39 +0100, Daniel Borkmann wrote:
> It is very useful to do dynamic truncation of packets. In particular,
> we're interested to push the necessary header bytes to the user space and
> cut off user payload that should probably not be transferred for some reasons
> (e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
> we can load it into the accumulator, and return it. E.g. in bpfc syntax ...
> 
>         ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
>         ret a           ; { 0x16, 0, 0, 0x00000000 },
> 
> ... as a filter will accomplish this without having to do a big hackery in
> a BPF filter itself. Follow-up JIT implementations are welcome.
> 
> Thanks to Eric Dumazet for suggesting and discussing this during the
> Netfilter Workshop in Copenhagen.
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---


Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH v3 net-next 2/4] net: flow_dissector: add __skb_get_poff to get a start offset to payload
From: Eric Dumazet @ 2013-03-19 16:42 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem
In-Reply-To: <1363711172-9728-3-git-send-email-dborkman@redhat.com>

On Tue, 2013-03-19 at 17:39 +0100, Daniel Borkmann wrote:
> __skb_get_poff() returns the offset to the payload as far as it could
> be dissected. The main user is currently BPF, so that we can dynamically
> truncate packets without needing to push actual payload to the user
> space and instead can analyze headers only.
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  include/linux/skbuff.h    |  2 ++
>  net/core/flow_dissector.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH v3 net-next 1/4] flow_keys: include thoff into flow_keys for later usage
From: Eric Dumazet @ 2013-03-19 16:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem
In-Reply-To: <1363711172-9728-2-git-send-email-dborkman@redhat.com>

On Tue, 2013-03-19 at 17:39 +0100, Daniel Borkmann wrote:
> In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
> doing the work here anyway, also store thoff for a later usage, e.g. in
> the BPF filter.
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  This patch also needs to go into the net tree, since Eric or Jason will
>  post a bug fix on top of this one.
> 
>  include/net/flow_keys.h   | 1 +
>  net/core/flow_dissector.c | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 80461c1..bb8271d 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -9,6 +9,7 @@ struct flow_keys {
>  		__be32 ports;
>  		__be16 port16[2];
>  	};
> +	u16 thoff;
>  	u8 ip_proto;
>  };
>  
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index f8d9e03..f4be293 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -151,6 +151,8 @@ ipv6:
>  			flow->ports = *ports;
>  	}
>  
> +	flow->thoff = (u16) nhoff;
> +
>  	return true;
>  }
>  EXPORT_SYMBOL(skb_flow_dissect);

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH v3 net-next 4/4] filter: add minimal BPF JIT emitted image disassembler
From: Daniel Borkmann @ 2013-03-19 16:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, Eric Dumazet
In-Reply-To: <1363711172-9728-1-git-send-email-dborkman@redhat.com>

This is a minimal stand-alone user space helper, that allows for debugging or
verification of emitted BPF JIT images. This is in particular useful for
emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
The disassembler is architecture generic and uses libopcodes and libbfd.

How to get to the disassembly, example:

  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
  3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output

`bpf_jit_disasm -o` will display the related opcodes to a particular instruction
as well. Example for x86_64:

$./bpf_jit_disasm
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0:	push   %rbp
   1:	mov    %rsp,%rbp
   4:	sub    $0x60,%rsp
   8:	mov    %rbx,-0x8(%rbp)
   c:	mov    0x68(%rdi),%r9d
  10:	sub    0x6c(%rdi),%r9d
  14:	mov    0xe0(%rdi),%r8
  1b:	mov    $0xc,%esi
  20:	callq  0xffffffffe0d01b71
  25:	cmp    $0x86dd,%eax
  2a:	jne    0x000000000000003d
  2c:	mov    $0x14,%esi
  31:	callq  0xffffffffe0d01b8d
  36:	cmp    $0x6,%eax
[...]
  5c:	leaveq
  5d:	retq

$ ./bpf_jit_disasm -o
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0:	push   %rbp
	55
   1:	mov    %rsp,%rbp
	48 89 e5
   4:	sub    $0x60,%rsp
	48 83 ec 60
   8:	mov    %rbx,-0x8(%rbp)
	48 89 5d f8
   c:	mov    0x68(%rdi),%r9d
	44 8b 4f 68
  10:	sub    0x6c(%rdi),%r9d
	44 2b 4f 6c
[...]
  5c:	leaveq
	c9
  5d:	retq
	c3

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 scripts/bpf_jit_disasm.c | 216 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)
 create mode 100644 scripts/bpf_jit_disasm.c

diff --git a/scripts/bpf_jit_disasm.c b/scripts/bpf_jit_disasm.c
new file mode 100644
index 0000000..1fe9fb5
--- /dev/null
+++ b/scripts/bpf_jit_disasm.c
@@ -0,0 +1,216 @@
+/*
+ * Minimal BPF JIT image disassembler
+ *
+ * Disassembles BPF JIT compiler emitted opcodes back to asm insn's for
+ * debugging or verification purposes.
+ *
+ * There is no Makefile. Compile with
+ *
+ *   `gcc -Wall -O2 bpf_jit_disasm.c -o bpf_jit_disasm -lopcodes -lbfd -ldl`
+ *
+ * or similar.
+ *
+ * To get the disassembly of the JIT code, do the following:
+ *
+ *  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
+ *  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
+ *  3) Run e.g. `./bpf_jit_disasm -o` to read out the last JIT code
+ *
+ * Copyright 2013 Daniel Borkmann <borkmann@redhat.com>
+ * Licensed under the GNU General Public License, version 2.0 (GPLv2)
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unistd.h>
+#include <string.h>
+#include <bfd.h>
+#include <dis-asm.h>
+#include <sys/klog.h>
+#include <sys/types.h>
+#include <regex.h>
+
+#define VERSION_STRING	"1.0"
+
+static void get_exec_path(char *tpath, size_t size)
+{
+	char *path;
+	ssize_t len;
+
+	snprintf(tpath, size, "/proc/%d/exe", (int) getpid());
+	tpath[size - 1] = 0;
+
+	path = strdup(tpath);
+	assert(path);
+
+	len = readlink(path, tpath, size);
+	tpath[len] = 0;
+
+	free(path);
+}
+
+static void get_asm_insns(uint8_t *image, size_t len, unsigned long base,
+			  int opcodes)
+{
+	int count, i, pc = 0;
+	char tpath[256];
+	struct disassemble_info info;
+	disassembler_ftype disassemble;
+	bfd *bfdf;
+
+	memset(tpath, 0, sizeof(tpath));
+	get_exec_path(tpath, sizeof(tpath));
+
+	bfdf = bfd_openr(tpath, NULL);
+	assert(bfdf);
+	assert(bfd_check_format(bfdf, bfd_object));
+
+	init_disassemble_info(&info, stdout, (fprintf_ftype) fprintf);
+	info.arch = bfd_get_arch(bfdf);
+	info.mach = bfd_get_mach(bfdf);
+	info.buffer = image;
+	info.buffer_length = len;
+
+	disassemble_init_for_target(&info);
+
+	disassemble = disassembler(bfdf);
+	assert(disassemble);
+
+	do {
+		printf("%4x:\t", pc);
+
+		count = disassemble(pc, &info);
+
+		if (opcodes) {
+			printf("\n\t");
+			for (i = 0; i < count; ++i)
+				printf("%02x ", (uint8_t) image[pc + i]);
+		}
+		printf("\n");
+
+		pc += count;
+	} while(count > 0 && pc < len);
+
+	bfd_close(bfdf);
+}
+
+static char *get_klog_buff(int *klen)
+{
+	int ret, len = klogctl(10, NULL, 0);
+	char *buff = malloc(len);
+
+	assert(buff && klen);
+	ret = klogctl(3, buff, len);
+	assert(ret >= 0);
+	*klen = ret;
+
+	return buff;
+}
+
+static void put_klog_buff(char *buff)
+{
+	free(buff);
+}
+
+static int get_last_jit_image(char *haystack, size_t hlen,
+			      uint8_t *image, size_t ilen,
+			      unsigned long *base)
+{
+	char *ptr, *pptr, *tmp;
+	off_t off = 0;
+	int ret, flen, proglen, pass, ulen = 0;
+	regmatch_t pmatch[1];
+	regex_t regex;
+
+	if (hlen == 0)
+		return 0;
+
+	ret = regcomp(&regex, "flen=[[:alnum:]]+ proglen=[[:digit:]]+ "
+		      "pass=[[:digit:]]+ image=[[:xdigit:]]+", REG_EXTENDED);
+	assert(ret == 0);
+
+	ptr = haystack;
+	while (1) {
+		ret = regexec(&regex, ptr, 1, pmatch, 0);
+		if (ret == 0) {
+			ptr += pmatch[0].rm_eo;
+			off += pmatch[0].rm_eo;
+			assert(off < hlen);
+		} else
+			break;
+	}
+
+	ptr = haystack + off - (pmatch[0].rm_eo - pmatch[0].rm_so);
+	ret = sscanf(ptr, "flen=%d proglen=%d pass=%d image=%lx",
+		     &flen, &proglen, &pass, base);
+	if (ret != 4)
+		return 0;
+
+	tmp = ptr = haystack + off;
+	while ((ptr = strtok(tmp, "\n")) != NULL && ulen < ilen) {
+		tmp = NULL;
+		if (!strstr(ptr, "JIT code"))
+			continue;
+		pptr = ptr;
+		while ((ptr = strstr(pptr, ":")))
+			pptr = ptr + 1;
+		ptr = pptr;
+		do {
+			image[ulen++] = (uint8_t) strtoul(pptr, &pptr, 16);
+			if (ptr == pptr || ulen >= ilen) {
+				ulen--;
+				break;
+			}
+			ptr = pptr;
+		} while (1);
+	}
+
+	assert(ulen == proglen);
+	printf("%d bytes emitted from JIT compiler (pass:%d, flen:%d)\n",
+	       proglen, pass, flen);
+	printf("%lx + <x>:\n", *base);
+
+	regfree(&regex);
+	return ulen;
+}
+
+static void help(void)
+{
+	printf("Usage: bpf_jit_disasm [-ohv]\n");
+	printf("Version %s, written by Daniel Borkmann <borkmann@redhat.com>\n",
+	       VERSION_STRING);
+	printf("  -o                             Include opcodes in output\n");
+	printf("  -h|-v                          Show help/version\n");
+	exit(0);
+}
+
+int main(int argc, char **argv)
+{
+	int len, klen, opcodes = 0;
+	char *kbuff;
+	unsigned long base;
+	uint8_t image[4096];
+
+	if (argc > 1) {
+		if (!strncmp("-o", argv[argc - 1], 2))
+			opcodes = 1;
+		if (!strncmp("-h", argv[argc - 1], 2) ||
+		    !strncmp("-v", argv[argc - 1], 2))
+			help();
+	}
+
+	bfd_init();
+	memset(image, 0, sizeof(image));
+
+	kbuff = get_klog_buff(&klen);
+
+	len = get_last_jit_image(kbuff, klen, image, sizeof(image), &base);
+	if (len > 0 && base > 0)
+		get_asm_insns(image, len, base, opcodes);
+
+	put_klog_buff(kbuff);
+
+	return 0;
+}
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v3 net-next 3/4] filter: add ANC_PAY_OFFSET instruction for loading payload start offset
From: Daniel Borkmann @ 2013-03-19 16:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1363711172-9728-1-git-send-email-dborkman@redhat.com>

It is very useful to do dynamic truncation of packets. In particular,
we're interested to push the necessary header bytes to the user space and
cut off user payload that should probably not be transferred for some reasons
(e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
we can load it into the accumulator, and return it. E.g. in bpfc syntax ...

        ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
        ret a           ; { 0x16, 0, 0, 0x00000000 },

... as a filter will accomplish this without having to do a big hackery in
a BPF filter itself. Follow-up JIT implementations are welcome.

Thanks to Eric Dumazet for suggesting and discussing this during the
Netfilter Workshop in Copenhagen.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/filter.h      | 1 +
 include/uapi/linux/filter.h | 3 ++-
 net/core/filter.c           | 5 +++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index c45eabc..d2059cb 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -126,6 +126,7 @@ enum {
 	BPF_S_ANC_SECCOMP_LD_W,
 	BPF_S_ANC_VLAN_TAG,
 	BPF_S_ANC_VLAN_TAG_PRESENT,
+	BPF_S_ANC_PAY_OFFSET,
 };
 
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/filter.h b/include/uapi/linux/filter.h
index 9cfde69..8eb9cca 100644
--- a/include/uapi/linux/filter.h
+++ b/include/uapi/linux/filter.h
@@ -129,7 +129,8 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_ALU_XOR_X	40
 #define SKF_AD_VLAN_TAG	44
 #define SKF_AD_VLAN_TAG_PRESENT 48
-#define SKF_AD_MAX	52
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_MAX	56
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 2e20b55..dad2a17 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -348,6 +348,9 @@ load_b:
 		case BPF_S_ANC_VLAN_TAG_PRESENT:
 			A = !!vlan_tx_tag_present(skb);
 			continue;
+		case BPF_S_ANC_PAY_OFFSET:
+			A = __skb_get_poff(skb);
+			continue;
 		case BPF_S_ANC_NLATTR: {
 			struct nlattr *nla;
 
@@ -612,6 +615,7 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
 			ANCILLARY(ALU_XOR_X);
 			ANCILLARY(VLAN_TAG);
 			ANCILLARY(VLAN_TAG_PRESENT);
+			ANCILLARY(PAY_OFFSET);
 			}
 
 			/* ancillary operation unknown or unsupported */
@@ -814,6 +818,7 @@ static void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to)
 		[BPF_S_ANC_SECCOMP_LD_W] = BPF_LD|BPF_B|BPF_ABS,
 		[BPF_S_ANC_VLAN_TAG]	= BPF_LD|BPF_B|BPF_ABS,
 		[BPF_S_ANC_VLAN_TAG_PRESENT] = BPF_LD|BPF_B|BPF_ABS,
+		[BPF_S_ANC_PAY_OFFSET]	= BPF_LD|BPF_B|BPF_ABS,
 		[BPF_S_LD_W_LEN]	= BPF_LD|BPF_W|BPF_LEN,
 		[BPF_S_LD_W_IND]	= BPF_LD|BPF_W|BPF_IND,
 		[BPF_S_LD_H_IND]	= BPF_LD|BPF_H|BPF_IND,
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v3 net-next 2/4] net: flow_dissector: add __skb_get_poff to get a start offset to payload
From: Daniel Borkmann @ 2013-03-19 16:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1363711172-9728-1-git-send-email-dborkman@redhat.com>

__skb_get_poff() returns the offset to the payload as far as it could
be dissected. The main user is currently BPF, so that we can dynamically
truncate packets without needing to push actual payload to the user
space and instead can analyze headers only.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/skbuff.h    |  2 ++
 net/core/flow_dissector.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eb2106f..0e84fd8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2835,6 +2835,8 @@ static inline void skb_checksum_none_assert(const struct sk_buff *skb)
 
 bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off);
 
+u32 __skb_get_poff(const struct sk_buff *skb);
+
 /**
  * skb_head_is_locked - Determine if the skb->head is locked down
  * @skb: skb to check
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f4be293..00ee068 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -5,6 +5,10 @@
 #include <linux/if_vlan.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
+#include <linux/igmp.h>
+#include <linux/icmp.h>
+#include <linux/sctp.h>
+#include <linux/dccp.h>
 #include <linux/if_tunnel.h>
 #include <linux/if_pppox.h>
 #include <linux/ppp_defs.h>
@@ -228,6 +232,59 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(__skb_tx_hash);
 
+/* __skb_get_poff() returns the offset to the payload as far as it could
+ * be dissected. The main user is currently BPF, so that we can dynamically
+ * truncate packets without needing to push actual payload to the user
+ * space and can analyze headers only, instead.
+ */
+u32 __skb_get_poff(const struct sk_buff *skb)
+{
+	struct flow_keys keys;
+	u32 poff = 0;
+
+	if (!skb_flow_dissect(skb, &keys))
+		return 0;
+
+	poff += keys.thoff;
+	switch (keys.ip_proto) {
+	case IPPROTO_TCP: {
+		const struct tcphdr *tcph;
+		struct tcphdr _tcph;
+
+		tcph = skb_header_pointer(skb, poff, sizeof(_tcph), &_tcph);
+		if (!tcph)
+			return poff;
+
+		poff += max_t(u32, sizeof(struct tcphdr), tcph->doff * 4);
+		break;
+	}
+	case IPPROTO_UDP:
+	case IPPROTO_UDPLITE:
+		poff += sizeof(struct udphdr);
+		break;
+	/* For the rest, we do not really care about header
+	 * extensions at this point for now.
+	 */
+	case IPPROTO_ICMP:
+		poff += sizeof(struct icmphdr);
+		break;
+	case IPPROTO_ICMPV6:
+		poff += sizeof(struct icmp6hdr);
+		break;
+	case IPPROTO_IGMP:
+		poff += sizeof(struct igmphdr);
+		break;
+	case IPPROTO_DCCP:
+		poff += sizeof(struct dccp_hdr);
+		break;
+	case IPPROTO_SCTP:
+		poff += sizeof(struct sctphdr);
+		break;
+	}
+
+	return poff;
+}
+
 static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
 {
 	if (unlikely(queue_index >= dev->real_num_tx_queues)) {
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v3 net-next 1/4] flow_keys: include thoff into flow_keys for later usage
From: Daniel Borkmann @ 2013-03-19 16:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet
In-Reply-To: <1363711172-9728-1-git-send-email-dborkman@redhat.com>

In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 This patch also needs to go into the net tree, since Eric or Jason will
 post a bug fix on top of this one.

 include/net/flow_keys.h   | 1 +
 net/core/flow_dissector.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 80461c1..bb8271d 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -9,6 +9,7 @@ struct flow_keys {
 		__be32 ports;
 		__be16 port16[2];
 	};
+	u16 thoff;
 	u8 ip_proto;
 };
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f8d9e03..f4be293 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -151,6 +151,8 @@ ipv6:
 			flow->ports = *ports;
 	}
 
+	flow->thoff = (u16) nhoff;
+
 	return true;
 }
 EXPORT_SYMBOL(skb_flow_dissect);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v3 net-next 0/4] net: filter: BPF updates
From: Daniel Borkmann @ 2013-03-19 16:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet

This set adds i) an ancillary operation to the BPF engine and ii) a
BPF JIT image disassembler in order to verify or debug the BPF JIT
compilers under arch/*/net/.

v1 -> v2:
	- No need to reorder choke_skb_cb structure
v2 -> v3:
	- Do not touch nhoff, let it stay as is

Daniel Borkmann (4):
  flow_keys: include thoff into flow_keys for later usage
  net: flow_dissector: add __skb_get_poff to get a start offset to payload
  filter: add ANC_PAY_OFFSET instruction for loading payload start offset
  filter: add minimal BPF JIT emitted image disassembler

 include/linux/filter.h      |   1 +
 include/linux/skbuff.h      |   2 +
 include/net/flow_keys.h     |   1 +
 include/uapi/linux/filter.h |   3 +-
 net/core/filter.c           |   5 +
 net/core/flow_dissector.c   |  59 ++++++++++++
 scripts/bpf_jit_disasm.c    | 216 ++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 scripts/bpf_jit_disasm.c

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH] ixp4xx_eth: set the device dma_coherent_mask
From: David Miller @ 2013-03-19 16:33 UTC (permalink / raw)
  To: aeschlimann; +Cc: khc, netdev, linux-kernel, c.aeschlimann
In-Reply-To: <1363708765-20778-1-git-send-email-c.aeschlimann@acn-group.ch>

From: Christophe Aeschlimann <aeschlimann@gmail.com>
Date: Tue, 19 Mar 2013 16:59:25 +0100

> Without the mask it is impossible to take the network interface up
> since it returns the following error:
> 
>> net eth1: coherent DMA mask is unset
>> ifconfig: SIOCSIFFLAGS: Cannot allocate memory
> 
> Tested on an out-of-tree ixp425 based board.
> 
> Signed-off-by: Christophe Aeschlimann <c.aeschlimann@acn-group.ch>
 ...
> @@ -1398,6 +1398,7 @@ static int eth_init_one(struct platform_device *pdev)
>  		return -ENOMEM;
>  
>  	SET_NETDEV_DEV(dev, &pdev->dev);
> +	dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);

Hmmm, shouldn't this be the default value, set by the bus layer or
similar?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox