* Re: [PATCH] net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens
From: Shimoda, Yoshihiro @ 2012-06-21 1:26 UTC (permalink / raw)
To: Guennadi Liakhovetski; +Cc: netdev, SH-Linux
In-Reply-To: <Pine.LNX.4.64.1206201507260.20254@axis700.grange>
Hello Guennadi-san,
2012/06/20 22:10, Guennadi Liakhovetski wrote:
> Hello Shimoda-san
>
> On Tue, 29 May 2012, Shimoda, Yoshihiro wrote:
>
>> When Receive Descriptor Empty happens, rxdesc pointer of the driver
>> and actual next descriptor of the controller may be mismatch.
>> This patch fixes it.
>
> Unfortunately, this patch breaks networking on ecovec (sh7724). Booting
> with dhcp and NFS-root progresses very slowly with lots of "nfs: server
> not responding / Ok" messages and never completes. Reverting it in current
> Linus' tree fixes the problem.
Thank you very much for the report.
The SH7724 doesn't set the RMCR register. So, the EDRRR will be clear after
the controller receives a freme every time.
So, the "fix the rxdesc pointer" had to check a condition.
I wrote a patch for the issue as the following.
If possible, would you try the patch?
Best regards,
Yoshihiro Shimoda
---
Subject: [PATCH] net: sh_eth: fix the condition to fix the cur_tx/dirty_rx
The following commit couldn't work if the RMCR is not set to 1.
"net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens"
commit id 79fba9f51755c704c0a7d7b7f0df10874dc0a744
If RMCR is not set, the controller will clear the EDRRR after it received
a frame. In this case, the driver doesn't need to fix the value of
cur_rx/dirty_rx. The driver only needs it when the controll detects
receive descriptors are empty.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
drivers/net/ethernet/renesas/sh_eth.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 667169b..79bf09b 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1011,7 +1011,7 @@ static int sh_eth_txfree(struct net_device *ndev)
}
/* Packet receive function */
-static int sh_eth_rx(struct net_device *ndev)
+static int sh_eth_rx(struct net_device *ndev, u32 intr_status)
{
struct sh_eth_private *mdp = netdev_priv(ndev);
struct sh_eth_rxdesc *rxdesc;
@@ -1102,9 +1102,11 @@ static int sh_eth_rx(struct net_device *ndev)
/* Restart Rx engine if stopped. */
/* If we don't need to check status, don't. -KDU */
if (!(sh_eth_read(ndev, EDRRR) & EDRRR_R)) {
- /* fix the values for the next receiving */
- mdp->cur_rx = mdp->dirty_rx = (sh_eth_read(ndev, RDFAR) -
- sh_eth_read(ndev, RDLAR)) >> 4;
+ /* fix the values for the next receiving if RDE is set */
+ if (intr_status & EESR_RDE)
+ mdp->cur_rx = mdp->dirty_rx =
+ (sh_eth_read(ndev, RDFAR) -
+ sh_eth_read(ndev, RDLAR)) >> 4;
sh_eth_write(ndev, EDRRR_R, EDRRR);
}
@@ -1273,7 +1275,7 @@ static irqreturn_t sh_eth_interrupt(int irq, void *netdev)
EESR_RTSF | /* short frame recv */
EESR_PRE | /* PHY-LSI recv error */
EESR_CERF)){ /* recv frame CRC error */
- sh_eth_rx(ndev);
+ sh_eth_rx(ndev, intr_status);
}
/* Tx Check */
--
1.7.1
^ permalink raw reply related
* [PATCH 3/3 net-next] tg3: Add sysfs file to export sensor data
From: Michael Chan @ 2012-06-21 0:06 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir, mcarlson
In-Reply-To: <1340237192-30052-2-git-send-email-mchan@broadcom.com>
Some tg3 devices have management firmware that can export data such as
temperature and other real time diagnostics data. Export this data to
sysfs so that userspace can access this information.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 241 +++++++++++++++++++++++++++++++++++
drivers/net/ethernet/broadcom/tg3.h | 60 +++++++++
2 files changed, 301 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index e93760c..f6c56ff 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -9538,6 +9538,182 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
return tg3_reset_hw(tp, reset_phy);
}
+static void tg3_sd_xfer(struct tg3 *tp, u32 off, u32 size)
+{
+ struct tg3_sd *sd = tp->sd;
+
+ if (!size)
+ return;
+
+ tg3_ape_scratchpad_read(tp, (u32 *) &sd->buf[off], off, size);
+}
+
+static void tg3_sd_update_host(struct tg3 *tp, struct tg3_sd_record *rec)
+{
+ tg3_sd_xfer(tp, rec->data_off, rec->data_len);
+ tg3_sd_xfer(tp, rec->hdr_off, rec->hdr_len);
+}
+
+static void tg3_sd_update_drvflags(struct tg3 *tp, bool unloading)
+{
+ struct tg3_sd *sd = tp->sd;
+ u32 flags;
+
+ if (!sd || !sd->sd_flags_off)
+ return;
+
+ tg3_ape_scratchpad_read(tp, &flags, sd->sd_flags_off, 4);
+
+ flags &= ~TG3_OCIR_DRVR_FEAT_MASK;
+
+ if (!unloading) {
+ u32 mask = NETIF_F_RXCSUM | NETIF_F_IP_CSUM |
+ NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM;
+
+ if (tp->dev->features & mask)
+ flags |= TG3_OCIR_DRVR_FEAT_CSUM;
+
+ if (tp->dev->features & NETIF_F_ALL_TSO)
+ flags |= TG3_OCIR_DRVR_FEAT_TSO;
+ }
+
+ tg3_ape_scratchpad_write(tp, sd->sd_flags_off, &flags, 4);
+}
+
+static void tg3_sd_scan_scratchpad(struct tg3 *tp, struct tg3_ocir *ocir)
+{
+ int i;
+
+ for (i = 0; i < TG3_SD_NUM_RECS; i++, ocir++) {
+ u32 off = i * TG3_OCIR_LEN, len = TG3_OCIR_LEN;
+
+ tg3_ape_scratchpad_read(tp, (u32 *) ocir, off, len);
+ off += len;
+
+ if (ocir->signature != TG3_OCIR_SIG_MAGIC ||
+ !(ocir->version_flags & TG3_OCIR_FLAG_ACTIVE))
+ memset(ocir, 0, TG3_OCIR_LEN);
+ }
+}
+
+static ssize_t tg3_sd_read(struct device *dev, struct device_attribute *attr,
+ char *buff)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct tg3 *tp = netdev_priv(netdev);
+ struct tg3_sd *sd = tp->sd;
+
+ memcpy(buff, sd->buf, sd->buf_size);
+
+ return sd->buf_size;
+}
+
+static DEVICE_ATTR(tg3_sd, 0400, tg3_sd_read, NULL);
+
+static int tg3_sd_init(struct tg3 *tp)
+{
+ int i;
+ u32 size = 0;
+ struct tg3_sd *sd;
+ struct tg3_ocir ocirs[TG3_SD_NUM_RECS];
+
+ if (!tg3_flag(tp, ENABLE_APE))
+ return 0;
+
+ tp->sd = kzalloc(sizeof(struct tg3_sd), GFP_KERNEL);
+ if (!tp->sd)
+ return -ENOMEM;
+
+ sd = tp->sd;
+ tg3_sd_scan_scratchpad(tp, ocirs);
+
+ for (i = 0; i < TG3_SD_NUM_RECS; i++) {
+ u32 val = 1;
+ struct tg3_sd_record *rec = &sd->rec[i];
+
+ if (!ocirs[i].src_data_length)
+ continue;
+
+ rec->hdr_len = ocirs[i].src_hdr_length;
+ rec->hdr_off = ocirs[i].src_hdr_offset;
+ rec->data_len = ocirs[i].src_data_length;
+ rec->data_off = ocirs[i].src_data_offset;
+
+ size += ocirs[i].src_hdr_length;
+ size += ocirs[i].src_data_length;
+
+ rec->utmr_off = i * TG3_OCIR_LEN + TG3_OCIR_UPDATE_TMR_OFF;
+ rec->rtmr_off = i * TG3_OCIR_LEN + TG3_OCIR_REFRESH_TMR_OFF;
+ rec->rtmr_int = ocirs[i].refresh_int;
+
+ /* Initialize utmr_off to non-zero so that we read the region
+ * at least once */
+ if (tg3_ape_scratchpad_write(tp, rec->utmr_off, &val, 4))
+ netdev_err(tp->dev, "write scratchpad error\n");
+
+ ocirs[i].update_tmr = 0;
+ }
+ if (!size) {
+ kfree(sd);
+ tp->sd = NULL;
+ return -ENODEV;
+ }
+
+ size += sizeof(ocirs);
+
+ sd->buf = kzalloc(size, GFP_KERNEL);
+ if (!sd->buf) {
+ kfree(sd);
+ tp->sd = NULL;
+ return -ENOMEM;
+ }
+
+ sd->buf_size = size;
+ memcpy(sd->buf, ocirs, sizeof(ocirs));
+
+ sd->sd_flags_off = 2 * TG3_OCIR_LEN +
+ (tp->pci_fn * sizeof(u32)) +
+ TG3_OCIR_PORT0_FLGS_OFF;
+
+ tg3_sd_update_drvflags(tp, false);
+ return 0;
+}
+
+static void tg3_sd_fini(struct tg3 *tp)
+{
+ struct tg3_sd *sd = tp->sd;
+
+ if (!sd)
+ return;
+
+ tg3_sd_update_drvflags(tp, true);
+
+ kfree(sd->buf);
+ kfree(sd);
+ tp->sd = NULL;
+}
+
+static void tg3_sd_close(struct tg3 *tp)
+{
+ struct tg3_sd *sd = tp->sd;
+
+ if (!sd)
+ return;
+
+ device_remove_file(&tp->pdev->dev, &dev_attr_tg3_sd);
+}
+
+static int tg3_sd_open(struct tg3 *tp)
+{
+ struct tg3_sd *sd = tp->sd;
+
+ if (!sd)
+ return -ENODEV;
+
+ return device_create_file(&tp->pdev->dev, &dev_attr_tg3_sd);
+}
+
#define TG3_STAT_ADD32(PSTAT, REG) \
do { u32 __val = tr32(REG); \
(PSTAT)->low += __val; \
@@ -9623,6 +9799,59 @@ static void tg3_chk_missed_msi(struct tg3 *tp)
}
}
+
+static void tg3_sd_timer(struct tg3 *tp)
+{
+ int i;
+ u32 val;
+ struct tg3_sd *sd = tp->sd;
+ struct tg3_ocir *ocirp = (struct tg3_ocir *) sd->buf;
+
+ if (!netif_running(tp->dev))
+ return;
+
+ for (i = 0; i < TG3_SD_NUM_RECS; i++, ocirp++) {
+ struct tg3_sd_record *rec = &sd->rec[i];
+
+ if (!rec->data_len)
+ continue;
+
+ tg3_ape_scratchpad_read(tp, &val, rec->utmr_off, 4);
+ /* Check if data has changed */
+ if (val) {
+
+ if (!rec->rtmr_int) {
+ tg3_sd_update_host(tp, rec);
+
+ rec->updated_seq++;
+ ocirp->update_tmr = rec->updated_seq;
+ } else {
+ u32 curr;
+ unsigned long tgt;
+
+ curr = tg3_ape_read32(tp, TG3_APE_STICKY_TMR);
+ tgt = rec->rtmr_val + rec->rtmr_int;
+ if (time_after((unsigned long) curr, tgt)) {
+ tg3_sd_update_host(tp, rec);
+
+ rec->rtmr_val = curr;
+ tg3_ape_scratchpad_write(tp,
+ rec->rtmr_off,
+ &curr, 4);
+
+ rec->updated_seq++;
+ ocirp->update_tmr = rec->updated_seq;
+ }
+ }
+
+ val = 0;
+ if (tg3_ape_scratchpad_write(tp, rec->utmr_off,
+ &val, 4))
+ netdev_err(tp->dev, "write scratchpad error\n");
+ }
+ }
+}
+
static void tg3_timer(unsigned long __opaque)
{
struct tg3 *tp = (struct tg3 *) __opaque;
@@ -9661,6 +9890,9 @@ static void tg3_timer(unsigned long __opaque)
if (tg3_flag(tp, 5705_PLUS))
tg3_periodic_fetch_stats(tp);
+ if (tp->sd)
+ tg3_sd_timer(tp);
+
if (tp->setlpicnt && !--tp->setlpicnt)
tg3_phy_eee_enable(tp);
@@ -10246,6 +10478,8 @@ static int tg3_open(struct net_device *dev)
tg3_phy_start(tp);
+ tg3_sd_open(tp);
+
tg3_full_lock(tp, 0);
tg3_timer_start(tp);
@@ -10295,6 +10529,8 @@ static int tg3_close(struct net_device *dev)
tg3_timer_stop(tp);
+ tg3_sd_close(tp);
+
tg3_phy_stop(tp);
tg3_full_lock(tp, 1);
@@ -15945,6 +16181,8 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
tg3_timer_init(tp);
+ tg3_sd_init(tp);
+
err = register_netdev(dev);
if (err) {
dev_err(&pdev->dev, "Cannot register net device, aborting\n");
@@ -16039,6 +16277,9 @@ static void __devexit tg3_remove_one(struct pci_dev *pdev)
}
unregister_netdev(dev);
+
+ tg3_sd_fini(tp);
+
if (tp->aperegs) {
iounmap(tp->aperegs);
tp->aperegs = NULL;
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index d167a1c..61a8f71 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -2379,6 +2379,18 @@
#define TG3_APE_LOCK_PHY3 5
#define TG3_APE_LOCK_GPIO 7
+/* SD flags */
+#define TG3_OCIR_SIG_MAGIC 0x5253434f
+#define TG3_OCIR_FLAG_ACTIVE 0x00000001
+
+#define TG3_OCIR_DRVR_FEAT_CSUM 0x00000001
+#define TG3_OCIR_DRVR_FEAT_TSO 0x00000002
+#define TG3_OCIR_DRVR_FEAT_MASK 0xff
+
+#define TG3_OCIR_REFRESH_TMR_OFF 0x00000008
+#define TG3_OCIR_UPDATE_TMR_OFF 0x0000000c
+#define TG3_OCIR_PORT0_FLGS_OFF 0x0000002c
+
#define TG3_EEPROM_SB_F1R2_MBA_OFF 0x10
@@ -2677,6 +2689,52 @@ struct tg3_hw_stats {
u8 __reserved4[0xb00-0x9c8];
};
+#define TG3_SD_NUM_RECS 3
+#define TG3_OCIR_LEN (sizeof(struct tg3_ocir))
+
+
+struct tg3_ocir {
+ u32 signature;
+ u16 version_flags;
+ u16 refresh_int;
+ u32 refresh_tmr;
+ u32 update_tmr;
+ u32 dst_base_addr;
+ u16 src_hdr_offset;
+ u16 src_hdr_length;
+ u16 src_data_offset;
+ u16 src_data_length;
+ u16 dst_hdr_offset;
+ u16 dst_data_offset;
+ u16 dst_reg_upd_offset;
+ u16 dst_sem_offset;
+ u32 reserved1[2];
+ u32 port0_flags;
+ u32 port1_flags;
+ u32 port2_flags;
+ u32 port3_flags;
+ u32 reserved2[1];
+};
+
+struct tg3_sd_record {
+ u16 hdr_off;
+ u16 hdr_len;
+ u16 data_off;
+ u16 data_len;
+ u32 updated_seq;
+ u16 utmr_off;
+ u16 rtmr_off;
+ u32 rtmr_val;
+ u16 rtmr_int;
+};
+
+struct tg3_sd {
+ struct tg3_sd_record rec[TG3_SD_NUM_RECS];
+ u32 sd_flags_off;
+ int buf_size;
+ u8 *buf;
+};
+
/* 'mapping' is superfluous as the chip does not write into
* the tx/rx post rings so we could just fetch it from there.
* But the cache behavior is better how we are doing it now.
@@ -3212,6 +3270,8 @@ struct tg3 {
const char *fw_needed;
const struct firmware *fw;
u32 fw_len; /* includes BSS */
+
+ struct tg3_sd *sd;
};
#endif /* !(_T3_H) */
--
1.7.1
^ permalink raw reply related
* [PATCH 1/3 net-next] tg3: Add common function tg3_ape_event_lock()
From: Michael Chan @ 2012-06-21 0:06 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir, mcarlson
From: Matt Carlson <mcarlson@broadcom.com>
by refactoring code in tg3_ape_send_event(). The common function will
be used in subsequent patches.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 56 ++++++++++++++++++++---------------
1 files changed, 32 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index e47ff8b..7c515db 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -730,44 +730,52 @@ static void tg3_ape_unlock(struct tg3 *tp, int locknum)
tg3_ape_write32(tp, gnt + 4 * locknum, bit);
}
-static void tg3_ape_send_event(struct tg3 *tp, u32 event)
+static int tg3_ape_event_lock(struct tg3 *tp, u32 timeout_us)
{
- int i;
u32 apedata;
- /* NCSI does not support APE events */
- if (tg3_flag(tp, APE_HAS_NCSI))
- return;
+ while (timeout_us) {
+ if (tg3_ape_lock(tp, TG3_APE_LOCK_MEM))
+ return -EBUSY;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
+ if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
+ break;
+
+ tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+
+ udelay(10);
+ timeout_us -= (timeout_us > 10) ? 10 : timeout_us;
+ }
+
+ return timeout_us ? 0 : -EBUSY;
+}
+
+static int tg3_ape_send_event(struct tg3 *tp, u32 event)
+{
+ int err;
+ u32 apedata;
apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
if (apedata != APE_SEG_SIG_MAGIC)
- return;
+ return -EAGAIN;
apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
if (!(apedata & APE_FW_STATUS_READY))
- return;
+ return -EAGAIN;
/* Wait for up to 1 millisecond for APE to service previous event. */
- for (i = 0; i < 10; i++) {
- if (tg3_ape_lock(tp, TG3_APE_LOCK_MEM))
- return;
-
- apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
-
- if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
- tg3_ape_write32(tp, TG3_APE_EVENT_STATUS,
- event | APE_EVENT_STATUS_EVENT_PENDING);
+ err = tg3_ape_event_lock(tp, 1000);
+ if (err)
+ return err;
- tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+ tg3_ape_write32(tp, TG3_APE_EVENT_STATUS,
+ event | APE_EVENT_STATUS_EVENT_PENDING);
- if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
- break;
+ tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+ tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
- udelay(100);
- }
-
- if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
- tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
+ return 0;
}
static void tg3_ape_driver_state_change(struct tg3 *tp, int kind)
--
1.7.1
^ permalink raw reply related
* [PATCH 2/3 net-next] tg3: Add APE scratchpad read and write functions.
From: Michael Chan @ 2012-06-21 0:06 UTC (permalink / raw)
To: davem; +Cc: netdev, nsujir, mcarlson
In-Reply-To: <1340237192-30052-1-git-send-email-mchan@broadcom.com>
From: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/tg3.c | 137 +++++++++++++++++++++++++++++++++++
drivers/net/ethernet/broadcom/tg3.h | 10 ++-
2 files changed, 145 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 7c515db..e93760c 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -751,6 +751,143 @@ static int tg3_ape_event_lock(struct tg3 *tp, u32 timeout_us)
return timeout_us ? 0 : -EBUSY;
}
+static int tg3_ape_wait_for_event(struct tg3 *tp, u32 timeout_us)
+{
+ u32 i, apedata;
+
+ for (i = 0; i < timeout_us / 10; i++) {
+ apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
+
+ if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
+ break;
+
+ udelay(10);
+ }
+
+ return i == timeout_us / 10;
+}
+
+int tg3_ape_scratchpad_read(struct tg3 *tp, u32 *data, u32 base_off, u32 len)
+{
+ int err;
+ u32 i, bufoff, msgoff, maxlen, apedata;
+
+ if (!tg3_flag(tp, APE_HAS_NCSI))
+ return 0;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
+ if (apedata != APE_SEG_SIG_MAGIC)
+ return -ENODEV;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
+ if (!(apedata & APE_FW_STATUS_READY))
+ return -EAGAIN;
+
+ bufoff = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_OFF) +
+ TG3_APE_SHMEM_BASE;
+ msgoff = bufoff + 2 * sizeof(u32);
+ maxlen = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_LEN);
+
+ while (len) {
+ u32 length;
+
+ /* Cap xfer sizes to scratchpad limits. */
+ length = (len > maxlen) ? maxlen : len;
+ len -= length;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
+ if (!(apedata & APE_FW_STATUS_READY))
+ return -EAGAIN;
+
+ /* Wait for up to 1 msec for APE to service previous event. */
+ err = tg3_ape_event_lock(tp, 1000);
+ if (err)
+ return err;
+
+ apedata = APE_EVENT_STATUS_DRIVER_EVNT |
+ APE_EVENT_STATUS_SCRTCHPD_READ |
+ APE_EVENT_STATUS_EVENT_PENDING;
+ tg3_ape_write32(tp, TG3_APE_EVENT_STATUS, apedata);
+
+ tg3_ape_write32(tp, bufoff, base_off);
+ tg3_ape_write32(tp, bufoff + sizeof(u32), length);
+
+ tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+ tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
+
+ base_off += length;
+
+ if (tg3_ape_wait_for_event(tp, 30000))
+ return -EAGAIN;
+
+ for (i = 0; length; i += 4, length -= 4) {
+ u32 val = tg3_ape_read32(tp, msgoff + i);
+ memcpy(data, &val, sizeof(u32));
+ data++;
+ }
+ }
+
+ return 0;
+}
+
+int tg3_ape_scratchpad_write(struct tg3 *tp, u32 dstoff, u32 *data, u32 len)
+{
+ u32 i, bufoff, msgoff, maxlen, apedata;
+
+ if (!tg3_flag(tp, APE_HAS_NCSI))
+ return 0;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
+ if (apedata != APE_SEG_SIG_MAGIC)
+ return -ENODEV;
+
+ apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
+ if (!(apedata & APE_FW_STATUS_READY))
+ return -EAGAIN;
+
+ bufoff = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_OFF) +
+ TG3_APE_SHMEM_BASE;
+ msgoff = bufoff + 2 * sizeof(u32);
+ maxlen = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_LEN);
+
+ while (len) {
+ int err;
+ u32 length;
+
+ /* Cap xfer sizes to scratchpad limits. */
+ length = (len > maxlen) ? maxlen : len;
+ len -= length;
+
+ /* Wait for up to 1 millisecond for
+ * APE to service previous event.
+ */
+ err = tg3_ape_event_lock(tp, 1000);
+ if (err)
+ return err;
+
+ tg3_ape_write32(tp, bufoff, dstoff);
+ tg3_ape_write32(tp, bufoff + sizeof(u32), length);
+ apedata = msgoff;
+
+ dstoff += length;
+
+ for (i = 0; length; i += 4, length -= sizeof(u32)) {
+ tg3_ape_write32(tp, apedata, *data++);
+ apedata += sizeof(u32);
+ }
+
+ apedata = APE_EVENT_STATUS_DRIVER_EVNT |
+ APE_EVENT_STATUS_SCRTCHPD_WRITE |
+ APE_EVENT_STATUS_EVENT_PENDING;
+ tg3_ape_write32(tp, TG3_APE_EVENT_STATUS, apedata);
+
+ tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+ tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
+ }
+
+ return 0;
+}
+
static int tg3_ape_send_event(struct tg3 *tp, u32 event)
{
int err;
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 93865f8..d167a1c 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -2311,10 +2311,12 @@
#define APE_LOCK_REQ_DRIVER 0x00001000
#define TG3_APE_LOCK_GRANT 0x004c
#define APE_LOCK_GRANT_DRIVER 0x00001000
-#define TG3_APE_SEG_SIG 0x4000
-#define APE_SEG_SIG_MAGIC 0x41504521
+#define TG3_APE_STICKY_TMR 0x00b0
/* APE shared memory. Accessible through BAR1 */
+#define TG3_APE_SHMEM_BASE 0x4000
+#define TG3_APE_SEG_SIG 0x4000
+#define APE_SEG_SIG_MAGIC 0x41504521
#define TG3_APE_FW_STATUS 0x400c
#define APE_FW_STATUS_READY 0x00000100
#define TG3_APE_FW_FEATURES 0x4010
@@ -2327,6 +2329,8 @@
#define APE_FW_VERSION_REVMSK 0x0000ff00
#define APE_FW_VERSION_REVSFT 8
#define APE_FW_VERSION_BLDMSK 0x000000ff
+#define TG3_APE_SEG_MSG_BUF_OFF 0x401c
+#define TG3_APE_SEG_MSG_BUF_LEN 0x4020
#define TG3_APE_HOST_SEG_SIG 0x4200
#define APE_HOST_SEG_SIG_MAGIC 0x484f5354
#define TG3_APE_HOST_SEG_LEN 0x4204
@@ -2353,6 +2357,8 @@
#define APE_EVENT_STATUS_DRIVER_EVNT 0x00000010
#define APE_EVENT_STATUS_STATE_CHNGE 0x00000500
+#define APE_EVENT_STATUS_SCRTCHPD_READ 0x00001600
+#define APE_EVENT_STATUS_SCRTCHPD_WRITE 0x00001700
#define APE_EVENT_STATUS_STATE_START 0x00010000
#define APE_EVENT_STATUS_STATE_UNLOAD 0x00020000
#define APE_EVENT_STATUS_STATE_WOL 0x00030000
--
1.7.1
^ permalink raw reply related
* accept_ra_rt_info_max_plen default value
From: Jiri Bohac @ 2012-06-20 23:07 UTC (permalink / raw)
To: yoshfuji; +Cc: Teran McKinney, Pekka Savola, David Miller, netdev
Hi,
I have been looking for the reason behind the default of
accept_ra_rt_info_max_plen being 0. No luck.
The feature has been introduced by 930d6ff2 ([IPV6]: ROUTE: Add
accept_ra_rt_info_max_plen sysctl).
The only relevant discussion I found was
http://markmail.org/message/5m34bfzhox6y5lcf
with no explanation.
I imagine that the motivation for setting
accept_ra_rt_info_max_plen to 0 would be security concerns (?).
However, RFC 4191, section "6. Security Consideration", concludes
that the new features don't increase the risks already present:
A malicious node could send Router Advertisement messages, specifying
a High Default Router Preference or carrying specific routes, with
the effect of pulling traffic away from legitimate routers. However,
a malicious node could easily achieve this same effect in other ways.
For example, it could fabricate Router Advertisement messages with a
zero Router Lifetime from the other routers, causing hosts to stop
using the other routes. By advertising a specific prefix, this
attack could be carried out in a less noticeable way. However, this
attack has no significant incremental impact on Internet
infrastructure security.
RFC 6434 has been published since, and under 5.3. it says:
Small Office/Home Office (SOHO) deployments supported by routers
adhering to [RFC6204] use RFC 4191 to advertise routes to certain
local destinations. Consequently, nodes that will be deployed in
SOHO environments SHOULD implement RFC 4191.
Shouldn't the default value of accept_ra_rt_info_max_plen be
re-considered to comply with RFC 6434 by default? Any reason not
to make it 128?
Thanks,
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ
^ permalink raw reply
* RE: [PATCH net-next 2/6] bnx2x: link cleanup
From: Joe Perches @ 2012-06-20 22:41 UTC (permalink / raw)
To: Yuval Mintz
Cc: davem@davemloft.net, netdev@vger.kernel.org, Eilon Greenstein,
Yaniv Rosner
In-Reply-To: <979A8436335E3744ADCD3A9F2A2B68A5029F62@SJEXCHMB10.corp.ad.broadcom.com>
On Wed, 2012-06-20 at 17:50 +0000, Yuval Mintz wrote:
> > > 3. Change msleep(1) --> usleep_range(1000, 1000)
> >
> > I believe replacing msleep(small) with
> > usleep_range(small * 1000, small * 1000) is
> > not generally a good idea.
> >
> > Please give usleep_range an actual range to
> > work with and not a repeated single value.
> >
> > Please think a little more about what a
> > good upper range for the maximum time to
> > sleep should be.
> >
> > usleep_range(small * 1000, small * 2000)
> > or something similar maybe.
> >
>
> Sounds good. I'll change it and re-send the patch series.
Hi Yuval.
Here's a little script from awhile ago that
does it by doubling the small value as the
high value range bound.
http://kerneltrap.org/mailarchive/linux-netdev/2010/12/2/6290711
(replace [path] as appropriate)
$ grep -nPrl --include=*.[ch] "msleep\s*\(\s*1?\d\s*\)" [path] \
xargs perl -p -i -e 's/msleep\s*\(\s*(1?\d)\s*\)/"usleep_range\(${1}000, " . scalar($1) * 2 . "000\)"/ge'
^ permalink raw reply
* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-20 22:29 UTC (permalink / raw)
To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <1340195920.4604.918.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 14:38:40 +0200
> Problem could happen if sk->sk_rx_dst is freed while some packets are
> still in napi or socket backlog (can happen with some network
> reordering)
>
> 1) Socket backlog must be flushed before sk->sk_rx_dst freeing
>
> 2) Even if we move rcu_read_lock() in net_rx_action(), we need some
> napi_gro_forcedstrefs() in case we sofnet_break
>
> Or maybe just use napi_gro_flush() ?
Good catch, but I've just figured out a more fundamental issue
with doing this at the GRO layer.
The IPV4 input path is going to undo our early socket demux by
orphaning the SKB in ip_rcv(). So we'll end up looking up the
socket twice.
^ permalink raw reply
* Re: [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-06-20 22:26 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1340181903-16382-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 20 Jun 2012 01:44:54 -0700
> This series contains updates to e1000, igb and ixgbe
>
> The following are changes since commit 41063e9dd11956f2d285e12e4342e1d232ba0ea2:
> ipv4: Early TCP socket demux.
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Pulled, thanks Jeff.
^ permalink raw reply
* Re: divide by 0 error in igbvf_set_coalesce - ab50a2a
From: David Ahern @ 2012-06-20 22:21 UTC (permalink / raw)
To: Williams, Mitch A; +Cc: netdev@vger.kernel.org
In-Reply-To: <AAEA33E297BCAC4B9BB20A7C2DF0AB8D15B30910@FMSMSX107.amr.corp.intel.com>
On 6/18/12 2:45 PM, Williams, Mitch A wrote:
> Thanks for letting me know, David. I'll look into it and get a patch out soon. Shouldn't be that big of a deal to fix.
Could you CC me on the patch so I know when it's fixed? I have enough
events to poll.
>
> In the meantime, my advice to you is, "Don't do that."
Uh, yea. Figured that part out. ;-)
Thanks,
David
^ permalink raw reply
* [PATCH] r8169: RxConfig hack for the 8168evl.
From: Francois Romieu @ 2012-06-20 22:09 UTC (permalink / raw)
To: Hayes Wang; +Cc: netdev, thomas.pi
The 8168evl (RTL_GIGA_MAC_VER_34) based Gigabyte GA-990FXA motherboards
are very prone to NETDEV watchdog problems without this change. See
https://bugzilla.kernel.org/show_bug.cgi?id=42899 for instance.
I don't know why it *works*. It's depressingly effective though.
For the record:
- the problem may go along IOMMU (AMD-Vi) errors but it really looks
like a red herring.
- the patch sets the RX_MULTI_EN bit. If the 8168c doc is any guide,
the chipset now fetches several Rx descriptors at a time.
- long ago the driver ignored the RX_MULTI_EN bit.
e542a2269f232d61270ceddd42b73a4348dee2bb changed the RxConfig
settings. Whatever the problem it's now labeled a regression.
- Realtek's own driver can identify two different 8168evl devices
(CFG_METHOD_16 and CFG_METHOD_17) where the r8169 driver only
sees one. It sucks.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
Hayes, any hindsight ?
drivers/net/ethernet/realtek/r8169.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 7260aa7..d7a04e0 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3894,6 +3894,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_22:
case RTL_GIGA_MAC_VER_23:
case RTL_GIGA_MAC_VER_24:
+ case RTL_GIGA_MAC_VER_34:
RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
break;
default:
--
1.7.10.2
^ permalink raw reply related
* Re: [net 0/3][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-06-20 22:09 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1340181882-16333-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 20 Jun 2012 01:44:39 -0700
> This series contains fixes to igb, ixgbe and intel/Kconfig
>
> The following are changes since commit 2c995ff892313009e336ecc8ec3411022f5b1c39:
> batman-adv: fix skb->data assignment
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master
>
> Alexander Duyck (1):
> ixgbe: Fix memory leak in ixgbe when receiving traffic on DDP enabled
> rings
>
> Carolyn Wyborny (2):
> igb: Fix incorrect RAR address entries for i210/i211 device.
> Kconfig: Fix Kconfig for Intel ixgbe and igb PTP support.
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH net-next] inetpeer: inetpeer_invalidate_tree() cleanup
From: David Miller @ 2012-06-20 21:39 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, steffen.klassert
In-Reply-To: <1340200930.4604.1028.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 16:02:10 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> No need to use cmpxchg() in inetpeer_invalidate_tree() since we hold
> base lock.
>
> Also use correct rcu annotations to remove sparse errors
> (CONFIG_SPARSE_RCU_POINTER=y)
>
> net/ipv4/inetpeer.c:144:19: error: incompatible types in comparison
> expression (different address spaces)
> net/ipv4/inetpeer.c:149:20: error: incompatible types in comparison
> expression (different address spaces)
> net/ipv4/inetpeer.c:595:10: error: incompatible types in comparison
> expression (different address spaces)
>
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [patch net-next 0/2] team: two RCU fixups
From: David Miller @ 2012-06-20 21:27 UTC (permalink / raw)
To: eric.dumazet; +Cc: jpirko, netdev, jbrouer, paulmck, wfg
In-Reply-To: <1340227176.4604.1913.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 23:19:36 +0200
> On Wed, 2012-06-20 at 14:05 -0700, David Miller wrote:
>> From: Jiri Pirko <jpirko@redhat.com>
>> Date: Wed, 20 Jun 2012 17:31:59 +0200
>>
>> > Jiri Pirko (2):
>> > team: use rcu_access_pointer to access RCU pointer by writer
>> > team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
>>
>> Applied, but this makes your subsequent patch not apply.
>
> I reviewed them and spotted problems, and you applied them...
>
> Then Jiri sent an update.
Sorry, I'll fix this up.
^ permalink raw reply
* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-20 21:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: shemminger, bhutchings, netdev
In-Reply-To: <1340227076.4604.1905.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 23:17:56 +0200
> In most routers setups I used, I had to disable GRO, because 64Kbytes
> packets on output path broke the tc setups (SFQ)
Then you speak of bugs and mis-features, rather than real fundamental
disadvantages of using GRO on a router :-)
> netfilter cost was hardly a problem, once correctly done.
But cost is not zero, and if you can divide it by N then you do it.
And GRO is what allows this.
Every demux, lookup, etc. is transaction cost.
Even routing cache lookup with no dst reference, which is _very_
cheap, takes up a serious amount of cpu cycles. Enough that we think
early demux is worth it, right?
And such a routing cache lookup is significantly cheaper than a trip
down into netfilter.
^ permalink raw reply
* Re: [patch net-next 0/2] team: two RCU fixups
From: Eric Dumazet @ 2012-06-20 21:19 UTC (permalink / raw)
To: David Miller; +Cc: jpirko, netdev, jbrouer, paulmck, wfg
In-Reply-To: <20120620.140516.2004640533824596305.davem@davemloft.net>
On Wed, 2012-06-20 at 14:05 -0700, David Miller wrote:
> From: Jiri Pirko <jpirko@redhat.com>
> Date: Wed, 20 Jun 2012 17:31:59 +0200
>
> > Jiri Pirko (2):
> > team: use rcu_access_pointer to access RCU pointer by writer
> > team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
>
> Applied, but this makes your subsequent patch not apply.
I reviewed them and spotted problems, and you applied them...
Then Jiri sent an update.
^ permalink raw reply
* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: Eric Dumazet @ 2012-06-20 21:17 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, bhutchings, netdev
In-Reply-To: <20120620140454.36847c65@s6510.linuxnetplumber.net>
On Wed, 2012-06-20 at 14:04 -0700, Stephen Hemminger wrote:
> On Wed, 20 Jun 2012 14:01:21 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 20 Jun 2012 20:40:04 +0200
> >
> > > If someone wants to tune its linux router, he probably already disables
> > > GRO because of various issues with too big packets.
> > >
> > > GRO adds a significant cost to forwarding path.
> >
> > No, Ben is right Eric. GRO decreases the costs, because it means we
> > only need to make one forwarding/netfilter/classification decision for
> > N packets instead of 1.
>
> GRO is also important for routers that interact with VM's.
> It helps reduce the per-packet wakeup of the guest VM's.
I spoke of mere routers, I was _not_ saying GRO is useless.
In most routers setups I used, I had to disable GRO, because 64Kbytes
packets on output path broke the tc setups (SFQ)
netfilter cost was hardly a problem, once correctly done.
^ permalink raw reply
* Re: [patch net-next 0/2] team: two RCU fixups
From: David Miller @ 2012-06-20 21:05 UTC (permalink / raw)
To: jpirko; +Cc: netdev, eric.dumazet, jbrouer, paulmck, wfg
In-Reply-To: <1340206321-5986-1-git-send-email-jpirko@redhat.com>
From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 20 Jun 2012 17:31:59 +0200
> Jiri Pirko (2):
> team: use rcu_access_pointer to access RCU pointer by writer
> team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
Applied, but this makes your subsequent patch not apply.
^ permalink raw reply
* Re: [PATCH v3] ipv4: Early TCP socket demux.
From: Julian Anastasov @ 2012-06-20 21:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120620.023002.243497856926894946.davem@davemloft.net>
Hello,
On Wed, 20 Jun 2012, David Miller wrote:
> > Date: Wed, 20 Jun 2012 10:00:37 +0300 (EEST)
> >
> >> if (skb->dev != dst->dev)
> >> dst = NULL;
> >
> > That makes the most sense.
>
> Doesn't work, dst->dev is &net->loopback_dev for these locally
> destined input routes.
I see, correct.
> We have to instead check rt->rt_iif or similar.
Yes, rt_iif should be valid for packets with
skb->dst = NULL. It is incorrect only on loopback
traffic diverted to "lo", i.e. when skb->dst != NULL.
But it concerns UDP which is not handled by GRO yet.
When UDP support for GRO is implemented
dev_gro_receive() should additionally check skb_dst
to ignore local copy of b-m/cast traffic sent via
ip_mc_output -> ip_dev_loopback_xmit because
in this case dst->dev and skb->dst can be eth0
where NETIF_F_GRO can be set.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [patch net-next] team: do RCU update path fixups
From: David Miller @ 2012-06-20 21:05 UTC (permalink / raw)
To: eric.dumazet; +Cc: jpirko, netdev, jbrouer, paulmck, wfg
In-Reply-To: <1340219643.4604.1641.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 21:14:03 +0200
> On Wed, 2012-06-20 at 20:39 +0200, Jiri Pirko wrote:
>> Use rcu_access_pointer and rcu_dereference_protected
>> to access RCU pointer by updater.
>> Use RCU_INIT_POINTER for NULL assignment of RCU pointer.
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> ---
>> drivers/net/team/team_mode_activebackup.c | 8 ++++++--
>> drivers/net/team/team_mode_loadbalance.c | 14 ++++++++++----
>> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> Seems good to me, thanks.
>
> Acked-by: Eric Dumazet <edumazet@google.com>
This patch doesn't apply after the 2 patch set you sent right
before this one.
^ permalink raw reply
* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: Stephen Hemminger @ 2012-06-20 21:04 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, bhutchings, netdev
In-Reply-To: <20120620.140121.1603737472432326278.davem@davemloft.net>
On Wed, 20 Jun 2012 14:01:21 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 20 Jun 2012 20:40:04 +0200
>
> > If someone wants to tune its linux router, he probably already disables
> > GRO because of various issues with too big packets.
> >
> > GRO adds a significant cost to forwarding path.
>
> No, Ben is right Eric. GRO decreases the costs, because it means we
> only need to make one forwarding/netfilter/classification decision for
> N packets instead of 1.
GRO is also important for routers that interact with VM's.
It helps reduce the per-packet wakeup of the guest VM's.
^ permalink raw reply
* Re: [PATCH v2] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-20 21:01 UTC (permalink / raw)
To: eric.dumazet; +Cc: bhutchings, shemminger, netdev
In-Reply-To: <1340217604.4604.1569.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 20:40:04 +0200
> If someone wants to tune its linux router, he probably already disables
> GRO because of various issues with too big packets.
>
> GRO adds a significant cost to forwarding path.
No, Ben is right Eric. GRO decreases the costs, because it means we
only need to make one forwarding/netfilter/classification decision for
N packets instead of 1.
^ permalink raw reply
* Re: [PATCH v2] can: c_can_pci: fix compilation on non HAVE_CLK archs
From: David Miller @ 2012-06-20 20:56 UTC (permalink / raw)
To: mkl; +Cc: netdev, linux-can, federico.vaga
In-Reply-To: <1340208266-22098-1-git-send-email-mkl@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 20 Jun 2012 18:04:26 +0200
> In commit:
>
> 5b92da0 c_can_pci: generic module for C_CAN/D_CAN on PCI
>
> the c_can_pci driver has been added. It uses clk_*() functions
> resulting in a link error on archs without clock support. This
> patch removed these clk_() functions as these parts of the driver
> are not tested.
>
> Cc: Federico Vaga <federico.vaga@gmail.com>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Applied.
^ permalink raw reply
* Re: [RFC net-next 00/14] default maximal number of RSS queues in mq drivers
From: Ben Hutchings @ 2012-06-20 20:48 UTC (permalink / raw)
To: Yuval Mintz
Cc: netdev, davem, eilong, Divy Le Ray, Or Gerlitz, Jon Mason,
Anirban Chakraborty, Jitendra Kalsaria, Ron Mercer, Jeff Kirsher,
Jon Mason, Andrew Gallatin, Sathya Perla, Subbu Seetharaman,
Ajit Khaparde, Matt Carlson, Michael Chan
In-Reply-To: <1340225015.2576.27.camel@bwh-desktop.uk.solarflarecom.com>
Also, I would recommend encapsulating the calculation of default number
of RSS queues in a function, rather than repeating it in every driver.
That will make it easier to replace with something more sophisticated
and configurable later on.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [RFC net-next 00/14] default maximal number of RSS queues in mq drivers
From: Ben Hutchings @ 2012-06-20 20:43 UTC (permalink / raw)
To: Yuval Mintz
Cc: netdev, davem, eilong, Divy Le Ray, Or Gerlitz, Jon Mason,
Anirban Chakraborty, Jitendra Kalsaria, Ron Mercer, Jeff Kirsher,
Jon Mason, Andrew Gallatin, Sathya Perla, Subbu Seetharaman,
Ajit Khaparde, Matt Carlson, Michael Chan
In-Reply-To: <1340118848-30978-1-git-send-email-yuvalmin@broadcom.com>
On Tue, 2012-06-19 at 18:13 +0300, Yuval Mintz wrote:
> Different vendors support different number of RSS queues by default. Today,
> there exists an ethtool API through which users can change the number of
> channels their driver supports; This enables us to pursue the goal of using
> a default number of RSS queues in various multi-queue drivers.
>
> This RFC intendeds to achieve the above default, by upper-limiting the number
> of interrupts multi-queue drivers request (by default, not via the new API)
> with correlation to the number of cpus on the machine.
>
> After examining multi-queue drivers that call alloc_etherdev_mq[s],
> it became evident that most drivers allocate their devices using hard-coded
> values. Changing those defaults directly will most likely cause a regression.
>
> However, (most) multi-queue driver look at the number of online cpus when
> requesting for interrupts. We assume that the number of interrupts the
> driver manages to request is propagated across the driver, and the number
> of RSS queues it configures is based upon it.
>
> This RFC modifies said logic - if the number of cpus is large enough, use
> a smaller default value instead. This serves 2 main purposes:
> 1. A step forward unity in the number of RSS queues of various drivers.
> 2. It prevents wasteful requests for interrupts on machines with many cpus.
[...]
> Driver identified as multi-queue, no reference to number of online cpus found,
> and thus unhandled in this RFC:
[...]
> * sfc efx
[...]
In sfc we currently look at the CPU topology to count cores instead of
threads. The result is the same unless the system has hyperthreading
(or other SMT) enabled.
I've seen many diagnostic reports from customer support tickets where
there were 32 queue-sets and MSI-X vectors in use (the maximum currently
supported by the driver), but very few had a problem with that.
I would be interested in a scheme to use fewer queues for RSS but more
for flow steering (accelerated RFS, XPS and ethtool NFC). We had some
discussion of this at last year's netconf but sadly I've not yet found
time to work on it.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* [PATCH][RESEND] bonding: delete migrated IP addresses from the rlb hash table
From: Jiri Bohac @ 2012-06-20 20:37 UTC (permalink / raw)
To: Jay Vosburgh, Andy Gospodarek, netdev
Hi, this is a resend of the patch discussed here:
http://thread.gmane.org/gmane.linux.network/228076
It has been updated to apply to the lastest net-next.
Bonding in balance-alb mode records information from ARP packets
passing through the bond in a hash table (rx_hashtbl).
At certain situations (e.g. link change of a slave),
rlb_update_rx_clients() will send out ARP packets to update ARP
caches of other hosts on the network to achieve RX load
balancing.
The problem is that once an IP address is recorded in the hash
table, it stays there indefinitely. If this IP address is
migrated to a different host in the network, bonding still sends
out ARP packets that poison other systems' ARP caches with
invalid information.
This patch solves this by looking at all incoming ARP packets,
and checking if the source IP address is one of the source
addresses stored in the rx_hashtbl. If it is, but the MAC
addresses differ, the corresponding hash table entries are
removed. Thus, when an IP address is migrated, the first ARP
broadcast by its new owner will purge the offending entries of
rx_hashtbl.
The hash table is hashed by ip_dst. To be able to do the above
check efficiently (not walking the whole hash table), we need a
reverse mapping (by ip_src).
I added three new members in struct rlb_client_info:
rx_hashtbl[x].src_first will point to the start of a list of
entries for which hash(ip_src) == x.
The list is linked with src_next and src_prev.
When an incoming ARP packet arrives at rlb_arp_recv()
rlb_purge_src_ip() can quickly walk only the entries on the
corresponding lists, i.e. the entries that are likely to contain
the offending IP address.
To avoid confusion, I renamed these existing fields of struct
rlb_client_info:
next -> used_next
prev -> used_prev
rx_hashtbl_head -> rx_hashtbl_used_head
(The current linked list is _not_ a list of hash table
entries with colliding ip_dst. It's a list of entries that are
being used; its purpose is to avoid walking the whole hash table
when looking for used entries.)
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index e15cc11..8505a24 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -84,6 +84,9 @@ static inline struct arp_pkt *arp_pkt(const struct sk_buff *skb)
/* Forward declaration */
static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[]);
+static void rlb_purge_src_ip(struct bonding *bond, struct arp_pkt *arp);
+static void rlb_src_unlink(struct bonding *bond, u32 index);
+static void rlb_src_link(struct bonding *bond, u32 ip_src_hash, u32 ip_dst_hash);
static inline u8 _simple_hash(const u8 *hash_start, int hash_size)
{
@@ -354,6 +357,17 @@ static int rlb_arp_recv(const struct sk_buff *skb, struct bonding *bond,
if (!arp)
goto out;
+ /* We received an ARP from arp->ip_src.
+ * We might have used this IP address previously (on the bonding host
+ * itself or on a system that is bridged together with the bond).
+ * However, if arp->mac_src is different than what is stored in
+ * rx_hashtbl, some other host is now using the IP and we must prevent
+ * sending out client updates with this IP address and the old MAC address.
+ * Clean up all hash table entries that have this address as ip_src but
+ * have a dirrerent mac_src.
+ */
+ rlb_purge_src_ip(bond, arp);
+
if (arp->op_code == htons(ARPOP_REPLY)) {
/* update rx hash table for this ARP */
rlb_update_entry_from_arp(bond, arp);
@@ -432,9 +446,9 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
_lock_rx_hashtbl_bh(bond);
rx_hash_table = bond_info->rx_hashtbl;
- index = bond_info->rx_hashtbl_head;
+ index = bond_info->rx_hashtbl_used_head;
for (; index != RLB_NULL_INDEX; index = next_index) {
- next_index = rx_hash_table[index].next;
+ next_index = rx_hash_table[index].used_next;
if (rx_hash_table[index].slave == slave) {
struct slave *assigned_slave = rlb_next_rx_slave(bond);
@@ -519,8 +533,8 @@ static void rlb_update_rx_clients(struct bonding *bond)
_lock_rx_hashtbl_bh(bond);
- hash_index = bond_info->rx_hashtbl_head;
- for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+ hash_index = bond_info->rx_hashtbl_used_head;
+ for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->used_next) {
client_info = &(bond_info->rx_hashtbl[hash_index]);
if (client_info->ntt) {
rlb_update_client(client_info);
@@ -548,8 +562,8 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla
_lock_rx_hashtbl_bh(bond);
- hash_index = bond_info->rx_hashtbl_head;
- for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+ hash_index = bond_info->rx_hashtbl_used_head;
+ for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->used_next) {
client_info = &(bond_info->rx_hashtbl[hash_index]);
if ((client_info->slave == slave) &&
@@ -578,8 +592,8 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip)
_lock_rx_hashtbl(bond);
- hash_index = bond_info->rx_hashtbl_head;
- for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+ hash_index = bond_info->rx_hashtbl_used_head;
+ for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->used_next) {
client_info = &(bond_info->rx_hashtbl[hash_index]);
if (!client_info->slave) {
@@ -625,6 +639,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
/* update mac address from arp */
memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
}
+ memcpy(client_info->mac_src, arp->mac_src, ETH_ALEN);
assigned_slave = client_info->slave;
if (assigned_slave) {
@@ -647,6 +662,13 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
assigned_slave = rlb_next_rx_slave(bond);
if (assigned_slave) {
+ if (!(client_info->assigned && client_info->ip_src == arp->ip_src)) {
+ /* ip_src is going to be updated, fix the src hash list */
+ u32 hash_src = _simple_hash((u8 *)&arp->ip_src, sizeof(arp->ip_src));
+ rlb_src_unlink(bond, hash_index);
+ rlb_src_link(bond, hash_src, hash_index);
+ }
+
client_info->ip_src = arp->ip_src;
client_info->ip_dst = arp->ip_dst;
/* arp->mac_dst is broadcast for arp reqeusts.
@@ -654,6 +676,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
* upon receiving an arp reply.
*/
memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
+ memcpy(client_info->mac_src, arp->mac_src, ETH_ALEN);
client_info->slave = assigned_slave;
if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
@@ -669,11 +692,11 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
}
if (!client_info->assigned) {
- u32 prev_tbl_head = bond_info->rx_hashtbl_head;
- bond_info->rx_hashtbl_head = hash_index;
- client_info->next = prev_tbl_head;
+ u32 prev_tbl_head = bond_info->rx_hashtbl_used_head;
+ bond_info->rx_hashtbl_used_head = hash_index;
+ client_info->used_next = prev_tbl_head;
if (prev_tbl_head != RLB_NULL_INDEX) {
- bond_info->rx_hashtbl[prev_tbl_head].prev =
+ bond_info->rx_hashtbl[prev_tbl_head].used_prev =
hash_index;
}
client_info->assigned = 1;
@@ -740,8 +763,8 @@ static void rlb_rebalance(struct bonding *bond)
_lock_rx_hashtbl_bh(bond);
ntt = 0;
- hash_index = bond_info->rx_hashtbl_head;
- for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+ hash_index = bond_info->rx_hashtbl_used_head;
+ for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->used_next) {
client_info = &(bond_info->rx_hashtbl[hash_index]);
assigned_slave = rlb_next_rx_slave(bond);
if (assigned_slave && (client_info->slave != assigned_slave)) {
@@ -759,11 +782,113 @@ static void rlb_rebalance(struct bonding *bond)
}
/* Caller must hold rx_hashtbl lock */
+static void rlb_init_table_entry_dst(struct rlb_client_info *entry)
+{
+ entry->used_next = RLB_NULL_INDEX;
+ entry->used_prev = RLB_NULL_INDEX;
+ entry->assigned = 0;
+ entry->slave = NULL;
+ entry->tag = 0;
+}
+static void rlb_init_table_entry_src(struct rlb_client_info *entry)
+{
+ entry->src_first = RLB_NULL_INDEX;
+ entry->src_prev = RLB_NULL_INDEX;
+ entry->src_next = RLB_NULL_INDEX;
+}
+
static void rlb_init_table_entry(struct rlb_client_info *entry)
{
memset(entry, 0, sizeof(struct rlb_client_info));
- entry->next = RLB_NULL_INDEX;
- entry->prev = RLB_NULL_INDEX;
+ rlb_init_table_entry_dst(entry);
+ rlb_init_table_entry_src(entry);
+}
+
+static void rlb_delete_table_entry_dst(struct bonding *bond, u32 index)
+{
+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+ u32 next_index = bond_info->rx_hashtbl[index].used_next;
+ u32 prev_index = bond_info->rx_hashtbl[index].used_prev;
+
+ if (index == bond_info->rx_hashtbl_used_head)
+ bond_info->rx_hashtbl_used_head = next_index;
+ if (prev_index != RLB_NULL_INDEX)
+ bond_info->rx_hashtbl[prev_index].used_next = next_index;
+ if (next_index != RLB_NULL_INDEX)
+ bond_info->rx_hashtbl[next_index].used_prev = prev_index;
+}
+
+/* unlink a rlb hash table entry from the src list */
+static void rlb_src_unlink(struct bonding *bond, u32 index)
+{
+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+ u32 next_index = bond_info->rx_hashtbl[index].src_next;
+ u32 prev_index = bond_info->rx_hashtbl[index].src_prev;
+
+ bond_info->rx_hashtbl[index].src_next = RLB_NULL_INDEX;
+ bond_info->rx_hashtbl[index].src_prev = RLB_NULL_INDEX;
+
+ if (next_index != RLB_NULL_INDEX)
+ bond_info->rx_hashtbl[next_index].src_prev = prev_index;
+
+ if (prev_index == RLB_NULL_INDEX)
+ return;
+
+ /* is prev_index pointing to the head of this list? */
+ if (bond_info->rx_hashtbl[prev_index].src_first == index)
+ bond_info->rx_hashtbl[prev_index].src_first = next_index;
+ else
+ bond_info->rx_hashtbl[prev_index].src_next = next_index;
+
+}
+
+static void rlb_delete_table_entry(struct bonding *bond, u32 index)
+{
+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+ struct rlb_client_info *entry = &(bond_info->rx_hashtbl[index]);
+
+ rlb_delete_table_entry_dst(bond, index);
+ rlb_init_table_entry_dst(entry);
+
+ rlb_src_unlink(bond, index);
+}
+
+/* add the rx_hashtbl[ip_dst_hash] entry to the list
+ * of entries with identical ip_src_hash
+ */
+static void rlb_src_link(struct bonding *bond, u32 ip_src_hash, u32 ip_dst_hash)
+{
+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+ u32 next;
+
+ bond_info->rx_hashtbl[ip_dst_hash].src_prev = ip_src_hash;
+ next = bond_info->rx_hashtbl[ip_src_hash].src_first;
+ bond_info->rx_hashtbl[ip_dst_hash].src_next = next;
+ if (next != RLB_NULL_INDEX)
+ bond_info->rx_hashtbl[next].src_prev = ip_dst_hash;
+ bond_info->rx_hashtbl[ip_src_hash].src_first = ip_dst_hash;
+}
+
+/* deletes all rx_hashtbl entries with arp->ip_src if their mac_src does
+ * not match arp->mac_src */
+static void rlb_purge_src_ip(struct bonding *bond, struct arp_pkt *arp)
+{
+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+ u32 ip_src_hash = _simple_hash((u8*)&(arp->ip_src), sizeof(arp->ip_src));
+ u32 index;
+
+ _lock_rx_hashtbl_bh(bond);
+
+ index = bond_info->rx_hashtbl[ip_src_hash].src_first;
+ while (index != RLB_NULL_INDEX) {
+ struct rlb_client_info *entry = &(bond_info->rx_hashtbl[index]);
+ u32 next_index = entry->src_next;
+ if (entry->ip_src == arp->ip_src &&
+ !ether_addr_equal_64bits(arp->mac_src, entry->mac_src))
+ rlb_delete_table_entry(bond, index);
+ index = next_index;
+ }
+ _unlock_rx_hashtbl_bh(bond);
}
static int rlb_initialize(struct bonding *bond)
@@ -781,7 +906,7 @@ static int rlb_initialize(struct bonding *bond)
bond_info->rx_hashtbl = new_hashtbl;
- bond_info->rx_hashtbl_head = RLB_NULL_INDEX;
+ bond_info->rx_hashtbl_used_head = RLB_NULL_INDEX;
for (i = 0; i < RLB_HASH_TABLE_SIZE; i++) {
rlb_init_table_entry(bond_info->rx_hashtbl + i);
@@ -803,7 +928,7 @@ static void rlb_deinitialize(struct bonding *bond)
kfree(bond_info->rx_hashtbl);
bond_info->rx_hashtbl = NULL;
- bond_info->rx_hashtbl_head = RLB_NULL_INDEX;
+ bond_info->rx_hashtbl_used_head = RLB_NULL_INDEX;
_unlock_rx_hashtbl_bh(bond);
}
@@ -815,25 +940,13 @@ static void rlb_clear_vlan(struct bonding *bond, unsigned short vlan_id)
_lock_rx_hashtbl_bh(bond);
- curr_index = bond_info->rx_hashtbl_head;
+ curr_index = bond_info->rx_hashtbl_used_head;
while (curr_index != RLB_NULL_INDEX) {
struct rlb_client_info *curr = &(bond_info->rx_hashtbl[curr_index]);
- u32 next_index = bond_info->rx_hashtbl[curr_index].next;
- u32 prev_index = bond_info->rx_hashtbl[curr_index].prev;
-
- if (curr->tag && (curr->vlan_id == vlan_id)) {
- if (curr_index == bond_info->rx_hashtbl_head) {
- bond_info->rx_hashtbl_head = next_index;
- }
- if (prev_index != RLB_NULL_INDEX) {
- bond_info->rx_hashtbl[prev_index].next = next_index;
- }
- if (next_index != RLB_NULL_INDEX) {
- bond_info->rx_hashtbl[next_index].prev = prev_index;
- }
+ u32 next_index = bond_info->rx_hashtbl[curr_index].used_next;
- rlb_init_table_entry(curr);
- }
+ if (curr->tag && (curr->vlan_id == vlan_id))
+ rlb_delete_table_entry(bond, curr_index);
curr_index = next_index;
}
diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
index 90f140a..1fbc938 100644
--- a/drivers/net/bonding/bond_alb.h
+++ b/drivers/net/bonding/bond_alb.h
@@ -100,9 +100,18 @@ struct tlb_client_info {
struct rlb_client_info {
__be32 ip_src; /* the server IP address */
__be32 ip_dst; /* the client IP address */
+ u8 mac_src[ETH_ALEN]; /* the server MAC address */
u8 mac_dst[ETH_ALEN]; /* the client MAC address */
- u32 next; /* The next Hash table entry index */
- u32 prev; /* The previous Hash table entry index */
+
+ /* list of used hash table entries, starting at rx_hashtbl_used_head */
+ u32 used_next;
+ u32 used_prev;
+
+ /* ip_src based hashing */
+ u32 src_next; /* next entry with same hash(ip_src) */
+ u32 src_prev; /* prev entry with same hash(ip_src) */
+ u32 src_first; /* first entry with hash(ip_src) == this entry's index */
+
u8 assigned; /* checking whether this entry is assigned */
u8 ntt; /* flag - need to transmit client info */
struct slave *slave; /* the slave assigned to this client */
@@ -131,7 +140,7 @@ struct alb_bond_info {
int rlb_enabled;
struct rlb_client_info *rx_hashtbl; /* Receive hash table */
spinlock_t rx_hashtbl_lock;
- u32 rx_hashtbl_head;
+ u32 rx_hashtbl_used_head;
u8 rx_ntt; /* flag - need to transmit
* to all rx clients
*/
diff --git a/drivers/net/bonding/bond_debugfs.c b/drivers/net/bonding/bond_debugfs.c
index 3680aa2..a570843 100644
--- a/drivers/net/bonding/bond_debugfs.c
+++ b/drivers/net/bonding/bond_debugfs.c
@@ -31,8 +31,8 @@ static int bond_debug_rlb_hash_show(struct seq_file *m, void *v)
spin_lock_bh(&(BOND_ALB_INFO(bond).rx_hashtbl_lock));
- hash_index = bond_info->rx_hashtbl_head;
- for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->next) {
+ hash_index = bond_info->rx_hashtbl_used_head;
+ for (; hash_index != RLB_NULL_INDEX; hash_index = client_info->used_next) {
client_info = &(bond_info->rx_hashtbl[hash_index]);
seq_printf(m, "%-15pI4 %-15pI4 %-17pM %s\n",
&client_info->ip_src,
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox