* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
From: Kirill A. Shutemov @ 2014-12-03 14:14 UTC (permalink / raw)
To: Oleg Nesterov, David S. Miller, Linus Torvalds
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
Eric W. Biederman, Kirill A. Shutemov, Peter Zijlstra,
Sasha Levin, linux-fsdevel, linux-kernel, Alexey Dobriyan, netdev
In-Reply-To: <20140805194655.GA30728@redhat.com>
On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
>
> cat /proc/self/maps >/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
>
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.
Oleg, I see it again with almost the same test-case:
cat /proc/self/stack >/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
Looks like bunch of proc files were converted to use seq_file by Alexey
Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
More generic test-case:
find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
David, any justification for allowing chmod +x for files under
/proc/pid/net?
[ 2.042212] ======================================================
[ 2.042930] [ INFO: possible circular locking dependency detected ]
[ 2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
[ 2.044350] -------------------------------------------------------
[ 2.045054] sh/94 is trying to acquire lock:
[ 2.045546] (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781]
[ 2.045781] but task is already holding lock:
[ 2.045781] (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[ 2.045781]
[ 2.045781] which lock already depends on the new lock.
[ 2.045781]
[ 2.045781]
[ 2.045781] the existing dependency chain (in reverse order) is:
[ 2.045781]
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
[ 2.045781] [<ffffffff81229de4>] lock_trace+0x24/0x70
[ 2.045781] [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
[ 2.045781] [<ffffffff81227244>] proc_single_show+0x54/0xa0
[ 2.045781] [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
[ 2.045781] [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
[ 2.045781]
-> #0 (&p->lock){+.+.+.}:
[ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
[ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
[ 2.045781]
[ 2.045781] other info that might help us debug this:
[ 2.045781]
[ 2.045781] Possible unsafe locking scenario:
[ 2.045781]
[ 2.045781] CPU0 CPU1
[ 2.045781] ---- ----
[ 2.045781] lock(&sig->cred_guard_mutex);
[ 2.045781] lock(&p->lock);
[ 2.045781] lock(&sig->cred_guard_mutex);
[ 2.045781] lock(&p->lock);
[ 2.045781]
[ 2.045781] *** DEADLOCK ***
[ 2.045781]
[ 2.045781] 1 lock held by sh/94:
[ 2.045781] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[ 2.045781]
[ 2.045781] stack backtrace:
[ 2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
[ 2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.045781] ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
[ 2.045781] ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
[ 2.045781] ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
[ 2.045781] Call Trace:
[ 2.045781] [<ffffffff81844a85>] dump_stack+0x4e/0x68
[ 2.045781] [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
[ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[ 2.045781] [<ffffffff8108fa76>] ? local_clock+0x16/0x30
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[ 2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
[ 2.045781] [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
[ 2.045781] [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
[ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
[ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
--
Kirill A. Shutemov
^ permalink raw reply
* Re: [PATCH v2 02/19] kbuild: kselftest_install - add a new make target to install selftests
From: Shuah Khan @ 2014-12-03 14:14 UTC (permalink / raw)
To: Michal Marek, gregkh, akpm, davem, keescook, tranmanphong,
dh.herrmann, hughd, bobby.prani, ebiederm, serge.hallyn
Cc: linux-kbuild, linux-kernel, linux-api, netdev,
masami.hiramatsu.pt, Shuah Khan
In-Reply-To: <547EFD6F.2010204@suse.cz>
On 12/03/2014 05:09 AM, Michal Marek wrote:
> On 2014-12-01 17:39, Shuah Khan wrote:
>> On 12/01/2014 08:47 AM, Michal Marek wrote:
>>> On 2014-11-11 21:27, Shuah Khan wrote:
>>>> diff --git a/Makefile b/Makefile
>>>> index 05d67af..ccbd2e1 100644
>>>> --- a/Makefile
>>>> +++ b/Makefile
>>>> @@ -1071,12 +1071,26 @@ headers_check: headers_install
>>>> $(Q)$(MAKE) $(hdr-inst)=arch/$(hdr-arch)/include/uapi/asm $(hdr-dst) HDRCHECK=1
>>>>
>>>> # ---------------------------------------------------------------------------
>>>> -# Kernel selftest
>>>> +# Kernel selftest targets
>>>> +
>>>> +PHONY += __kselftest_configure
>>>> +INSTALL_KSFT_PATH=$(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
>>>> +export INSTALL_KSFT_PATH
>>>> +KSELFTEST=$(INSTALL_KSFT_PATH)/kselftest.sh
>>>> +export KSELFTEST
>>>
>>> Can this be moved to tools/testing/selftests/Makefile? It's only used in
>>> this part of the tree.
>>
>> I looked into doing that. KERNELRELEASE will have to be exported for
>> tools/testing/selftests/Makefile to use it? Does that sound okay?
>
> In fact, KERNELRELEASE is already exported. So go ahead.
Good.
>
>
>> Also, it might be easier to get this series in, if you can Ack the main
>> Makefile patch (when we are ready i.e), so I can take it through
>> kselftest tree.
>
> Sure. The Makefile change will only consist of redirecting the
> kselftest_install target to tools/testing/selftests, right?
>
Correct. I plan to keep the selftests specific work under the
selftests related Makefiles.
thanks,
-- Shuah
--
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978
^ permalink raw reply
* [PATCH net-next 4/4] cxgb4: Update firmware version after flashing it via ethtool
From: Hariprasad Shenai @ 2014-12-03 14:02 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, anish, nirranjan, kumaras, Hariprasad Shenai
In-Reply-To: <1417615374-12961-1-git-send-email-hariprasad@chelsio.com>
After successfully loading new firmware, reload the new firmware's version
number information so "ethtool -i", etc. will report the right value
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 3f8876f..a9323bd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -1213,6 +1213,8 @@ out:
if (ret)
dev_err(adap->pdev_dev, "firmware download failed, error %d\n",
ret);
+ else
+ ret = t4_get_fw_version(adap, &adap->params.fw_vers);
return ret;
}
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 3/4] cxgb4/cxgb4vf: Use new interfaces to calculate BAR2 SGE Queue Register addresses
From: Hariprasad Shenai @ 2014-12-03 14:02 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, anish, nirranjan, kumaras, Hariprasad Shenai
In-Reply-To: <1417615374-12961-1-git-send-email-hariprasad@chelsio.com>
Use BAR2 Going To Sleep (GTS) for T5 and later. Use new BAR2 User Doorbells for
T5 for both cxgb4 and cxgb4vf driver.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 9 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 60 +++---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 7 +
drivers/net/ethernet/chelsio/cxgb4/sge.c | 148 ++++++++--------
drivers/net/ethernet/chelsio/cxgb4vf/adapter.h | 7 +
.../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 12 --
drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 185 +++++++++++++++++---
7 files changed, 288 insertions(+), 140 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 40905c6..c38a936 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -438,7 +438,8 @@ struct sge_fl { /* SGE free-buffer queue state */
struct rx_sw_desc *sdesc; /* address of SW Rx descriptor ring */
__be64 *desc; /* address of HW Rx descriptor ring */
dma_addr_t addr; /* bus address of HW ring start */
- u64 udb; /* BAR2 offset of User Doorbell area */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
};
/* A packet gather list */
@@ -468,7 +469,8 @@ struct sge_rspq { /* state for an SGE response queue */
u16 abs_id; /* absolute SGE id for the response q */
__be64 *desc; /* address of HW response ring */
dma_addr_t phys_addr; /* physical address of the ring */
- u64 udb; /* BAR2 offset of User Doorbell area */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
unsigned int iqe_len; /* entry size */
unsigned int size; /* capacity of response queue */
struct adapter *adap;
@@ -526,7 +528,8 @@ struct sge_txq {
int db_disabled;
unsigned short db_pidx;
unsigned short db_pidx_inc;
- u64 udb; /* BAR2 offset of User Doorbell area */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
};
struct sge_eth_txq { /* state for an SGE Ethernet Tx queue */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 3aea82b..c770cbf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3791,6 +3791,22 @@ u64 cxgb4_read_sge_timestamp(struct net_device *dev)
}
EXPORT_SYMBOL(cxgb4_read_sge_timestamp);
+int cxgb4_bar2_sge_qregs(struct net_device *dev,
+ unsigned int qid,
+ enum cxgb4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid)
+{
+ return t4_bar2_sge_qregs(netdev2adap(dev),
+ qid,
+ (qtype == CXGB4_BAR2_QTYPE_EGRESS
+ ? T4_BAR2_QTYPE_EGRESS
+ : T4_BAR2_QTYPE_INGRESS),
+ pbar2_qoffset,
+ pbar2_qid);
+}
+EXPORT_SYMBOL(cxgb4_bar2_sge_qregs);
+
static struct pci_driver cxgb4_driver;
static void check_neigh_update(struct neighbour *neigh)
@@ -3973,31 +3989,18 @@ static void process_db_drop(struct work_struct *work)
u32 dropped_db = t4_read_reg(adap, 0x010ac);
u16 qid = (dropped_db >> 15) & 0x1ffff;
u16 pidx_inc = dropped_db & 0x1fff;
- unsigned int s_qpp;
- unsigned short udb_density;
- unsigned long qpshift;
- int page;
- u32 udb;
-
- dev_warn(adap->pdev_dev,
- "Dropped DB 0x%x qid %d bar2 %d coalesce %d pidx %d\n",
- dropped_db, qid,
- (dropped_db >> 14) & 1,
- (dropped_db >> 13) & 1,
- pidx_inc);
-
- drain_db_fifo(adap, 1);
+ u64 bar2_qoffset;
+ unsigned int bar2_qid;
+ int ret;
- s_qpp = QUEUESPERPAGEPF1 * adap->fn;
- udb_density = 1 << QUEUESPERPAGEPF0_GET(t4_read_reg(adap,
- SGE_EGRESS_QUEUES_PER_PAGE_PF) >> s_qpp);
- qpshift = PAGE_SHIFT - ilog2(udb_density);
- udb = qid << qpshift;
- udb &= PAGE_MASK;
- page = udb / PAGE_SIZE;
- udb += (qid - (page * udb_density)) * 128;
-
- writel(PIDX(pidx_inc), adap->bar2 + udb + 8);
+ ret = t4_bar2_sge_qregs(adap, qid, T4_BAR2_QTYPE_EGRESS,
+ &bar2_qoffset, &bar2_qid);
+ if (ret)
+ dev_err(adap->pdev_dev, "doorbell drop recovery: "
+ "qid=%d, pidx_inc=%d\n", qid, pidx_inc);
+ else
+ writel(PIDX_T5(pidx_inc) | QID(bar2_qid),
+ adap->bar2 + bar2_qoffset + SGE_UDB_KDOORBELL);
/* Re-enable BAR2 WC */
t4_set_reg_field(adap, 0x10b0, 1<<15, 1<<15);
@@ -4055,12 +4058,8 @@ static void uld_attach(struct adapter *adap, unsigned int uld)
lli.adapter_type = adap->params.chip;
lli.iscsi_iolen = MAXRXDATA_GET(t4_read_reg(adap, TP_PARA_REG2));
lli.cclk_ps = 1000000000 / adap->params.vpd.cclk;
- lli.udb_density = 1 << QUEUESPERPAGEPF0_GET(
- t4_read_reg(adap, SGE_EGRESS_QUEUES_PER_PAGE_PF) >>
- (adap->fn * 4));
- lli.ucq_density = 1 << QUEUESPERPAGEPF0_GET(
- t4_read_reg(adap, SGE_INGRESS_QUEUES_PER_PAGE_PF) >>
- (adap->fn * 4));
+ lli.udb_density = 1 << adap->params.sge.eq_qpp;
+ lli.ucq_density = 1 << adap->params.sge.iq_qpp;
lli.filt_mode = adap->params.tp.vlan_pri_map;
/* MODQ_REQ_MAP sets queues 0-3 to chan 0-3 */
for (i = 0; i < NCHAN; i++)
@@ -5912,6 +5911,7 @@ static int adap_init0(struct adapter *adap)
t4_load_mtus(adap, adap->params.mtus, adap->params.a_wnd,
adap->params.b_wnd);
}
+ t4_init_sge_params(adap);
t4_init_tp_params(adap);
adap->flags |= FW_OK;
return 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 4eba7cb..152b4c4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -305,4 +305,11 @@ void cxgb4_enable_db_coalescing(struct net_device *dev);
int cxgb4_read_tpte(struct net_device *dev, u32 stag, __be32 *tpte);
u64 cxgb4_read_sge_timestamp(struct net_device *dev);
+enum cxgb4_bar2_qtype { CXGB4_BAR2_QTYPE_EGRESS, CXGB4_BAR2_QTYPE_INGRESS };
+int cxgb4_bar2_sge_qregs(struct net_device *dev,
+ unsigned int qid,
+ enum cxgb4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid);
+
#endif /* !__CXGB4_OFLD_H */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 433560b..f12debd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -527,14 +527,16 @@ static inline void ring_fl_db(struct adapter *adap, struct sge_fl *q)
val |= DBPRIO(1);
wmb();
- /* If we're on T4, use the old doorbell mechanism; otherwise
- * use the new BAR2 mechanism.
+ /* If we don't have access to the new User Doorbell (T5+), use
+ * the old doorbell mechanism; otherwise use the new BAR2
+ * mechanism.
*/
- if (is_t4(adap->params.chip)) {
+ if (unlikely(q->bar2_addr == NULL)) {
t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
val | QID(q->cntxt_id));
} else {
- writel(val, adap->bar2 + q->udb + SGE_UDB_KDOORBELL);
+ writel(val | QID(q->bar2_qid),
+ q->bar2_addr + SGE_UDB_KDOORBELL);
/* This Write memory Barrier will force the write to
* the User Doorbell area to be flushed.
@@ -850,14 +852,13 @@ static void write_sgl(const struct sk_buff *skb, struct sge_txq *q,
*end = 0;
}
-/* This function copies a tx_desc struct to memory mapped BAR2 space(user space
- * writes). For coalesced WR SGE, fetches data from the FIFO instead of from
- * Host.
+/* This function copies 64 byte coalesced work request to
+ * memory mapped BAR2 space. For coalesced WR SGE fetches
+ * data from the FIFO instead of from Host.
*/
-static void cxgb_pio_copy(u64 __iomem *dst, struct tx_desc *desc)
+static void cxgb_pio_copy(u64 __iomem *dst, u64 *src)
{
- int count = sizeof(*desc) / sizeof(u64);
- u64 *src = (u64 *)desc;
+ int count = 8;
while (count) {
writeq(*src, dst);
@@ -879,7 +880,10 @@ static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
{
wmb(); /* write descriptors before telling HW */
- if (is_t4(adap->params.chip)) {
+ /* If we don't have access to the new User Doorbell (T5+), use the old
+ * doorbell mechanism; otherwise use the new BAR2 mechanism.
+ */
+ if (unlikely(q->bar2_addr == NULL)) {
u32 val = PIDX(n);
unsigned long flags;
@@ -905,21 +909,22 @@ static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
*/
WARN_ON(val & DBPRIO(1));
- /* For T5 and later we use the Write-Combine mapped BAR2 User
- * Doorbell mechanism. If we're only writing a single TX
- * Descriptor and TX Write Combining hasn't been disabled, we
- * can use the Write Combining Gather Buffer; otherwise we use
- * the simple doorbell.
+ /* If we're only writing a single TX Descriptor and we can use
+ * Inferred QID registers, we can use the Write Combining
+ * Gather Buffer; otherwise we use the simple doorbell.
*/
- if (n == 1) {
+ if (n == 1 && q->bar2_qid == 0) {
int index = (q->pidx
? (q->pidx - 1)
: (q->size - 1));
+ u64 *wr = (u64 *)&q->desc[index];
- cxgb_pio_copy(adap->bar2 + q->udb + SGE_UDB_WCDOORBELL,
- q->desc + index);
+ cxgb_pio_copy((u64 __iomem *)
+ (q->bar2_addr + SGE_UDB_WCDOORBELL),
+ wr);
} else {
- writel(val, adap->bar2 + q->udb + SGE_UDB_KDOORBELL);
+ writel(val | QID(q->bar2_qid),
+ q->bar2_addr + SGE_UDB_KDOORBELL);
}
/* This Write Memory Barrier will force the write to the User
@@ -1997,11 +2002,16 @@ static int napi_rx_handler(struct napi_struct *napi, int budget)
params = QINTR_TIMER_IDX(7);
val = CIDXINC(work_done) | SEINTARM(params);
- if (is_t4(q->adap->params.chip)) {
+
+ /* If we don't have access to the new User GTS (T5+), use the old
+ * doorbell mechanism; otherwise use the new BAR2 mechanism.
+ */
+ if (unlikely(q->bar2_addr == NULL)) {
t4_write_reg(q->adap, MYPF_REG(SGE_PF_GTS),
val | INGRESSQID((u32)q->cntxt_id));
} else {
- writel(val, q->adap->bar2 + q->udb + SGE_UDB_GTS);
+ writel(val | INGRESSQID(q->bar2_qid),
+ q->bar2_addr + SGE_UDB_GTS);
wmb();
}
return work_done;
@@ -2047,11 +2057,16 @@ static unsigned int process_intrq(struct adapter *adap)
}
val = CIDXINC(credits) | SEINTARM(q->intr_params);
- if (is_t4(adap->params.chip)) {
+
+ /* If we don't have access to the new User GTS (T5+), use the old
+ * doorbell mechanism; otherwise use the new BAR2 mechanism.
+ */
+ if (unlikely(q->bar2_addr == NULL)) {
t4_write_reg(adap, MYPF_REG(SGE_PF_GTS),
val | INGRESSQID(q->cntxt_id));
} else {
- writel(val, adap->bar2 + q->udb + SGE_UDB_GTS);
+ writel(val | INGRESSQID(q->bar2_qid),
+ q->bar2_addr + SGE_UDB_GTS);
wmb();
}
spin_unlock(&adap->sge.intrq_lock);
@@ -2235,48 +2250,32 @@ static void sge_tx_timer_cb(unsigned long data)
}
/**
- * udb_address - return the BAR2 User Doorbell address for a Queue
- * @adap: the adapter
- * @cntxt_id: the Queue Context ID
- * @qpp: Queues Per Page (for all PFs)
+ * bar2_address - return the BAR2 address for an SGE Queue's Registers
+ * @adapter: the adapter
+ * @qid: the SGE Queue ID
+ * @qtype: the SGE Queue Type (Egress or Ingress)
+ * @pbar2_qid: BAR2 Queue ID or 0 for Queue ID inferred SGE Queues
*
- * Returns the BAR2 address of the user Doorbell associated with the
- * indicated Queue Context ID. Note that this is only applicable
- * for T5 and later.
- */
-static u64 udb_address(struct adapter *adap, unsigned int cntxt_id,
- unsigned int qpp)
-{
- u64 udb;
- unsigned int s_qpp;
- unsigned short udb_density;
- unsigned long qpshift;
- int page;
-
- BUG_ON(is_t4(adap->params.chip));
-
- s_qpp = (QUEUESPERPAGEPF0 +
- (QUEUESPERPAGEPF1 - QUEUESPERPAGEPF0) * adap->fn);
- udb_density = 1 << ((qpp >> s_qpp) & QUEUESPERPAGEPF0_MASK);
- qpshift = PAGE_SHIFT - ilog2(udb_density);
- udb = (u64)cntxt_id << qpshift;
- udb &= PAGE_MASK;
- page = udb / PAGE_SIZE;
- udb += (cntxt_id - (page * udb_density)) * SGE_UDB_SIZE;
-
- return udb;
-}
+ * Returns the BAR2 address for the SGE Queue Registers associated with
+ * @qid. If BAR2 SGE Registers aren't available, returns NULL. Also
+ * returns the BAR2 Queue ID to be used with writes to the BAR2 SGE
+ * Queue Registers. If the BAR2 Queue ID is 0, then "Inferred Queue ID"
+ * Registers are supported (e.g. the Write Combining Doorbell Buffer).
+ */
+static void __iomem *bar2_address(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ unsigned int *pbar2_qid)
+{
+ u64 bar2_qoffset;
+ int ret;
-static u64 udb_address_eq(struct adapter *adap, unsigned int cntxt_id)
-{
- return udb_address(adap, cntxt_id,
- t4_read_reg(adap, SGE_EGRESS_QUEUES_PER_PAGE_PF));
-}
+ ret = t4_bar2_sge_qregs(adapter, qid, qtype,
+ &bar2_qoffset, pbar2_qid);
+ if (ret)
+ return NULL;
-static u64 udb_address_iq(struct adapter *adap, unsigned int cntxt_id)
-{
- return udb_address(adap, cntxt_id,
- t4_read_reg(adap, SGE_INGRESS_QUEUES_PER_PAGE_PF));
+ return adapter->bar2 + bar2_qoffset;
}
int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
@@ -2344,8 +2343,10 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
iq->next_intr_params = iq->intr_params;
iq->cntxt_id = ntohs(c.iqid);
iq->abs_id = ntohs(c.physiqid);
- if (!is_t4(adap->params.chip))
- iq->udb = udb_address_iq(adap, iq->cntxt_id);
+ iq->bar2_addr = bar2_address(adap,
+ iq->cntxt_id,
+ T4_BAR2_QTYPE_INGRESS,
+ &iq->bar2_qid);
iq->size--; /* subtract status entry */
iq->netdev = dev;
iq->handler = hnd;
@@ -2362,11 +2363,13 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
fl->alloc_failed = fl->large_alloc_failed = fl->starving = 0;
adap->sge.egr_map[fl->cntxt_id - adap->sge.egr_start] = fl;
- /* Note, we must initialize the Free List User Doorbell
- * address before refilling the Free List!
+ /* Note, we must initialize the BAR2 Free List User Doorbell
+ * information before refilling the Free List!
*/
- if (!is_t4(adap->params.chip))
- fl->udb = udb_address_eq(adap, fl->cntxt_id);
+ fl->bar2_addr = bar2_address(adap,
+ fl->cntxt_id,
+ T4_BAR2_QTYPE_EGRESS,
+ &fl->bar2_qid);
refill_fl(adap, fl, fl_cap(fl), GFP_KERNEL);
}
return 0;
@@ -2392,9 +2395,10 @@ err:
static void init_txq(struct adapter *adap, struct sge_txq *q, unsigned int id)
{
q->cntxt_id = id;
- if (!is_t4(adap->params.chip))
- q->udb = udb_address_eq(adap, q->cntxt_id);
-
+ q->bar2_addr = bar2_address(adap,
+ q->cntxt_id,
+ T4_BAR2_QTYPE_EGRESS,
+ &q->bar2_qid);
q->in_use = 0;
q->cidx = q->pidx = 0;
q->stops = q->restarts = 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 3d06e77..d00a751 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -138,6 +138,8 @@ struct sge_fl {
struct rx_sw_desc *sdesc; /* address of SW RX descriptor ring */
__be64 *desc; /* address of HW RX descriptor ring */
dma_addr_t addr; /* PCI bus address of hardware ring */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
};
/*
@@ -178,6 +180,8 @@ struct sge_rspq {
u16 abs_id; /* SGE abs QID for the response Q */
__be64 *desc; /* address of hardware response ring */
dma_addr_t phys_addr; /* PCI bus address of ring */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
unsigned int iqe_len; /* entry size */
unsigned int size; /* capcity of response Q */
struct adapter *adapter; /* our adapter */
@@ -240,6 +244,8 @@ struct sge_txq {
struct tx_sw_desc *sdesc; /* address of SW TX descriptor ring */
struct sge_qstat *stat; /* queue status entry */
dma_addr_t phys_addr; /* PCI bus address of hardware ring */
+ void __iomem *bar2_addr; /* address of BAR2 Queue registers */
+ unsigned int bar2_qid; /* Queue ID for BAR2 Queue registers */
};
/*
@@ -345,6 +351,7 @@ struct sge {
struct adapter {
/* PCI resources */
void __iomem *regs;
+ void __iomem *bar2;
struct pci_dev *pdev;
struct device *pdev_dev;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index c5425f0..aa74ec3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2095,7 +2095,6 @@ static int adap_init0(struct adapter *adapter)
unsigned int ethqsets;
int err;
u32 param, val = 0;
- unsigned int chipid;
/*
* Wait for the device to become ready before proceeding ...
@@ -2123,17 +2122,6 @@ static int adap_init0(struct adapter *adapter)
return err;
}
- adapter->params.chip = 0;
- switch (adapter->pdev->device >> 12) {
- case CHELSIO_T4:
- adapter->params.chip = CHELSIO_CHIP_CODE(CHELSIO_T4, 0);
- break;
- case CHELSIO_T5:
- chipid = G_REV(t4_read_reg(adapter, A_PL_VF_REV));
- adapter->params.chip |= CHELSIO_CHIP_CODE(CHELSIO_T5, chipid);
- break;
- }
-
/*
* Grab basic operational parameters. These will predominantly have
* been set up by the Physical Function Driver or will be hard coded
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index 045301d..f7fd131 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -525,19 +525,40 @@ static inline void ring_fl_db(struct adapter *adapter, struct sge_fl *fl)
{
u32 val;
- /*
- * The SGE keeps track of its Producer and Consumer Indices in terms
+ /* The SGE keeps track of its Producer and Consumer Indices in terms
* of Egress Queue Units so we can only tell it about integral numbers
* of multiples of Free List Entries per Egress Queue Units ...
*/
if (fl->pend_cred >= FL_PER_EQ_UNIT) {
- val = PIDX(fl->pend_cred / FL_PER_EQ_UNIT);
- if (!is_t4(adapter->params.chip))
- val |= DBTYPE(1);
+ if (is_t4(adapter->params.chip))
+ val = PIDX(fl->pend_cred / FL_PER_EQ_UNIT);
+ else
+ val = PIDX_T5(fl->pend_cred / FL_PER_EQ_UNIT) |
+ DBTYPE(1);
+ val |= DBPRIO(1);
+
+ /* Make sure all memory writes to the Free List queue are
+ * committed before we tell the hardware about them.
+ */
wmb();
- t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
- DBPRIO(1) |
- QID(fl->cntxt_id) | val);
+
+ /* If we don't have access to the new User Doorbell (T5+), use
+ * the old doorbell mechanism; otherwise use the new BAR2
+ * mechanism.
+ */
+ if (unlikely(fl->bar2_addr == NULL)) {
+ t4_write_reg(adapter,
+ T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
+ QID(fl->cntxt_id) | val);
+ } else {
+ writel(val | QID(fl->bar2_qid),
+ fl->bar2_addr + SGE_UDB_KDOORBELL);
+
+ /* This Write memory Barrier will force the write to
+ * the User Doorbell area to be flushed.
+ */
+ wmb();
+ }
fl->pend_cred %= FL_PER_EQ_UNIT;
}
}
@@ -949,14 +970,74 @@ static void write_sgl(const struct sk_buff *skb, struct sge_txq *tq,
static inline void ring_tx_db(struct adapter *adapter, struct sge_txq *tq,
int n)
{
- /*
- * Warn if we write doorbells with the wrong priority and write
- * descriptors before telling HW.
+ /* Make sure that all writes to the TX Descriptors are committed
+ * before we tell the hardware about them.
*/
- WARN_ON((QID(tq->cntxt_id) | PIDX(n)) & DBPRIO(1));
wmb();
- t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
- QID(tq->cntxt_id) | PIDX(n));
+
+ /* If we don't have access to the new User Doorbell (T5+), use the old
+ * doorbell mechanism; otherwise use the new BAR2 mechanism.
+ */
+ if (unlikely(tq->bar2_addr == NULL)) {
+ u32 val = PIDX(n);
+
+ t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
+ QID(tq->cntxt_id) | val);
+ } else {
+ u32 val = PIDX_T5(n);
+
+ /* T4 and later chips share the same PIDX field offset within
+ * the doorbell, but T5 and later shrank the field in order to
+ * gain a bit for Doorbell Priority. The field was absurdly
+ * large in the first place (14 bits) so we just use the T5
+ * and later limits and warn if a Queue ID is too large.
+ */
+ WARN_ON(val & DBPRIO(1));
+
+ /* If we're only writing a single Egress Unit and the BAR2
+ * Queue ID is 0, we can use the Write Combining Doorbell
+ * Gather Buffer; otherwise we use the simple doorbell.
+ */
+ if (n == 1 && tq->bar2_qid == 0) {
+ unsigned int index = (tq->pidx
+ ? (tq->pidx - 1)
+ : (tq->size - 1));
+ __be64 *src = (__be64 *)&tq->desc[index];
+ __be64 __iomem *dst = (__be64 *)(tq->bar2_addr +
+ SGE_UDB_WCDOORBELL);
+ unsigned int count = EQ_UNIT / sizeof(__be64);
+
+ /* Copy the TX Descriptor in a tight loop in order to
+ * try to get it to the adapter in a single Write
+ * Combined transfer on the PCI-E Bus. If the Write
+ * Combine fails (say because of an interrupt, etc.)
+ * the hardware will simply take the last write as a
+ * simple doorbell write with a PIDX Increment of 1
+ * and will fetch the TX Descriptor from memory via
+ * DMA.
+ */
+ while (count) {
+ writeq(*src, dst);
+ src++;
+ dst++;
+ count--;
+ }
+ } else
+ writel(val | QID(tq->bar2_qid),
+ tq->bar2_addr + SGE_UDB_KDOORBELL);
+
+ /* This Write Memory Barrier will force the write to the User
+ * Doorbell area to be flushed. This is needed to prevent
+ * writes on different CPUs for the same queue from hitting
+ * the adapter out of order. This is required when some Work
+ * Requests take the Write Combine Gather Buffer path (user
+ * doorbell area offset [SGE_UDB_WCDOORBELL..+63]) and some
+ * take the traditional path where we simply increment the
+ * PIDX (User Doorbell area SGE_UDB_KDOORBELL) and have the
+ * hardware DMA read the actual Work Request.
+ */
+ wmb();
+ }
}
/**
@@ -1782,6 +1863,7 @@ static int napi_rx_handler(struct napi_struct *napi, int budget)
unsigned int intr_params;
struct sge_rspq *rspq = container_of(napi, struct sge_rspq, napi);
int work_done = process_responses(rspq, budget);
+ u32 val;
if (likely(work_done < budget)) {
napi_complete(napi);
@@ -1793,11 +1875,16 @@ static int napi_rx_handler(struct napi_struct *napi, int budget)
if (unlikely(work_done == 0))
rspq->unhandled_irqs++;
- t4_write_reg(rspq->adapter,
- T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
- CIDXINC(work_done) |
- INGRESSQID((u32)rspq->cntxt_id) |
- SEINTARM(intr_params));
+ val = CIDXINC(work_done) | SEINTARM(intr_params);
+ if (is_t4(rspq->adapter->params.chip)) {
+ t4_write_reg(rspq->adapter,
+ T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+ val | INGRESSQID((u32)rspq->cntxt_id));
+ } else {
+ writel(val | INGRESSQID(rspq->bar2_qid),
+ rspq->bar2_addr + SGE_UDB_GTS);
+ wmb();
+ }
return work_done;
}
@@ -1822,6 +1909,7 @@ static unsigned int process_intrq(struct adapter *adapter)
struct sge *s = &adapter->sge;
struct sge_rspq *intrq = &s->intrq;
unsigned int work_done;
+ u32 val;
spin_lock(&adapter->sge.intrq_lock);
for (work_done = 0; ; work_done++) {
@@ -1887,10 +1975,15 @@ static unsigned int process_intrq(struct adapter *adapter)
rspq_next(intrq);
}
- t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
- CIDXINC(work_done) |
- INGRESSQID(intrq->cntxt_id) |
- SEINTARM(intrq->intr_params));
+ val = CIDXINC(work_done) | SEINTARM(intrq->intr_params);
+ if (is_t4(adapter->params.chip))
+ t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+ val | INGRESSQID(intrq->cntxt_id));
+ else {
+ writel(val | INGRESSQID(intrq->bar2_qid),
+ intrq->bar2_addr + SGE_UDB_GTS);
+ wmb();
+ }
spin_unlock(&adapter->sge.intrq_lock);
@@ -2036,6 +2129,35 @@ static void sge_tx_timer_cb(unsigned long data)
}
/**
+ * bar2_address - return the BAR2 address for an SGE Queue's Registers
+ * @adapter: the adapter
+ * @qid: the SGE Queue ID
+ * @qtype: the SGE Queue Type (Egress or Ingress)
+ * @pbar2_qid: BAR2 Queue ID or 0 for Queue ID inferred SGE Queues
+ *
+ * Returns the BAR2 address for the SGE Queue Registers associated with
+ * @qid. If BAR2 SGE Registers aren't available, returns NULL. Also
+ * returns the BAR2 Queue ID to be used with writes to the BAR2 SGE
+ * Queue Registers. If the BAR2 Queue ID is 0, then "Inferred Queue ID"
+ * Registers are supported (e.g. the Write Combining Doorbell Buffer).
+ */
+static void __iomem *bar2_address(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ unsigned int *pbar2_qid)
+{
+ u64 bar2_qoffset;
+ int ret;
+
+ ret = t4_bar2_sge_qregs(adapter, qid, qtype,
+ &bar2_qoffset, pbar2_qid);
+ if (ret)
+ return NULL;
+
+ return adapter->bar2 + bar2_qoffset;
+}
+
+/**
* t4vf_sge_alloc_rxq - allocate an SGE RX Queue
* @adapter: the adapter
* @rspq: pointer to to the new rxq's Response Queue to be filled in
@@ -2166,6 +2288,10 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
rspq->gen = 1;
rspq->next_intr_params = rspq->intr_params;
rspq->cntxt_id = be16_to_cpu(rpl.iqid);
+ rspq->bar2_addr = bar2_address(adapter,
+ rspq->cntxt_id,
+ T4_BAR2_QTYPE_INGRESS,
+ &rspq->bar2_qid);
rspq->abs_id = be16_to_cpu(rpl.physiqid);
rspq->size--; /* subtract status entry */
rspq->adapter = adapter;
@@ -2184,6 +2310,15 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
fl->alloc_failed = 0;
fl->large_alloc_failed = 0;
fl->starving = 0;
+
+ /* Note, we must initialize the BAR2 Free List User Doorbell
+ * information before refilling the Free List!
+ */
+ fl->bar2_addr = bar2_address(adapter,
+ fl->cntxt_id,
+ T4_BAR2_QTYPE_EGRESS,
+ &fl->bar2_qid);
+
refill_fl(adapter, fl, fl_cap(fl), GFP_KERNEL);
}
@@ -2296,6 +2431,10 @@ int t4vf_sge_alloc_eth_txq(struct adapter *adapter, struct sge_eth_txq *txq,
txq->q.pidx = 0;
txq->q.stat = (void *)&txq->q.desc[txq->q.size];
txq->q.cntxt_id = FW_EQ_ETH_CMD_EQID_G(be32_to_cpu(rpl.eqid_pkd));
+ txq->q.bar2_addr = bar2_address(adapter,
+ txq->q.cntxt_id,
+ T4_BAR2_QTYPE_EGRESS,
+ &txq->q.bar2_qid);
txq->q.abs_id =
FW_EQ_ETH_CMD_PHYSEQID_G(be32_to_cpu(rpl.physeqid_pkd));
txq->txq = devq;
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 2/4] cxgb4/cxgb4vf: Add code to calculate T5 BAR2 Offsets for SGE Queue Registers
From: Hariprasad Shenai @ 2014-12-03 14:02 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, anish, nirranjan, kumaras, Hariprasad Shenai
In-Reply-To: <1417615374-12961-1-git-send-email-hariprasad@chelsio.com>
Add new Common Code facilities for calculating T5 BAR2 Offsets for SGE Queue
Registers. This new code can handle situations where
Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 16 +++
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 120 ++++++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 12 +-
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h | 8 ++
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 98 ++++++++++++++++-
5 files changed, 247 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e8b09bb..40905c6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -222,6 +222,12 @@ struct tp_err_stats {
u32 ofldCongDefer;
};
+struct sge_params {
+ u32 hps; /* host page size for our PF/VF */
+ u32 eq_qpp; /* egress queues/page for our PF/VF */
+ u32 iq_qpp; /* egress queues/page for our PF/VF */
+};
+
struct tp_params {
unsigned int ntxchan; /* # of Tx channels */
unsigned int tre; /* log2 of core clocks per TP tick */
@@ -285,6 +291,7 @@ enum chip_type {
};
struct adapter_params {
+ struct sge_params sge;
struct tp_params tp;
struct vpd_params vpd;
struct pci_params pci;
@@ -995,6 +1002,15 @@ int t4_prep_fw(struct adapter *adap, struct fw_info *fw_info,
const u8 *fw_data, unsigned int fw_size,
struct fw_hdr *card_fw, enum dev_state state, int *reset);
int t4_prep_adapter(struct adapter *adapter);
+
+enum t4_bar2_qtype { T4_BAR2_QTYPE_EGRESS, T4_BAR2_QTYPE_INGRESS };
+int t4_bar2_sge_qregs(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid);
+
+int t4_init_sge_params(struct adapter *adapter);
int t4_init_tp_params(struct adapter *adap);
int t4_filter_field_shift(const struct adapter *adap, int filter_sel);
int t4_port_init(struct adapter *adap, int mbox, int pf, int vf);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 7975d26..3f8876f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -4003,6 +4003,126 @@ int t4_prep_adapter(struct adapter *adapter)
}
/**
+ * t4_bar2_sge_qregs - return BAR2 SGE Queue register information
+ * @adapter: the adapter
+ * @qid: the Queue ID
+ * @qtype: the Ingress or Egress type for @qid
+ * @pbar2_qoffset: BAR2 Queue Offset
+ * @pbar2_qid: BAR2 Queue ID or 0 for Queue ID inferred SGE Queues
+ *
+ * Returns the BAR2 SGE Queue Registers information associated with the
+ * indicated Absolute Queue ID. These are passed back in return value
+ * pointers. @qtype should be T4_BAR2_QTYPE_EGRESS for Egress Queue
+ * and T4_BAR2_QTYPE_INGRESS for Ingress Queues.
+ *
+ * This may return an error which indicates that BAR2 SGE Queue
+ * registers aren't available. If an error is not returned, then the
+ * following values are returned:
+ *
+ * *@pbar2_qoffset: the BAR2 Offset of the @qid Registers
+ * *@pbar2_qid: the BAR2 SGE Queue ID or 0 of @qid
+ *
+ * If the returned BAR2 Queue ID is 0, then BAR2 SGE registers which
+ * require the "Inferred Queue ID" ability may be used. E.g. the
+ * Write Combining Doorbell Buffer. If the BAR2 Queue ID is not 0,
+ * then these "Inferred Queue ID" register may not be used.
+ */
+int t4_bar2_sge_qregs(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid)
+{
+ unsigned int page_shift, page_size, qpp_shift, qpp_mask;
+ u64 bar2_page_offset, bar2_qoffset;
+ unsigned int bar2_qid, bar2_qid_offset, bar2_qinferred;
+
+ /* T4 doesn't support BAR2 SGE Queue registers.
+ */
+ if (is_t4(adapter->params.chip))
+ return -EINVAL;
+
+ /* Get our SGE Page Size parameters.
+ */
+ page_shift = adapter->params.sge.hps + 10;
+ page_size = 1 << page_shift;
+
+ /* Get the right Queues per Page parameters for our Queue.
+ */
+ qpp_shift = (qtype == T4_BAR2_QTYPE_EGRESS
+ ? adapter->params.sge.eq_qpp
+ : adapter->params.sge.iq_qpp);
+ qpp_mask = (1 << qpp_shift) - 1;
+
+ /* Calculate the basics of the BAR2 SGE Queue register area:
+ * o The BAR2 page the Queue registers will be in.
+ * o The BAR2 Queue ID.
+ * o The BAR2 Queue ID Offset into the BAR2 page.
+ */
+ bar2_page_offset = ((qid >> qpp_shift) << page_shift);
+ bar2_qid = qid & qpp_mask;
+ bar2_qid_offset = bar2_qid * SGE_UDB_SIZE;
+
+ /* If the BAR2 Queue ID Offset is less than the Page Size, then the
+ * hardware will infer the Absolute Queue ID simply from the writes to
+ * the BAR2 Queue ID Offset within the BAR2 Page (and we need to use a
+ * BAR2 Queue ID of 0 for those writes). Otherwise, we'll simply
+ * write to the first BAR2 SGE Queue Area within the BAR2 Page with
+ * the BAR2 Queue ID and the hardware will infer the Absolute Queue ID
+ * from the BAR2 Page and BAR2 Queue ID.
+ *
+ * One important censequence of this is that some BAR2 SGE registers
+ * have a "Queue ID" field and we can write the BAR2 SGE Queue ID
+ * there. But other registers synthesize the SGE Queue ID purely
+ * from the writes to the registers -- the Write Combined Doorbell
+ * Buffer is a good example. These BAR2 SGE Registers are only
+ * available for those BAR2 SGE Register areas where the SGE Absolute
+ * Queue ID can be inferred from simple writes.
+ */
+ bar2_qoffset = bar2_page_offset;
+ bar2_qinferred = (bar2_qid_offset < page_size);
+ if (bar2_qinferred) {
+ bar2_qoffset += bar2_qid_offset;
+ bar2_qid = 0;
+ }
+
+ *pbar2_qoffset = bar2_qoffset;
+ *pbar2_qid = bar2_qid;
+ return 0;
+}
+
+/**
+ * t4_init_sge_params - initialize adap->params.sge
+ * @adapter: the adapter
+ *
+ * Initialize various fields of the adapter's SGE Parameters structure.
+ */
+int t4_init_sge_params(struct adapter *adapter)
+{
+ struct sge_params *sge_params = &adapter->params.sge;
+ u32 hps, qpp;
+ unsigned int s_hps, s_qpp;
+
+ /* Extract the SGE Page Size for our PF.
+ */
+ hps = t4_read_reg(adapter, SGE_HOST_PAGE_SIZE);
+ s_hps = (HOSTPAGESIZEPF0_S +
+ (HOSTPAGESIZEPF1_S - HOSTPAGESIZEPF0_S) * adapter->fn);
+ sge_params->hps = ((hps >> s_hps) & HOSTPAGESIZEPF0_M);
+
+ /* Extract the SGE Egress and Ingess Queues Per Page for our PF.
+ */
+ s_qpp = (QUEUESPERPAGEPF0_S +
+ (QUEUESPERPAGEPF1_S - QUEUESPERPAGEPF0_S) * adapter->fn);
+ qpp = t4_read_reg(adapter, SGE_EGRESS_QUEUES_PER_PAGE_PF);
+ sge_params->eq_qpp = ((qpp >> s_qpp) & QUEUESPERPAGEPF0_MASK);
+ qpp = t4_read_reg(adapter, SGE_INGRESS_QUEUES_PER_PAGE_PF);
+ sge_params->iq_qpp = ((qpp >> s_qpp) & QUEUESPERPAGEPF0_MASK);
+
+ return 0;
+}
+
+/**
* t4_init_tp_params - initialize adap->params.tp
* @adap: the adapter
*
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 1efc862..d7bd34e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -155,13 +155,13 @@
#define HOSTPAGESIZEPF2_SHIFT 8
#define HOSTPAGESIZEPF2(x) ((x) << HOSTPAGESIZEPF2_SHIFT)
-#define HOSTPAGESIZEPF1_MASK 0x0000000fU
-#define HOSTPAGESIZEPF1_SHIFT 4
-#define HOSTPAGESIZEPF1(x) ((x) << HOSTPAGESIZEPF1_SHIFT)
+#define HOSTPAGESIZEPF1_M 0x0000000fU
+#define HOSTPAGESIZEPF1_S 4
+#define HOSTPAGESIZEPF1(x) ((x) << HOSTPAGESIZEPF1_S)
-#define HOSTPAGESIZEPF0_MASK 0x0000000fU
-#define HOSTPAGESIZEPF0_SHIFT 0
-#define HOSTPAGESIZEPF0(x) ((x) << HOSTPAGESIZEPF0_SHIFT)
+#define HOSTPAGESIZEPF0_M 0x0000000fU
+#define HOSTPAGESIZEPF0_S 0
+#define HOSTPAGESIZEPF0(x) ((x) << HOSTPAGESIZEPF0_S)
#define SGE_EGRESS_QUEUES_PER_PAGE_PF 0x1010
#define SGE_EGRESS_QUEUES_PER_PAGE_VF_A 0x1014
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
index 6d6a956..8d3237f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
@@ -138,6 +138,7 @@ struct sge_params {
u32 sge_host_page_size; /* PF0-7 page sizes */
u32 sge_egress_queues_per_page; /* PF0-7 egress queues/page */
u32 sge_ingress_queues_per_page;/* PF0-7 ingress queues/page */
+ u32 sge_vf_hps; /* host page size for our vf */
u32 sge_vf_eq_qpp; /* egress queues/page for our VF */
u32 sge_vf_iq_qpp; /* ingress queues/page for our VF */
u32 sge_fl_buffer_size[16]; /* free list buffer sizes */
@@ -282,6 +283,13 @@ int t4vf_port_init(struct adapter *, int);
int t4vf_fw_reset(struct adapter *);
int t4vf_set_params(struct adapter *, unsigned int, const u32 *, const u32 *);
+enum t4_bar2_qtype { T4_BAR2_QTYPE_EGRESS, T4_BAR2_QTYPE_INGRESS };
+int t4_bar2_sge_qregs(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid);
+
int t4vf_get_sge_params(struct adapter *);
int t4vf_get_vpd_params(struct adapter *);
int t4vf_get_dev_params(struct adapter *);
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
index 03ab19e..02e8833 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
@@ -430,6 +430,95 @@ int t4vf_set_params(struct adapter *adapter, unsigned int nparams,
}
/**
+ * t4_bar2_sge_qregs - return BAR2 SGE Queue register information
+ * @adapter: the adapter
+ * @qid: the Queue ID
+ * @qtype: the Ingress or Egress type for @qid
+ * @pbar2_qoffset: BAR2 Queue Offset
+ * @pbar2_qid: BAR2 Queue ID or 0 for Queue ID inferred SGE Queues
+ *
+ * Returns the BAR2 SGE Queue Registers information associated with the
+ * indicated Absolute Queue ID. These are passed back in return value
+ * pointers. @qtype should be T4_BAR2_QTYPE_EGRESS for Egress Queue
+ * and T4_BAR2_QTYPE_INGRESS for Ingress Queues.
+ *
+ * This may return an error which indicates that BAR2 SGE Queue
+ * registers aren't available. If an error is not returned, then the
+ * following values are returned:
+ *
+ * *@pbar2_qoffset: the BAR2 Offset of the @qid Registers
+ * *@pbar2_qid: the BAR2 SGE Queue ID or 0 of @qid
+ *
+ * If the returned BAR2 Queue ID is 0, then BAR2 SGE registers which
+ * require the "Inferred Queue ID" ability may be used. E.g. the
+ * Write Combining Doorbell Buffer. If the BAR2 Queue ID is not 0,
+ * then these "Inferred Queue ID" register may not be used.
+ */
+int t4_bar2_sge_qregs(struct adapter *adapter,
+ unsigned int qid,
+ enum t4_bar2_qtype qtype,
+ u64 *pbar2_qoffset,
+ unsigned int *pbar2_qid)
+{
+ unsigned int page_shift, page_size, qpp_shift, qpp_mask;
+ u64 bar2_page_offset, bar2_qoffset;
+ unsigned int bar2_qid, bar2_qid_offset, bar2_qinferred;
+
+ /* T4 doesn't support BAR2 SGE Queue registers.
+ */
+ if (is_t4(adapter->params.chip))
+ return -EINVAL;
+
+ /* Get our SGE Page Size parameters.
+ */
+ page_shift = adapter->params.sge.sge_vf_hps + 10;
+ page_size = 1 << page_shift;
+
+ /* Get the right Queues per Page parameters for our Queue.
+ */
+ qpp_shift = (qtype == T4_BAR2_QTYPE_EGRESS
+ ? adapter->params.sge.sge_vf_eq_qpp
+ : adapter->params.sge.sge_vf_iq_qpp);
+ qpp_mask = (1 << qpp_shift) - 1;
+
+ /* Calculate the basics of the BAR2 SGE Queue register area:
+ * o The BAR2 page the Queue registers will be in.
+ * o The BAR2 Queue ID.
+ * o The BAR2 Queue ID Offset into the BAR2 page.
+ */
+ bar2_page_offset = ((qid >> qpp_shift) << page_shift);
+ bar2_qid = qid & qpp_mask;
+ bar2_qid_offset = bar2_qid * SGE_UDB_SIZE;
+
+ /* If the BAR2 Queue ID Offset is less than the Page Size, then the
+ * hardware will infer the Absolute Queue ID simply from the writes to
+ * the BAR2 Queue ID Offset within the BAR2 Page (and we need to use a
+ * BAR2 Queue ID of 0 for those writes). Otherwise, we'll simply
+ * write to the first BAR2 SGE Queue Area within the BAR2 Page with
+ * the BAR2 Queue ID and the hardware will infer the Absolute Queue ID
+ * from the BAR2 Page and BAR2 Queue ID.
+ *
+ * One important censequence of this is that some BAR2 SGE registers
+ * have a "Queue ID" field and we can write the BAR2 SGE Queue ID
+ * there. But other registers synthesize the SGE Queue ID purely
+ * from the writes to the registers -- the Write Combined Doorbell
+ * Buffer is a good example. These BAR2 SGE Registers are only
+ * available for those BAR2 SGE Register areas where the SGE Absolute
+ * Queue ID can be inferred from simple writes.
+ */
+ bar2_qoffset = bar2_page_offset;
+ bar2_qinferred = (bar2_qid_offset < page_size);
+ if (bar2_qinferred) {
+ bar2_qoffset += bar2_qid_offset;
+ bar2_qid = 0;
+ }
+
+ *pbar2_qoffset = bar2_qoffset;
+ *pbar2_qid = bar2_qid;
+ return 0;
+}
+
+/**
* t4vf_get_sge_params - retrieve adapter Scatter gather Engine parameters
* @adapter: the adapter
*
@@ -507,7 +596,7 @@ int t4vf_get_sge_params(struct adapter *adapter)
*/
if (!is_t4(adapter->params.chip)) {
u32 whoami;
- unsigned int pf, s_qpp;
+ unsigned int pf, s_hps, s_qpp;
params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_REG) |
FW_PARAMS_PARAM_XYZ_V(
@@ -533,6 +622,13 @@ int t4vf_get_sge_params(struct adapter *adapter)
whoami = t4_read_reg(adapter,
T4VF_PL_BASE_ADDR + A_PL_VF_WHOAMI);
pf = SOURCEPF_GET(whoami);
+
+ s_hps = (HOSTPAGESIZEPF0_S +
+ (HOSTPAGESIZEPF1_S - HOSTPAGESIZEPF0_S) * pf);
+ sge_params->sge_vf_hps =
+ ((sge_params->sge_host_page_size >> s_hps)
+ & HOSTPAGESIZEPF0_M);
+
s_qpp = (QUEUESPERPAGEPF0_S +
(QUEUESPERPAGEPF1_S - QUEUESPERPAGEPF0_S) * pf);
sge_params->sge_vf_eq_qpp =
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 1/4] cxgb4vf: Add and initialize some sge params for VF driver
From: Hariprasad Shenai @ 2014-12-03 14:02 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, anish, nirranjan, kumaras, Hariprasad Shenai
In-Reply-To: <1417615374-12961-1-git-send-email-hariprasad@chelsio.com>
Add sge_vf_eq_qpp and sge_vf_iq_qpp to (struct sge_params), initialize
sge_queues_per_page and sge_vf_qpp in t4vf_get_sge_params(), add new
t4vf_prep_adapter() which initializes basic adapter parameters.
Grab both SGE_EGRESS_QUEUES_PER_PAGE_VF and SGE_INGRESS_QUEUES_PER_PAGE_VF
for VF Drivers since we need both to calculate the User Doorbell area
offsets for Egress and Ingress Queues.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 10 ++-
.../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 27 +++++++
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h | 11 ++-
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 77 ++++++++++++++++++++
4 files changed, 120 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index ccdf8a7..1efc862 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -164,8 +164,13 @@
#define HOSTPAGESIZEPF0(x) ((x) << HOSTPAGESIZEPF0_SHIFT)
#define SGE_EGRESS_QUEUES_PER_PAGE_PF 0x1010
-#define QUEUESPERPAGEPF0_MASK 0x0000000fU
-#define QUEUESPERPAGEPF0_GET(x) ((x) & QUEUESPERPAGEPF0_MASK)
+#define SGE_EGRESS_QUEUES_PER_PAGE_VF_A 0x1014
+
+#define QUEUESPERPAGEPF1_S 4
+
+#define QUEUESPERPAGEPF0_S 0
+#define QUEUESPERPAGEPF0_MASK 0x0000000fU
+#define QUEUESPERPAGEPF0_GET(x) ((x) & QUEUESPERPAGEPF0_MASK)
#define QUEUESPERPAGEPF0 0
#define QUEUESPERPAGEPF1 4
@@ -323,6 +328,7 @@
#define SGE_DEBUG_DATA_LOW_INDEX_3 0x12cc
#define SGE_DEBUG_DATA_HIGH_INDEX_10 0x12a8
#define SGE_INGRESS_QUEUES_PER_PAGE_PF 0x10f4
+#define SGE_INGRESS_QUEUES_PER_PAGE_VF_A 0x10f8
#define S_HP_INT_THRESH 28
#define M_HP_INT_THRESH 0xfU
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ad88246..c5425f0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2594,6 +2594,27 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
goto err_free_adapter;
}
+ /* Wait for the device to become ready before proceeding ...
+ */
+ err = t4vf_prep_adapter(adapter);
+ if (err) {
+ dev_err(adapter->pdev_dev, "device didn't become ready:"
+ " err=%d\n", err);
+ goto err_unmap_bar0;
+ }
+
+ /* For T5 and later we want to use the new BAR-based User Doorbells,
+ * so we need to map BAR2 here ...
+ */
+ if (!is_t4(adapter->params.chip)) {
+ adapter->bar2 = ioremap_wc(pci_resource_start(pdev, 2),
+ pci_resource_len(pdev, 2));
+ if (!adapter->bar2) {
+ dev_err(adapter->pdev_dev, "cannot map BAR2 doorbells\n");
+ err = -ENOMEM;
+ goto err_unmap_bar0;
+ }
+ }
/*
* Initialize adapter level features.
*/
@@ -2786,6 +2807,10 @@ err_free_dev:
}
err_unmap_bar:
+ if (!is_t4(adapter->params.chip))
+ iounmap(adapter->bar2);
+
+err_unmap_bar0:
iounmap(adapter->regs);
err_free_adapter:
@@ -2856,6 +2881,8 @@ static void cxgb4vf_pci_remove(struct pci_dev *pdev)
free_netdev(netdev);
}
iounmap(adapter->regs);
+ if (!is_t4(adapter->params.chip))
+ iounmap(adapter->bar2);
kfree(adapter);
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
index a608c66..6d6a956 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
@@ -135,9 +135,11 @@ struct dev_params {
struct sge_params {
u32 sge_control; /* padding, boundaries, lengths, etc. */
u32 sge_control2; /* T5: more of the same */
- u32 sge_host_page_size; /* RDMA page sizes */
- u32 sge_queues_per_page; /* RDMA queues/page */
- u32 sge_user_mode_limits; /* limits for BAR2 user mode accesses */
+ u32 sge_host_page_size; /* PF0-7 page sizes */
+ u32 sge_egress_queues_per_page; /* PF0-7 egress queues/page */
+ u32 sge_ingress_queues_per_page;/* PF0-7 ingress queues/page */
+ u32 sge_vf_eq_qpp; /* egress queues/page for our VF */
+ u32 sge_vf_iq_qpp; /* ingress queues/page for our VF */
u32 sge_fl_buffer_size[16]; /* free list buffer sizes */
u32 sge_ingress_rx_threshold; /* RX counter interrupt threshold[4] */
u32 sge_congestion_control; /* congestion thresholds, etc. */
@@ -267,6 +269,8 @@ static inline int t4vf_wr_mbox_ns(struct adapter *adapter, const void *cmd,
return t4vf_wr_mbox_core(adapter, cmd, size, rpl, false);
}
+#define CHELSIO_PCI_ID_VER(dev_id) ((dev_id) >> 12)
+
static inline int is_t4(enum chip_type chip)
{
return CHELSIO_CHIP_VERSION(chip) == CHELSIO_T4;
@@ -309,5 +313,6 @@ int t4vf_iq_free(struct adapter *, unsigned int, unsigned int, unsigned int,
int t4vf_eth_eq_free(struct adapter *, unsigned int);
int t4vf_handle_fw_rpl(struct adapter *, const __be64 *);
+int t4vf_prep_adapter(struct adapter *);
#endif /* __T4VF_COMMON_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
index 624a213..03ab19e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
@@ -501,6 +501,48 @@ int t4vf_get_sge_params(struct adapter *adapter)
sge_params->sge_ingress_rx_threshold = vals[0];
sge_params->sge_congestion_control = vals[1];
+ /* For T5 and later we want to use the new BAR2 Doorbells.
+ * Unfortunately, older firmware didn't allow the this register to be
+ * read.
+ */
+ if (!is_t4(adapter->params.chip)) {
+ u32 whoami;
+ unsigned int pf, s_qpp;
+
+ params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_REG) |
+ FW_PARAMS_PARAM_XYZ_V(
+ SGE_EGRESS_QUEUES_PER_PAGE_VF_A));
+ params[1] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_REG) |
+ FW_PARAMS_PARAM_XYZ_V(
+ SGE_INGRESS_QUEUES_PER_PAGE_VF_A));
+ v = t4vf_query_params(adapter, 2, params, vals);
+ if (v != FW_SUCCESS) {
+ dev_warn(adapter->pdev_dev,
+ "Unable to get VF SGE Queues/Page; "
+ "probably old firmware.\n");
+ return v;
+ }
+ sge_params->sge_egress_queues_per_page = vals[0];
+ sge_params->sge_ingress_queues_per_page = vals[1];
+
+ /* We need the Queues/Page for our VF. This is based on the
+ * PF from which we're instantiated and is indexed in the
+ * register we just read. Do it once here so other code in
+ * the driver can just use it.
+ */
+ whoami = t4_read_reg(adapter,
+ T4VF_PL_BASE_ADDR + A_PL_VF_WHOAMI);
+ pf = SOURCEPF_GET(whoami);
+ s_qpp = (QUEUESPERPAGEPF0_S +
+ (QUEUESPERPAGEPF1_S - QUEUESPERPAGEPF0_S) * pf);
+ sge_params->sge_vf_eq_qpp =
+ ((sge_params->sge_egress_queues_per_page >> s_qpp)
+ & QUEUESPERPAGEPF0_MASK);
+ sge_params->sge_vf_iq_qpp =
+ ((sge_params->sge_ingress_queues_per_page >> s_qpp)
+ & QUEUESPERPAGEPF0_MASK);
+ }
+
return 0;
}
@@ -1420,3 +1462,38 @@ int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
}
return 0;
}
+
+/**
+ */
+int t4vf_prep_adapter(struct adapter *adapter)
+{
+ int err;
+ unsigned int chipid;
+
+ /* Wait for the device to become ready before proceeding ...
+ */
+ err = t4vf_wait_dev_ready(adapter);
+ if (err)
+ return err;
+
+ /* Default port and clock for debugging in case we can't reach
+ * firmware.
+ */
+ adapter->params.nports = 1;
+ adapter->params.vfres.pmask = 1;
+ adapter->params.vpd.cclk = 50000;
+
+ adapter->params.chip = 0;
+ switch (CHELSIO_PCI_ID_VER(adapter->pdev->device)) {
+ case CHELSIO_T4:
+ adapter->params.chip |= CHELSIO_CHIP_CODE(CHELSIO_T4, 0);
+ break;
+
+ case CHELSIO_T5:
+ chipid = G_REV(t4_read_reg(adapter, A_PL_VF_REV));
+ adapter->params.chip |= CHELSIO_CHIP_CODE(CHELSIO_T5, chipid);
+ break;
+ }
+
+ return 0;
+}
--
1.7.1
^ permalink raw reply related
* [PATCH net-next 0/4] cxgb4/cxgb4vf: T5 BAR2 and ethtool related fixes
From: Hariprasad Shenai @ 2014-12-03 14:02 UTC (permalink / raw)
To: netdev; +Cc: davem, leedom, anish, nirranjan, kumaras, Hariprasad Shenai
Hi,
This series adds new interface to calculate BAR2 SGE queue register address for
cxgb4 and cxgb4vf driver and some more sge related fixes for T5. Also adds a
patch which updates the FW version displayed by ethtool after firmware flash.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and cxgb4vf driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
Thanks
Hariprasad Shenai (4):
cxgb4vf: Add and initialize some sge params for VF driver
cxgb4/cxgb4vf: Add code to calculate T5 BAR2 Offsets for SGE Queue
Registers
cxgb4/cxgb4vf: Use new interfaces to calculate BAR2 SGE Queue
Register addresses
cxgb4: Update firmware version after flashing it via ethtool
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 25 +++-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 60 +++---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 7 +
drivers/net/ethernet/chelsio/cxgb4/sge.c | 148 ++++++++--------
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 122 +++++++++++++
drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 22 ++-
drivers/net/ethernet/chelsio/cxgb4vf/adapter.h | 7 +
.../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 39 +++--
drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 185 +++++++++++++++++---
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h | 19 ++-
drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 173 ++++++++++++++++++
11 files changed, 656 insertions(+), 151 deletions(-)
^ permalink raw reply
* Re: [patch net-next 0/6] net_sched: cls_*: couple of fixes
From: Thomas Graf @ 2014-12-03 13:50 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, jhs
In-Reply-To: <1417539636-12710-1-git-send-email-jiri@resnulli.us>
On 12/02/14 at 06:00pm, Jiri Pirko wrote:
> Jiri Pirko (6):
> net_sched: cls_basic: remove unnecessary iteration and use passed arg
> net_sched: cls_bpf: remove unnecessary iteration and use passed arg
> net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
> net_sched: cls_flow: remove faulty use of list_for_each_entry_rcu
> net_sched: cls_flow: remove duplicate assignments
> net_sched: cls_cgroup: remove unnecessary if
Looks sane. RCU is obviously not needed as all cls_api calls
are holding RTNL by now.
Reviewed-by: Thomas Graf <tgraf@suug.ch>
^ permalink raw reply
* Re: [patch net-next 3/6] net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
From: Pablo Neira Ayuso @ 2014-12-03 13:50 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, jhs
In-Reply-To: <20141203132142.GG1860@nanopsycho.orion>
On Wed, Dec 03, 2014 at 02:21:42PM +0100, Jiri Pirko wrote:
> Wed, Dec 03, 2014 at 01:19:49PM CET, pablo@netfilter.org wrote:
> >On Tue, Dec 02, 2014 at 06:00:33PM +0100, Jiri Pirko wrote:
> >> rcu variant is not correct here. The code is called by updater (rtnl
> >> lock is held), not by reader (no rcu_read_lock is held).
> >>
> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> ---
> >> net/sched/cls_bpf.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> >> index cbfaf6f..d0de979 100644
> >> --- a/net/sched/cls_bpf.c
> >> +++ b/net/sched/cls_bpf.c
> >> @@ -141,7 +141,7 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
> >> if (head == NULL)
> >> return 0UL;
> >>
> >> - list_for_each_entry_rcu(prog, &head->plist, link) {
> >> + list_for_each_entry(prog, &head->plist, link) {
> >> if (prog->handle == handle) {
> >> ret = (unsigned long) prog;
> >> break;
> >> @@ -337,7 +337,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
> >> struct cls_bpf_head *head = rtnl_dereference(tp->root);
> >> struct cls_bpf_prog *prog;
> >>
> >> - list_for_each_entry_rcu(prog, &head->plist, link) {
> >> + list_for_each_entry(prog, &head->plist, link) {
> >
> >We still need the _rcu here in the walk path. IIRC, this is called from the
> >dump path and we hold no rtnl_lock there.
>
> It is called from tc_dump_tfilter only. And tc_dump_tfilter is always
> called with rtnl_lock
Right, nlk->cb_mutex is set to rtnl_lock so netlink_dump() grabs it
for each recvmsg() invocation.
^ permalink raw reply
* [PATCH net-next] tipc: drop tx side permission checks
From: erik.hugne @ 2014-12-03 13:44 UTC (permalink / raw)
To: netdev; +Cc: jon.maloy, tipc-discussion, Erik
From: Erik Hugne <erik.hugne@ericsson.com>
Part of the old remote management feature is a piece of code
that checked permissions on the local system to see if a certain
operation was permitted, and if so pass the command to a remote
node. This serves no purpose after the removal of remote management
with commit 5902385a2440 ("tipc: obsolete the remote management
feature") so we remove it.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
---
net/tipc/socket.c | 38 +-------------------------------------
1 file changed, 1 insertion(+), 37 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 9658d9b..4731cad 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -827,39 +827,6 @@ exit:
return TIPC_OK;
}
-/**
- * dest_name_check - verify user is permitted to send to specified port name
- * @dest: destination address
- * @m: descriptor for message to be sent
- *
- * Prevents restricted configuration commands from being issued by
- * unauthorized users.
- *
- * Returns 0 if permission is granted, otherwise errno
- */
-static int dest_name_check(struct sockaddr_tipc *dest, struct msghdr *m)
-{
- struct tipc_cfg_msg_hdr hdr;
-
- if (unlikely(dest->addrtype == TIPC_ADDR_ID))
- return 0;
- if (likely(dest->addr.name.name.type >= TIPC_RESERVED_TYPES))
- return 0;
- if (likely(dest->addr.name.name.type == TIPC_TOP_SRV))
- return 0;
- if (likely(dest->addr.name.name.type != TIPC_CFG_SRV))
- return -EACCES;
-
- if (!m->msg_iovlen || (m->msg_iov[0].iov_len < sizeof(hdr)))
- return -EMSGSIZE;
- if (copy_from_user(&hdr, m->msg_iov[0].iov_base, sizeof(hdr)))
- return -EFAULT;
- if ((ntohs(hdr.tcm_type) & 0xC000) && (!capable(CAP_NET_ADMIN)))
- return -EACCES;
-
- return 0;
-}
-
static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p)
{
struct sock *sk = sock->sk;
@@ -912,7 +879,7 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
struct tipc_name_seq *seq = &dest->addr.nameseq;
u32 mtu;
long timeo;
- int rc = -EINVAL;
+ int rc;
if (unlikely(!dest))
return -EDESTADDRREQ;
@@ -945,9 +912,6 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
tsk->conn_instance = dest->addr.name.name.instance;
}
}
- rc = dest_name_check(dest, m);
- if (rc)
- goto exit;
timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
--
2.1.3
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
^ permalink raw reply related
* Re: [patch net-next 2/6] net_sched: cls_bpf: remove unnecessary iteration and use passed arg
From: Daniel Borkmann @ 2014-12-03 13:38 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, jhs
In-Reply-To: <1417539636-12710-3-git-send-email-jiri@resnulli.us>
On 12/02/2014 06:00 PM, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
^ permalink raw reply
* Re: [patch net-next 3/6] net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
From: Jiri Pirko @ 2014-12-03 13:26 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, davem
In-Reply-To: <547F0743.4040301@mojatatu.com>
Wed, Dec 03, 2014 at 01:51:15PM CET, jhs@mojatatu.com wrote:
>On 12/02/14 12:00, Jiri Pirko wrote:
>>rcu variant is not correct here. The code is called by updater (rtnl
>>lock is held), not by reader (no rcu_read_lock is held).
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>> net/sched/cls_bpf.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>>index cbfaf6f..d0de979 100644
>>--- a/net/sched/cls_bpf.c
>>+++ b/net/sched/cls_bpf.c
>>@@ -141,7 +141,7 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
>> if (head == NULL)
>> return 0UL;
>>
>>- list_for_each_entry_rcu(prog, &head->plist, link) {
>>+ list_for_each_entry(prog, &head->plist, link) {
>> if (prog->handle == handle) {
>> ret = (unsigned long) prog;
>
>The above is ok i think - only one user space entrant at a time
>and datapath is not affected because no modification is happening.
>
>> break;
>>@@ -337,7 +337,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
>> struct cls_bpf_head *head = rtnl_dereference(tp->root);
>> struct cls_bpf_prog *prog;
>>
>>- list_for_each_entry_rcu(prog, &head->plist, link) {
>>+ list_for_each_entry(prog, &head->plist, link) {
>> if (arg->count < arg->skip)
>> goto skip;
>> if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
>>
>
>I think this may be problematic. Doesnt a flush operation also use the
>walker?
I don't believe so. Just look at tc_dump_tfilter.
But even if that would the the case, _rcu variant is wrong (yep, it
would have to be replaced by _safe variant then).
>
>cheers,
>jamal
>
>
^ permalink raw reply
* Re: [patch net-next 3/6] net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
From: Jiri Pirko @ 2014-12-03 13:21 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netdev, davem, jhs
In-Reply-To: <20141203121949.GA24370@salvia>
Wed, Dec 03, 2014 at 01:19:49PM CET, pablo@netfilter.org wrote:
>On Tue, Dec 02, 2014 at 06:00:33PM +0100, Jiri Pirko wrote:
>> rcu variant is not correct here. The code is called by updater (rtnl
>> lock is held), not by reader (no rcu_read_lock is held).
>>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>> net/sched/cls_bpf.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>> index cbfaf6f..d0de979 100644
>> --- a/net/sched/cls_bpf.c
>> +++ b/net/sched/cls_bpf.c
>> @@ -141,7 +141,7 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
>> if (head == NULL)
>> return 0UL;
>>
>> - list_for_each_entry_rcu(prog, &head->plist, link) {
>> + list_for_each_entry(prog, &head->plist, link) {
>> if (prog->handle == handle) {
>> ret = (unsigned long) prog;
>> break;
>> @@ -337,7 +337,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
>> struct cls_bpf_head *head = rtnl_dereference(tp->root);
>> struct cls_bpf_prog *prog;
>>
>> - list_for_each_entry_rcu(prog, &head->plist, link) {
>> + list_for_each_entry(prog, &head->plist, link) {
>
>We still need the _rcu here in the walk path. IIRC, this is called from the
>dump path and we hold no rtnl_lock there.
It is called from tc_dump_tfilter only. And tc_dump_tfilter is always
called with rtnl_lock
>
>> if (arg->count < arg->skip)
>> goto skip;
>> if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
>> --
>> 1.9.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [patch net-next 6/6] net_sched: cls_cgroup: remove unnecessary if
From: Jiri Pirko @ 2014-12-03 13:18 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, davem
In-Reply-To: <547F0AFA.7090001@mojatatu.com>
Wed, Dec 03, 2014 at 02:07:06PM CET, jhs@mojatatu.com wrote:
>On 12/02/14 12:00, Jiri Pirko wrote:
>>since head->handle == handle (checked before), just assign handle.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>> net/sched/cls_cgroup.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>>diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
>>index d61a801..dbee65e 100644
>>--- a/net/sched/cls_cgroup.c
>>+++ b/net/sched/cls_cgroup.c
>>@@ -117,11 +117,7 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
>> return -ENOBUFS;
>>
>> tcf_exts_init(&new->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
>>- if (head)
>>- new->handle = head->handle;
>>- else
>>- new->handle = handle;
>>-
>>+ new->handle = handle;
>
>
>Hrm. head could be NULL, no?
Sure it can. But that is not a problem. Not sure what you are trying to
point at...
>
>cheers,
>jamal
>
^ permalink raw reply
* [patch net-next v2 2/2] rocker: fix eth_type type in struct rocker_ctrl
From: Jiri Pirko @ 2014-12-03 13:14 UTC (permalink / raw)
To: netdev; +Cc: davem, sfeldma
In-Reply-To: <1417612494-3788-1-git-send-email-jiri@resnulli.us>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
v1->v2:
-fixed typo in subject pointed out by Sergei
---
drivers/net/ethernet/rocker/rocker.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 4b060fb..5536435 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -2753,7 +2753,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
static struct rocker_ctrl {
const u8 *eth_dst;
const u8 *eth_dst_mask;
- u16 eth_type;
+ __be16 eth_type;
bool acl;
bool bridge;
bool term;
--
1.9.3
^ permalink raw reply related
* [patch net-next v2 1/2] rocker: introduce be put/get variants and use it when appropriate
From: Jiri Pirko @ 2014-12-03 13:14 UTC (permalink / raw)
To: netdev; +Cc: davem, sfeldma
This kills the sparse warnings.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
v1->v2:
-no change
---
drivers/net/ethernet/rocker/rocker.c | 79 ++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index fded127..4b060fb 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -648,6 +648,11 @@ static u16 rocker_tlv_get_u16(const struct rocker_tlv *tlv)
return *(u16 *) rocker_tlv_data(tlv);
}
+static __be16 rocker_tlv_get_be16(const struct rocker_tlv *tlv)
+{
+ return *(__be16 *) rocker_tlv_data(tlv);
+}
+
static u32 rocker_tlv_get_u32(const struct rocker_tlv *tlv)
{
return *(u32 *) rocker_tlv_data(tlv);
@@ -726,12 +731,24 @@ static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
}
+static int rocker_tlv_put_be16(struct rocker_desc_info *desc_info,
+ int attrtype, __be16 value)
+{
+ return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), &value);
+}
+
static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
int attrtype, u32 value)
{
return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
}
+static int rocker_tlv_put_be32(struct rocker_desc_info *desc_info,
+ int attrtype, __be32 value)
+{
+ return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), &value);
+}
+
static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
int attrtype, u64 value)
{
@@ -1343,7 +1360,7 @@ static int rocker_event_mac_vlan_seen(struct rocker *rocker,
port_number =
rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT]) - 1;
addr = rocker_tlv_data(attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC]);
- vlan_id = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
+ vlan_id = rocker_tlv_get_be16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
if (port_number >= rocker->port_count)
return -EINVAL;
@@ -1717,18 +1734,18 @@ static int rocker_cmd_flow_tbl_add_vlan(struct rocker_desc_info *desc_info,
if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
entry->key.vlan.in_lport))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->key.vlan.vlan_id))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->key.vlan.vlan_id))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
- entry->key.vlan.vlan_id_mask))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+ entry->key.vlan.vlan_id_mask))
return -EMSGSIZE;
if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
entry->key.vlan.goto_tbl))
return -EMSGSIZE;
if (entry->key.vlan.untagged &&
- rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
- entry->key.vlan.new_vlan_id))
+ rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
+ entry->key.vlan.new_vlan_id))
return -EMSGSIZE;
return 0;
@@ -1743,8 +1760,8 @@ static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
entry->key.term_mac.in_lport_mask))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
- entry->key.term_mac.eth_type))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+ entry->key.term_mac.eth_type))
return -EMSGSIZE;
if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
ETH_ALEN, entry->key.term_mac.eth_dst))
@@ -1752,11 +1769,11 @@ static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
ETH_ALEN, entry->key.term_mac.eth_dst_mask))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->key.term_mac.vlan_id))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->key.term_mac.vlan_id))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
- entry->key.term_mac.vlan_id_mask))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+ entry->key.term_mac.vlan_id_mask))
return -EMSGSIZE;
if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
entry->key.term_mac.goto_tbl))
@@ -1773,14 +1790,14 @@ static int
rocker_cmd_flow_tbl_add_ucast_routing(struct rocker_desc_info *desc_info,
struct rocker_flow_tbl_entry *entry)
{
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
- entry->key.ucast_routing.eth_type))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+ entry->key.ucast_routing.eth_type))
return -EMSGSIZE;
- if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
- entry->key.ucast_routing.dst4))
+ if (rocker_tlv_put_be32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
+ entry->key.ucast_routing.dst4))
return -EMSGSIZE;
- if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
- entry->key.ucast_routing.dst4_mask))
+ if (rocker_tlv_put_be32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
+ entry->key.ucast_routing.dst4_mask))
return -EMSGSIZE;
if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
entry->key.ucast_routing.goto_tbl))
@@ -1804,8 +1821,8 @@ static int rocker_cmd_flow_tbl_add_bridge(struct rocker_desc_info *desc_info,
ETH_ALEN, entry->key.bridge.eth_dst_mask))
return -EMSGSIZE;
if (entry->key.bridge.vlan_id &&
- rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->key.bridge.vlan_id))
+ rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->key.bridge.vlan_id))
return -EMSGSIZE;
if (entry->key.bridge.tunnel_id &&
rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_TUNNEL_ID,
@@ -1846,14 +1863,14 @@ static int rocker_cmd_flow_tbl_add_acl(struct rocker_desc_info *desc_info,
if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
ETH_ALEN, entry->key.acl.eth_dst_mask))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
- entry->key.acl.eth_type))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+ entry->key.acl.eth_type))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->key.acl.vlan_id))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->key.acl.vlan_id))
return -EMSGSIZE;
- if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
- entry->key.acl.vlan_id_mask))
+ if (rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+ entry->key.acl.vlan_id_mask))
return -EMSGSIZE;
switch (ntohs(entry->key.acl.eth_type)) {
@@ -2002,8 +2019,8 @@ rocker_cmd_group_tbl_add_l2_rewrite(struct rocker_desc_info *desc_info,
ETH_ALEN, entry->l2_rewrite.eth_dst))
return -EMSGSIZE;
if (entry->l2_rewrite.vlan_id &&
- rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->l2_rewrite.vlan_id))
+ rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->l2_rewrite.vlan_id))
return -EMSGSIZE;
return 0;
@@ -2048,8 +2065,8 @@ rocker_cmd_group_tbl_add_l3_unicast(struct rocker_desc_info *desc_info,
ETH_ALEN, entry->l3_unicast.eth_dst))
return -EMSGSIZE;
if (entry->l3_unicast.vlan_id &&
- rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
- entry->l3_unicast.vlan_id))
+ rocker_tlv_put_be16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+ entry->l3_unicast.vlan_id))
return -EMSGSIZE;
if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_TTL_CHECK,
entry->l3_unicast.ttl_check))
--
1.9.3
^ permalink raw reply related
* Re: [patch net-next 6/6] net_sched: cls_cgroup: remove unnecessary if
From: Jamal Hadi Salim @ 2014-12-03 13:07 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: davem
In-Reply-To: <1417539636-12710-7-git-send-email-jiri@resnulli.us>
On 12/02/14 12:00, Jiri Pirko wrote:
> since head->handle == handle (checked before), just assign handle.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
> net/sched/cls_cgroup.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index d61a801..dbee65e 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -117,11 +117,7 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
> return -ENOBUFS;
>
> tcf_exts_init(&new->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
> - if (head)
> - new->handle = head->handle;
> - else
> - new->handle = handle;
> -
> + new->handle = handle;
Hrm. head could be NULL, no?
cheers,
jamal
^ permalink raw reply
* Re: [patch net-next 4/6] net_sched: cls_flow: remove faulty use of list_for_each_entry_rcu
From: Jamal Hadi Salim @ 2014-12-03 13:02 UTC (permalink / raw)
To: Jiri Pirko, netdev; +Cc: davem
In-Reply-To: <1417539636-12710-5-git-send-email-jiri@resnulli.us>
On 12/02/14 12:00, Jiri Pirko wrote:
> rcu variant is not correct here. The code is called by updater (rtnl
> lock is held), not by reader (no rcu_read_lock is held).
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Same comment as other one..
I actually i am not so sure about the get_ either.
The core may end up invoking that first before doing a del
so for consistency we need rcu version.
cheers,
jamal
^ permalink raw reply
* Re: [patch net-next 2/2] rocker: fix eth_type tybe in struct rocker_ctrl
From: Jiri Pirko @ 2014-12-03 12:57 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: netdev, davem, sfeldma
In-Reply-To: <547F0690.4070307@cogentembedded.com>
Wed, Dec 03, 2014 at 01:48:16PM CET, sergei.shtylyov@cogentembedded.com wrote:
>Hello.
>
>On 12/3/2014 1:32 PM, Jiri Pirko wrote:
>
> s/tybe/type in the subject.
>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>
>WBR, Sergei
>
Oh, will fix and send v2, thanks.
^ permalink raw reply
* [PATCH 12/12] netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The elements must be u32 sized for the used hash function.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipset/ip_set_hash_netnet.c | 2 ++
net/netfilter/ipset/ip_set_hash_netportnet.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c
index da00284..ea8772a 100644
--- a/net/netfilter/ipset/ip_set_hash_netnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netnet.c
@@ -46,6 +46,7 @@ struct hash_netnet4_elem {
__be64 ipcmp;
};
u8 nomatch;
+ u8 padding;
union {
u8 cidr[2];
u16 ccmp;
@@ -271,6 +272,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
struct hash_netnet6_elem {
union nf_inet_addr ip[2];
u8 nomatch;
+ u8 padding;
union {
u8 cidr[2];
u16 ccmp;
diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c
index b8053d6..bfaa94c 100644
--- a/net/netfilter/ipset/ip_set_hash_netportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
@@ -53,6 +53,7 @@ struct hash_netportnet4_elem {
u8 cidr[2];
u16 ccmp;
};
+ u16 padding;
u8 nomatch:1;
u8 proto;
};
@@ -324,6 +325,7 @@ struct hash_netportnet6_elem {
u8 cidr[2];
u16 ccmp;
};
+ u16 padding;
u8 nomatch:1;
u8 proto;
};
--
1.7.10.4
^ permalink raw reply related
* [PATCH 10/12] netfilter: ipset: Simplify cidr handling for hash:*net* types
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipset/ip_set_hash_gen.h | 56 ++++++++++++++++-----------------
1 file changed, 28 insertions(+), 28 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 9428fa5..8ef9135 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -147,11 +147,17 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize)
#else
#define __CIDR(cidr, i) (cidr)
#endif
+
+/* cidr + 1 is stored in net_prefixes to support /0 */
+#define SCIDR(cidr, i) (__CIDR(cidr, i) + 1)
+
#ifdef IP_SET_HASH_WITH_NETS_PACKED
-/* When cidr is packed with nomatch, cidr - 1 is stored in the entry */
-#define CIDR(cidr, i) (__CIDR(cidr, i) + 1)
+/* When cidr is packed with nomatch, cidr - 1 is stored in the data entry */
+#define GCIDR(cidr, i) (__CIDR(cidr, i) + 1)
+#define NCIDR(cidr) (cidr)
#else
-#define CIDR(cidr, i) (__CIDR(cidr, i))
+#define GCIDR(cidr, i) (__CIDR(cidr, i))
+#define NCIDR(cidr) (cidr - 1)
#endif
#define SET_HOST_MASK(family) (family == AF_INET ? 32 : 128)
@@ -292,24 +298,22 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
int i, j;
/* Add in increasing prefix order, so larger cidr first */
- for (i = 0, j = -1; i < nets_length && h->nets[i].nets[n]; i++) {
+ for (i = 0, j = -1; i < nets_length && h->nets[i].cidr[n]; i++) {
if (j != -1)
continue;
else if (h->nets[i].cidr[n] < cidr)
j = i;
else if (h->nets[i].cidr[n] == cidr) {
- h->nets[i].nets[n]++;
+ h->nets[cidr - 1].nets[n]++;
return;
}
}
if (j != -1) {
- for (; i > j; i--) {
+ for (; i > j; i--)
h->nets[i].cidr[n] = h->nets[i - 1].cidr[n];
- h->nets[i].nets[n] = h->nets[i - 1].nets[n];
- }
}
h->nets[i].cidr[n] = cidr;
- h->nets[i].nets[n] = 1;
+ h->nets[cidr - 1].nets[n] = 1;
}
static void
@@ -320,16 +324,12 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
for (i = 0; i < nets_length; i++) {
if (h->nets[i].cidr[n] != cidr)
continue;
- if (h->nets[i].nets[n] > 1 || i == net_end ||
- h->nets[i + 1].nets[n] == 0) {
- h->nets[i].nets[n]--;
+ h->nets[cidr -1].nets[n]--;
+ if (h->nets[cidr -1].nets[n] > 0)
return;
- }
- for (j = i; j < net_end && h->nets[j].nets[n]; j++) {
+ for (j = i; j < net_end && h->nets[j].cidr[n]; j++)
h->nets[j].cidr[n] = h->nets[j + 1].cidr[n];
- h->nets[j].nets[n] = h->nets[j + 1].nets[n];
- }
- h->nets[j].nets[n] = 0;
+ h->nets[j].cidr[n] = 0;
return;
}
}
@@ -486,7 +486,7 @@ mtype_expire(struct ip_set *set, struct htype *h, u8 nets_length, size_t dsize)
pr_debug("expired %u/%u\n", i, j);
#ifdef IP_SET_HASH_WITH_NETS
for (k = 0; k < IPSET_NET_COUNT; k++)
- mtype_del_cidr(h, CIDR(data->cidr, k),
+ mtype_del_cidr(h, SCIDR(data->cidr, k),
nets_length, k);
#endif
ip_set_ext_destroy(set, data);
@@ -680,9 +680,9 @@ reuse_slot:
data = ahash_data(n, j, set->dsize);
#ifdef IP_SET_HASH_WITH_NETS
for (i = 0; i < IPSET_NET_COUNT; i++) {
- mtype_del_cidr(h, CIDR(data->cidr, i),
+ mtype_del_cidr(h, SCIDR(data->cidr, i),
NLEN(set->family), i);
- mtype_add_cidr(h, CIDR(d->cidr, i),
+ mtype_add_cidr(h, SCIDR(d->cidr, i),
NLEN(set->family), i);
}
#endif
@@ -699,7 +699,7 @@ reuse_slot:
data = ahash_data(n, n->pos++, set->dsize);
#ifdef IP_SET_HASH_WITH_NETS
for (i = 0; i < IPSET_NET_COUNT; i++)
- mtype_add_cidr(h, CIDR(d->cidr, i), NLEN(set->family),
+ mtype_add_cidr(h, SCIDR(d->cidr, i), NLEN(set->family),
i);
#endif
h->elements++;
@@ -760,7 +760,7 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext,
h->elements--;
#ifdef IP_SET_HASH_WITH_NETS
for (j = 0; j < IPSET_NET_COUNT; j++)
- mtype_del_cidr(h, CIDR(d->cidr, j), NLEN(set->family),
+ mtype_del_cidr(h, SCIDR(d->cidr, j), NLEN(set->family),
j);
#endif
ip_set_ext_destroy(set, data);
@@ -821,15 +821,15 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
u8 nets_length = NLEN(set->family);
pr_debug("test by nets\n");
- for (; j < nets_length && h->nets[j].nets[0] && !multi; j++) {
+ for (; j < nets_length && h->nets[j].cidr[0] && !multi; j++) {
#if IPSET_NET_COUNT == 2
mtype_data_reset_elem(d, &orig);
- mtype_data_netmask(d, h->nets[j].cidr[0], false);
- for (k = 0; k < nets_length && h->nets[k].nets[1] && !multi;
+ mtype_data_netmask(d, NCIDR(h->nets[j].cidr[0]), false);
+ for (k = 0; k < nets_length && h->nets[k].cidr[1] && !multi;
k++) {
- mtype_data_netmask(d, h->nets[k].cidr[1], true);
+ mtype_data_netmask(d, NCIDR(h->nets[k].cidr[1]), true);
#else
- mtype_data_netmask(d, h->nets[j].cidr[0]);
+ mtype_data_netmask(d, NCIDR(h->nets[j].cidr[0]));
#endif
key = HKEY(d, h->initval, t->htable_bits);
n = hbucket(t, key);
@@ -877,7 +877,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
/* If we test an IP address and not a network address,
* try all possible network sizes */
for (i = 0; i < IPSET_NET_COUNT; i++)
- if (CIDR(d->cidr, i) != SET_HOST_MASK(set->family))
+ if (GCIDR(d->cidr, i) != SET_HOST_MASK(set->family))
break;
if (i == IPSET_NET_COUNT) {
ret = mtype_test_cidrs(set, d, ext, mext, flags);
--
1.7.10.4
^ permalink raw reply related
* [PATCH 08/12] netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Sven-Haegar Koch reported the issue:
sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.
In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32
which was introduced by the counter extension in ipset.
The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/uapi/linux/netfilter/ipset/ip_set.h | 8 ++-
include/uapi/linux/netfilter/xt_set.h | 13 ++++-
net/netfilter/xt_set.c | 73 +++++++++++++++++++++++++--
3 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/netfilter/ipset/ip_set.h b/include/uapi/linux/netfilter/ipset/ip_set.h
index ca03119..5ab4e60 100644
--- a/include/uapi/linux/netfilter/ipset/ip_set.h
+++ b/include/uapi/linux/netfilter/ipset/ip_set.h
@@ -256,11 +256,17 @@ enum {
IPSET_COUNTER_GT,
};
-struct ip_set_counter_match {
+/* Backward compatibility for set match v3 */
+struct ip_set_counter_match0 {
__u8 op;
__u64 value;
};
+struct ip_set_counter_match {
+ __aligned_u64 value;
+ __u8 op;
+};
+
/* Interface to iptables/ip6tables */
#define SO_IP_SET 83
diff --git a/include/uapi/linux/netfilter/xt_set.h b/include/uapi/linux/netfilter/xt_set.h
index d6a1df1..d4e0234 100644
--- a/include/uapi/linux/netfilter/xt_set.h
+++ b/include/uapi/linux/netfilter/xt_set.h
@@ -66,8 +66,8 @@ struct xt_set_info_target_v2 {
struct xt_set_info_match_v3 {
struct xt_set_info match_set;
- struct ip_set_counter_match packets;
- struct ip_set_counter_match bytes;
+ struct ip_set_counter_match0 packets;
+ struct ip_set_counter_match0 bytes;
__u32 flags;
};
@@ -81,4 +81,13 @@ struct xt_set_info_target_v3 {
__u32 timeout;
};
+/* Revision 4 match */
+
+struct xt_set_info_match_v4 {
+ struct xt_set_info match_set;
+ struct ip_set_counter_match packets;
+ struct ip_set_counter_match bytes;
+ __u32 flags;
+};
+
#endif /*_XT_SET_H*/
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 5732cd6..0d47afe 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -157,7 +157,7 @@ set_match_v1_destroy(const struct xt_mtdtor_param *par)
/* Revision 3 match */
static bool
-match_counter(u64 counter, const struct ip_set_counter_match *info)
+match_counter0(u64 counter, const struct ip_set_counter_match0 *info)
{
switch (info->op) {
case IPSET_COUNTER_NONE:
@@ -192,14 +192,60 @@ set_match_v3(const struct sk_buff *skb, struct xt_action_param *par)
if (!(ret && opt.cmdflags & IPSET_FLAG_MATCH_COUNTERS))
return ret;
- if (!match_counter(opt.ext.packets, &info->packets))
+ if (!match_counter0(opt.ext.packets, &info->packets))
return 0;
- return match_counter(opt.ext.bytes, &info->bytes);
+ return match_counter0(opt.ext.bytes, &info->bytes);
}
#define set_match_v3_checkentry set_match_v1_checkentry
#define set_match_v3_destroy set_match_v1_destroy
+/* Revision 4 match */
+
+static bool
+match_counter(u64 counter, const struct ip_set_counter_match *info)
+{
+ switch (info->op) {
+ case IPSET_COUNTER_NONE:
+ return true;
+ case IPSET_COUNTER_EQ:
+ return counter == info->value;
+ case IPSET_COUNTER_NE:
+ return counter != info->value;
+ case IPSET_COUNTER_LT:
+ return counter < info->value;
+ case IPSET_COUNTER_GT:
+ return counter > info->value;
+ }
+ return false;
+}
+
+static bool
+set_match_v4(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_set_info_match_v4 *info = par->matchinfo;
+ ADT_OPT(opt, par->family, info->match_set.dim,
+ info->match_set.flags, info->flags, UINT_MAX);
+ int ret;
+
+ if (info->packets.op != IPSET_COUNTER_NONE ||
+ info->bytes.op != IPSET_COUNTER_NONE)
+ opt.cmdflags |= IPSET_FLAG_MATCH_COUNTERS;
+
+ ret = match_set(info->match_set.index, skb, par, &opt,
+ info->match_set.flags & IPSET_INV_MATCH);
+
+ if (!(ret && opt.cmdflags & IPSET_FLAG_MATCH_COUNTERS))
+ return ret;
+
+ if (!match_counter(opt.ext.packets, &info->packets))
+ return 0;
+ return match_counter(opt.ext.bytes, &info->bytes);
+}
+
+#define set_match_v4_checkentry set_match_v1_checkentry
+#define set_match_v4_destroy set_match_v1_destroy
+
/* Revision 0 interface: backward compatible with netfilter/iptables */
static unsigned int
@@ -573,6 +619,27 @@ static struct xt_match set_matches[] __read_mostly = {
.destroy = set_match_v3_destroy,
.me = THIS_MODULE
},
+ /* new revision for counters support: update, match */
+ {
+ .name = "set",
+ .family = NFPROTO_IPV4,
+ .revision = 4,
+ .match = set_match_v4,
+ .matchsize = sizeof(struct xt_set_info_match_v4),
+ .checkentry = set_match_v4_checkentry,
+ .destroy = set_match_v4_destroy,
+ .me = THIS_MODULE
+ },
+ {
+ .name = "set",
+ .family = NFPROTO_IPV6,
+ .revision = 4,
+ .match = set_match_v4,
+ .matchsize = sizeof(struct xt_set_info_match_v4),
+ .checkentry = set_match_v4_checkentry,
+ .destroy = set_match_v4_destroy,
+ .me = THIS_MODULE
+ },
};
static struct xt_target set_targets[] __read_mostly = {
--
1.7.10.4
^ permalink raw reply related
* [PATCH 00/12] Netfilter updates for net-next
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
Hi David,
The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:
1) Rise maximum number per IP address to be remembered in xt_recent
while retaining backward compatibility, from Florian Westphal.
2) Skip zeroing timer area in nf_conn objects, also from Florian.
3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
using meta l4proto and transport layer header, from Alvaro Neira.
4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
and IP6_NF_IPTABLES=n.
And ipset updates from Jozsef Kadlecsik:
5) Support updating element extensions when the set is full (fixes
netfilter bugzilla id 880).
6) Fix set match with 32-bits userspace / 64-bits kernel.
7) Indicate explicitly when /0 networks are supported in ipset.
8) Simplify cidr handling for hash:*net* types.
9) Allocate the proper size of memory when /0 networks are supported.
10) Explicitly add padding elements to hash:net,net and hash:net,port,
because the elements must be u32 sized for the used hash function.
Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
You can pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
Thanks!
----------------------------------------------------------------
The following changes since commit 3a611e26e958b0372d2e7600b87bbb4a84c7704b:
net/smsc911x: Add minimal runtime PM support (2014-11-24 16:41:50 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master
for you to fetch changes up to cac3763967362ace7996532ad3933f493a928a1b:
netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net (2014-12-03 12:43:36 +0100)
----------------------------------------------------------------
Alvaro Neira (2):
netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functions
netfilter: nf_tables_bridge: set the pktinfo for IPv4/IPv6 traffic
Florian Westphal (2):
netfilter: xt_recent: relax ip_pkt_list_tot restrictions
netfilter: conntrack: avoid zeroing timer
Jozsef Kadlecsik (6):
netfilter: ipset: Support updating extensions when the set is full
netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
netfilter: ipset: Indicate when /0 networks are supported
netfilter: ipset: Simplify cidr handling for hash:*net* types
netfilter: ipset: Allocate the proper size of memory when /0 networks are supported
netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
Pablo Neira Ayuso (1):
netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module
Steven Noonan (1):
netfilter: nf_log_ipv6: correct typo in module description
include/net/netfilter/ipv4/nf_nat_redirect.h | 9 --
include/net/netfilter/ipv6/nf_nat_redirect.h | 8 --
include/net/netfilter/nf_conntrack.h | 15 +--
include/net/netfilter/nf_nat_redirect.h | 12 +++
include/net/netfilter/nf_tables_bridge.h | 7 ++
include/uapi/linux/netfilter/ipset/ip_set.h | 8 +-
include/uapi/linux/netfilter/xt_set.h | 13 ++-
net/bridge/netfilter/nf_tables_bridge.c | 88 ++++++++++++++++-
net/bridge/netfilter/nft_reject_bridge.c | 52 +---------
net/ipv4/netfilter/Kconfig | 8 +-
net/ipv4/netfilter/Makefile | 1 -
net/ipv4/netfilter/nft_redir_ipv4.c | 2 +-
net/ipv6/netfilter/Kconfig | 8 +-
net/ipv6/netfilter/Makefile | 1 -
net/ipv6/netfilter/nf_log_ipv6.c | 2 +-
net/ipv6/netfilter/nf_nat_redirect_ipv6.c | 75 ---------------
net/ipv6/netfilter/nft_redir_ipv6.c | 2 +-
net/netfilter/Kconfig | 10 +-
net/netfilter/Makefile | 1 +
net/netfilter/ipset/ip_set_hash_gen.h | 101 +++++++++-----------
net/netfilter/ipset/ip_set_hash_netiface.c | 1 +
net/netfilter/ipset/ip_set_hash_netnet.c | 2 +
net/netfilter/ipset/ip_set_hash_netportnet.c | 2 +
net/netfilter/nf_conntrack_core.c | 11 +--
.../nf_nat_redirect.c} | 47 ++++++++-
net/netfilter/xt_REDIRECT.c | 3 +-
net/netfilter/xt_recent.c | 64 +++++++++----
net/netfilter/xt_set.c | 73 +++++++++++++-
28 files changed, 372 insertions(+), 254 deletions(-)
delete mode 100644 include/net/netfilter/ipv4/nf_nat_redirect.h
delete mode 100644 include/net/netfilter/ipv6/nf_nat_redirect.h
create mode 100644 include/net/netfilter/nf_nat_redirect.h
create mode 100644 include/net/netfilter/nf_tables_bridge.h
delete mode 100644 net/ipv6/netfilter/nf_nat_redirect_ipv6.c
rename net/{ipv4/netfilter/nf_nat_redirect_ipv4.c => netfilter/nf_nat_redirect.c} (67%)
^ permalink raw reply
* [PATCH 09/12] netfilter: ipset: Indicate when /0 networks are supported
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipset/ip_set_hash_gen.h | 2 +-
net/netfilter/ipset/ip_set_hash_netiface.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index a12ee04..9428fa5 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -156,7 +156,7 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize)
#define SET_HOST_MASK(family) (family == AF_INET ? 32 : 128)
-#ifdef IP_SET_HASH_WITH_MULTI
+#ifdef IP_SET_HASH_WITH_NET0
#define NLEN(family) (SET_HOST_MASK(family) + 1)
#else
#define NLEN(family) SET_HOST_MASK(family)
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index 35dd358..758b002 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -115,6 +115,7 @@ iface_add(struct rb_root *root, const char **iface)
#define IP_SET_HASH_WITH_NETS
#define IP_SET_HASH_WITH_RBTREE
#define IP_SET_HASH_WITH_MULTI
+#define IP_SET_HASH_WITH_NET0
#define STREQ(a, b) (strcmp(a, b) == 0)
--
1.7.10.4
^ permalink raw reply related
* [PATCH 07/12] netfilter: ipset: Support updating extensions when the set is full
From: Pablo Neira Ayuso @ 2014-12-03 12:55 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.
Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipset/ip_set_hash_gen.h | 40 ++++++++++++++-------------------
1 file changed, 17 insertions(+), 23 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index fee7c64e..a12ee04 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -633,29 +633,6 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
bool flag_exist = flags & IPSET_FLAG_EXIST;
u32 key, multi = 0;
- if (h->elements >= h->maxelem && SET_WITH_FORCEADD(set)) {
- rcu_read_lock_bh();
- t = rcu_dereference_bh(h->table);
- key = HKEY(value, h->initval, t->htable_bits);
- n = hbucket(t,key);
- if (n->pos) {
- /* Choosing the first entry in the array to replace */
- j = 0;
- goto reuse_slot;
- }
- rcu_read_unlock_bh();
- }
- if (SET_WITH_TIMEOUT(set) && h->elements >= h->maxelem)
- /* FIXME: when set is full, we slow down here */
- mtype_expire(set, h, NLEN(set->family), set->dsize);
-
- if (h->elements >= h->maxelem) {
- if (net_ratelimit())
- pr_warn("Set %s is full, maxelem %u reached\n",
- set->name, h->maxelem);
- return -IPSET_ERR_HASH_FULL;
- }
-
rcu_read_lock_bh();
t = rcu_dereference_bh(h->table);
key = HKEY(value, h->initval, t->htable_bits);
@@ -680,6 +657,23 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
j != AHASH_MAX(h) + 1)
j = i;
}
+ if (h->elements >= h->maxelem && SET_WITH_FORCEADD(set) && n->pos) {
+ /* Choosing the first entry in the array to replace */
+ j = 0;
+ goto reuse_slot;
+ }
+ if (SET_WITH_TIMEOUT(set) && h->elements >= h->maxelem)
+ /* FIXME: when set is full, we slow down here */
+ mtype_expire(set, h, NLEN(set->family), set->dsize);
+
+ if (h->elements >= h->maxelem) {
+ if (net_ratelimit())
+ pr_warn("Set %s is full, maxelem %u reached\n",
+ set->name, h->maxelem);
+ ret = -IPSET_ERR_HASH_FULL;
+ goto out;
+ }
+
reuse_slot:
if (j != AHASH_MAX(h) + 1) {
/* Fill out reused slot */
--
1.7.10.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox