* Re: [PATCH v2] ipv6: Not to probe neighbourless routes
From: David Miller @ 2019-08-28 19:55 UTC (permalink / raw)
To: wang.yi59
Cc: kuznet, yoshfuji, netdev, linux-kernel, xue.zhihong, wang.liang82,
cheng.lin130
In-Reply-To: <1566958765-1686-1-git-send-email-wang.yi59@zte.com.cn>
I am tossing this patch.
Resubmit it when you test it properly on current kernels.
^ permalink raw reply
* Re: [PATCH net-next v4 3/3] dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the new phylink API
From: David Miller @ 2019-08-28 19:56 UTC (permalink / raw)
To: matthias.bgg
Cc: opensource, john, sean.wang, nelson.chang, netdev,
linux-arm-kernel, linux-mediatek, linux-mips, linux, frank-w, sr
In-Reply-To: <e45565b1-bb63-66af-16f6-5c7c1094dd67@gmail.com>
From: Matthias Brugger <matthias.bgg@gmail.com>
Date: Wed, 28 Aug 2019 11:29:45 +0200
> Thanks for taking this patch. For the next time, please make sure that dts[i]
> patches are independent from the binding description, as dts[i] should go
> through my tree. No problem for this round, just saying for the future.
That's not always possible nor reasonable, to be quite honest.
^ permalink raw reply
* Re: [PATCH net-next v2 3/3] dpaa2-eth: Add pause frame support
From: David Miller @ 2019-08-28 20:06 UTC (permalink / raw)
To: andrew; +Cc: ruxandra.radulescu, netdev, ioana.ciornei
In-Reply-To: <20190828115250.GA32178@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 28 Aug 2019 13:52:50 +0200
>> Clearing the ASYM_PAUSE flag only means we tell the firmware we want
>> both Rx and Tx pause to be enabled in the beginning. User can still set
>> an asymmetric config (i.e. only Rx pause or only Tx pause to be enabled)
>> if needed.
>>
>> The truth table is like this:
>>
>> PAUSE | ASYM_PAUSE | Rx pause | Tx pause
>> ----------------------------------------
>> 0 | 0 | disabled | disabled
>> 0 | 1 | disabled | enabled
>> 1 | 0 | enabled | enabled
>> 1 | 1 | enabled | disabled
>
> Hi Ioana
>
> Ah, that is not intuitive. Please add a comment, and maybe this table
> to the commit message.
Isn't this the same truth table as for the pause bits in the usual MII
registers?
^ permalink raw reply
* [PATCH] net: spider_net: Use struct_size() helper
From: Gustavo A. R. Silva @ 2019-08-28 20:21 UTC (permalink / raw)
To: Ishizaki Kou, David S. Miller; +Cc: netdev, linux-kernel, Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct spider_net_card {
...
struct spider_net_descr darray[0];
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
So, replace the following form:
sizeof(struct spider_net_card) + (tx_descriptors + rx_descriptors) * sizeof(struct spider_net_descr)
with:
struct_size(card, darray, tx_descriptors + rx_descriptors)
Notice that, in this case, variable alloc_size is not necessary, hence it
is removed.
Building: allmodconfig powerpc.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/ethernet/toshiba/spider_net.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/toshiba/spider_net.c b/drivers/net/ethernet/toshiba/spider_net.c
index 0f346761a2b2..538e70810d3d 100644
--- a/drivers/net/ethernet/toshiba/spider_net.c
+++ b/drivers/net/ethernet/toshiba/spider_net.c
@@ -2311,11 +2311,9 @@ spider_net_alloc_card(void)
{
struct net_device *netdev;
struct spider_net_card *card;
- size_t alloc_size;
- alloc_size = sizeof(struct spider_net_card) +
- (tx_descriptors + rx_descriptors) * sizeof(struct spider_net_descr);
- netdev = alloc_etherdev(alloc_size);
+ netdev = alloc_etherdev(struct_size(card, darray,
+ tx_descriptors + rx_descriptors));
if (!netdev)
return NULL;
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 0/9] r8169: add support for RTL8125
From: Heiner Kallweit @ 2019-08-28 20:23 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
This series adds support for the 2.5Gbps chip RTl8125. It can be found
on PCIe network cards, and on an increasing number of consumer gaming
mainboards. Series is partially based on the r8125 vendor driver.
Tested with a Delock 89531 PCIe card against a Netgear GS110MX
Multi-Gig switch.
Firmware isn't strictly needed, but on some systems there may be
compatibility issues w/o firmware. Firmware has been submitted to
linux-firmware.
v2:
- split first patch into 6 smaller ones to facilitate bisecting
Heiner Kallweit (9):
r8169: change interrupt mask type to u32
r8169: restrict rtl_is_8168evl_up to RTL8168 chip versions
r8169: factor out reading MAC address from registers
r8169: move disabling interrupt coalescing to RTL8169/RTL8168 init
r8169: read common register for PCI commit
r8169: don't use bit LastFrag in tx descriptor after send
r8169: add support for RTL8125
r8169: add RTL8125 PHY initialization
r8169: add support for EEE on RTL8125
drivers/net/ethernet/realtek/Kconfig | 9 +-
drivers/net/ethernet/realtek/r8169_main.c | 464 ++++++++++++++++++++--
2 files changed, 443 insertions(+), 30 deletions(-)
--
2.23.0
^ permalink raw reply
* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
From: Andrii Nakryiko @ 2019-08-28 20:23 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Toke Høiland-Jørgensen, Stephen Hemminger,
Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
Yonghong Song, David Miller, Networking, bpf, Anton Protopopov,
Stanislav Fomichev, Yoel Caspersen
In-Reply-To: <20190823122713.73450a4b@carbon>
On Fri, Aug 23, 2019 at 3:27 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 21 Aug 2019 13:30:09 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > iproute2 uses its own bpf loader to load eBPF programs, which has
> > > evolved separately from libbpf. Since we are now standardising on
> > > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > > feature incompatibilities with libbpf-based loaders. In particular,
> > > iproute2 has its own (expanded) version of the map definition struct,
> > > which makes it difficult to write programs that can be loaded with both
> > > custom loaders and iproute2.
> > >
> > > This series seeks to address this by converting iproute2 to using libbpf
> > > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > > get some feedback on whether people think this is the right direction.
> > >
> > > What this series does is the following:
> > >
> > > - Updates the libbpf map definition struct to match that of iproute2
> > > (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1]. I think instead of emulating
> > iproute2 way of matching everything based on user-specified internal
> > IDs, which doesn't provide good user experience and is quite easy to
> > get wrong, we should support same scenarios with better declarative
> > syntax and in a less error-prone way. I believe we can do that by
> > relying on BTF more heavily (again, please check some of my proposals
> > in [0], [1], and discussion with Daniel in those threads). It will
> > feel more natural and be more straightforward to follow. It would be
> > great if you can lend a hand in implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
> >
> > [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
> > [1] https://www.spinics.net/lists/bpf/msg03976.html
> >
> > > - Adds functionality to libbpf to support automatic pinning of maps when
> > > loading an eBPF program, while re-using pinned maps if they already
> > > exist (patches 2-3).
>
> For production use-cases, libbpf really need an easier higher-level API
> for re-using pinned maps, for establishing shared maps between
> programs. The existing libbpf API bpf_object__pin_maps() and
> bpf_object__unpin_maps(), which don't re-use pinned maps, are not
> really usable, because they pin/unpin ALL maps in the ELF file.
>
> What users really need is an easy way to specify, on a per map basis,
> what kind of pinning and reuse/sharing they want. E.g. like iproute2
> have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
> be nice for XDP).
I totally agree and I think this is easy to add both for BTF-defined
and "classic" bpf_map_def maps. Daniel mentioned in one of the
previous threads that in practice object-scope doesn't seem to be
used, so I'd say we should start with no-pinning + global pinning as
two initial supported values for pinning attribute. ifindex-scope is
interesting, but I'd love to hear a bit more about the use cases.
> Today users have to split/reimplement bpf_prog_load_xattr(), and
> use/add bpf_map__reuse_fd(). Which is that I ended doing for
Honestly, bpf_prog_load_xattr() existence seems redundant to me. It's
basically just bpf_object__open + bpf_object__load. There is a piece
in the middle with "guessing" program types, but it should just be
moved into bpf_object__open and happen by default. Using open + load
gives more control and isn't really harder than bpf_prog_load_xattr.
bpf_prog_load_xattr which might be slightly more convenient for simple
use case, but falls apart immediately if you need to tune anything
before load.
> xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
> extra code[3] that should have been hidden inside libbpf. And worse,
> in this solution[4] the maps for reuse-pinning is specified in the code
> by name. Thus, they cannot use a generic loader. That I why, I want
> to mark the maps via a pinning member, like iproute2.
>
> I really hope this moves in a practical direction, as I have the next
> production request lined up (also from an ISP), and I hate to have to
> advice them to choose the same route as [3].
It seems to me that map pinning doesn't need much discussion at this
point, let's start with no-pinning + global pinning. To accommodate
pinning at custom root, bpf_object__open_xattr should accept extra
argument with non-default pinning root path. That should solve your
case completely, shouldn't it? Ultimately, with BTF-defined maps it
should be possible to specify custom pinning path on per-map basis for
cases where user needs ultimate non-uniform manual control.
>
>
> [2] https://github.com/xdp-project/xdp-cpumap-tc/
> [3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
> [4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH v2] bonding: force enable lacp port after link state recovery for 802.3ad
From: David Miller @ 2019-08-28 20:28 UTC (permalink / raw)
To: zhangsha.zhang
Cc: j.vosburgh, vfalico, andy, netdev, linux-kernel, yuehaibing,
hunongda, alex.chen
In-Reply-To: <20190823034209.14596-1-zhangsha.zhang@huawei.com>
You've had enough time to respon to my feedback question.
I'm tossing this patch.
^ permalink raw reply
* [PATCH net-next v2 1/9] r8169: change interrupt mask type to u32
From: Heiner Kallweit @ 2019-08-28 20:24 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
RTL8125 uses a 32 bit interrupt mask even though only bits in the
lower 16 bits are used. Change interrupt mask size to u32 to be
prepared and reintroduce helper rtl_get_events.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index faa4041cf..bf00c3d8f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -645,7 +645,7 @@ struct rtl8169_private {
struct page *Rx_databuff[NUM_RX_DESC]; /* Rx data buffers */
struct ring_info tx_skb[NUM_TX_DESC]; /* Tx data buffers */
u16 cp_cmd;
- u16 irq_mask;
+ u32 irq_mask;
struct clk *clk;
struct {
@@ -1313,7 +1313,12 @@ static u8 rtl8168d_efuse_read(struct rtl8169_private *tp, int reg_addr)
RTL_R32(tp, EFUSEAR) & EFUSEAR_DATA_MASK : ~0;
}
-static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
+static u32 rtl_get_events(struct rtl8169_private *tp)
+{
+ return RTL_R16(tp, IntrStatus);
+}
+
+static void rtl_ack_events(struct rtl8169_private *tp, u32 bits)
{
RTL_W16(tp, IntrStatus, bits);
}
@@ -1337,7 +1342,7 @@ static void rtl_irq_enable(struct rtl8169_private *tp)
static void rtl8169_irq_mask_and_ack(struct rtl8169_private *tp)
{
rtl_irq_disable(tp);
- rtl_ack_events(tp, 0xffff);
+ rtl_ack_events(tp, 0xffffffff);
/* PCI commit */
RTL_R8(tp, ChipCmd);
}
@@ -5854,9 +5859,10 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
{
struct rtl8169_private *tp = dev_instance;
- u16 status = RTL_R16(tp, IntrStatus);
+ u32 status = rtl_get_events(tp);
- if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask))
+ if (!tp->irq_enabled || (status & 0xffff) == 0xffff ||
+ !(status & tp->irq_mask))
return IRQ_NONE;
if (unlikely(status & SYSErr)) {
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 2/9] r8169: restrict rtl_is_8168evl_up to RTL8168 chip versions
From: Heiner Kallweit @ 2019-08-28 20:24 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
Extend helper rtl_is_8168evl_up to properly work once we add
mac version numbers >51 for RTL8125.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index bf00c3d8f..e9d900c11 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -730,7 +730,8 @@ static void rtl_tx_performance_tweak(struct rtl8169_private *tp, u16 force)
static bool rtl_is_8168evl_up(struct rtl8169_private *tp)
{
return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
- tp->mac_version != RTL_GIGA_MAC_VER_39;
+ tp->mac_version != RTL_GIGA_MAC_VER_39 &&
+ tp->mac_version <= RTL_GIGA_MAC_VER_51;
}
static bool rtl_supports_eee(struct rtl8169_private *tp)
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 5/9] r8169: read common register for PCI commit
From: Heiner Kallweit @ 2019-08-28 20:26 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
RTL8125 uses a different register number for IntrMask.
To net have side effects by reading a random register let's
use a register that is the same on all supported chip families.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index dc799528f..652bacf62 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5135,7 +5135,7 @@ static void rtl_hw_start(struct rtl8169_private *tp)
rtl_lock_config_regs(tp);
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
- RTL_R8(tp, IntrMask);
+ RTL_R16(tp, CPlusCmd);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
rtl_init_rxcfg(tp);
rtl_set_tx_config_registers(tp);
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 6/9] r8169: don't use bit LastFrag in tx descriptor after send
From: Heiner Kallweit @ 2019-08-28 20:27 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
On RTL8125 this bit is always cleared after send. Therefore check for
tx_skb->skb being set what is functionally equivalent.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 652bacf62..4489cd9f2 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5713,7 +5713,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb,
tp->TxDescArray + entry);
- if (status & LastFrag) {
+ if (tx_skb->skb) {
pkts_compl++;
bytes_compl += tx_skb->skb->len;
napi_consume_skb(tx_skb->skb, budget);
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 7/9] r8169: add support for RTL8125
From: Heiner Kallweit @ 2019-08-28 20:28 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
This adds support for 2.5Gbps chip RTL8125, it's partially based on the
r8125 vendor driver. Tested with a Delock 89531 PCIe card against a
Netgear GS110MX Multi-Gig switch. Firmware isn't strictly needed,
but on some systems there may be compatibility issues w/o firmware.
Firmware has been submitted to linux-firmware.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/Kconfig | 9 +-
drivers/net/ethernet/realtek/r8169_main.c | 274 ++++++++++++++++++++--
2 files changed, 265 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/realtek/Kconfig b/drivers/net/ethernet/realtek/Kconfig
index b18e7a91d..5e0b9d2f1 100644
--- a/drivers/net/ethernet/realtek/Kconfig
+++ b/drivers/net/ethernet/realtek/Kconfig
@@ -96,14 +96,19 @@ config 8139_OLD_RX_RESET
old RX-reset behavior. If unsure, say N.
config R8169
- tristate "Realtek 8169 gigabit ethernet support"
+ tristate "Realtek 8169/8168/8101/8125 ethernet support"
depends on PCI
select FW_LOADER
select CRC32
select PHYLIB
select REALTEK_PHY
---help---
- Say Y here if you have a Realtek 8169 PCI Gigabit Ethernet adapter.
+ Say Y here if you have a Realtek Ethernet adapter belonging to
+ the following families:
+ RTL8169 Gigabit Ethernet
+ RTL8168 Gigabit Ethernet
+ RTL8101 Fast Ethernet
+ RTL8125 2.5GBit Ethernet
To compile this driver as a module, choose M here: the module
will be called r8169. This is recommended.
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 4489cd9f2..4d1779e39 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -135,6 +135,8 @@ enum mac_version {
RTL_GIGA_MAC_VER_49,
RTL_GIGA_MAC_VER_50,
RTL_GIGA_MAC_VER_51,
+ RTL_GIGA_MAC_VER_60,
+ RTL_GIGA_MAC_VER_61,
RTL_GIGA_MAC_NONE
};
@@ -200,6 +202,8 @@ static const struct {
[RTL_GIGA_MAC_VER_49] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_50] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_51] = {"RTL8168ep/8111ep" },
+ [RTL_GIGA_MAC_VER_60] = {"RTL8125" },
+ [RTL_GIGA_MAC_VER_61] = {"RTL8125" },
};
static const struct pci_device_id rtl8169_pci_tbl[] = {
@@ -220,6 +224,8 @@ static const struct pci_device_id rtl8169_pci_tbl[] = {
{ PCI_VDEVICE(USR, 0x0116) },
{ PCI_VENDOR_ID_LINKSYS, 0x1032, PCI_ANY_ID, 0x0024 },
{ 0x0001, 0x8168, PCI_ANY_ID, 0x2410 },
+ { PCI_VDEVICE(REALTEK, 0x8125) },
+ { PCI_VDEVICE(REALTEK, 0x3000) },
{}
};
@@ -384,6 +390,19 @@ enum rtl8168_registers {
#define EARLY_TALLY_EN (1 << 16)
};
+enum rtl8125_registers {
+ IntrMask_8125 = 0x38,
+ IntrStatus_8125 = 0x3c,
+ TxPoll_8125 = 0x90,
+ MAC0_BKP = 0x19e0,
+};
+
+#define RX_VLAN_INNER_8125 BIT(22)
+#define RX_VLAN_OUTER_8125 BIT(23)
+#define RX_VLAN_8125 (RX_VLAN_INNER_8125 | RX_VLAN_OUTER_8125)
+
+#define RX_FETCH_DFLT_8125 (8 << 27)
+
enum rtl_register_content {
/* InterruptStatusBits */
SYSErr = 0x8000,
@@ -727,6 +746,11 @@ static void rtl_tx_performance_tweak(struct rtl8169_private *tp, u16 force)
PCI_EXP_DEVCTL_READRQ, force);
}
+static bool rtl_is_8125(struct rtl8169_private *tp)
+{
+ return tp->mac_version >= RTL_GIGA_MAC_VER_60;
+}
+
static bool rtl_is_8168evl_up(struct rtl8169_private *tp)
{
return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
@@ -1023,7 +1047,7 @@ static void rtl_writephy(struct rtl8169_private *tp, int location, int val)
case RTL_GIGA_MAC_VER_31:
r8168dp_2_mdio_write(tp, location, val);
break;
- case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_61:
r8168g_mdio_write(tp, location, val);
break;
default:
@@ -1040,7 +1064,7 @@ static int rtl_readphy(struct rtl8169_private *tp, int location)
case RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
return r8168dp_2_mdio_read(tp, location);
- case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_61:
return r8168g_mdio_read(tp, location);
default:
return r8169_mdio_read(tp, location);
@@ -1324,17 +1348,26 @@ static u8 rtl8168d_efuse_read(struct rtl8169_private *tp, int reg_addr)
static u32 rtl_get_events(struct rtl8169_private *tp)
{
- return RTL_R16(tp, IntrStatus);
+ if (rtl_is_8125(tp))
+ return RTL_R32(tp, IntrStatus_8125);
+ else
+ return RTL_R16(tp, IntrStatus);
}
static void rtl_ack_events(struct rtl8169_private *tp, u32 bits)
{
- RTL_W16(tp, IntrStatus, bits);
+ if (rtl_is_8125(tp))
+ RTL_W32(tp, IntrStatus_8125, bits);
+ else
+ RTL_W16(tp, IntrStatus, bits);
}
static void rtl_irq_disable(struct rtl8169_private *tp)
{
- RTL_W16(tp, IntrMask, 0);
+ if (rtl_is_8125(tp))
+ RTL_W32(tp, IntrMask_8125, 0);
+ else
+ RTL_W16(tp, IntrMask, 0);
tp->irq_enabled = 0;
}
@@ -1345,7 +1378,10 @@ static void rtl_irq_disable(struct rtl8169_private *tp)
static void rtl_irq_enable(struct rtl8169_private *tp)
{
tp->irq_enabled = 1;
- RTL_W16(tp, IntrMask, tp->irq_mask);
+ if (rtl_is_8125(tp))
+ RTL_W32(tp, IntrMask_8125, tp->irq_mask);
+ else
+ RTL_W16(tp, IntrMask, tp->irq_mask);
}
static void rtl8169_irq_mask_and_ack(struct rtl8169_private *tp)
@@ -1410,7 +1446,6 @@ static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
{
- unsigned int i, tmp;
static const struct {
u32 opt;
u16 reg;
@@ -1423,20 +1458,25 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
{ WAKE_ANY, Config5, LanWake },
{ WAKE_MAGIC, Config3, MagicPacket }
};
+ unsigned int i, tmp = ARRAY_SIZE(cfg);
u8 options;
rtl_unlock_config_regs(tp);
if (rtl_is_8168evl_up(tp)) {
- tmp = ARRAY_SIZE(cfg) - 1;
+ tmp--;
if (wolopts & WAKE_MAGIC)
rtl_eri_set_bits(tp, 0x0dc, ERIAR_MASK_0100,
MagicPacket_v2);
else
rtl_eri_clear_bits(tp, 0x0dc, ERIAR_MASK_0100,
MagicPacket_v2);
- } else {
- tmp = ARRAY_SIZE(cfg);
+ } else if (rtl_is_8125(tp)) {
+ tmp--;
+ if (wolopts & WAKE_MAGIC)
+ r8168_mac_ocp_modify(tp, 0xc0b6, 0, BIT(0));
+ else
+ r8168_mac_ocp_modify(tp, 0xc0b6, BIT(0), 0);
}
for (i = 0; i < tmp; i++) {
@@ -1542,6 +1582,13 @@ static int rtl8169_set_features(struct net_device *dev,
else
rx_config &= ~(AcceptErr | AcceptRunt);
+ if (rtl_is_8125(tp)) {
+ if (features & NETIF_F_HW_VLAN_CTAG_RX)
+ rx_config |= RX_VLAN_8125;
+ else
+ rx_config &= ~RX_VLAN_8125;
+ }
+
RTL_W32(tp, RxConfig, rx_config);
if (features & NETIF_F_RXCSUM)
@@ -1549,10 +1596,12 @@ static int rtl8169_set_features(struct net_device *dev,
else
tp->cp_cmd &= ~RxChkSum;
- if (features & NETIF_F_HW_VLAN_CTAG_RX)
- tp->cp_cmd |= RxVlan;
- else
- tp->cp_cmd &= ~RxVlan;
+ if (!rtl_is_8125(tp)) {
+ if (features & NETIF_F_HW_VLAN_CTAG_RX)
+ tp->cp_cmd |= RxVlan;
+ else
+ tp->cp_cmd &= ~RxVlan;
+ }
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
RTL_R16(tp, CPlusCmd);
@@ -1851,6 +1900,9 @@ static int rtl_get_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
int i;
u16 w;
+ if (rtl_is_8125(tp))
+ return -EOPNOTSUPP;
+
memset(ec, 0, sizeof(*ec));
/* get rx/tx scale corresponding to current speed and CPlusCmd[0:1] */
@@ -1919,6 +1971,9 @@ static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
u16 w = 0, cp01;
int i;
+ if (rtl_is_8125(tp))
+ return -EOPNOTSUPP;
+
scale = rtl_coalesce_choose_scale(dev,
max(p[0].usecs, p[1].usecs) * 1000, &cp01);
if (IS_ERR(scale))
@@ -2065,6 +2120,10 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp)
u16 val;
u16 mac_version;
} mac_info[] = {
+ /* 8125 family. */
+ { 0x7cf, 0x608, RTL_GIGA_MAC_VER_60 },
+ { 0x7c8, 0x608, RTL_GIGA_MAC_VER_61 },
+
/* 8168EP family. */
{ 0x7cf, 0x502, RTL_GIGA_MAC_VER_51 },
{ 0x7cf, 0x501, RTL_GIGA_MAC_VER_50 },
@@ -3615,6 +3674,8 @@ static void rtl_hw_phy_config(struct net_device *dev)
[RTL_GIGA_MAC_VER_49] = rtl8168ep_1_hw_phy_config,
[RTL_GIGA_MAC_VER_50] = rtl8168ep_2_hw_phy_config,
[RTL_GIGA_MAC_VER_51] = rtl8168ep_2_hw_phy_config,
+ [RTL_GIGA_MAC_VER_60] = NULL,
+ [RTL_GIGA_MAC_VER_61] = NULL,
};
struct rtl8169_private *tp = netdev_priv(dev);
@@ -3742,6 +3803,8 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_48:
case RTL_GIGA_MAC_VER_50:
case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_60:
+ case RTL_GIGA_MAC_VER_61:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) & ~0x80);
break;
case RTL_GIGA_MAC_VER_40:
@@ -3771,6 +3834,8 @@ static void rtl_pll_power_up(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_48:
case RTL_GIGA_MAC_VER_50:
case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_60:
+ case RTL_GIGA_MAC_VER_61:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0xc0);
break;
case RTL_GIGA_MAC_VER_40:
@@ -3803,6 +3868,10 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
break;
+ case RTL_GIGA_MAC_VER_60 ... RTL_GIGA_MAC_VER_61:
+ RTL_W32(tp, RxConfig, RX_FETCH_DFLT_8125 | RX_VLAN_8125 |
+ RX_DMA_BURST);
+ break;
default:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
break;
@@ -5020,6 +5089,126 @@ static void rtl_hw_start_8106(struct rtl8169_private *tp)
rtl_hw_aspm_clkreq_enable(tp, true);
}
+DECLARE_RTL_COND(rtl_mac_ocp_e00e_cond)
+{
+ return r8168_mac_ocp_read(tp, 0xe00e) & BIT(13);
+}
+
+static void rtl_hw_start_8125_common(struct rtl8169_private *tp)
+{
+ rtl_pcie_state_l2l3_disable(tp);
+
+ RTL_W16(tp, 0x382, 0x221b);
+ RTL_W8(tp, 0x4500, 0);
+ RTL_W16(tp, 0x4800, 0);
+
+ /* disable UPS */
+ r8168_mac_ocp_modify(tp, 0xd40a, 0x0010, 0x0000);
+
+ RTL_W8(tp, Config1, RTL_R8(tp, Config1) & ~0x10);
+
+ r8168_mac_ocp_write(tp, 0xc140, 0xffff);
+ r8168_mac_ocp_write(tp, 0xc142, 0xffff);
+
+ r8168_mac_ocp_modify(tp, 0xd3e2, 0x0fff, 0x03a9);
+ r8168_mac_ocp_modify(tp, 0xd3e4, 0x00ff, 0x0000);
+ r8168_mac_ocp_modify(tp, 0xe860, 0x0000, 0x0080);
+
+ /* disable new tx descriptor format */
+ r8168_mac_ocp_modify(tp, 0xeb58, 0x0001, 0x0000);
+
+ r8168_mac_ocp_modify(tp, 0xe614, 0x0700, 0x0400);
+ r8168_mac_ocp_modify(tp, 0xe63e, 0x0c30, 0x0020);
+ r8168_mac_ocp_modify(tp, 0xc0b4, 0x0000, 0x000c);
+ r8168_mac_ocp_modify(tp, 0xeb6a, 0x00ff, 0x0033);
+ r8168_mac_ocp_modify(tp, 0xeb50, 0x03e0, 0x0040);
+ r8168_mac_ocp_modify(tp, 0xe056, 0x00f0, 0x0030);
+ r8168_mac_ocp_modify(tp, 0xe040, 0x1000, 0x0000);
+ r8168_mac_ocp_modify(tp, 0xe0c0, 0x4f0f, 0x4403);
+ r8168_mac_ocp_modify(tp, 0xe052, 0x0080, 0x0067);
+ r8168_mac_ocp_modify(tp, 0xc0ac, 0x0080, 0x1f00);
+ r8168_mac_ocp_modify(tp, 0xd430, 0x0fff, 0x047f);
+ r8168_mac_ocp_modify(tp, 0xe84c, 0x0000, 0x00c0);
+ r8168_mac_ocp_modify(tp, 0xea1c, 0x0004, 0x0000);
+ r8168_mac_ocp_modify(tp, 0xeb54, 0x0000, 0x0001);
+ udelay(1);
+ r8168_mac_ocp_modify(tp, 0xeb54, 0x0001, 0x0000);
+ RTL_W16(tp, 0x1880, RTL_R16(tp, 0x1880) & ~0x0030);
+
+ r8168_mac_ocp_write(tp, 0xe098, 0xc302);
+
+ rtl_udelay_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10);
+
+ RTL_W32(tp, MISC, RTL_R32(tp, MISC) & ~RXDV_GATED_EN);
+ udelay(10);
+}
+
+static void rtl_hw_start_8125_1(struct rtl8169_private *tp)
+{
+ static const struct ephy_info e_info_8125_1[] = {
+ { 0x01, 0xffff, 0xa812 },
+ { 0x09, 0xffff, 0x520c },
+ { 0x04, 0xffff, 0xd000 },
+ { 0x0d, 0xffff, 0xf702 },
+ { 0x0a, 0xffff, 0x8653 },
+ { 0x06, 0xffff, 0x001e },
+ { 0x08, 0xffff, 0x3595 },
+ { 0x20, 0xffff, 0x9455 },
+ { 0x21, 0xffff, 0x99ff },
+ { 0x02, 0xffff, 0x6046 },
+ { 0x29, 0xffff, 0xfe00 },
+ { 0x23, 0xffff, 0xab62 },
+
+ { 0x41, 0xffff, 0xa80c },
+ { 0x49, 0xffff, 0x520c },
+ { 0x44, 0xffff, 0xd000 },
+ { 0x4d, 0xffff, 0xf702 },
+ { 0x4a, 0xffff, 0x8653 },
+ { 0x46, 0xffff, 0x001e },
+ { 0x48, 0xffff, 0x3595 },
+ { 0x60, 0xffff, 0x9455 },
+ { 0x61, 0xffff, 0x99ff },
+ { 0x42, 0xffff, 0x6046 },
+ { 0x69, 0xffff, 0xfe00 },
+ { 0x63, 0xffff, 0xab62 },
+ };
+
+ rtl_set_def_aspm_entry_latency(tp);
+
+ /* disable aspm and clock request before access ephy */
+ rtl_hw_aspm_clkreq_enable(tp, false);
+ rtl_ephy_init(tp, e_info_8125_1);
+
+ rtl_hw_start_8125_common(tp);
+}
+
+static void rtl_hw_start_8125_2(struct rtl8169_private *tp)
+{
+ static const struct ephy_info e_info_8125_2[] = {
+ { 0x04, 0xffff, 0xd000 },
+ { 0x0a, 0xffff, 0x8653 },
+ { 0x23, 0xffff, 0xab66 },
+ { 0x20, 0xffff, 0x9455 },
+ { 0x21, 0xffff, 0x99ff },
+ { 0x29, 0xffff, 0xfe04 },
+
+ { 0x44, 0xffff, 0xd000 },
+ { 0x4a, 0xffff, 0x8653 },
+ { 0x63, 0xffff, 0xab66 },
+ { 0x60, 0xffff, 0x9455 },
+ { 0x61, 0xffff, 0x99ff },
+ { 0x69, 0xffff, 0xfe04 },
+ };
+
+ rtl_set_def_aspm_entry_latency(tp);
+
+ /* disable aspm and clock request before access ephy */
+ rtl_hw_aspm_clkreq_enable(tp, false);
+ rtl_ephy_init(tp, e_info_8125_2);
+
+ rtl_hw_start_8125_common(tp);
+}
+
static void rtl_hw_config(struct rtl8169_private *tp)
{
static const rtl_generic_fct hw_configs[] = {
@@ -5068,12 +5257,25 @@ static void rtl_hw_config(struct rtl8169_private *tp)
[RTL_GIGA_MAC_VER_49] = rtl_hw_start_8168ep_1,
[RTL_GIGA_MAC_VER_50] = rtl_hw_start_8168ep_2,
[RTL_GIGA_MAC_VER_51] = rtl_hw_start_8168ep_3,
+ [RTL_GIGA_MAC_VER_60] = rtl_hw_start_8125_1,
+ [RTL_GIGA_MAC_VER_61] = rtl_hw_start_8125_2,
};
if (hw_configs[tp->mac_version])
hw_configs[tp->mac_version](tp);
}
+static void rtl_hw_start_8125(struct rtl8169_private *tp)
+{
+ int i;
+
+ /* disable interrupt coalescing */
+ for (i = 0xa00; i < 0xb00; i += 4)
+ RTL_W32(tp, i, 0);
+
+ rtl_hw_config(tp);
+}
+
static void rtl_hw_start_8168(struct rtl8169_private *tp)
{
if (tp->mac_version == RTL_GIGA_MAC_VER_13 ||
@@ -5127,6 +5329,8 @@ static void rtl_hw_start(struct rtl8169_private *tp)
if (tp->mac_version <= RTL_GIGA_MAC_VER_06)
rtl_hw_start_8169(tp);
+ else if (rtl_is_8125(tp))
+ rtl_hw_start_8125(tp);
else
rtl_hw_start_8168(tp);
@@ -5510,6 +5714,14 @@ static bool rtl_chip_supports_csum_v2(struct rtl8169_private *tp)
}
}
+static void rtl8169_doorbell(struct rtl8169_private *tp)
+{
+ if (rtl_is_8125(tp))
+ RTL_W16(tp, TxPoll_8125, BIT(0));
+ else
+ RTL_W8(tp, TxPoll, NPQ);
+}
+
static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -5589,7 +5801,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
}
if (door_bell)
- RTL_W8(tp, TxPoll, NPQ);
+ rtl8169_doorbell(tp);
if (unlikely(stop_queue)) {
/* Sync with rtl_tx:
@@ -5751,7 +5963,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
* it is slow enough). -- FR
*/
if (tp->cur_tx != dirty_tx)
- RTL_W8(tp, TxPoll, NPQ);
+ rtl8169_doorbell(tp);
}
}
@@ -6473,6 +6685,8 @@ static void rtl_read_mac_address(struct rtl8169_private *tp,
value = rtl_eri_read(tp, 0xe4);
mac_addr[4] = (value >> 0) & 0xff;
mac_addr[5] = (value >> 8) & 0xff;
+ } else if (rtl_is_8125(tp)) {
+ rtl_read_mac_from_reg(tp, mac_addr, MAC0_BKP);
}
}
@@ -6570,6 +6784,31 @@ static void rtl_hw_init_8168g(struct rtl8169_private *tp)
rtl_udelay_loop_wait_high(tp, &rtl_link_list_ready_cond, 100, 42);
}
+static void rtl_hw_init_8125(struct rtl8169_private *tp)
+{
+ tp->ocp_base = OCP_STD_PHY_BASE;
+
+ RTL_W32(tp, MISC, RTL_R32(tp, MISC) | RXDV_GATED_EN);
+
+ if (!rtl_udelay_loop_wait_high(tp, &rtl_rxtx_empty_cond, 100, 42))
+ return;
+
+ RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) & ~(CmdTxEnb | CmdRxEnb));
+ msleep(1);
+ RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
+
+ r8168_mac_ocp_modify(tp, 0xe8de, BIT(14), 0);
+
+ if (!rtl_udelay_loop_wait_high(tp, &rtl_link_list_ready_cond, 100, 42))
+ return;
+
+ r8168_mac_ocp_write(tp, 0xc0aa, 0x07d0);
+ r8168_mac_ocp_write(tp, 0xc0a6, 0x0150);
+ r8168_mac_ocp_write(tp, 0xc01e, 0x5555);
+
+ rtl_udelay_loop_wait_high(tp, &rtl_link_list_ready_cond, 100, 42);
+}
+
static void rtl_hw_initialize(struct rtl8169_private *tp)
{
switch (tp->mac_version) {
@@ -6579,6 +6818,9 @@ static void rtl_hw_initialize(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_48:
rtl_hw_init_8168g(tp);
break;
+ case RTL_GIGA_MAC_VER_60 ... RTL_GIGA_MAC_VER_61:
+ rtl_hw_init_8125(tp);
+ break;
default:
break;
}
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 8/9] r8169: add RTL8125 PHY initialization
From: Heiner Kallweit @ 2019-08-28 20:28 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
This patch adds PHY initialization magic copied from the r8125 vendor
driver. In addition it supports loading the firmware for chip version
RTL_GIGA_MAC_VER_61.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 130 +++++++++++++++++++++-
1 file changed, 127 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 4d1779e39..99176a9a8 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -55,6 +55,7 @@
#define FIRMWARE_8168H_2 "rtl_nic/rtl8168h-2.fw"
#define FIRMWARE_8107E_1 "rtl_nic/rtl8107e-1.fw"
#define FIRMWARE_8107E_2 "rtl_nic/rtl8107e-2.fw"
+#define FIRMWARE_8125A_3 "rtl_nic/rtl8125a-3.fw"
#define R8169_MSG_DEFAULT \
(NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
@@ -203,7 +204,7 @@ static const struct {
[RTL_GIGA_MAC_VER_50] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_51] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_60] = {"RTL8125" },
- [RTL_GIGA_MAC_VER_61] = {"RTL8125" },
+ [RTL_GIGA_MAC_VER_61] = {"RTL8125", FIRMWARE_8125A_3},
};
static const struct pci_device_id rtl8169_pci_tbl[] = {
@@ -714,6 +715,7 @@ MODULE_FIRMWARE(FIRMWARE_8168H_1);
MODULE_FIRMWARE(FIRMWARE_8168H_2);
MODULE_FIRMWARE(FIRMWARE_8107E_1);
MODULE_FIRMWARE(FIRMWARE_8107E_2);
+MODULE_FIRMWARE(FIRMWARE_8125A_3);
static inline struct device *tp_to_dev(struct rtl8169_private *tp)
{
@@ -3619,6 +3621,128 @@ static void rtl8106e_hw_phy_config(struct rtl8169_private *tp)
rtl_eri_write(tp, 0x1d0, ERIAR_MASK_0011, 0x0000);
}
+static void rtl8125_1_hw_phy_config(struct rtl8169_private *tp)
+{
+ struct phy_device *phydev = tp->phydev;
+
+ phy_modify_paged(phydev, 0xad4, 0x10, 0x03ff, 0x0084);
+ phy_modify_paged(phydev, 0xad4, 0x17, 0x0000, 0x0010);
+ phy_modify_paged(phydev, 0xad1, 0x13, 0x03ff, 0x0006);
+ phy_modify_paged(phydev, 0xad3, 0x11, 0x003f, 0x0006);
+ phy_modify_paged(phydev, 0xac0, 0x14, 0x0000, 0x1100);
+ phy_modify_paged(phydev, 0xac8, 0x15, 0xf000, 0x7000);
+ phy_modify_paged(phydev, 0xad1, 0x14, 0x0000, 0x0400);
+ phy_modify_paged(phydev, 0xad1, 0x15, 0x0000, 0x03ff);
+ phy_modify_paged(phydev, 0xad1, 0x16, 0x0000, 0x03ff);
+
+ phy_write(phydev, 0x1f, 0x0a43);
+ phy_write(phydev, 0x13, 0x80ea);
+ phy_modify(phydev, 0x14, 0xff00, 0xc400);
+ phy_write(phydev, 0x13, 0x80eb);
+ phy_modify(phydev, 0x14, 0x0700, 0x0300);
+ phy_write(phydev, 0x13, 0x80f8);
+ phy_modify(phydev, 0x14, 0xff00, 0x1c00);
+ phy_write(phydev, 0x13, 0x80f1);
+ phy_modify(phydev, 0x14, 0xff00, 0x3000);
+ phy_write(phydev, 0x13, 0x80fe);
+ phy_modify(phydev, 0x14, 0xff00, 0xa500);
+ phy_write(phydev, 0x13, 0x8102);
+ phy_modify(phydev, 0x14, 0xff00, 0x5000);
+ phy_write(phydev, 0x13, 0x8105);
+ phy_modify(phydev, 0x14, 0xff00, 0x3300);
+ phy_write(phydev, 0x13, 0x8100);
+ phy_modify(phydev, 0x14, 0xff00, 0x7000);
+ phy_write(phydev, 0x13, 0x8104);
+ phy_modify(phydev, 0x14, 0xff00, 0xf000);
+ phy_write(phydev, 0x13, 0x8106);
+ phy_modify(phydev, 0x14, 0xff00, 0x6500);
+ phy_write(phydev, 0x13, 0x80dc);
+ phy_modify(phydev, 0x14, 0xff00, 0xed00);
+ phy_write(phydev, 0x13, 0x80df);
+ phy_set_bits(phydev, 0x14, BIT(8));
+ phy_write(phydev, 0x13, 0x80e1);
+ phy_clear_bits(phydev, 0x14, BIT(8));
+ phy_write(phydev, 0x1f, 0x0000);
+
+ phy_modify_paged(phydev, 0xbf0, 0x13, 0x003f, 0x0038);
+ phy_write_paged(phydev, 0xa43, 0x13, 0x819f);
+ phy_write_paged(phydev, 0xa43, 0x14, 0xd0b6);
+
+ phy_write_paged(phydev, 0xbc3, 0x12, 0x5555);
+ phy_modify_paged(phydev, 0xbf0, 0x15, 0x0e00, 0x0a00);
+ phy_modify_paged(phydev, 0xa5c, 0x10, 0x0400, 0x0000);
+ phy_modify_paged(phydev, 0xa44, 0x11, 0x0000, 0x0800);
+}
+
+static void rtl8125_2_hw_phy_config(struct rtl8169_private *tp)
+{
+ struct phy_device *phydev = tp->phydev;
+ int i;
+
+ phy_modify_paged(phydev, 0xad4, 0x17, 0x0000, 0x0010);
+ phy_modify_paged(phydev, 0xad1, 0x13, 0x03ff, 0x03ff);
+ phy_modify_paged(phydev, 0xad3, 0x11, 0x003f, 0x0006);
+ phy_modify_paged(phydev, 0xac0, 0x14, 0x1100, 0x0000);
+ phy_modify_paged(phydev, 0xacc, 0x10, 0x0003, 0x0002);
+ phy_modify_paged(phydev, 0xad4, 0x10, 0x00e7, 0x0044);
+ phy_modify_paged(phydev, 0xac1, 0x12, 0x0080, 0x0000);
+ phy_modify_paged(phydev, 0xac8, 0x10, 0x0300, 0x0000);
+ phy_modify_paged(phydev, 0xac5, 0x17, 0x0007, 0x0002);
+ phy_write_paged(phydev, 0xad4, 0x16, 0x00a8);
+ phy_write_paged(phydev, 0xac5, 0x16, 0x01ff);
+ phy_modify_paged(phydev, 0xac8, 0x15, 0x00f0, 0x0030);
+
+ phy_write(phydev, 0x1f, 0x0b87);
+ phy_write(phydev, 0x16, 0x80a2);
+ phy_write(phydev, 0x17, 0x0153);
+ phy_write(phydev, 0x16, 0x809c);
+ phy_write(phydev, 0x17, 0x0153);
+ phy_write(phydev, 0x1f, 0x0000);
+
+ phy_write(phydev, 0x1f, 0x0a43);
+ phy_write(phydev, 0x13, 0x81B3);
+ phy_write(phydev, 0x14, 0x0043);
+ phy_write(phydev, 0x14, 0x00A7);
+ phy_write(phydev, 0x14, 0x00D6);
+ phy_write(phydev, 0x14, 0x00EC);
+ phy_write(phydev, 0x14, 0x00F6);
+ phy_write(phydev, 0x14, 0x00FB);
+ phy_write(phydev, 0x14, 0x00FD);
+ phy_write(phydev, 0x14, 0x00FF);
+ phy_write(phydev, 0x14, 0x00BB);
+ phy_write(phydev, 0x14, 0x0058);
+ phy_write(phydev, 0x14, 0x0029);
+ phy_write(phydev, 0x14, 0x0013);
+ phy_write(phydev, 0x14, 0x0009);
+ phy_write(phydev, 0x14, 0x0004);
+ phy_write(phydev, 0x14, 0x0002);
+ for (i = 0; i < 25; i++)
+ phy_write(phydev, 0x14, 0x0000);
+
+ phy_write(phydev, 0x13, 0x8257);
+ phy_write(phydev, 0x14, 0x020F);
+
+ phy_write(phydev, 0x13, 0x80EA);
+ phy_write(phydev, 0x14, 0x7843);
+ phy_write(phydev, 0x1f, 0x0000);
+
+ rtl_apply_firmware(tp);
+
+ phy_modify_paged(phydev, 0xd06, 0x14, 0x0000, 0x2000);
+
+ phy_write(phydev, 0x1f, 0x0a43);
+ phy_write(phydev, 0x13, 0x81a2);
+ phy_set_bits(phydev, 0x14, BIT(8));
+ phy_write(phydev, 0x1f, 0x0000);
+
+ phy_modify_paged(phydev, 0xb54, 0x16, 0xff00, 0xdb00);
+ phy_modify_paged(phydev, 0xa45, 0x12, 0x0001, 0x0000);
+ phy_modify_paged(phydev, 0xa5d, 0x12, 0x0000, 0x0020);
+ phy_modify_paged(phydev, 0xad4, 0x17, 0x0010, 0x0000);
+ phy_modify_paged(phydev, 0xa86, 0x15, 0x0001, 0x0000);
+ phy_modify_paged(phydev, 0xa44, 0x11, 0x0000, 0x0800);
+}
+
static void rtl_hw_phy_config(struct net_device *dev)
{
static const rtl_generic_fct phy_configs[] = {
@@ -3674,8 +3798,8 @@ static void rtl_hw_phy_config(struct net_device *dev)
[RTL_GIGA_MAC_VER_49] = rtl8168ep_1_hw_phy_config,
[RTL_GIGA_MAC_VER_50] = rtl8168ep_2_hw_phy_config,
[RTL_GIGA_MAC_VER_51] = rtl8168ep_2_hw_phy_config,
- [RTL_GIGA_MAC_VER_60] = NULL,
- [RTL_GIGA_MAC_VER_61] = NULL,
+ [RTL_GIGA_MAC_VER_60] = rtl8125_1_hw_phy_config,
+ [RTL_GIGA_MAC_VER_61] = rtl8125_2_hw_phy_config,
};
struct rtl8169_private *tp = netdev_priv(dev);
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 9/9] r8169: add support for EEE on RTL8125
From: Heiner Kallweit @ 2019-08-28 20:29 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
This adds EEE support for RTL8125 based on the vendor driver.
Supported is EEE for 100Mbps and 1Gbps. Realtek recommended to not yet
enable EEE for 2.5Gbps due to potential compatibility issues. Also
ethtool doesn't support yet controlling EEE for 2.5Gbps and 5Gbps.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 24 +++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 99176a9a8..f337f81e4 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2271,6 +2271,12 @@ static void rtl8168_config_eee_mac(struct rtl8169_private *tp)
rtl_eri_set_bits(tp, 0x1b0, ERIAR_MASK_1111, 0x0003);
}
+static void rtl8125_config_eee_mac(struct rtl8169_private *tp)
+{
+ r8168_mac_ocp_modify(tp, 0xe040, 0, BIT(1) | BIT(0));
+ r8168_mac_ocp_modify(tp, 0xeb62, 0, BIT(2) | BIT(1));
+}
+
static void rtl8168f_config_eee_phy(struct rtl8169_private *tp)
{
struct phy_device *phydev = tp->phydev;
@@ -2301,6 +2307,16 @@ static void rtl8168h_config_eee_phy(struct rtl8169_private *tp)
phy_modify_paged(phydev, 0xa42, 0x14, 0x0000, 0x0080);
}
+static void rtl8125_config_eee_phy(struct rtl8169_private *tp)
+{
+ struct phy_device *phydev = tp->phydev;
+
+ rtl8168h_config_eee_phy(tp);
+
+ phy_modify_paged(phydev, 0xa6d, 0x12, 0x0001, 0x0000);
+ phy_modify_paged(phydev, 0xa6d, 0x14, 0x0010, 0x0000);
+}
+
static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
{
static const struct phy_reg phy_reg_init[] = {
@@ -3672,6 +3688,9 @@ static void rtl8125_1_hw_phy_config(struct rtl8169_private *tp)
phy_modify_paged(phydev, 0xbf0, 0x15, 0x0e00, 0x0a00);
phy_modify_paged(phydev, 0xa5c, 0x10, 0x0400, 0x0000);
phy_modify_paged(phydev, 0xa44, 0x11, 0x0000, 0x0800);
+
+ rtl8125_config_eee_phy(tp);
+ rtl_enable_eee(tp);
}
static void rtl8125_2_hw_phy_config(struct rtl8169_private *tp)
@@ -3741,6 +3760,9 @@ static void rtl8125_2_hw_phy_config(struct rtl8169_private *tp)
phy_modify_paged(phydev, 0xad4, 0x17, 0x0010, 0x0000);
phy_modify_paged(phydev, 0xa86, 0x15, 0x0001, 0x0000);
phy_modify_paged(phydev, 0xa44, 0x11, 0x0000, 0x0800);
+
+ rtl8125_config_eee_phy(tp);
+ rtl_enable_eee(tp);
}
static void rtl_hw_phy_config(struct net_device *dev)
@@ -5263,6 +5285,8 @@ static void rtl_hw_start_8125_common(struct rtl8169_private *tp)
rtl_udelay_loop_wait_low(tp, &rtl_mac_ocp_e00e_cond, 1000, 10);
+ rtl8125_config_eee_mac(tp);
+
RTL_W32(tp, MISC, RTL_R32(tp, MISC) & ~RXDV_GATED_EN);
udelay(10);
}
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 3/9] r8169: factor out reading MAC address from registers
From: Heiner Kallweit @ 2019-08-28 20:25 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
For RTL8125 we will have to read the MAC address also from another
register range, therefore create a small helper.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index e9d900c11..7d89826cb 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -741,6 +741,14 @@ static bool rtl_supports_eee(struct rtl8169_private *tp)
tp->mac_version != RTL_GIGA_MAC_VER_39;
}
+static void rtl_read_mac_from_reg(struct rtl8169_private *tp, u8 *mac, int reg)
+{
+ int i;
+
+ for (i = 0; i < ETH_ALEN; i++)
+ mac[i] = RTL_R8(tp, reg + i);
+}
+
struct rtl_cond {
bool (*check)(struct rtl8169_private *);
const char *msg;
@@ -6630,7 +6638,7 @@ static void rtl_init_mac_address(struct rtl8169_private *tp)
{
struct net_device *dev = tp->dev;
u8 *mac_addr = dev->dev_addr;
- int rc, i;
+ int rc;
rc = eth_platform_get_mac_address(tp_to_dev(tp), mac_addr);
if (!rc)
@@ -6640,8 +6648,7 @@ static void rtl_init_mac_address(struct rtl8169_private *tp)
if (is_valid_ether_addr(mac_addr))
goto done;
- for (i = 0; i < ETH_ALEN; i++)
- mac_addr[i] = RTL_R8(tp, MAC0 + i);
+ rtl_read_mac_from_reg(tp, mac_addr, MAC0);
if (is_valid_ether_addr(mac_addr))
goto done;
--
2.23.0
^ permalink raw reply related
* [PATCH net-next v2 4/9] r8169: move disabling interrupt coalescing to RTL8169/RTL8168 init
From: Heiner Kallweit @ 2019-08-28 20:26 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Chun-Hao Lin
In-Reply-To: <8181244b-24ac-73e2-bac7-d01f644ebb3f@gmail.com>
RTL8125 doesn't support the same coalescing registers, therefore move
this initialization to the 8168/6169-specific init.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 7d89826cb..dc799528f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5087,6 +5087,9 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp)
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
rtl_hw_config(tp);
+
+ /* disable interrupt coalescing */
+ RTL_W16(tp, IntrMitigate, 0x0000);
}
static void rtl_hw_start_8169(struct rtl8169_private *tp)
@@ -5110,6 +5113,9 @@ static void rtl_hw_start_8169(struct rtl8169_private *tp)
rtl8169_set_magic_reg(tp, tp->mac_version);
RTL_W32(tp, RxMissed, 0);
+
+ /* disable interrupt coalescing */
+ RTL_W16(tp, IntrMitigate, 0x0000);
}
static void rtl_hw_start(struct rtl8169_private *tp)
@@ -5128,8 +5134,6 @@ static void rtl_hw_start(struct rtl8169_private *tp)
rtl_set_rx_tx_desc_registers(tp);
rtl_lock_config_regs(tp);
- /* disable interrupt coalescing */
- RTL_W16(tp, IntrMitigate, 0x0000);
/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
RTL_R8(tp, IntrMask);
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
--
2.23.0
^ permalink raw reply related
* Re: [PATCH v1 net-next] net: stmmac: Add support for MDIO interrupts
From: David Miller @ 2019-08-28 20:33 UTC (permalink / raw)
To: weifeng.voon
Cc: mcoquelin.stm32, netdev, linux-kernel, joabreu, peppe.cavallaro,
andrew, alexandre.torgue, boon.leong.ong
In-Reply-To: <1566870320-9825-1-git-send-email-weifeng.voon@intel.com>
From: Voon Weifeng <weifeng.voon@intel.com>
Date: Tue, 27 Aug 2019 09:45:20 +0800
> From: "Chuah, Kim Tatt" <kim.tatt.chuah@intel.com>
>
> DW EQoS v5.xx controllers added capability for interrupt generation
> when MDIO interface is done (GMII Busy bit is cleared).
> This patch adds support for this interrupt on supported HW to avoid
> polling on GMII Busy bit.
>
> stmmac_mdio_read() & stmmac_mdio_write() will sleep until wake_up() is
> called by the interrupt handler.
>
> Reviewed-by: Voon Weifeng <weifeng.voon@intel.com>
> Reviewed-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
> Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
I know there are some design changes that will occur with this patch but
coding style wise:
> @@ -276,6 +284,10 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
> mac->mode = mac->mode ? : entry->mode;
> mac->tc = mac->tc ? : entry->tc;
> mac->mmc = mac->mmc ? : entry->mmc;
> + mac->mdio_intr_en = mac->mdio_intr_en ? : entry->mdio_intr_en;
> +
> + if (mac->mdio_intr_en)
> + init_waitqueue_head(&mac->mdio_busy_wait);
I'd say always unconditionally initialize wait queues, mutexes, etc.
> +static bool stmmac_mdio_intr_done(struct mii_bus *bus)
> +{
> + struct net_device *ndev = bus->priv;
> + struct stmmac_priv *priv = netdev_priv(ndev);
> + unsigned int mii_address = priv->hw->mii.addr;
Reverse christmas tree here, please.
^ permalink raw reply
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Carlos Antonio Neira Bustos @ 2019-08-28 20:39 UTC (permalink / raw)
To: Yonghong Song
Cc: netdev, Eric Biederman, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <CACiB22jyN9=0ATWWE+x=BoWD6u+8KO+MvBfsFQmcNfkmANb2_w@mail.gmail.com>
Yonghong,
Thanks for the pointer, I fixed this bug, but I found another one that's triggered
now the test program I included in tools/testing/selftests/bpf/test_pidns.
It's seemed that fname was not correctly setup when passing it to filename_lookup.
This is fixed now and I'm doing some more testing.
I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
the fix intended.
Is ok to remove them and just focus to finish the self tests code?.
Bests
On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> Thank you very much!
>
> Bests
>
> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
>
> >
> >
> > On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> > > On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> > >>
> > >>
> > >> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > >>> From: Carlos <cneirabustos@gmail.com>
> > >>>
> > >>> New bpf helper bpf_get_current_pidns_info.
> > >>> This helper obtains the active namespace from current and returns
> > >>> pid, tgid, device and namespace id as seen from that namespace,
> > >>> allowing to instrument a process inside a container.
> > >>>
> > >>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > >>> ---
> > >>> fs/internal.h | 2 --
> > >>> fs/namei.c | 1 -
> > >>> include/linux/bpf.h | 1 +
> > >>> include/linux/namei.h | 4 +++
> > >>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > >>> kernel/bpf/core.c | 1 +
> > >>> kernel/bpf/helpers.c | 64
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>> kernel/trace/bpf_trace.c | 2 ++
> > >>> 8 files changed, 102 insertions(+), 4 deletions(-)
> > >>>
> > [...]
> > >>>
> > >>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> > pidns_info, u32,
> > >>> + size)
> > >>> +{
> > >>> + const char *pidns_path = "/proc/self/ns/pid";
> > >>> + struct pid_namespace *pidns = NULL;
> > >>> + struct filename *tmp = NULL;
> > >>> + struct inode *inode;
> > >>> + struct path kp;
> > >>> + pid_t tgid = 0;
> > >>> + pid_t pid = 0;
> > >>> + int ret;
> > >>> + int len;
> > >>
> > >
> > > Thank you very much for catching this!.
> > > Could you share how to replicate this bug?.
> >
> > The config is attached. just run trace_ns_info and you
> > can reproduce the issue.
> >
> > >
> > >> I am running your sample program and get the following kernel bug:
> > >>
> > >> ...
> > >> [ 26.414825] BUG: sleeping function called from invalid context at
> > >> /data/users/yhs/work/net-next/fs
> > >> /dcache.c:843
> > >> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> > >> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> > >> 5.3.0-rc1+ #280
> > >> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> BIOS 1.9.3-1.el7.centos 04/01/2
> > >> 014
> > >> [ 26.419393] Call Trace:
> > >> [ 26.419697] <IRQ>
> > >> [ 26.419960] dump_stack+0x46/0x5b
> > >> [ 26.420434] ___might_sleep+0xe4/0x110
> > >> [ 26.420894] dput+0x2a/0x200
> > >> [ 26.421265] walk_component+0x10c/0x280
> > >> [ 26.421773] link_path_walk+0x327/0x560
> > >> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> > >> [ 26.422848] ? path_init+0x232/0x330
> > >> [ 26.423364] path_lookupat+0x88/0x200
> > >> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> > >> [ 26.424521] filename_lookup+0xaf/0x190
> > >> [ 26.425031] ? simple_attr_release+0x20/0x20
> > >> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> > >> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> > >> [ 26.426779] trace_call_bpf+0xb5/0x160
> > >> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.428496] kprobe_perf_func+0x4d/0x280
> > >> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> > >> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> > >> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> > >> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> > >> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> > >> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> > >> [ 26.433060] 0xffffffffc03180bf
> > >> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> > >> ...
> > >>
> > >> To prevent we are running in arbitrary task (e.g., idle task)
> > >> context which may introduce sleeping issues, the following
> > >> probably appropriate:
> > >>
> > >> if (in_nmi() || in_softirq())
> > >> return -EPERM;
> > >>
> > >> Anyway, if in nmi or softirq, the namespace and pid/tgid
> > >> we get may be just accidentally associated with the bpf running
> > >> context, but it could be in a different context. So such info
> > >> is not reliable any way.
> > >>
> > >>> +
> > >>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > >>> + return -EINVAL;
> > >>> + pidns = task_active_pid_ns(current);
> > [...]
> >
^ permalink raw reply
* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
From: Andrii Nakryiko @ 2019-08-28 20:40 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
Jesper Dangaard Brouer, Networking, bpf
In-Reply-To: <87tva8m85t.fsf@toke.dk>
On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> [ ... snip ...]
>
> > E.g., today's API is essentially three steps:
> >
> > 1. open and parse ELF: collect relos, programs, map definitions
> > 2. load: create maps from collected defs, do program/global data/CO-RE
> > relocs, load and verify BPF programs
> > 3. attach programs one by one.
> >
> > Between step 1 and 2 user has flexibility to create more maps, set up
> > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > tail call maps, etc. That's already pretty flexible. But we can tune
> > and break apart those steps even further, if necessary.
>
> Today, steps 1 and 2 can be collapsed into a single call to
> bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> generally want to do all the fancy rewriting stuff, we just want a
> simple way to load a program and get reusable pinning of maps.
I agree. See my response to Jesper's message. Note also my view of
bpf_prog_load_xattr() existence.
> Preferably in a way that is compatible with the iproute2 loader.
>
> So I really think we need two things:
>
> (1) a flexible API that splits up all the various steps in a way that
> allows programs to inject their own map definitions before
> relocations and loading
>
> (2) a simple convenience wrapper that loads an object file, does
> something sensible with pinning and map-in-map definitions, and loads
> everything into the kernel.
I agree. I think this wrapper is bpf_object__open + bpf_object__load
(bpf_prog_load_xattr will do as well, if you don't need to do anything
between open and load). I think pinning is simple to add in minimal
form and is pretty non-controversial (there is some ambiguity as to
how to handle merging of prog array maps, or maybe not just prog array
maps, but that can be controlled later through extra flags/attributes,
so I'd start with something sensible as a default behavior).
>
> I'd go so far as to say that (2) should even support system-wide
> configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
> /etc/libbpf/pinning.conf file that sets the default pinning directory,
> and makes it possible to set up pin-value-to-subdir mappings like what
> iproute2 does today.
This I'm a bit hesitant about. It feels like it's not library's job to
read some system-wide configs modifying its behavior. We have all
those _xattr methods, which allow to override sensible defaults, I'd
try to go as far as possible with just that before doing
libbpf-specific /etc configs.
>
> Having (2) makes it more likely that all the different custom loaders
> will be compatible with each other, while still allowing people to do
> their own custom thing with (1). And of course, (2) could be implemented
> in terms of (1) internally in libbpf.
>
> In my ideal world, (2) would just use the definition format already in
> iproute2 (this is basically what I implemented already), but if you guys
> don't want to put this into libbpf, I can probably live with the default
I want to avoid having legacy-at-the-time-it-was-added code in libbpf
that we'd need to support for a long time, that solves only iproute2
cases, which is why I'm pushing back. With BTF we can support same
functionality in better form, which is what I want to prioritize and
which will be beneficial to the whole BPF ecosystem.
But I also want to make libbpf useful to iproute2 and other custom
loaders that have to support existing formats, and thus my proposal to
have libbpf provide granular enough APIs to augment default format in
non-intrusive way. Should this be callback-based or not is secondary,
though important to API design, concern.
> format being BTF-based instead. Which would mean that iproute2 I would
> end up with a flow like this:
>
> - When given an elf file, try to run it through the "standard loader"
> (2). If this works, great, proceed to program attach.
>
> - If using (2) fails because it doesn't understand the map definition,
> fall back to a compatibility loader that parses the legacy iproute2
> map definition format and uses (1) to load that.
>
>
> Does the above make sense? :)
It does, yes. Also, with BTF enabled it should be easy to distinguish
between those two (e.g., was bpf_elf_map type used? if yes, then it's
a compatibility format) and not do extra work.
>
> -Toke
^ permalink raw reply
* [PATCH 1/3] samples: pktgen: make variable consistent with option
From: Daniel T. Lee @ 2019-08-28 20:42 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
This commit changes variable names that can cause confusion.
For example, variable DST_MIN is quite confusing since the
keyword 'udp_dst_min' and keyword 'dst_min' is used with pg_ctrl.
On the following commit, 'dst_min' will be used to set destination IP,
and the existing variable name DST_MIN should be changed.
Variable names are matched to the exact keyword used with pg_ctrl.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
.../pktgen_bench_xmit_mode_netif_receive.sh | 8 ++++----
.../pktgen/pktgen_bench_xmit_mode_queue_xmit.sh | 8 ++++----
samples/pktgen/pktgen_sample01_simple.sh | 16 ++++++++--------
samples/pktgen/pktgen_sample02_multiqueue.sh | 16 ++++++++--------
.../pktgen/pktgen_sample03_burst_single_flow.sh | 8 ++++----
samples/pktgen/pktgen_sample04_many_flows.sh | 8 ++++----
.../pktgen/pktgen_sample05_flow_per_thread.sh | 8 ++++----
...en_sample06_numa_awared_queue_irq_affinity.sh | 16 ++++++++--------
8 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
index e14b1a9144d9..9b74502c58f7 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
@@ -42,8 +42,8 @@ fi
[ -z "$BURST" ] && BURST=1024
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -76,8 +76,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Inject packet into RX path of stack
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
index 82c3e504e056..0f332555b40d 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
@@ -25,8 +25,8 @@ if [[ -n "$BURST" ]]; then
fi
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -59,8 +59,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Inject packet into TX qdisc egress path of stack
diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh
index d1702fdde8f3..063ec0998906 100755
--- a/samples/pktgen/pktgen_sample01_simple.sh
+++ b/samples/pktgen/pktgen_sample01_simple.sh
@@ -23,16 +23,16 @@ fi
[ -z "$DST_MAC" ] && usage && err 2 "Must specify -m dst_mac"
[ -z "$COUNT" ] && COUNT="100000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
DELAY="0" # Zero means max speed
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
# General cleanup everything since last run
# (especially important if other threads were configured by other scripts)
@@ -66,14 +66,14 @@ pg_set $DEV "dst$IP6 $DEST_IP"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $DEV "flag UDPDST_RND"
- pg_set $DEV "udp_dst_min $DST_MIN"
- pg_set $DEV "udp_dst_max $DST_MAX"
+ pg_set $DEV "udp_dst_min $UDP_DST_MIN"
+ pg_set $DEV "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $DEV "flag UDPSRC_RND"
-pg_set $DEV "udp_src_min $UDP_MIN"
-pg_set $DEV "udp_src_max $UDP_MAX"
+pg_set $DEV "udp_src_min $UDP_SRC_MIN"
+pg_set $DEV "udp_src_max $UDP_SRC_MAX"
# start_run
echo "Running... ctrl^C to stop" >&2
diff --git a/samples/pktgen/pktgen_sample02_multiqueue.sh b/samples/pktgen/pktgen_sample02_multiqueue.sh
index 7f7a9a27548f..a4726fb50197 100755
--- a/samples/pktgen/pktgen_sample02_multiqueue.sh
+++ b/samples/pktgen/pktgen_sample02_multiqueue.sh
@@ -21,8 +21,8 @@ DELAY="0" # Zero means max speed
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
# (example of setting default params in your script)
if [ -z "$DEST_IP" ]; then
@@ -30,8 +30,8 @@ if [ -z "$DEST_IP" ]; then
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# General cleanup everything since last run
@@ -67,14 +67,14 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $dev "flag UDPSRC_RND"
- pg_set $dev "udp_src_min $UDP_MIN"
- pg_set $dev "udp_src_max $UDP_MAX"
+ pg_set $dev "udp_src_min $UDP_SRC_MIN"
+ pg_set $dev "udp_src_max $UDP_SRC_MAX"
done
# start_run
diff --git a/samples/pktgen/pktgen_sample03_burst_single_flow.sh b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
index b520637817ce..dfea91a09ccc 100755
--- a/samples/pktgen/pktgen_sample03_burst_single_flow.sh
+++ b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
@@ -34,8 +34,8 @@ fi
[ -z "$CLONE_SKB" ] && CLONE_SKB="0" # No need for clones when bursting
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -67,8 +67,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup burst, for easy testing -b 0 disable bursting
diff --git a/samples/pktgen/pktgen_sample04_many_flows.sh b/samples/pktgen/pktgen_sample04_many_flows.sh
index 5b6e9d9cb5b5..7ea9b4a3acf6 100755
--- a/samples/pktgen/pktgen_sample04_many_flows.sh
+++ b/samples/pktgen/pktgen_sample04_many_flows.sh
@@ -18,8 +18,8 @@ source ${basedir}/parameters.sh
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# NOTICE: Script specific settings
@@ -63,8 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Randomize source IP-addresses
diff --git a/samples/pktgen/pktgen_sample05_flow_per_thread.sh b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
index 0c06e63fbe97..fbfafe029e11 100755
--- a/samples/pktgen/pktgen_sample05_flow_per_thread.sh
+++ b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
@@ -23,8 +23,8 @@ source ${basedir}/parameters.sh
[ -z "$BURST" ] && BURST=32
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -56,8 +56,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup source IP-addresses based on thread number
diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
index 97f0266c0356..755e662183f1 100755
--- a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -20,8 +20,8 @@ DELAY="0" # Zero means max speed
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
node=`get_iface_node $DEV`
irq_array=(`get_iface_irqs $DEV`)
@@ -36,8 +36,8 @@ if [ -z "$DEST_IP" ]; then
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# General cleanup everything since last run
@@ -84,14 +84,14 @@ for ((i = 0; i < $THREADS; i++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $dev "flag UDPSRC_RND"
- pg_set $dev "udp_src_min $UDP_MIN"
- pg_set $dev "udp_src_max $UDP_MAX"
+ pg_set $dev "udp_src_min $UDP_SRC_MIN"
+ pg_set $dev "udp_src_max $UDP_SRC_MAX"
done
# start_run
--
2.20.1
^ permalink raw reply related
* [PATCH 2/3] samples: pktgen: add helper functions for IP(v4/v6) CIDR parsing
From: Daniel T. Lee @ 2019-08-28 20:42 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
In-Reply-To: <20190828204243.16666-1-danieltimlee@gmail.com>
This commit adds CIDR parsing and IP validate helper function to parse
single IP or range of IP with CIDR. (e.g. 198.18.0.0/15)
Helpers will be used in prior to set target address in samples/pktgen.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
samples/pktgen/functions.sh | 134 ++++++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/samples/pktgen/functions.sh b/samples/pktgen/functions.sh
index 4af4046d71be..eb1c52e25018 100644
--- a/samples/pktgen/functions.sh
+++ b/samples/pktgen/functions.sh
@@ -163,6 +163,140 @@ function get_node_cpus()
echo $node_cpu_list
}
+# Extend shrunken IPv6 address.
+# fe80::42:bcff:fe84:e10a => fe80:0:0:0:42:bcff:fe84:e10a
+function extend_addr6()
+{
+ local addr=$1
+ local sep=:
+ local sep2=::
+ local sep_cnt=$(tr -cd $sep <<< $1 | wc -c)
+ local shrink
+
+ # separator count : should be between 2, 7.
+ if [[ $sep_cnt -lt 2 || $sep_cnt -gt 7 ]]; then
+ err 5 "Invalid IP6 address sep: $1"
+ fi
+
+ # if shrink '::' occurs multiple, it's malformed.
+ shrink=( $(egrep -o "$sep{2,}" <<< $addr) )
+ if [[ ${#shrink[@]} -ne 0 ]]; then
+ if [[ ${#shrink[@]} -gt 1 || ( ${shrink[0]} != $sep2 ) ]]; then
+ err 5 "Invalid IP$IP6 address shr: $1"
+ fi
+ fi
+
+ # add 0 at begin & end, and extend addr by adding :0
+ [[ ${addr:0:1} == $sep ]] && addr=0${addr}
+ [[ ${addr: -1} == $sep ]] && addr=${addr}0
+ echo "${addr/$sep2/$(printf ':0%.s' $(seq $[8-sep_cnt])):}"
+}
+
+
+# Given a single IP(v4/v6) address, whether it is valid.
+function validate_addr()
+{
+ # check function is called with (funcname)6
+ [[ ${FUNCNAME[1]: -1} == 6 ]] && local IP6=6
+ local len=$[ IP6 ? 8 : 4 ]
+ local max=$[ 2**(len*2)-1 ]
+ local addr
+ local sep
+
+ # set separator for each IP(v4/v6)
+ [[ $IP6 ]] && sep=: || sep=.
+ IFS=$sep read -a addr <<< $1
+
+ # array length
+ if [[ ${#addr[@]} != $len ]]; then
+ err 5 "Invalid IP$IP6 address: $1"
+ fi
+
+ # check each digit between 0, $max
+ for digit in "${addr[@]}"; do
+ [[ $IP6 ]] && digit=$[ 16#$digit ]
+ if [[ $digit -lt 0 || $digit -gt $max ]]; then
+ err 5 "Invalid IP$IP6 address: $1"
+ fi
+ done
+
+ return 0
+}
+
+function validate_addr6() { validate_addr $@ ; }
+
+# Given a single IP(v4/v6) or CIDR, return minimum and maximum IP addr.
+function parse_addr()
+{
+ # check function is called with (funcname)6
+ [[ ${FUNCNAME[1]: -1} == 6 ]] && local IP6=6
+ local bitlen=$[ IP6 ? 128 : 32 ]
+
+ local addr=$1
+ local net
+ local prefix
+ local min_ip
+ local max_ip
+
+ IFS='/' read net prefix <<< $addr
+ [[ $IP6 ]] && net=$(extend_addr6 $net)
+ validate_addr$IP6 $net
+
+ if [[ $prefix -gt $bitlen ]]; then
+ err 5 "Invalid prefix: $prefix"
+ elif [[ -z $prefix ]]; then
+ min_ip=$net
+ max_ip=$net
+ else
+ # defining array for converting Decimal 2 Binary
+ # 00000000 00000001 00000010 00000011 00000100 ...
+ local d2b='{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}'
+ [[ $IP6 ]] && d2b+=$d2b
+ eval local D2B=($d2b)
+
+ local shift=$[ bitlen-prefix ]
+ local ip_bit
+ local ip
+ local sep
+
+ # set separator for each IP(v4/v6)
+ [[ $IP6 ]] && sep=: || sep=.
+ IFS=$sep read -ra ip <<< $net
+
+ # build full size bit
+ for digit in "${ip[@]}"; do
+ [[ $IP6 ]] && digit=$[ 16#$digit ]
+ ip_bit+=${D2B[$digit]}
+ done
+
+ # fill 0 or 1 by $shift
+ base_bit=${ip_bit::$prefix}
+ min_bit="$base_bit$(printf '0%.s' $(seq $shift))"
+ max_bit="$base_bit$(printf '1%.s' $(seq $shift))"
+
+ bit2addr() {
+ local step=$[ IP6 ? 16 : 8 ]
+ local max=$[ bitlen-step ]
+ local result
+ local fmt
+ [[ $IP6 ]] && fmt='%X' || fmt='%d'
+
+ for i in $(seq 0 $step $max); do
+ result+=$(printf $fmt $[ 2#${1:$i:$step} ])
+ [[ $i != $max ]] && result+=$sep
+ done
+ echo $result
+ }
+
+ min_ip=$(bit2addr $min_bit)
+ max_ip=$(bit2addr $max_bit)
+ fi
+
+ echo $min_ip $max_ip
+}
+
+function parse_addr6() { parse_addr $@ ; }
+
# Given a single or range of port(s), return minimum and maximum port number.
function parse_ports()
{
--
2.20.1
^ permalink raw reply related
* [PATCH 3/3] samples: pktgen: allow to specify destination IP range (CIDR)
From: Daniel T. Lee @ 2019-08-28 20:42 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
In-Reply-To: <20190828204243.16666-1-danieltimlee@gmail.com>
Currently, kernel pktgen has the feature to specify destination
address range for sending packet. (e.g. pgset "dst_min/dst_max")
But on samples, each of them doesn't have any option to achieve this.
The commit adds feature to specify destination address range with CIDR.
-d : ($DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed
# ./pktgen_sample01_simple.sh -6 -d fe80::20/126 -p 3000 -n 4
# tcpdump ip6 and udp
05:14:18.082285 IP6 fe80::99.71 > fe80::23.3000: UDP, length 16
05:14:18.082564 IP6 fe80::99.43 > fe80::23.3000: UDP, length 16
05:14:18.083366 IP6 fe80::99.107 > fe80::22.3000: UDP, length 16
05:14:18.083585 IP6 fe80::99.97 > fe80::21.3000: UDP, length 16
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
samples/pktgen/README.rst | 2 +-
samples/pktgen/parameters.sh | 2 +-
.../pktgen/pktgen_bench_xmit_mode_netif_receive.sh | 4 +++-
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh | 4 +++-
samples/pktgen/pktgen_sample01_simple.sh | 4 +++-
samples/pktgen/pktgen_sample02_multiqueue.sh | 4 +++-
samples/pktgen/pktgen_sample03_burst_single_flow.sh | 4 +++-
samples/pktgen/pktgen_sample04_many_flows.sh | 11 ++++++++---
samples/pktgen/pktgen_sample05_flow_per_thread.sh | 4 +++-
.../pktgen_sample06_numa_awared_queue_irq_affinity.sh | 4 +++-
10 files changed, 31 insertions(+), 12 deletions(-)
diff --git a/samples/pktgen/README.rst b/samples/pktgen/README.rst
index fd39215db508..3f6483e8b2df 100644
--- a/samples/pktgen/README.rst
+++ b/samples/pktgen/README.rst
@@ -18,7 +18,7 @@ across the sample scripts. Usage example is printed on errors::
Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
- -d : ($DEST_IP) destination IP
+ -d : ($DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed
-m : ($DST_MAC) destination MAC-addr
-p : ($DST_PORT) destination PORT range (e.g. 433-444) is also allowed
-t : ($THREADS) threads to start
diff --git a/samples/pktgen/parameters.sh b/samples/pktgen/parameters.sh
index a06b00a0c7b6..ff0ed474fee9 100644
--- a/samples/pktgen/parameters.sh
+++ b/samples/pktgen/parameters.sh
@@ -8,7 +8,7 @@ function usage() {
echo "Usage: $0 [-vx] -i ethX"
echo " -i : (\$DEV) output interface/device (required)"
echo " -s : (\$PKT_SIZE) packet size"
- echo " -d : (\$DEST_IP) destination IP"
+ echo " -d : (\$DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed"
echo " -m : (\$DST_MAC) destination MAC-addr"
echo " -p : (\$DST_PORT) destination PORT range (e.g. 433-444) is also allowed"
echo " -t : (\$THREADS) threads to start"
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
index 9b74502c58f7..da6cb711b7f4 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
@@ -41,6 +41,7 @@ fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
[ -z "$BURST" ] && BURST=1024
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -71,7 +72,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
index 0f332555b40d..355937787364 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
@@ -24,6 +24,7 @@ if [[ -n "$BURST" ]]; then
err 1 "Bursting not supported for this mode"
fi
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -54,7 +55,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh
index 063ec0998906..08995fa70025 100755
--- a/samples/pktgen/pktgen_sample01_simple.sh
+++ b/samples/pktgen/pktgen_sample01_simple.sh
@@ -22,6 +22,7 @@ fi
# Example enforce param "-m" for dst_mac
[ -z "$DST_MAC" ] && usage && err 2 "Must specify -m dst_mac"
[ -z "$COUNT" ] && COUNT="100000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -61,7 +62,8 @@ pg_set $DEV "flag NO_TIMESTAMP"
# Destination
pg_set $DEV "dst_mac $DST_MAC"
-pg_set $DEV "dst$IP6 $DEST_IP"
+pg_set $DEV "dst${IP6}_min $DST_MIN"
+pg_set $DEV "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample02_multiqueue.sh b/samples/pktgen/pktgen_sample02_multiqueue.sh
index a4726fb50197..9b806e41c23a 100755
--- a/samples/pktgen/pktgen_sample02_multiqueue.sh
+++ b/samples/pktgen/pktgen_sample02_multiqueue.sh
@@ -29,6 +29,7 @@ if [ -z "$DEST_IP" ]; then
[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -62,7 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample03_burst_single_flow.sh b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
index dfea91a09ccc..cb067788ceb3 100755
--- a/samples/pktgen/pktgen_sample03_burst_single_flow.sh
+++ b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
@@ -33,6 +33,7 @@ fi
[ -z "$BURST" ] && BURST=32
[ -z "$CLONE_SKB" ] && CLONE_SKB="0" # No need for clones when bursting
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -62,7 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample04_many_flows.sh b/samples/pktgen/pktgen_sample04_many_flows.sh
index 7ea9b4a3acf6..626e33016869 100755
--- a/samples/pktgen/pktgen_sample04_many_flows.sh
+++ b/samples/pktgen/pktgen_sample04_many_flows.sh
@@ -17,6 +17,7 @@ source ${basedir}/parameters.sh
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -37,6 +38,9 @@ if [[ -n "$BURST" ]]; then
err 1 "Bursting not supported for this mode"
fi
+# 198.18.0.0 / 198.19.255.255
+read -r SRC_MIN SRC_MAX <<< $(parse_addr 198.18.0.0/15)
+
# General cleanup everything since last run
pg_ctrl "reset"
@@ -58,7 +62,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Single destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst $DEST_IP"
+ pg_set $dev "dst_min $DST_MIN"
+ pg_set $dev "dst_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
@@ -69,8 +74,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Randomize source IP-addresses
pg_set $dev "flag IPSRC_RND"
- pg_set $dev "src_min 198.18.0.0"
- pg_set $dev "src_max 198.19.255.255"
+ pg_set $dev "src_min $SRC_MIN"
+ pg_set $dev "src_max $SRC_MAX"
# Limit number of flows (max 65535)
pg_set $dev "flows $FLOWS"
diff --git a/samples/pktgen/pktgen_sample05_flow_per_thread.sh b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
index fbfafe029e11..cb79de073e9d 100755
--- a/samples/pktgen/pktgen_sample05_flow_per_thread.sh
+++ b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
@@ -22,6 +22,7 @@ source ${basedir}/parameters.sh
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$BURST" ] && BURST=32
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -51,7 +52,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Single destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst $DEST_IP"
+ pg_set $dev "dst_min $DST_MIN"
+ pg_set $dev "dst_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
index 755e662183f1..739adcda5b5f 100755
--- a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -35,6 +35,7 @@ if [ -z "$DEST_IP" ]; then
[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -79,7 +80,8 @@ for ((i = 0; i < $THREADS; i++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
--
2.20.1
^ permalink raw reply related
* Re: [PATCH net-next 01/15] MIPS: SGI-IP27: remove ioc3 ethernet init
From: Paul Burton @ 2019-08-28 20:45 UTC (permalink / raw)
To: Thomas Bogendoerfer
Cc: Ralf Baechle, James Hogan, David S. Miller,
linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <20190828140315.17048-2-tbogendoerfer@suse.de>
Hi Thomas,
On Wed, Aug 28, 2019 at 04:03:00PM +0200, Thomas Bogendoerfer wrote:
> Removed not needed disabling of ethernet interrupts in IP27 platform code.
>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Thanks,
Paul
> ---
> arch/mips/sgi-ip27/ip27-init.c | 13 -------------
> 1 file changed, 13 deletions(-)
>
> diff --git a/arch/mips/sgi-ip27/ip27-init.c b/arch/mips/sgi-ip27/ip27-init.c
> index 066b33f50bcc..59d5375c9021 100644
> --- a/arch/mips/sgi-ip27/ip27-init.c
> +++ b/arch/mips/sgi-ip27/ip27-init.c
> @@ -130,17 +130,6 @@ cnodeid_t get_compact_nodeid(void)
> return NASID_TO_COMPACT_NODEID(get_nasid());
> }
>
> -static inline void ioc3_eth_init(void)
> -{
> - struct ioc3 *ioc3;
> - nasid_t nid;
> -
> - nid = get_nasid();
> - ioc3 = (struct ioc3 *) KL_CONFIG_CH_CONS_INFO(nid)->memory_base;
> -
> - ioc3->eier = 0;
> -}
> -
> extern void ip27_reboot_setup(void);
>
> void __init plat_mem_setup(void)
> @@ -182,8 +171,6 @@ void __init plat_mem_setup(void)
> panic("Kernel compiled for N mode.");
> #endif
>
> - ioc3_eth_init();
> -
> ioport_resource.start = 0;
> ioport_resource.end = ~0UL;
> set_io_port_base(IO_BASE);
> --
> 2.13.7
>
^ permalink raw reply
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-28 20:53 UTC (permalink / raw)
To: Carlos Antonio Neira Bustos
Cc: netdev@vger.kernel.org, Eric Biederman, brouer@redhat.com,
bpf@vger.kernel.org
In-Reply-To: <20190828203951.qo4kaloahcnvp7nw@ebpf-metal>
On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> Yonghong,
>
> Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> now the test program I included in tools/testing/selftests/bpf/test_pidns.
> It's seemed that fname was not correctly setup when passing it to filename_lookup.
> This is fixed now and I'm doing some more testing.
> I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> the fix intended.
> Is ok to remove them and just focus to finish the self tests code?.
Yes, the samples/bpf test case can be removed.
Could you create a selftest with tracpoint net/netif_receive_skb, which
also uses the proposed helper? net/netif_receive_skb will happen in
interrupt context and it should catch the issue as well if
filename_lookup still get called in interrupt context.
>
> Bests
>
> On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
>> Thank you very much!
>>
>> Bests
>>
>> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
>>
>>>
>>>
>>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
>>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
>>>>>> From: Carlos <cneirabustos@gmail.com>
>>>>>>
>>>>>> New bpf helper bpf_get_current_pidns_info.
>>>>>> This helper obtains the active namespace from current and returns
>>>>>> pid, tgid, device and namespace id as seen from that namespace,
>>>>>> allowing to instrument a process inside a container.
>>>>>>
>>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>>>>>> ---
>>>>>> fs/internal.h | 2 --
>>>>>> fs/namei.c | 1 -
>>>>>> include/linux/bpf.h | 1 +
>>>>>> include/linux/namei.h | 4 +++
>>>>>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>>>>>> kernel/bpf/core.c | 1 +
>>>>>> kernel/bpf/helpers.c | 64
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> kernel/trace/bpf_trace.c | 2 ++
>>>>>> 8 files changed, 102 insertions(+), 4 deletions(-)
>>>>>>
>>> [...]
>>>>>>
>>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
>>> pidns_info, u32,
>>>>>> + size)
>>>>>> +{
>>>>>> + const char *pidns_path = "/proc/self/ns/pid";
>>>>>> + struct pid_namespace *pidns = NULL;
>>>>>> + struct filename *tmp = NULL;
>>>>>> + struct inode *inode;
>>>>>> + struct path kp;
>>>>>> + pid_t tgid = 0;
>>>>>> + pid_t pid = 0;
>>>>>> + int ret;
>>>>>> + int len;
>>>>>
>>>>
>>>> Thank you very much for catching this!.
>>>> Could you share how to replicate this bug?.
>>>
>>> The config is attached. just run trace_ns_info and you
>>> can reproduce the issue.
>>>
>>>>
>>>>> I am running your sample program and get the following kernel bug:
>>>>>
>>>>> ...
>>>>> [ 26.414825] BUG: sleeping function called from invalid context at
>>>>> /data/users/yhs/work/net-next/fs
>>>>> /dcache.c:843
>>>>> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
>>>>> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
>>>>> 5.3.0-rc1+ #280
>>>>> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS 1.9.3-1.el7.centos 04/01/2
>>>>> 014
>>>>> [ 26.419393] Call Trace:
>>>>> [ 26.419697] <IRQ>
>>>>> [ 26.419960] dump_stack+0x46/0x5b
>>>>> [ 26.420434] ___might_sleep+0xe4/0x110
>>>>> [ 26.420894] dput+0x2a/0x200
>>>>> [ 26.421265] walk_component+0x10c/0x280
>>>>> [ 26.421773] link_path_walk+0x327/0x560
>>>>> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
>>>>> [ 26.422848] ? path_init+0x232/0x330
>>>>> [ 26.423364] path_lookupat+0x88/0x200
>>>>> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
>>>>> [ 26.424521] filename_lookup+0xaf/0x190
>>>>> [ 26.425031] ? simple_attr_release+0x20/0x20
>>>>> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
>>>>> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
>>>>> [ 26.426779] trace_call_bpf+0xb5/0x160
>>>>> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.428496] kprobe_perf_func+0x4d/0x280
>>>>> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
>>>>> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
>>>>> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
>>>>> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
>>>>> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
>>>>> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
>>>>> [ 26.433060] 0xffffffffc03180bf
>>>>> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> ...
>>>>>
>>>>> To prevent we are running in arbitrary task (e.g., idle task)
>>>>> context which may introduce sleeping issues, the following
>>>>> probably appropriate:
>>>>>
>>>>> if (in_nmi() || in_softirq())
>>>>> return -EPERM;
>>>>>
>>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
>>>>> we get may be just accidentally associated with the bpf running
>>>>> context, but it could be in a different context. So such info
>>>>> is not reliable any way.
>>>>>
>>>>>> +
>>>>>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
>>>>>> + return -EINVAL;
>>>>>> + pidns = task_active_pid_ns(current);
>>> [...]
>>>
^ permalink raw reply
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Carlos Antonio Neira Bustos @ 2019-08-28 21:03 UTC (permalink / raw)
To: Yonghong Song
Cc: netdev@vger.kernel.org, Eric Biederman, brouer@redhat.com,
bpf@vger.kernel.org
In-Reply-To: <4faeb577-387a-7186-e060-f0ca76395823@fb.com>
Thanks, I'll work on the net/netif_receive_skb selftest using this helper.
I hope I could complete this work this week.
Bests.
On Wed, Aug 28, 2019 at 08:53:25PM +0000, Yonghong Song wrote:
>
>
> On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> > Yonghong,
> >
> > Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> > now the test program I included in tools/testing/selftests/bpf/test_pidns.
> > It's seemed that fname was not correctly setup when passing it to filename_lookup.
> > This is fixed now and I'm doing some more testing.
> > I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> > the fix intended.
> > Is ok to remove them and just focus to finish the self tests code?.
>
> Yes, the samples/bpf test case can be removed.
> Could you create a selftest with tracpoint net/netif_receive_skb, which
> also uses the proposed helper? net/netif_receive_skb will happen in
> interrupt context and it should catch the issue as well if
> filename_lookup still get called in interrupt context.
>
> >
> > Bests
> >
> > On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> >> Thank you very much!
> >>
> >> Bests
> >>
> >> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> >>
> >>>
> >>>
> >>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> >>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> >>>>>
> >>>>>
> >>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
> >>>>>> From: Carlos <cneirabustos@gmail.com>
> >>>>>>
> >>>>>> New bpf helper bpf_get_current_pidns_info.
> >>>>>> This helper obtains the active namespace from current and returns
> >>>>>> pid, tgid, device and namespace id as seen from that namespace,
> >>>>>> allowing to instrument a process inside a container.
> >>>>>>
> >>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> >>>>>> ---
> >>>>>> fs/internal.h | 2 --
> >>>>>> fs/namei.c | 1 -
> >>>>>> include/linux/bpf.h | 1 +
> >>>>>> include/linux/namei.h | 4 +++
> >>>>>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >>>>>> kernel/bpf/core.c | 1 +
> >>>>>> kernel/bpf/helpers.c | 64
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> kernel/trace/bpf_trace.c | 2 ++
> >>>>>> 8 files changed, 102 insertions(+), 4 deletions(-)
> >>>>>>
> >>> [...]
> >>>>>>
> >>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> >>> pidns_info, u32,
> >>>>>> + size)
> >>>>>> +{
> >>>>>> + const char *pidns_path = "/proc/self/ns/pid";
> >>>>>> + struct pid_namespace *pidns = NULL;
> >>>>>> + struct filename *tmp = NULL;
> >>>>>> + struct inode *inode;
> >>>>>> + struct path kp;
> >>>>>> + pid_t tgid = 0;
> >>>>>> + pid_t pid = 0;
> >>>>>> + int ret;
> >>>>>> + int len;
> >>>>>
> >>>>
> >>>> Thank you very much for catching this!.
> >>>> Could you share how to replicate this bug?.
> >>>
> >>> The config is attached. just run trace_ns_info and you
> >>> can reproduce the issue.
> >>>
> >>>>
> >>>>> I am running your sample program and get the following kernel bug:
> >>>>>
> >>>>> ...
> >>>>> [ 26.414825] BUG: sleeping function called from invalid context at
> >>>>> /data/users/yhs/work/net-next/fs
> >>>>> /dcache.c:843
> >>>>> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> >>>>> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> >>>>> 5.3.0-rc1+ #280
> >>>>> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>>>> BIOS 1.9.3-1.el7.centos 04/01/2
> >>>>> 014
> >>>>> [ 26.419393] Call Trace:
> >>>>> [ 26.419697] <IRQ>
> >>>>> [ 26.419960] dump_stack+0x46/0x5b
> >>>>> [ 26.420434] ___might_sleep+0xe4/0x110
> >>>>> [ 26.420894] dput+0x2a/0x200
> >>>>> [ 26.421265] walk_component+0x10c/0x280
> >>>>> [ 26.421773] link_path_walk+0x327/0x560
> >>>>> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> >>>>> [ 26.422848] ? path_init+0x232/0x330
> >>>>> [ 26.423364] path_lookupat+0x88/0x200
> >>>>> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> >>>>> [ 26.424521] filename_lookup+0xaf/0x190
> >>>>> [ 26.425031] ? simple_attr_release+0x20/0x20
> >>>>> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> >>>>> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> >>>>> [ 26.426779] trace_call_bpf+0xb5/0x160
> >>>>> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.428496] kprobe_perf_func+0x4d/0x280
> >>>>> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> >>>>> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> >>>>> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> >>>>> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> >>>>> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> >>>>> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> >>>>> [ 26.433060] 0xffffffffc03180bf
> >>>>> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> ...
> >>>>>
> >>>>> To prevent we are running in arbitrary task (e.g., idle task)
> >>>>> context which may introduce sleeping issues, the following
> >>>>> probably appropriate:
> >>>>>
> >>>>> if (in_nmi() || in_softirq())
> >>>>> return -EPERM;
> >>>>>
> >>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
> >>>>> we get may be just accidentally associated with the bpf running
> >>>>> context, but it could be in a different context. So such info
> >>>>> is not reliable any way.
> >>>>>
> >>>>>> +
> >>>>>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> >>>>>> + return -EINVAL;
> >>>>>> + pidns = task_active_pid_ns(current);
> >>> [...]
> >>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox