Netdev List
 help / color / mirror / Atom feed
* RE: [RFC] vhost: introduce mdev based hardware vhost backend
From: Liang, Cunming @ 2018-04-20  3:50 UTC (permalink / raw)
  To: Bie, Tiwei, Michael S. Tsirkin
  Cc: Jason Wang, alex.williamson@redhat.com, ddutile@redhat.com,
	Duyck, Alexander H, virtio-dev@lists.oasis-open.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	Daly, Dan, Wang, Zhihong, Tan, Jianfeng, Wang, Xiao W,
	Tian, Kevin
In-Reply-To: <20180420032806.i3jy7xb7emgil6eu@debian>



> -----Original Message-----
> From: Bie, Tiwei
> Sent: Friday, April 20, 2018 11:28 AM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>; alex.williamson@redhat.com;
> ddutile@redhat.com; Duyck, Alexander H <alexander.h.duyck@intel.com>;
> virtio-dev@lists.oasis-open.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> netdev@vger.kernel.org; Daly, Dan <dan.daly@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Tan,
> Jianfeng <jianfeng.tan@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>;
> Tian, Kevin <kevin.tian@intel.com>
> Subject: Re: [RFC] vhost: introduce mdev based hardware vhost backend
> 
> On Thu, Apr 19, 2018 at 09:40:23PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > > > One problem is that, different virtio ring compatible devices
> > > > > > may have different device interfaces. That is to say, we will
> > > > > > need different drivers in QEMU. It could be troublesome. And
> > > > > > that's what this patch trying to fix. The idea behind this
> > > > > > patch is very simple: mdev is a standard way to emulate device
> > > > > > in kernel.
> > > > > So you just move the abstraction layer from qemu to kernel, and
> > > > > you still need different drivers in kernel for different device
> > > > > interfaces of accelerators. This looks even more complex than
> > > > > leaving it in qemu. As you said, another idea is to implement
> > > > > userspace vhost backend for accelerators which seems easier and
> > > > > could co-work with other parts of qemu without inventing new type of
> messages.
> > > > I'm not quite sure. Do you think it's acceptable to add various
> > > > vendor specific hardware drivers in QEMU?
> > > >
> > >
> > > I don't object but we need to figure out the advantages of doing it
> > > in qemu too.
> > >
> > > Thanks
> >
> > To be frank kernel is exactly where device drivers belong.  DPDK did
> > move them to userspace but that's merely a requirement for data path.
> > *If* you can have them in kernel that is best:
> > - update kernel and there's no need to rebuild userspace
> > - apps can be written in any language no need to maintain multiple
> >   libraries or add wrappers
> > - security concerns are much smaller (ok people are trying to
> >   raise the bar with IOMMUs and such, but it's already pretty
> >   good even without)
> >
> > The biggest issue is that you let userspace poke at the device which
> > is also allowed by the IOMMU to poke at kernel memory (needed for
> > kernel driver to work).
> 
> I think the device won't and shouldn't be allowed to poke at kernel memory. Its
> kernel driver needs some kernel memory to work. But the device doesn't have
> the access to them. Instead, the device only has the access to:
> 
> (1) the entire memory of the VM (if vIOMMU isn't used) or
> (2) the memory belongs to the guest virtio device (if
>     vIOMMU is being used).
> 
> Below is the reason:
> 
> For the first case, we should program the IOMMU for the hardware device based
> on the info in the memory table which is the entire memory of the VM.
> 
> For the second case, we should program the IOMMU for the hardware device
> based on the info in the shadow page table of the vIOMMU.
> 
> So the memory can be accessed by the device is limited, it should be safe
> especially for the second case.
> 
> My concern is that, in this RFC, we don't program the IOMMU for the mdev
> device in the userspace via the VFIO API directly. Instead, we pass the memory
> table to the kernel driver via the mdev device (BAR0) and ask the driver to do the
> IOMMU programming. Someone may don't like it. The main reason why we don't
> program IOMMU via VFIO API in userspace directly is that, currently IOMMU
> drivers don't support mdev bus.
> 
> >
> > Yes, maybe if device is not buggy it's all fine, but it's better if we
> > do not have to trust the device otherwise the security picture becomes
> > more murky.
> >
> > I suggested attaching a PASID to (some) queues - see my old post
> > "using PASIDs to enable a safe variant of direct ring access".
> 
Ideally we can have a device binding with normal driver in host, meanwhile support to allocate a few queues attaching with PASID on-demand. By vhost mdev transport channel, the data path ability of queues(as a device) can expose to qemu vhost adaptor as a vDPA instance. Then we can avoid VF number limitation, providing vhost data path acceleration in a small granularity.

> It's pretty cool. We also have some similar ideas.
> Cunming will talk more about this.
> 
> Best regards,
> Tiwei Bie
> 
> >
> > Then using IOMMU with VFIO to limit access through queue to corrent
> > ranges of memory.
> >
> >
> > --
> > MST

^ permalink raw reply

* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Jason Wang @ 2018-04-20  3:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, kvm, netdev, linux-kernel,
	virtualization, xiao.w.wang, ddutile, jianfeng.tan, zhihong.wang
In-Reply-To: <20180419212911-mutt-send-email-mst@kernel.org>



On 2018年04月20日 02:40, Michael S. Tsirkin wrote:
> On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
>>>>> One problem is that, different virtio ring compatible devices
>>>>> may have different device interfaces. That is to say, we will
>>>>> need different drivers in QEMU. It could be troublesome. And
>>>>> that's what this patch trying to fix. The idea behind this
>>>>> patch is very simple: mdev is a standard way to emulate device
>>>>> in kernel.
>>>> So you just move the abstraction layer from qemu to kernel, and you still
>>>> need different drivers in kernel for different device interfaces of
>>>> accelerators. This looks even more complex than leaving it in qemu. As you
>>>> said, another idea is to implement userspace vhost backend for accelerators
>>>> which seems easier and could co-work with other parts of qemu without
>>>> inventing new type of messages.
>>> I'm not quite sure. Do you think it's acceptable to
>>> add various vendor specific hardware drivers in QEMU?
>>>
>> I don't object but we need to figure out the advantages of doing it in qemu
>> too.
>>
>> Thanks
> To be frank kernel is exactly where device drivers belong.  DPDK did
> move them to userspace but that's merely a requirement for data path.
> *If* you can have them in kernel that is best:
> - update kernel and there's no need to rebuild userspace

Well, you still need to rebuild userspace since a new vhost backend is 
required which relies vhost protocol through mdev API. And I believe 
upgrading userspace package is considered to be more lightweight than 
upgrading kernel. With mdev, we're likely to repeat the story of vhost 
API, dealing with features/versions and inventing new API endless for 
new features. And you will still need to rebuild the userspace.

> - apps can be written in any language no need to maintain multiple
>    libraries or add wrappers

This is not a big issue consider It's not a generic network driver but a 
mdev driver, the only possible user is VM.

> - security concerns are much smaller (ok people are trying to
>    raise the bar with IOMMUs and such, but it's already pretty
>    good even without)

Well, I think not, kernel bugs are much more serious than userspace 
ones. And I beg the kernel driver itself won't be small.

>
> The biggest issue is that you let userspace poke at the
> device which is also allowed by the IOMMU to poke at
> kernel memory (needed for kernel driver to work).

I don't quite get. The userspace driver could be built on top of VFIO 
for sure. So kernel memory were perfectly isolated in this case.

>
> Yes, maybe if device is not buggy it's all fine, but
> it's better if we do not have to trust the device
> otherwise the security picture becomes more murky.
>
> I suggested attaching a PASID to (some) queues - see my old post "using
> PASIDs to enable a safe variant of direct ring access".
>
> Then using IOMMU with VFIO to limit access through queue to corrent
> ranges of memory.

Well userspace driver could benefit from this too. And we can even go 
further by using nested IO page tables to share IOVA address space 
between devices and a VM.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [Patch net] llc: fix NULL pointer deref for SOCK_ZAPPED
From: Cong Wang @ 2018-04-20  4:54 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

For SOCK_ZAPPED socket, we don't need to care about llc->sap,
so we should just skip these refcount functions in this case.

Fixes: f7e43672683b ("llc: hold llc_sap before release_sock()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/llc/af_llc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 6d29b2b94e84..cb80ebb38311 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -189,7 +189,6 @@ static int llc_ui_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct llc_sock *llc;
-	struct llc_sap *sap;
 
 	if (unlikely(sk == NULL))
 		goto out;
@@ -200,15 +199,19 @@ static int llc_ui_release(struct socket *sock)
 		llc->laddr.lsap, llc->daddr.lsap);
 	if (!llc_send_disc(sk))
 		llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo);
-	sap = llc->sap;
-	/* Hold this for release_sock(), so that llc_backlog_rcv() could still
-	 * use it.
-	 */
-	llc_sap_hold(sap);
-	if (!sock_flag(sk, SOCK_ZAPPED))
+	if (!sock_flag(sk, SOCK_ZAPPED)) {
+		struct llc_sap *sap = llc->sap;
+
+		/* Hold this for release_sock(), so that llc_backlog_rcv()
+		 * could still use it.
+		 */
+		llc_sap_hold(sap);
 		llc_sap_remove_socket(llc->sap, sk);
-	release_sock(sk);
-	llc_sap_put(sap);
+		release_sock(sk);
+		llc_sap_put(sap);
+	} else {
+		release_sock(sk);
+	}
 	if (llc->dev)
 		dev_put(llc->dev);
 	sock_put(sk);
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH 1/1] IB/rxe: avoid double kfree_skb
From: Yanjun Zhu @ 2018-04-20  5:46 UTC (permalink / raw)
  To: Doug Ledford, monis, jgg, linux-rdma; +Cc: netdev
In-Reply-To: <1524190794.11756.18.camel@redhat.com>



On 2018/4/20 10:19, Doug Ledford wrote:
> On Thu, 2018-04-19 at 10:01 -0400, Zhu Yanjun wrote:
>> When skb is dropped by iptables rules, the skb is freed at the same time
>> -EPERM is returned. So in softroce, it is not necessary to free skb again.
>> Or else, crash will occur.
>>
>> The steps to reproduce:
>>
>>       server                       client
>>      ---------                    ---------
>>      |1.1.1.1|<----rxe-channel--->|1.1.1.2|
>>      ---------                    ---------
>>
>> On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
>> On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512
>>
>> The kernel configs CONFIG_DEBUG_KMEMLEAK and
>> CONFIG_DEBUG_OBJECTS are enabled on both server and client.
>>
>> When rping runs, run the following command in server:
>>
>> iptables -I OUTPUT -p udp  --dport 4791 -j DROP
>>
>> Without this patch, crash will occur.
>>
>> CC: Srinivas Eeda <srinivas.eeda@oracle.com>
>> CC: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> I have no reason to doubt your analysis, but if there are a bunch of
> error paths for net_xmit and they all return with your skb still being
> valid and holding a reference, and then one oddball that returns with
> your skb already gone, that just sounds like a mistake waiting to happen
> (not to mention a bajillion special cases sprinkled everywhere to deal
> with this apparent inconsistency).
>
> Can we get a netdev@ confirmation on this being the right solution?
Yes. I agree with you.
After iptables rule "iptables -I OUTPUT -p udp  --dport 4791 -j DROP", 
the skb is freed in this function

/* Returns 1 if okfn() needs to be executed by the caller,
  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
                  const struct nf_hook_entries *e, unsigned int s)
{
         unsigned int verdict;
         int ret;

         for (; s < e->num_hook_entries; s++) {
                 verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
                 switch (verdict & NF_VERDICT_MASK) {
                 case NF_ACCEPT:
                         break;
                 case NF_DROP:
kfree_skb(skb);                               <----here, skb is freed
                         ret = NF_DROP_GETERR(verdict);
                         if (ret == 0)
                                 ret = -EPERM;
                         return ret;
                 case NF_QUEUE:
                         ret = nf_queue(skb, state, e, s, verdict);
                         if (ret == 1)
                                 continue;
                         return ret;
                 default:
                         /* Implicit handling for NF_STOLEN, as well as 
any other
                          * non conventional verdicts.
                          */
                         return 0;
                 }
         }

         return 1;
}
EXPORT_SYMBOL(nf_hook_slow);

If I am wrong, please correct me.

And my test environment is still there, any solution can be verified in it.

Zhu Yanjun
>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_net.c  | 3 +++
>>   drivers/infiniband/sw/rxe/rxe_req.c  | 5 +++--
>>   drivers/infiniband/sw/rxe/rxe_resp.c | 9 ++++++---
>>   3 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 9da6e37..2094434 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -511,6 +511,9 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
>>   
>>          if (unlikely(net_xmit_eval(err))) {
>>                  pr_debug("error sending packet: %d\n", err);
>> +               /* -EPERM means the skb is dropped and freed. */
>> +               if (err == -EPERM)
>> +                       return -EPERM;
>>                  return -EAGAIN;
>>          }
>>   
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index 7bdaf71..9d2efec 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -727,8 +727,9 @@ int rxe_requester(void *arg)
>>   
>>                  rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
>>   
>> -               if (ret == -EAGAIN) {
>> -                       kfree_skb(skb);
>> +               if ((ret == -EAGAIN) || (ret == -EPERM)) {
>> +                       if (ret == -EAGAIN)
>> +                               kfree_skb(skb);
>>                          rxe_run_task(&qp->req.task, 1);
>>                          goto exit;
>>                  }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index a65c996..6bdf9b2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -742,7 +742,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>          err = rxe_xmit_packet(rxe, qp, &ack_pkt, skb);
>>          if (err) {
>>                  pr_err("Failed sending RDMA reply.\n");
>> -               kfree_skb(skb);
>> +               if (err != -EPERM)
>> +                       kfree_skb(skb);
>>                  return RESPST_ERR_RNR;
>>          }
>>   
>> @@ -956,7 +957,8 @@ static int send_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>>          err = rxe_xmit_packet(rxe, qp, &ack_pkt, skb);
>>          if (err) {
>>                  pr_err_ratelimited("Failed sending ack\n");
>> -               kfree_skb(skb);
>> +               if (err != -EPERM)
>> +                       kfree_skb(skb);
>>          }
>>   
>>   err1:
>> @@ -1141,7 +1143,8 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
>>                          if (rc) {
>>                                  pr_err("Failed resending result. This flow is not handled - skb ignored\n");
>>                                  rxe_drop_ref(qp);
>> -                               kfree_skb(skb_copy);
>> +                               if (rc != -EPERM)
>> +                                       kfree_skb(skb_copy);
>>                                  rc = RESPST_CLEANUP;
>>                                  goto out;
>>                          }
>> -- 
>> 2.7.4
>>

^ permalink raw reply

* [PATCH v1 net-next] lan78xx: Add support to dump lan78xx registers
From: Raghuram Chary J @ 2018-04-20  6:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, unglinuxdriver, woojung.huh, raghuramchary.jallipalli

In order to dump lan78xx family registers using ethtool, add
support at lan78xx driver level.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
---
v0->v1:
   * Remove one variable in the for loop.
---
 drivers/net/usb/lan78xx.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 0867f7275852..207a3e18c08f 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -278,6 +278,30 @@ struct lan78xx_statstage64 {
 	u64 eee_tx_lpi_time;
 };
 
+static u32 lan78xx_regs[] = {
+	ID_REV,
+	INT_STS,
+	HW_CFG,
+	PMT_CTL,
+	E2P_CMD,
+	E2P_DATA,
+	USB_STATUS,
+	VLAN_TYPE,
+	MAC_CR,
+	MAC_RX,
+	MAC_TX,
+	FLOW,
+	ERR_STS,
+	MII_ACC,
+	MII_DATA,
+	EEE_TX_LPI_REQ_DLY,
+	EEE_TW_TX_SYS,
+	EEE_TX_LPI_REM_DLY,
+	WUCSR
+};
+
+#define PHY_REG_SIZE (32 * sizeof(u32))
+
 struct lan78xx_net;
 
 struct lan78xx_priv {
@@ -1605,6 +1629,34 @@ static int lan78xx_set_pause(struct net_device *net,
 	return ret;
 }
 
+static int lan78xx_get_regs_len(struct net_device *netdev)
+{
+	if (!netdev->phydev)
+		return (sizeof(lan78xx_regs));
+	else
+		return (sizeof(lan78xx_regs) + PHY_REG_SIZE);
+}
+
+static void
+lan78xx_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
+		 void *buf)
+{
+	u32 *data = buf;
+	int i, j;
+	struct lan78xx_net *dev = netdev_priv(netdev);
+
+	/* Read Device/MAC registers */
+	for (i = 0; i < (sizeof(lan78xx_regs) / sizeof(u32)); i++)
+		lan78xx_read_reg(dev, lan78xx_regs[i], &data[i]);
+
+	if (!netdev->phydev)
+		return;
+
+	/* Read PHY registers */
+	for (j = 0; j < 32; i++, j++)
+		data[i] = phy_read(netdev->phydev, j);
+}
+
 static const struct ethtool_ops lan78xx_ethtool_ops = {
 	.get_link	= lan78xx_get_link,
 	.nway_reset	= phy_ethtool_nway_reset,
@@ -1625,6 +1677,8 @@ static const struct ethtool_ops lan78xx_ethtool_ops = {
 	.set_pauseparam	= lan78xx_set_pause,
 	.get_link_ksettings = lan78xx_get_link_ksettings,
 	.set_link_ksettings = lan78xx_set_link_ksettings,
+	.get_regs_len	= lan78xx_get_regs_len,
+	.get_regs	= lan78xx_get_regs,
 };
 
 static int lan78xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
-- 
2.16.2

^ permalink raw reply related

* Re: [PATCH] net: phy: TLK10X initial driver submission
From: Måns Andersson @ 2018-04-20  6:58 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rob Herring, Mark Rutland, Florian Fainelli, netdev, devicetree,
	linux-kernel
In-Reply-To: <20180419120902.GB17888@lunn.ch>

On Thu, Apr 19, 2018 at 02:09:02PM +0200, Andrew Lunn wrote:
> On Thu, Apr 19, 2018 at 10:28:16AM +0200, Måns Andersson wrote:
> > From: Mans Andersson <mans.andersson@nibe.se>
> > 
> > Add suport for the TI TLK105 and TLK106 10/100Mbit ethernet phys.
> > 
> > In addition the TLK10X needs to be removed from DP83848 driver as the
> > power back off support is added here for this device.
> > 
> > Datasheet:
> > http://www.ti.com/lit/gpn/tlk106
> > ---
> >  .../devicetree/bindings/net/ti,tlk10x.txt          |  27 +++
> >  drivers/net/phy/Kconfig                            |   5 +
> >  drivers/net/phy/Makefile                           |   1 +
> >  drivers/net/phy/dp83848.c                          |   3 -
> >  drivers/net/phy/tlk10x.c                           | 209 +++++++++++++++++++++
> >  5 files changed, 242 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/net/ti,tlk10x.txt
> >  create mode 100644 drivers/net/phy/tlk10x.c
> > 
> > diff --git a/Documentation/devicetree/bindings/net/ti,tlk10x.txt b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
> > new file mode 100644
> > index 0000000..371d0d7
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
> > @@ -0,0 +1,27 @@
> > +* Texas Instruments - TLK105 / TLK106 ethernet PHYs
> > +
> > +Required properties:
> > +	- reg - The ID number for the phy, usually a small integer
> > +
> > +Optional properties:
> > +	- ti,power-back-off - Power Back Off Level
> > +		Please refer to data sheet chapter 8.6 and TI Application
> > +		Note SLLA3228
> > +		0 - Normal Operation
> > +		1 - Level 1 (up to 140m cable between TLK link partners)
> > +		2 - Level 2 (up to 100m cable between TLK link partners)
> > +		3 - Level 3 (up to 80m cable between TLK link partners)
> 
> Hi Måns
> 
> Device tree is all about board properties. In most cases, power back
> off is not a board properties, since it depends on the cable length
> and the peer board. If however, your board has two PHYs back to back,
> say to connect to an Ethernet switch, that would be a valid board
> property.
> 
> How are you using this?
> 
> I know of others who would like such a configuration. Marvell PHYs can
> do something similar. I've always suggested adding a PHY tunable. Pass
> the cable length in meters and let the PHY driver pick the nearest it
> can do, rounding up. The Marvell PHYs also support measuring the cable
> length as part of the cable diagnostics. So it would be good to
> reserve a configuration value to mean 'auto' - measure the cable and
> then pick the best power back off. Quickly scanning the data sheet, i
> see that this PHY also has the ability to measure the cable length.
>

Hi Andrew,

Thanks for your comments, highly appreciated!

I'm using this to lock down the PHY to the IEEE 802.3 100m standard cable
length, as opposed to the extended 150m which the PHY supports in its
default operation (see pg. 2 of the data sheet). The reason why I need this
is that the board has too high EMC emissions when running with the default
operation. For me the setting is therefore used as a board property, i.e.
it's not something that will be changed during operation.
 
> > +static int tlk10x_read(struct phy_device *phydev, int reg)
> > +{
> > +	if (reg & ~0x1f) {
> > +		/* Extended register */
> > +		phy_write(phydev, TLK10X_REGCR, 0x001F);
> > +		phy_write(phydev, TLK10X_ADDAR, reg);
> > +		phy_write(phydev, TLK10X_REGCR, 0x401F);
> > +		reg = TLK10X_ADDAR;
> > +	}
> > +
> > +	return phy_read(phydev, reg);
> > +}
> > +
> > +static int tlk10x_write(struct phy_device *phydev, int reg, int val)
> > +{
> > +	if (reg & ~0x1f) {
> > +		/* Extended register */
> > +		phy_write(phydev, TLK10X_REGCR, 0x001F);
> > +		phy_write(phydev, TLK10X_ADDAR, reg);
> > +		phy_write(phydev, TLK10X_REGCR, 0x401F);
> > +		reg = TLK10X_ADDAR;
> > +	}
> > +
> > +	return phy_write(phydev, reg, val);
> > +}
> 
> This looks to be phy_read_mmd() and phy_write_mmd(). If so, please use
> them, they get the locking correct.
>

Yes, that's correct, will fix that.

> 
> > +#ifdef CONFIG_OF_MDIO
> > +static int tlk10x_of_init(struct phy_device *phydev)
> > +{
> > +	struct tlk10x_private *tlk10x = phydev->priv;
> > +	struct device *dev = &phydev->mdio.dev;
> > +	struct device_node *of_node = dev->of_node;
> > +	int ret;
> > +
> > +	if (!of_node)
> > +		return 0;
> > +
> > +	ret = of_property_read_u32(of_node, "ti,power-back-off",
> > +				   &tlk10x->pwrbo_level);
> > +	if (ret) {
> > +		dev_err(dev, "missing ti,power-back-off property");
> > +		tlk10x->pwrbo_level = 0;
> > +	}
> 
> If we decide to accept this, you should do range checking, and return
> -EINVAL if the value is out of range.

Ok, will fix that.

> 
> > +static int tlk10x_config_init(struct phy_device *phydev)
> > +{
> > +	int ret, reg;
> > +	struct tlk10x_private *tlk10x;
> > +
> > +	ret = genphy_config_init(phydev);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (!phydev->priv) {
> > +		tlk10x = devm_kzalloc(&phydev->mdio.dev, sizeof(*tlk10x),
> > +				      GFP_KERNEL);
> > +		if (!tlk10x)
> > +			return -ENOMEM;
> > +
> > +		phydev->priv = tlk10x;
> > +		ret = tlk10x_of_init(phydev);
> > +		if (ret)
> > +			return ret;
> > +	} else {
> > +		tlk10x = (struct tlk10x_private *)phydev->priv;
> > +	}
> 
> This allocation should be done in .probe

Ok, will fix that.

> 
> > +
> > +	// Power back off
> > +	if (tlk10x->pwrbo_level < 0 || tlk10x->pwrbo_level > 3)
> > +		tlk10x->pwrbo_level = 0;
> > +	reg = tlk10x_read(phydev, TLK10X_PWRBOCR);
> > +	reg = ((reg & ~TLK10X_PWRBOCR_MASK)
> > +		| (tlk10x->pwrbo_level << 6));
> > +	ret = tlk10x_write(phydev, TLK10X_PWRBOCR, reg);
> > +	if (ret < 0) {
> > +		dev_err(&phydev->mdio.dev,
> > +			"unable to set power back-off (err=%d)\n", ret);
> > +		return ret;
> > +	}
> > +	dev_info(&phydev->mdio.dev, "power back-off set to level %d\n",
> > +		 tlk10x->pwrbo_level);
> > +
> > +	return 0;
> > +}
> 
>   Andrew

// Måns

^ permalink raw reply

* socketoption IPV6_MULTICAST_LOOP Bug?
From: Andre Naujoks @ 2018-04-20  7:02 UTC (permalink / raw)
  To: netdev

Hi all.

It seems the documentation (i.e. 'man ipv6') and the actual
behavior of the kernel diverge somehow in regard to what
IPV6_MULTICAST_LOOP does.

The manpage says:
---
IPV6_MULTICAST_LOOP
       Control whether the socket sees multicast packets that it has send itself.  Argument is a pointer to boolean.
---

However, the actual behavior on my system is, that the whole system
no longer receives multicast packets sent from that socket. As opposed
to only the socket itself no longer seeing its own packets.

Is this a bug, or is the documentation wrong? I sincerely hope this is a bug,
since everything else does not make sense at all in a multiuser system.

Regards
Andre

^ permalink raw reply

* Re: [PATCH] net: phy: TLK10X initial driver submission
From: Måns Andersson @ 2018-04-20  7:07 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Rob Herring, Mark Rutland, Andrew Lunn, Florian Fainelli,
	Network Development, devicetree, linux-kernel
In-Reply-To: <CANiq72=oPqsCkQ_aaO2cQgMxa_VLzL1yPC7APri6FL5jpxU15g@mail.gmail.com>

On Thu, Apr 19, 2018 at 02:24:27PM +0200, Miguel Ojeda wrote:
> On Thu, Apr 19, 2018 at 10:28 AM, Måns Andersson <mans.andersson@nibe.se> wrote:
> > From: Mans Andersson <mans.andersson@nibe.se>
> >
> > Add suport for the TI TLK105 and TLK106 10/100Mbit ethernet phys.
> >
> 
> Hi Mans,
> 
> Some quick notes.
> 
> > In addition the TLK10X needs to be removed from DP83848 driver as the
> > power back off support is added here for this device.
> >
> > Datasheet:
> > http://www.ti.com/lit/gpn/tlk106
> 
> Missing signature.

Hi Miguel,

My bad, will fix.

> 
> > ---
> >  .../devicetree/bindings/net/ti,tlk10x.txt          |  27 +++
> >  drivers/net/phy/Kconfig                            |   5 +
> >  drivers/net/phy/Makefile                           |   1 +
> >  drivers/net/phy/dp83848.c                          |   3 -
> >  drivers/net/phy/tlk10x.c                           | 209 +++++++++++++++++++++
> >  5 files changed, 242 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/net/ti,tlk10x.txt
> >  create mode 100644 drivers/net/phy/tlk10x.c
> >
> > diff --git a/Documentation/devicetree/bindings/net/ti,tlk10x.txt b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
> > new file mode 100644
> > index 0000000..371d0d7
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/ti,tlk10x.txt
> > @@ -0,0 +1,27 @@
> > +* Texas Instruments - TLK105 / TLK106 ethernet PHYs
> > +
> > +Required properties:
> > +       - reg - The ID number for the phy, usually a small integer
> > +
> > +Optional properties:
> > +       - ti,power-back-off - Power Back Off Level
> > +               Please refer to data sheet chapter 8.6 and TI Application
> > +               Note SLLA3228
> > +               0 - Normal Operation
> > +               1 - Level 1 (up to 140m cable between TLK link partners)
> > +               2 - Level 2 (up to 100m cable between TLK link partners)
> > +               3 - Level 3 (up to 80m cable between TLK link partners)
> > +
> > +Default child nodes are standard Ethernet PHY device
> > +nodes as described in Documentation/devicetree/bindings/net/phy.txt
> > +
> > +Example:
> > +
> > +       ethernet-phy@0 {
> > +               reg = <0>;
> > +               ti,power-back-off = <2>;
> > +       };
> > +
> > +Datasheets and documentation can be found at:
> > +http://www.ti.com/lit/gpn/tlk106
> > +http://www.ti.com/lit/an/slla328/slla328.pdf
> > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> > index bdfbabb..c980240 100644
> > --- a/drivers/net/phy/Kconfig
> > +++ b/drivers/net/phy/Kconfig
> > @@ -295,6 +295,11 @@ config DP83867_PHY
> >         ---help---
> >           Currently supports the DP83867 PHY.
> >
> > +config TLK10X_PHY
> > +       tristate "Texas Instruments TLK10x PHY"
> > +       ---help---
> > +         Supports the TLK105 and TLK106 PHYs.
> > +
> >  config FIXED_PHY
> >         tristate "MDIO Bus/PHY emulation with fixed speed/link PHYs"
> >         depends on PHYLIB
> > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> > index 01acbcb..37e4e02 100644
> > --- a/drivers/net/phy/Makefile
> > +++ b/drivers/net/phy/Makefile
> > @@ -79,5 +79,6 @@ obj-$(CONFIG_ROCKCHIP_PHY)    += rockchip.o
> >  obj-$(CONFIG_SMSC_PHY)         += smsc.o
> >  obj-$(CONFIG_STE10XP)          += ste10Xp.o
> >  obj-$(CONFIG_TERANETICS_PHY)   += teranetics.o
> > +obj-$(CONFIG_TLK10X_PHY)       += tlk10x.o
> >  obj-$(CONFIG_VITESSE_PHY)      += vitesse.o
> >  obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
> > diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
> > index cd09c3a..435f401 100644
> > --- a/drivers/net/phy/dp83848.c
> > +++ b/drivers/net/phy/dp83848.c
> > @@ -19,7 +19,6 @@
> >  #define TI_DP83848C_PHY_ID             0x20005ca0
> >  #define TI_DP83620_PHY_ID              0x20005ce0
> >  #define NS_DP83848C_PHY_ID             0x20005c90
> > -#define TLK10X_PHY_ID                  0x2000a210
> >
> >  /* Registers */
> >  #define DP83848_MICR                   0x11 /* MII Interrupt Control Register */
> > @@ -78,7 +77,6 @@ static struct mdio_device_id __maybe_unused dp83848_tbl[] = {
> >         { TI_DP83848C_PHY_ID, 0xfffffff0 },
> >         { NS_DP83848C_PHY_ID, 0xfffffff0 },
> >         { TI_DP83620_PHY_ID, 0xfffffff0 },
> > -       { TLK10X_PHY_ID, 0xfffffff0 },
> >         { }
> >  };
> >  MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
> > @@ -105,7 +103,6 @@ static struct phy_driver dp83848_driver[] = {
> >         DP83848_PHY_DRIVER(TI_DP83848C_PHY_ID, "TI DP83848C 10/100 Mbps PHY"),
> >         DP83848_PHY_DRIVER(NS_DP83848C_PHY_ID, "NS DP83848C 10/100 Mbps PHY"),
> >         DP83848_PHY_DRIVER(TI_DP83620_PHY_ID, "TI DP83620 10/100 Mbps PHY"),
> > -       DP83848_PHY_DRIVER(TLK10X_PHY_ID, "TI TLK10X 10/100 Mbps PHY"),
> >  };
> >  module_phy_driver(dp83848_driver);
> >
> > diff --git a/drivers/net/phy/tlk10x.c b/drivers/net/phy/tlk10x.c
> > new file mode 100644
> > index 0000000..1efc81e
> > --- /dev/null
> > +++ b/drivers/net/phy/tlk10x.c
> > @@ -0,0 +1,209 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/**
> > + * Driver for the Texas Instruments TLK105 / TLK106
> > + *
> > + * Copyright (C) 2018 NIBE Industrier AB - http://www.nibe.com
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> 
> Since you are using the SPDX id, please remove the license text (which
> is actually wrong: it seems you have cut the v2+ version and then
> removed the last sentence of the first paragraph? :-).
>

Ok.
 
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/phy.h>
> > +#include <linux/of.h>
> > +
> > +#define TLK10X_PHY_ID                  0x2000a210
> > +
> > +/* Registers */
> > +#define TLK10X_MICR                    0x11 /* MII Interrupt Control Reg */
> > +#define TLK10X_MISR                    0x12 /* MII Interrupt Status Reg */
> > +#define TLK10X_REGCR                   0x0d /* Register Control Register */
> > +#define TLK10X_ADDAR                   0x0e /* Data Register */
> > +#define TLK10X_PWRBOCR                 0xae /* Power Backoff Register */
> > +
> > +/* MICR Register Fields */
> > +#define TLK10X_MICR_INT_OE             BIT(0) /* Interrupt Output Enable */
> > +#define TLK10X_MICR_INTEN              BIT(1) /* Interrupt Enable */
> > +
> > +/* MISR Register Fields */
> > +#define TLK10X_MISR_RHF_INT_EN         BIT(0) /* Receive Error Counter */
> > +#define TLK10X_MISR_FHF_INT_EN         BIT(1) /* False Carrier Counter */
> > +#define TLK10X_MISR_ANC_INT_EN         BIT(2) /* Auto-negotiation complete */
> > +#define TLK10X_MISR_DUP_INT_EN         BIT(3) /* Duplex Status */
> > +#define TLK10X_MISR_SPD_INT_EN         BIT(4) /* Speed status */
> > +#define TLK10X_MISR_LINK_INT_EN                BIT(5) /* Link status */
> > +#define TLK10X_MISR_ED_INT_EN          BIT(6) /* Energy detect */
> > +#define TLK10X_MISR_LQM_INT_EN         BIT(7) /* Link Quality Monitor */
> > +
> > +/* PWRBOCR Register Fields */
> > +#define TLK10X_PWRBOCR_MASK            0xe0 /* Power Backoff Mask */
> > +
> > +#define TLK10X_INT_EN_MASK             \
> > +       (TLK10X_MISR_ANC_INT_EN |       \
> > +        TLK10X_MISR_DUP_INT_EN |       \
> > +        TLK10X_MISR_SPD_INT_EN |       \
> > +        TLK10X_MISR_LINK_INT_EN)
> > +
> > +struct tlk10x_private {
> > +       int pwrbo_level;
> > +};
> > +
> > +static int tlk10x_read(struct phy_device *phydev, int reg)
> > +{
> > +       if (reg & ~0x1f) {
> 
> 0x1f or ~0x1f should probably have a #define name.
> 

That's true, will fix that.

> > +               /* Extended register */
> > +               phy_write(phydev, TLK10X_REGCR, 0x001F);
> > +               phy_write(phydev, TLK10X_ADDAR, reg);
> > +               phy_write(phydev, TLK10X_REGCR, 0x401F);
> > +               reg = TLK10X_ADDAR;
> > +       }
> > +
> > +       return phy_read(phydev, reg);
> > +}
> > +
> > +static int tlk10x_write(struct phy_device *phydev, int reg, int val)
> > +{
> > +       if (reg & ~0x1f) {
> 
> Ditto.
> 

Ok.

> > +               /* Extended register */
> > +               phy_write(phydev, TLK10X_REGCR, 0x001F);
> > +               phy_write(phydev, TLK10X_ADDAR, reg);
> > +               phy_write(phydev, TLK10X_REGCR, 0x401F);
> > +               reg = TLK10X_ADDAR;
> > +       }
> > +
> > +       return phy_write(phydev, reg, val);
> > +}
> > +
> > +#ifdef CONFIG_OF_MDIO
> 
> Maybe you want the #ifdef inside.
> 

What's the preferred way by the kernel maintainers? I've seen both solutions
used but figured this was the most common.

> > +static int tlk10x_of_init(struct phy_device *phydev)
> > +{
> > +       struct tlk10x_private *tlk10x = phydev->priv;
> > +       struct device *dev = &phydev->mdio.dev;
> > +       struct device_node *of_node = dev->of_node;
> > +       int ret;
> > +
> > +       if (!of_node)
> > +               return 0;
> > +
> > +       ret = of_property_read_u32(of_node, "ti,power-back-off",
> > +                                  &tlk10x->pwrbo_level);
> > +       if (ret) {
> > +               dev_err(dev, "missing ti,power-back-off property");
> > +               tlk10x->pwrbo_level = 0;
> > +       }
> > +
> > +       return 0;
> > +}
> > +#else
> > +static int tlk10x_of_init(struct phy_device *phydev)
> > +{
> > +       return 0;
> > +}
> > +#endif /* CONFIG_OF_MDIO */
> > +
> > +static int tlk10x_config_init(struct phy_device *phydev)
> > +{
> > +       int ret, reg;
> > +       struct tlk10x_private *tlk10x;
> > +
> > +       ret = genphy_config_init(phydev);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (!phydev->priv) {
> > +               tlk10x = devm_kzalloc(&phydev->mdio.dev, sizeof(*tlk10x),
> > +                                     GFP_KERNEL);
> > +               if (!tlk10x)
> > +                       return -ENOMEM;
> > +
> > +               phydev->priv = tlk10x;
> > +               ret = tlk10x_of_init(phydev);
> > +               if (ret)
> > +                       return ret;
> > +       } else {
> > +               tlk10x = (struct tlk10x_private *)phydev->priv;
> > +       }
> > +
> > +       // Power back off
> > +       if (tlk10x->pwrbo_level < 0 || tlk10x->pwrbo_level > 3)
> > +               tlk10x->pwrbo_level = 0;
> > +       reg = tlk10x_read(phydev, TLK10X_PWRBOCR);
> > +       reg = ((reg & ~TLK10X_PWRBOCR_MASK)
> > +               | (tlk10x->pwrbo_level << 6));
> 
> Maybe the 6 should have a name, or maybe a bigger macro for this would clarify.
> 

Will look into this.

> > +       ret = tlk10x_write(phydev, TLK10X_PWRBOCR, reg);
> > +       if (ret < 0) {
> > +               dev_err(&phydev->mdio.dev,
> > +                       "unable to set power back-off (err=%d)\n", ret);
> > +               return ret;
> > +       }
> > +       dev_info(&phydev->mdio.dev, "power back-off set to level %d\n",
> > +                tlk10x->pwrbo_level);
> > +
> > +       return 0;
> > +}
> > +
> > +static int tlk10x_ack_interrupt(struct phy_device *phydev)
> > +{
> > +       int err = tlk10x_read(phydev, TLK10X_MISR);
> 
> Following the style of the rest of the file, shouldn't this be:
> 
>     if (err < 0)
>         return err;
> 
>     return 0;
> 
> ?
> 

True, will fix that.

> > +
> > +       return err < 0 ? err : 0;
> > +}
> > +
> > +static int tlk10x_config_intr(struct phy_device *phydev)
> > +{
> > +       int control, ret;
> > +
> > +       control = tlk10x_read(phydev, TLK10X_MICR);
> > +       if (control < 0)
> > +               return control;
> > +
> > +       if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> > +               control |= TLK10X_MICR_INT_OE;
> > +               control |= TLK10X_MICR_INTEN;
> > +
> > +               ret = tlk10x_write(phydev, TLK10X_MISR, TLK10X_INT_EN_MASK);
> > +               if (ret < 0)
> > +                       return ret;
> > +       } else {
> > +               control &= ~TLK10X_MICR_INTEN;
> > +       }
> > +
> > +       return tlk10x_write(phydev, TLK10X_MICR, control);
> > +}
> > +
> > +static struct phy_driver tlk10x_driver[] = {
> > +       {
> > +               .phy_id         = TLK10X_PHY_ID,
> > +               .phy_id_mask    = 0xfffffff0,
> > +               .name           = "TI TLK10X 10/100 Mbps PHY",
> > +               .features       = PHY_BASIC_FEATURES,
> > +               .flags          = PHY_HAS_INTERRUPT,
> > +
> > +               .config_init    = tlk10x_config_init,
> > +               .soft_reset     = genphy_soft_reset,
> > +
> > +               /* IRQ related */
> > +               .ack_interrupt  = tlk10x_ack_interrupt,
> > +               .config_intr    = tlk10x_config_intr,
> > +
> > +               .suspend        = genphy_suspend,
> > +               .resume         = genphy_resume,
> > +       },
> > +};
> > +module_phy_driver(tlk10x_driver);
> > +
> > +static struct mdio_device_id __maybe_unused tlk10x_tbl[] = {
> > +       { TLK10X_PHY_ID, 0xfffffff0 },
> > +       { }
> > +};
> > +MODULE_DEVICE_TABLE(mdio, tlk10x_tbl);
> > +
> > +MODULE_DESCRIPTION("Texas Instruments TLK105 / TLK106 PHY driver");
> > +MODULE_AUTHOR("Mans Andersson <mans.andersson@nibe.se>");
> > +MODULE_LICENSE("GPL");
> 
> Should be "GPL v2".
> 
> Cheers,
> Miguel

Good catch, will update that as well. Thanks for all the input! I will
get back in a couple of days with an updated patch.
// Måns

^ permalink raw reply

* [PATCH bpf-next] libbpf: fixed build error for samples/bpf/
From: Björn Töpel @ 2018-04-20  8:05 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: Björn Töpel, Martin KaFai Lau

From: Björn Töpel <bjorn.topel@intel.com>

Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not
include stdbool.h, so GCC complained when building samples/bpf/.

In file included from /home/btopel/src/ext/linux/samples/bpf/libbpf.h:6:0,
                 from /home/btopel/src/ext/linux/samples/bpf/test_lru_dist.c:24:
/home/btopel/src/ext/linux/tools/lib/bpf/bpf.h:105:4: error: unknown type name ‘bool’; did you mean ‘_Bool’?
    bool do_log);
    ^~~~
    _Bool

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 tools/lib/bpf/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 01bda076310f..553b11ad52b3 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -24,6 +24,7 @@
 #define __BPF_BPF_H
 
 #include <linux/bpf.h>
+#include <stdbool.h>
 #include <stddef.h>
 
 struct bpf_create_map_attr {
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH bpf-next] libbpf: fixed build error for samples/bpf/
From: Daniel Borkmann @ 2018-04-20  8:13 UTC (permalink / raw)
  To: Björn Töpel, ast, netdev
  Cc: Björn Töpel, Martin KaFai Lau
In-Reply-To: <20180420080516.16683-1-bjorn.topel@gmail.com>

On 04/20/2018 10:05 AM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not
> include stdbool.h, so GCC complained when building samples/bpf/.
> 
> In file included from /home/btopel/src/ext/linux/samples/bpf/libbpf.h:6:0,
>                  from /home/btopel/src/ext/linux/samples/bpf/test_lru_dist.c:24:
> /home/btopel/src/ext/linux/tools/lib/bpf/bpf.h:105:4: error: unknown type name ‘bool’; did you mean ‘_Bool’?
>     bool do_log);
>     ^~~~
>     _Bool
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Applied to bpf-next, thanks Björn!

^ permalink raw reply

* [PATCH iproute2-next v2 1/2] man: ip link: document GRE tunnels
From: Sabrina Dubroca @ 2018-04-20  8:31 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Sabrina Dubroca

GRE tunnels are currently only documented together with IPIP and SIT
tunnels, but they actually have very different configuration
options. Let's separate them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 man/man8/ip-link.8.in | 152 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 148 insertions(+), 4 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 5dee9fcd627a..77ab8a3b9723 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -693,13 +693,13 @@ tunnel.
 .in -8
 
 .TP
-GRE, IPIP, SIT Type Support
-For a link of types
-.I GRE/IPIP/SIT
+IPIP, SIT Type Support
+For a link of type
+.IR IPIP or SIT
 the following additional arguments are supported:
 
 .BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " }"
+.BR type " { " ipip " | " sit " }"
 .BI " remote " ADDR " local " ADDR
 [
 .BR encap " { " fou " | " gue " | " none " }"
@@ -764,6 +764,150 @@ IPv6-Over-IPv4 is not supported for IPIP.
 - make this tunnel externally controlled
 .RB "(e.g. " "ip route encap" ).
 
+.in -8
+.TP
+GRE Type Support
+For a link of type
+.IR GRE " or " GRETAP
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " gre " | " gretap " }"
+.BI " remote " ADDR " local " ADDR
+[
+.RB [ i | o ] seq
+] [
+.RB [ i | o ] key
+.I KEY
+] [
+.RB [ i | o ] csum
+] [
+.BI ttl " TTL "
+] [
+.BI tos " TOS "
+] [
+.RB [ no ] pmtudisc
+] [
+.RB [ no ] ignore-df
+] [
+.BI dev " PHYS_DEV "
+] [
+.BR encap " { " fou " | " gue " | " none " }"
+] [
+.BR encap-sport " { " \fIPORT " | " auto " }"
+] [
+.BI "encap-dport " PORT
+] [
+.RB [ no ] encap-csum
+] [
+.RB [ no ] encap-remcsum
+] [
+.BR external
+]
+
+.in +8
+.sp
+.BI  remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI  local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.RB [ i | o ] seq
+- serialize packets.
+The
+.B oseq
+flag enables sequencing of outgoing packets.
+The
+.B iseq
+flag requires that all input packets are serialized.
+
+.sp
+.RB [ i | o ] key
+.I KEY
+- use keyed GRE with key
+.IR KEY ". "KEY
+is either a number or an IPv4 address-like dotted quad.
+The
+.B key
+parameter specifies the same key to use in both directions.
+The
+.BR ikey " and " okey
+parameters specify different keys for input and output.
+
+.sp
+.RB  [ i | o ] csum
+- generate/require checksums for tunneled packets.
+The
+.B ocsum
+flag calculates checksums for outgoing packets.
+The
+.B icsum
+flag requires that all input packets have the correct
+checksum. The
+.B csum
+flag is equivalent to the combination
+.B "icsum ocsum" .
+
+.sp
+.BI ttl " TTL"
+- specifies the TTL value to use in outgoing packets.
+
+.sp
+.BI tos " TOS"
+- specifies the TOS value to use in outgoing packets.
+
+.sp
+.RB [ no ] pmtudisc
+- enables/disables Path MTU Discovery on this tunnel.
+It is enabled by default. Note that a fixed ttl is incompatible
+with this option: tunneling with a fixed ttl always makes pmtu
+discovery.
+
+.sp
+.RB [ no ] ignore-df
+- enables/disables IPv4 DF suppression on this tunnel.
+Normally datagrams that exceed the MTU will be fragmented; the presence
+of the DF flag inhibits this, resulting instead in an ICMP Unreachable
+(Fragmentation Required) message.  Enabling this attribute casues the
+DF flag to be ignored.
+
+.sp
+.BI dev " PHYS_DEV"
+- specifies the physical device to use for tunnel endpoint communication.
+
+.sp
+.BR encap " { " fou " | " gue " | " none " }"
+- specifies type of secondary UDP encapsulation. "fou" indicates
+Foo-Over-UDP, "gue" indicates Generic UDP Encapsulation.
+
+.sp
+.BR encap-sport " { " \fIPORT " | " auto " }"
+- specifies the source port in UDP encapsulation.
+.IR PORT
+indicates the port by number, "auto"
+indicates that the port number should be chosen automatically
+(the kernel picks a flow based on the flow hash of the
+encapsulated packet).
+
+.sp
+.RB [ no ] encap-csum
+- specifies if UDP checksums are enabled in the secondary
+encapsulation.
+
+.sp
+.RB [ no ] encap-remcsum
+- specifies if Remote Checksum Offload is enabled. This is only
+applicable for Generic UDP Encapsulation.
+
+.sp
+.BR external
+- make this tunnel externally controlled
+.RB "(e.g. " "ip route encap" ).
+
 .in -8
 
 .TP
-- 
2.17.0

^ permalink raw reply related

* [PATCH iproute2-next v2 2/2] gre/gre6: allow clearing {,i,o}{key,seq,csum} flags
From: Sabrina Dubroca @ 2018-04-20  8:32 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Sabrina Dubroca
In-Reply-To: <0906394e032f7a11b92c11bf74d0a7b31ffd9c30.1524173201.git.sd@queasysnail.net>

Currently, iproute allows setting those flags, but it's impossible to
clear them, since their current value is fetched from the kernel and
then we OR in the additional flags passed on the command line.

Add no* variants to allow clearing them.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
v2: fixed up okey flag clearing
    also reset the actual value of the key, not just the flag

 ip/link_gre.c         | 30 +++++++++++++++++++++++++++---
 ip/link_gre6.c        | 30 +++++++++++++++++++++++++++---
 man/man8/ip-link.8.in | 27 ++++++++++++++++++---------
 3 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index bc1cee8fbca2..ede761b23a8c 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -31,9 +31,9 @@ static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE *f)
 	);
 	fprintf(f,
 		"                     [ local ADDR ]\n"
-		"                     [ [i|o]seq ]\n"
-		"                     [ [i|o]key KEY ]\n"
-		"                     [ [i|o]csum ]\n"
+		"                     [ [no][i|o]seq ]\n"
+		"                     [ [i|o]key KEY | no[i|o]key ]\n"
+		"                     [ [no][i|o]csum ]\n"
 		"                     [ ttl TTL ]\n"
 		"                     [ tos TOS ]\n"
 		"                     [ [no]pmtudisc ]\n"
@@ -210,28 +210,52 @@ get_failed:
 			iflags |= GRE_KEY;
 			oflags |= GRE_KEY;
 			ikey = okey = tnl_parse_key("key", *argv);
+		} else if (!matches(*argv, "nokey")) {
+			iflags &= ~GRE_KEY;
+			oflags &= ~GRE_KEY;
+			ikey = okey = 0;
 		} else if (!matches(*argv, "ikey")) {
 			NEXT_ARG();
 			iflags |= GRE_KEY;
 			ikey = tnl_parse_key("ikey", *argv);
+		} else if (!matches(*argv, "noikey")) {
+			iflags &= ~GRE_KEY;
+			ikey = 0;
 		} else if (!matches(*argv, "okey")) {
 			NEXT_ARG();
 			oflags |= GRE_KEY;
 			okey = tnl_parse_key("okey", *argv);
+		} else if (!matches(*argv, "nookey")) {
+			oflags &= ~GRE_KEY;
+			okey = 0;
 		} else if (!matches(*argv, "seq")) {
 			iflags |= GRE_SEQ;
 			oflags |= GRE_SEQ;
+		} else if (!matches(*argv, "noseq")) {
+			iflags &= ~GRE_SEQ;
+			oflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "iseq")) {
 			iflags |= GRE_SEQ;
+		} else if (!matches(*argv, "noiseq")) {
+			iflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "oseq")) {
 			oflags |= GRE_SEQ;
+		} else if (!matches(*argv, "nooseq")) {
+			oflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "csum")) {
 			iflags |= GRE_CSUM;
 			oflags |= GRE_CSUM;
+		} else if (!matches(*argv, "nocsum")) {
+			iflags &= ~GRE_CSUM;
+			oflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "icsum")) {
 			iflags |= GRE_CSUM;
+		} else if (!matches(*argv, "noicsum")) {
+			iflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "ocsum")) {
 			oflags |= GRE_CSUM;
+		} else if (!matches(*argv, "noocsum")) {
+			oflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "nopmtudisc")) {
 			pmtudisc = 0;
 		} else if (!matches(*argv, "pmtudisc")) {
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index a6fe0b73d235..181b2eae808b 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -38,9 +38,9 @@ static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE *f)
 	);
 	fprintf(f,
 		"                     [ local ADDR ]\n"
-		"                     [ [i|o]seq ]\n"
-		"                     [ [i|o]key KEY ]\n"
-		"                     [ [i|o]csum ]\n"
+		"                     [ [no][i|o]seq ]\n"
+		"                     [ [i|o]key KEY | no[i|o]key ]\n"
+		"                     [ [no][i|o]csum ]\n"
 		"                     [ hoplimit TTL ]\n"
 		"                     [ encaplimit ELIM ]\n"
 		"                     [ tclass TCLASS ]\n"
@@ -220,28 +220,52 @@ get_failed:
 			iflags |= GRE_KEY;
 			oflags |= GRE_KEY;
 			ikey = okey = tnl_parse_key("key", *argv);
+		} else if (!matches(*argv, "nokey")) {
+			iflags &= ~GRE_KEY;
+			oflags &= ~GRE_KEY;
+			ikey = okey = 0;
 		} else if (!matches(*argv, "ikey")) {
 			NEXT_ARG();
 			iflags |= GRE_KEY;
 			ikey = tnl_parse_key("ikey", *argv);
+		} else if (!matches(*argv, "noikey")) {
+			iflags &= ~GRE_KEY;
+			ikey = 0;
 		} else if (!matches(*argv, "okey")) {
 			NEXT_ARG();
 			oflags |= GRE_KEY;
 			okey = tnl_parse_key("okey", *argv);
+		} else if (!matches(*argv, "nookey")) {
+			oflags &= ~GRE_KEY;
+			okey = 0;
 		} else if (!matches(*argv, "seq")) {
 			iflags |= GRE_SEQ;
 			oflags |= GRE_SEQ;
+		} else if (!matches(*argv, "noseq")) {
+			iflags &= ~GRE_SEQ;
+			oflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "iseq")) {
 			iflags |= GRE_SEQ;
+		} else if (!matches(*argv, "noiseq")) {
+			iflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "oseq")) {
 			oflags |= GRE_SEQ;
+		} else if (!matches(*argv, "nooseq")) {
+			oflags &= ~GRE_SEQ;
 		} else if (!matches(*argv, "csum")) {
 			iflags |= GRE_CSUM;
 			oflags |= GRE_CSUM;
+		} else if (!matches(*argv, "nocsum")) {
+			iflags &= ~GRE_CSUM;
+			oflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "icsum")) {
 			iflags |= GRE_CSUM;
+		} else if (!matches(*argv, "noicsum")) {
+			iflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "ocsum")) {
 			oflags |= GRE_CSUM;
+		} else if (!matches(*argv, "noocsum")) {
+			oflags &= ~GRE_CSUM;
 		} else if (!matches(*argv, "remote")) {
 			NEXT_ARG();
 			get_addr(&daddr, *argv, AF_INET6);
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 77ab8a3b9723..83ef3cae54b9 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -775,12 +775,14 @@ the following additional arguments are supported:
 .BR type " { " gre " | " gretap " }"
 .BI " remote " ADDR " local " ADDR
 [
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
 ] [
 .RB [ i | o ] key
 .I KEY
+|
+.BR no [ i | o ] key
 ] [
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
 ] [
 .BI ttl " TTL "
 ] [
@@ -816,7 +818,7 @@ the following additional arguments are supported:
 It must be an address on another interface on this host.
 
 .sp
-.RB [ i | o ] seq
+.RB  [ no ] "" [ i | o ] seq
 - serialize packets.
 The
 .B oseq
@@ -828,6 +830,8 @@ flag requires that all input packets are serialized.
 .sp
 .RB [ i | o ] key
 .I KEY
+|
+.BR no [ i | o ] key
 - use keyed GRE with key
 .IR KEY ". "KEY
 is either a number or an IPv4 address-like dotted quad.
@@ -839,7 +843,7 @@ The
 parameters specify different keys for input and output.
 
 .sp
-.RB  [ i | o ] csum
+.RB  [ no ] "" [ i | o ] csum
 - generate/require checksums for tunneled packets.
 The
 .B ocsum
@@ -920,12 +924,14 @@ the following additional arguments are supported:
 .BR type " { " ip6gre " | " ip6gretap " }"
 .BI remote " ADDR " local " ADDR"
 [
-.RB [ i | o ] seq
+.RB [ no ] "" [ i | o ] seq
 ] [
 .RB [ i | o ] key
 .I KEY
+|
+.BR no [ i | o ] key
 ] [
-.RB [ i | o ] csum
+.RB [ no ] "" [ i | o ] csum
 ] [
 .BI hoplimit " TTL "
 ] [
@@ -955,7 +961,7 @@ the following additional arguments are supported:
 It must be an address on another interface on this host.
 
 .sp
-.RB  [ i | o ] seq
+.RB  [ no ] "" [ i | o ] seq
 - serialize packets.
 The
 .B oseq
@@ -965,7 +971,10 @@ The
 flag requires that all input packets are serialized.
 
 .sp
-.RB  [ i | o ] key " \fIKEY"
+.RB [ i | o ] key
+.I KEY
+|
+.BR no [ i | o ] key
 - use keyed GRE with key
 .IR KEY ". "KEY
 is either a number or an IPv4 address-like dotted quad.
@@ -977,7 +986,7 @@ The
 parameters specify different keys for input and output.
 
 .sp
-.RB  [ i | o ] csum
+.RB  [ no ] "" [ i | o ] csum
 - generate/require checksums for tunneled packets.
 The
 .B ocsum
-- 
2.17.0

^ permalink raw reply related

* Q: force netif ON even when there is no real link ?
From: Ran Shalit @ 2018-04-20  8:44 UTC (permalink / raw)
  To: netdev

Hello,

We configure external switch in u-boot.
The configuration is through mdio (cpu is mac and switch is phy).

But in Linux we rather not implement any communication in mdio to
switch, but it means that we then don't have the information of link
state.

Is it possible to force in Linux (by default in startup) Ethernet
connectivity (netif_carrier_on, netif_wake_queue) even if there is no
information of real link state ?

Thank you,
ranran

^ permalink raw reply

* Re: [bpf-next PATCH 3/3] bpf: add sample program to trace map events
From: Jesper Dangaard Brouer @ 2018-04-20  9:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Sebastiano Miano, netdev, ast, daniel, mingo, rostedt,
	fulvio.risso, David S. Miller, brouer
In-Reply-To: <20180420002735.ytmnhzs73rwkwewm@ast-mbp>

On Thu, 19 Apr 2018 17:27:37 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Wed, Apr 18, 2018 at 05:30:59PM +0200, Sebastiano Miano wrote:
> > This patch adds a sample program, called trace_map_events,
> > that shows how to capture map events and filter them based on
> > the map id.  
> ...
> > +struct bpf_map_keyval_ctx {
> > +	u64 pad;		// First 8 bytes are not accessible by bpf code
> > +	u32 type;		// offset:8;	size:4;	signed:0;
> > +	u32 key_len;		// offset:12;	size:4;	signed:0;
> > +	u32 key;		// offset:16;	size:4;	signed:0;
> > +	bool key_trunc;		// offset:20;	size:1;	signed:0;
> > +	u32 val_len;		// offset:24;	size:4;	signed:0;
> > +	u32 val;		// offset:28;	size:4;	signed:0;
> > +	bool val_trunc;		// offset:32;	size:1;	signed:0;
> > +	int ufd;		// offset:36;	size:4;	signed:1;
> > +	u32 id;			// offset:40;	size:4;	signed:0;
> > +};
> > +
> > +SEC("tracepoint/bpf/bpf_map_lookup_elem")
> > +int trace_bpf_map_lookup_elem(struct bpf_map_keyval_ctx *ctx)
> > +{
> > +	struct map_event_data data;
> > +	int cpu = bpf_get_smp_processor_id();
> > +	bool *filter;
> > +	u32 key = 0, map_id = ctx->id;
> > +
> > +	filter = bpf_map_lookup_elem(&filter_events, &key);
> > +	if (!filter)
> > +		return 1;
> > +
> > +	if (!*filter)
> > +		goto send_event;
> > +
> > +	/*
> > +	 * If the map_id is not in the list of filtered
> > +	 * ids we immediately return
> > +	 */
> > +	if (!bpf_map_lookup_elem(&filtered_ids, &map_id))
> > +		return 0;
> > +
> > +send_event:
> > +	data.map_id = map_id;
> > +	data.evnt_type = MAP_LOOKUP;
> > +	data.map_type = ctx->type;
> > +
> > +	bpf_perf_event_output(ctx, &map_event_trace, cpu, &data, sizeof(data));
> > +	return 0;
> > +}  
> 
> looks like the purpose of the series is to create map notify mechanism
> so some external user space daemon can snoop all bpf map operations
> that all other processes and bpf programs are doing.
> I think it would be way better to create a proper mechanism for that
> with permissions.
> 
> tracepoints in the bpf core were introduced as introspection mechanism
> for debugging. Right now we have better ways to do introspection
> with ids, queries, etc that bpftool is using, so original purpose of
> those tracepoints is gone and they actually rot.
> Let's not repurpose them into this map notify logic.
> I don't want tracepoints in the bpf core to become a stable interface
> it will stiffen the development.

Well, I need it exactly for introspection and debugging, and just need
the missing ID (as it was introduced later).

Can we just drop the sample program then? I would likely not use the
sample program, because it is missing the PID+comm-name.  

For my use, I can simply use perf record to debug what programs are
changing the map ID:

  perf record -e bpf:bpf_map_* -a --filter 'id == 2' 

This should be a fairly common troubleshooting step.  I want to
figure-out if anybody else (another userspace program) is changing my
map. This can easily be caused by two userspace control programs
running at the same time. Sysadm/scripts made a mistake and started two
instances. Without the map ID, I cannot know what map perf is talking
about...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH RFC iptables] iptables: Per-net ns lock
From: Kirill Tkhai @ 2018-04-20 10:41 UTC (permalink / raw)
  To: netdev, pablo, rstoyanov1, fw, ptikhomirov, avagin
In-Reply-To: <152329277200.18428.18388703172485475447.stgit@localhost.localdomain>

Pablo, Florian, could you please provide comments on this?

On 09.04.2018 19:55, Kirill Tkhai wrote:
> In CRIU and LXC-restore we met the situation,
> when iptables in container can't be restored
> because of permission denied:
> 
> https://github.com/checkpoint-restore/criu/issues/469
> 
> Containers want to restore their own net ns,
> while they may have no their own mnt ns.
> This case they share host's /run/xtables.lock
> file, but they may not have permission to open
> it.
> 
> Patch makes /run/xtables.lock to be per-namespace,
> i.e., to refer to the caller task's net ns.
> 
> What you think?
> 
> Thanks,
> Kirill
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  iptables/xshared.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/iptables/xshared.c b/iptables/xshared.c
> index 06db72d4..b6dbe4e7 100644
> --- a/iptables/xshared.c
> +++ b/iptables/xshared.c
> @@ -254,7 +254,12 @@ static int xtables_lock(int wait, struct timeval *wait_interval)
>  	time_left.tv_sec = wait;
>  	time_left.tv_usec = 0;
>  
> -	fd = open(XT_LOCK_NAME, O_CREAT, 0600);
> +	if (symlink("/proc/self/ns/net", XT_LOCK_NAME) != 0 &&
> +	    errno != EEXIST) {
> +		fprintf(stderr, "Fatal: can't create lock file\n");
> +		return XT_LOCK_FAILED;
> +	}
> +	fd = open(XT_LOCK_NAME, O_RDONLY);
>  	if (fd < 0) {
>  		fprintf(stderr, "Fatal: can't open lock file %s: %s\n",
>  			XT_LOCK_NAME, strerror(errno));
> 

^ permalink raw reply

* [PATCH 0/3] Introduce Xen fault injection facility
From: Stanislav Kinsburskii @ 2018-04-20 10:47 UTC (permalink / raw)
  Cc: jakub.kicinski, hpa, mcroce, staskins, tglx, ggarcia, daniel, x86,
	mingo, xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, jgross, linux-block, wei.liu2,
	netdev, linux-kernel, davem, dwmw, roger.pau

This series adds a facility, which can be used to instrument Xen code with
fault injections.
It is based "Fault injection capabilities infrastructure" described here:
- Documentation/fault-injection/fault-injection.txt

First patch adds a generic facility to use anywhere in Xen.
When using it all the fault injection user land control directories (if
any) will appear here:
- /sys/kernel/debug/xen/fault_inject/

To distinguish with generic (or core) Xen fault injections, next two
patches add additional directories to the root path above for blkback and
netback drivers respectively:
- /sys/kernel/debug/xen/fault_inject/xen-blkback/
- /sys/kernel/debug/xen/fault_inject/xen-netback/

---

Stanislav Kinsburskii (3):
      xen: add generic fault injection facility
      xen netback: add fault injection facility
      xen blkback: add fault injection facility


 arch/x86/xen/Kconfig                   |    7 ++
 arch/x86/xen/Makefile                  |    1 
 arch/x86/xen/fault_inject.c            |  109 +++++++++++++++++++++++++++++
 drivers/block/Kconfig                  |    7 ++
 drivers/block/xen-blkback/Makefile     |    1 
 drivers/block/xen-blkback/blkback.c    |    9 ++
 drivers/block/xen-blkback/blkback_fi.c |  116 +++++++++++++++++++++++++++++++
 drivers/block/xen-blkback/blkback_fi.h |   37 ++++++++++
 drivers/block/xen-blkback/common.h     |    3 +
 drivers/block/xen-blkback/xenbus.c     |    5 +
 drivers/net/Kconfig                    |    8 ++
 drivers/net/xen-netback/Makefile       |    1 
 drivers/net/xen-netback/common.h       |    3 +
 drivers/net/xen-netback/netback.c      |    3 +
 drivers/net/xen-netback/netback_fi.c   |  119 ++++++++++++++++++++++++++++++++
 drivers/net/xen-netback/netback_fi.h   |   35 +++++++++
 drivers/net/xen-netback/xenbus.c       |    6 ++
 include/xen/fault_inject.h             |   45 ++++++++++++
 18 files changed, 515 insertions(+)
 create mode 100644 arch/x86/xen/fault_inject.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.h
 create mode 100644 drivers/net/xen-netback/netback_fi.c
 create mode 100644 drivers/net/xen-netback/netback_fi.h
 create mode 100644 include/xen/fault_inject.h
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply

* [PATCH 2/3] xen netback: add fault injection facility
From: Stanislav Kinsburskii @ 2018-04-20 10:47 UTC (permalink / raw)
  Cc: jakub.kicinski, hpa, mcroce, staskins, tglx, ggarcia, daniel, x86,
	mingo, xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, jgross, linux-block, wei.liu2,
	netdev, linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104603.17823.31095.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

This patch adds wrapper helpers around generic Xen fault inject facility.
The major reason is to keep all the module fault injection directories
in a dedicated subdirectory instead of Xen fault inject root.

IOW, when using these helpers, per-device and named by device name fault
injection control directories will appear under the following directory:
- /sys/kernel/debug/xen/fault_inject/xen-netback/
instead of:
- /sys/kernel/debug/xen/fault_inject/

Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Matteo Croce <mcroce@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Gerard Garcia <ggarcia@deic.uab.cat>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Juergen Gross <jgross@suse.com>
CC: Amir Levy <amir.jer.levy@intel.com>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: xen-devel@lists.xenproject.org
CC: Stanislav Kinsburskii <staskins@amazon.com>
CC: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/net/Kconfig                  |    8 ++
 drivers/net/xen-netback/Makefile     |    1 
 drivers/net/xen-netback/common.h     |    3 +
 drivers/net/xen-netback/netback.c    |    3 +
 drivers/net/xen-netback/netback_fi.c |  119 ++++++++++++++++++++++++++++++++++
 drivers/net/xen-netback/netback_fi.h |   35 ++++++++++
 drivers/net/xen-netback/xenbus.c     |    6 ++
 7 files changed, 175 insertions(+)
 create mode 100644 drivers/net/xen-netback/netback_fi.c
 create mode 100644 drivers/net/xen-netback/netback_fi.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8918466..5cc9acd 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
 	  compile this driver as a module, chose M here: the module
 	  will be called xen-netback.
 
+config XEN_NETDEV_BACKEND_FAULT_INJECTION
+	  bool "Xen net-device backend driver fault injection"
+	  depends on XEN_NETDEV_BACKEND
+	  depends on XEN_FAULT_INJECTION
+	  default n
+	  help
+	    Allow to inject errors to Xen backend network driver
+
 config VMXNET3
 	tristate "VMware VMXNET3 ethernet driver"
 	depends on PCI && INET
diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
index d49798a..28abcdc 100644
--- a/drivers/net/xen-netback/Makefile
+++ b/drivers/net/xen-netback/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
 
 xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o
+xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index a46a1e9..30d676d 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -286,6 +286,9 @@ struct xenvif {
 
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *xenvif_dbg_root;
+#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
+	void *fi_info;
+#endif
 #endif
 
 	struct xen_netif_ctrl_back_ring ctrl;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a27daa2..ecc416e 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -33,6 +33,7 @@
  */
 
 #include "common.h"
+#include "netback_fi.h"
 
 #include <linux/kthread.h>
 #include <linux/if_vlan.h>
@@ -1649,6 +1650,7 @@ static int __init netback_init(void)
 			PTR_ERR(xen_netback_dbg_root));
 #endif /* CONFIG_DEBUG_FS */
 
+	(void) xen_netbk_fi_init();
 	return 0;
 
 failed_init:
@@ -1659,6 +1661,7 @@ module_init(netback_init);
 
 static void __exit netback_fini(void)
 {
+	xen_netbk_fi_fini();
 #ifdef CONFIG_DEBUG_FS
 	if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
 		debugfs_remove_recursive(xen_netback_dbg_root);
diff --git a/drivers/net/xen-netback/netback_fi.c b/drivers/net/xen-netback/netback_fi.c
new file mode 100644
index 0000000..47541d0
--- /dev/null
+++ b/drivers/net/xen-netback/netback_fi.c
@@ -0,0 +1,119 @@
+/*
+ * Fault injection interface for Xen backend network driver
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#include <linux/debugfs.h>
+
+#include <xen/fault_inject.h>
+#include "netback_fi.h"
+
+static struct dentry *vif_fi_dir;
+
+static const char *xenvif_fi_names[] = {
+};
+
+struct xenvif_fi {
+	struct dentry *dir;
+	struct xen_fi *faults[XENVIF_FI_MAX];
+};
+
+int xen_netbk_fi_init(void)
+{
+	vif_fi_dir = xen_fi_dir_create("xen-netback");
+	if (!vif_fi_dir)
+		return -ENOMEM;
+	return 0;
+}
+
+void xen_netbk_fi_fini(void)
+{
+	debugfs_remove_recursive(vif_fi_dir);
+}
+
+void xenvif_fi_fini(struct xenvif *vif)
+{
+	struct xenvif_fi *vfi = vif->fi_info;
+	int fi;
+
+	if (!vif->fi_info)
+		return;
+
+	vif->fi_info = NULL;
+
+	for (fi = 0; fi < XENVIF_FI_MAX; fi++)
+		xen_fi_del(vfi->faults[fi]);
+	debugfs_remove_recursive(vfi->dir);
+	kfree(vfi);
+}
+
+int xenvif_fi_init(struct xenvif *vif)
+{
+	struct dentry *parent;
+	struct xenvif_fi *vfi;
+	int fi, err = -ENOMEM;
+
+	parent = vif_fi_dir;
+	if (!parent)
+		return -ENOMEM;
+
+	vfi = kmalloc(sizeof(*vfi), GFP_KERNEL);
+	if (!vfi)
+		return -ENOMEM;
+
+	vfi->dir = debugfs_create_dir(vif->dev->name, parent);
+	if (!vfi->dir)
+		goto err_dir;
+
+	for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
+		vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
+				xenvif_fi_names[fi]);
+		if (!vfi->faults[fi])
+			goto err_fault;
+	}
+
+	vif->fi_info = vfi;
+	return 0;
+
+err_fault:
+	for (; fi > 0; fi--)
+		xen_fi_del(vfi->faults[fi]);
+	debugfs_remove_recursive(vfi->dir);
+err_dir:
+	kfree(vfi);
+	return err;
+}
+
+bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
+{
+	struct xenvif_fi *vfi = vif->fi_info;
+
+	return xen_should_fail(vfi->faults[type]);
+}
diff --git a/drivers/net/xen-netback/netback_fi.h b/drivers/net/xen-netback/netback_fi.h
new file mode 100644
index 0000000..895c6a6
--- /dev/null
+++ b/drivers/net/xen-netback/netback_fi.h
@@ -0,0 +1,35 @@
+#ifndef _XEN_NETBACK_FI_H
+#define _XEN_NETBACK_FI_H
+
+struct xen_fi;
+
+typedef enum {
+	XENVIF_FI_MAX
+} xenvif_fi_t;
+
+#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
+
+int xen_netbk_fi_init(void);
+void xen_netbk_fi_fini(void);
+
+void xenvif_fi_fini(struct xenvif *vif);
+int xenvif_fi_init(struct xenvif *vif);
+
+bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type);
+
+#else
+
+static inline int xen_netbk_fi_init(void) { return 0; }
+static inline void xen_netbk_fi_fini(void) { }
+
+static inline void xenvif_fi_fini(struct xenvif *vif) { }
+static inline int xenvif_fi_init(struct xenvif *vif) { return 0; }
+
+static inline bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
+{
+	return false;
+}
+
+#endif /* CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION */
+
+#endif /* _XEN_NETBACK_FI_H */
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index e1aef25..c775ee0 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -21,6 +21,7 @@
 #include "common.h"
 #include <linux/vmalloc.h>
 #include <linux/rtnetlink.h>
+#include "netback_fi.h"
 
 struct backend_info {
 	struct xenbus_device *dev;
@@ -502,6 +503,7 @@ static void backend_disconnect(struct backend_info *be)
 #ifdef CONFIG_DEBUG_FS
 		xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
+		xenvif_fi_fini(vif);
 		xenvif_disconnect_data(vif);
 
 		/* At this point some of the handlers may still be active
@@ -1024,6 +1026,10 @@ static void connect(struct backend_info *be)
 		}
 	}
 
+	err = xenvif_fi_init(be->vif);
+	if (err)
+		goto err;
+
 #ifdef CONFIG_DEBUG_FS
 	xenvif_debugfs_addif(be->vif);
 #endif /* CONFIG_DEBUG_FS */

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply related

* [PATCH 1/3] xen: add generic fault injection facility
From: Stanislav Kinsburskii @ 2018-04-20 10:47 UTC (permalink / raw)
  Cc: jakub.kicinski, hpa, mcroce, staskins, tglx, ggarcia, daniel, x86,
	mingo, xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, jgross, linux-block, wei.liu2,
	netdev, linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104603.17823.31095.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

The overall idea of this patch is to add a generic fault injection facility
to Xen, which later can be used in various places by different Xen parts.

Core implementation ideas:

- The facility build is controlled by boolean config
  CONFIG_XEN_FAULT_INJECTION option ("N" by default).

- All fault injection logic is located in an optionally compiled separated
  file.

- Fault injection attribute and control directory creation and destruction
  are wrapped with helpers, producing and accepting a pointer to an opaque
  object thus making all the rest of code independent on fault injection
  engine.

When enabled Xen root fault injection directory appears:

- /sys/kernel/debug/xen/fault_inject/

The falicity provides the following helpers (exported to be accessible in
modules):

- xen_fi_add(name) - adds fault injection control directory "name" to Xen
  root fault injection directory

- xen_fi_dir_create(name) - allows to create a subdirectory "name" in Xen
  root fault injection directory.

- xen_fi_dir_add(dir, name) - adds fault injection control directory "name"
  to directory "dir"

- xen_should_fail(fi) - check whether fi hav to fail.

Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Juergen Gross <jgross@suse.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: xen-devel@lists.xenproject.org
CC: linux-kernel@vger.kernel.org
CC: Stanislav Kinsburskii <staskins@amazon.com>
CC: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/xen/Kconfig        |    7 +++
 arch/x86/xen/Makefile       |    1 
 arch/x86/xen/fault_inject.c |  109 +++++++++++++++++++++++++++++++++++++++++++
 include/xen/fault_inject.h  |   45 ++++++++++++++++++
 4 files changed, 162 insertions(+)
 create mode 100644 arch/x86/xen/fault_inject.c
 create mode 100644 include/xen/fault_inject.h

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c1f98f3..483fc16 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -77,3 +77,10 @@ config XEN_PVH
 	bool "Support for running as a PVH guest"
 	depends on XEN && XEN_PVHVM && ACPI
 	def_bool n
+
+config XEN_FAULT_INJECTION
+	bool "Enable Xen fault injection"
+	depends on FAULT_INJECTION_DEBUG_FS
+	default n
+	help
+	  Enable Xen fault injection facility
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index d83cb54..3158fe1 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)		+= vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)		+= efi.o
 obj-$(CONFIG_XEN_PVH)	 	+= xen-pvh.o
+obj-$(CONFIG_XEN_FAULT_INJECTION)	+= fault_inject.o
diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
new file mode 100644
index 0000000..ecf0f7c
--- /dev/null
+++ b/arch/x86/xen/fault_inject.c
@@ -0,0 +1,109 @@
+/*
+ * Fauit injection interface for Xen virtual block devices
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/fault-inject.h>
+
+#include <xen/fault_inject.h>
+
+#include "debugfs.h"
+
+static struct dentry *d_fi_debug;
+
+static DECLARE_FAULT_ATTR(template_attr);
+
+struct xen_fi {
+	struct dentry		*dir;
+	struct fault_attr	attr;
+};
+
+struct xen_fi *xen_fi_dir_add(struct dentry *parent, const char *name)
+{
+	struct xen_fi *fi;
+	struct dentry *dir;
+
+	fi = kzalloc(sizeof(*fi), GFP_KERNEL);
+	if (!fi)
+		return ERR_PTR(-ENOMEM);
+
+	memcpy(&fi->attr, &template_attr, sizeof(template_attr));
+
+	dir = fault_create_debugfs_attr(name, parent, &fi->attr);
+	if (IS_ERR(dir)) {
+		kfree(fi);
+		return (struct xen_fi *)dir;
+	}
+
+	fi->dir = dir;
+
+	return fi;
+}
+EXPORT_SYMBOL(xen_fi_dir_add);
+
+struct xen_fi *xen_fi_add(const char *name)
+{
+	return xen_fi_dir_add(d_fi_debug, name);
+}
+
+void xen_fi_del(struct xen_fi *fi)
+{
+	debugfs_remove_recursive(fi->dir);
+	kfree(fi);
+}
+EXPORT_SYMBOL(xen_fi_del);
+
+bool xen_should_fail(struct xen_fi *fi)
+{
+	return should_fail(&fi->attr, 0);
+}
+EXPORT_SYMBOL(xen_should_fail);
+
+struct dentry *xen_fi_dir_create(const char *name)
+{
+	return debugfs_create_dir(name, d_fi_debug);
+}
+EXPORT_SYMBOL(xen_fi_dir_create);
+
+static int __init xen_fi_init(void)
+{
+	struct dentry *d_xen;
+
+	d_xen = xen_init_debugfs();
+	if (!d_xen)
+		return -ENOMEM;
+
+	d_fi_debug = debugfs_create_dir("fault_inject", d_xen);
+	if (d_fi_debug == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+fs_initcall(xen_fi_init);
diff --git a/include/xen/fault_inject.h b/include/xen/fault_inject.h
new file mode 100644
index 0000000..e2bf14a
--- /dev/null
+++ b/include/xen/fault_inject.h
@@ -0,0 +1,45 @@
+#ifndef _XEN_FAULT_INJECT_H
+#define _XEN_FAULT_INJECT_H
+
+struct dentry;
+struct xen_fi;
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+
+struct dentry *xen_fi_dir_create(const char *name);
+
+struct xen_fi *xen_fi_dir_add(struct dentry *parent, const char *name);
+struct xen_fi *xen_fi_add(const char *name);
+void xen_fi_del(struct xen_fi *fi);
+
+bool xen_should_fail(struct xen_fi *fi);
+
+#else
+
+static inline struct dentry *xen_fi_dir_create(const char *name)
+{
+	return NULL;
+}
+
+static inline struct xen_fi *xen_fi_dir_add(struct dentry *parent, const char *name)
+{
+	return NULL;
+}
+
+static inline struct xen_fi *xen_fi_add(const char *name)
+{
+	return NULL;
+}
+
+static inline void xen_fi_del(struct xen_fi *fi) { }
+
+static inline bool xen_should_fail(struct xen_fi *fi)
+{
+	return false;
+}
+
+#endif /* CONFIG_FAULT_INJECTION_DEBUG_FS */
+
+#endif /* _XEN_FAULT_INJECT_H */
+
+

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply related

* [PATCH 3/3] xen blkback: add fault injection facility
From: Stanislav Kinsburskii @ 2018-04-20 10:47 UTC (permalink / raw)
  Cc: jakub.kicinski, hpa, mcroce, staskins, tglx, ggarcia, daniel, x86,
	mingo, xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, jgross, linux-block, wei.liu2,
	netdev, linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104603.17823.31095.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

This patch adds wrapper helpers around generic Xen fault inject
facility.
The major reason is to keep all the module fault injection directories
in a dedicated subdirectory instead of Xen fault inject root.

IOW, when using these helpers, per-device and named by device name
fault injection control directories will appear under the following
directory:
- /sys/kernel/debug/xen/fault_inject/xen-blkback/
instead of:
- /sys/kernel/debug/xen/fault_inject/

Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: linux-kernel@vger.kernel.org
CC: linux-block@vger.kernel.org
CC: xen-devel@lists.xenproject.org
CC: Stanislav Kinsburskii <staskins@amazon.com>
CC: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/block/Kconfig                  |    7 ++
 drivers/block/xen-blkback/Makefile     |    1 
 drivers/block/xen-blkback/blkback.c    |    9 ++
 drivers/block/xen-blkback/blkback_fi.c |  116 ++++++++++++++++++++++++++++++++
 drivers/block/xen-blkback/blkback_fi.h |   37 ++++++++++
 drivers/block/xen-blkback/common.h     |    3 +
 drivers/block/xen-blkback/xenbus.c     |    5 +
 7 files changed, 178 insertions(+)
 create mode 100644 drivers/block/xen-blkback/blkback_fi.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.h

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ad9b687..0ce05fe 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -436,6 +436,13 @@ config XEN_BLKDEV_BACKEND
 	  compile this driver as a module, chose M here: the module
 	  will be called xen-blkback.
 
+config XEN_BLKDEV_BACKEND_FAULT_INJECTION
+	  bool "Xen block-device backend driver fault injection"
+	  depends on XEN_BLKDEV_BACKEND
+	  depends on XEN_FAULT_INJECTION
+	  default n
+	  help
+	    Allow to inject errors to Xen backend block driver
 
 config VIRTIO_BLK
 	tristate "Virtio block driver"
diff --git a/drivers/block/xen-blkback/Makefile b/drivers/block/xen-blkback/Makefile
index e491c1b..035d134 100644
--- a/drivers/block/xen-blkback/Makefile
+++ b/drivers/block/xen-blkback/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_XEN_BLKDEV_BACKEND) := xen-blkback.o
 
 xen-blkback-y	:= blkback.o xenbus.o
+xen-blkback-$(CONFIG_XEN_BLKDEV_BACKEND_FAULT_INJECTION) += blkback_fi.o
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 987d665..2933bec 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -51,6 +51,7 @@
 #include <xen/balloon.h>
 #include <xen/grant_table.h>
 #include "common.h"
+#include "blkback_fi.h"
 
 /*
  * Maximum number of unused free pages to keep in the internal buffer.
@@ -1491,11 +1492,19 @@ static int __init xen_blkif_init(void)
 	if (rc)
 		goto failed_init;
 
+	(void) xen_blkbk_fi_init();
  failed_init:
 	return rc;
 }
 
 module_init(xen_blkif_init);
 
+static void __exit xen_blkif_fini(void)
+{
+	xen_blkbk_fi_fini();
+}
+
+module_exit(xen_blkif_fini);
+
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("xen-backend:vbd");
diff --git a/drivers/block/xen-blkback/blkback_fi.c b/drivers/block/xen-blkback/blkback_fi.c
new file mode 100644
index 0000000..b7abaea
--- /dev/null
+++ b/drivers/block/xen-blkback/blkback_fi.c
@@ -0,0 +1,116 @@
+/*
+ * Fault injection interface for Xen virtual block devices
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/debugfs.h>
+
+#include <xen/fault_inject.h>
+
+#include "blkback_fi.h"
+#include "common.h"
+
+static struct dentry *blkif_fi_dir;
+
+static const char *xen_blkif_fi_names[XENBLKIF_FI_MAX] = {
+};
+
+struct xen_blkif_fi {
+	struct dentry *dir;
+	struct xen_fi *faults[XENBLKIF_FI_MAX];
+};
+
+int xen_blkbk_fi_init(void)
+{
+	blkif_fi_dir = xen_fi_dir_create("xen-blkback");
+	if (!blkif_fi_dir)
+		return -ENOMEM;
+	return 0;
+}
+
+void xen_blkbk_fi_fini(void)
+{
+	debugfs_remove_recursive(blkif_fi_dir);
+}
+
+void xen_blkif_fi_fini(struct xen_blkif *blkif)
+{
+	struct xen_blkif_fi *bfi = blkif->fi_info;
+	int fi;
+
+	if (!blkif->fi_info)
+		return;
+
+	blkif->fi_info = NULL;
+
+	for (fi = 0; fi < XENBLKIF_FI_MAX; fi++)
+		xen_fi_del(bfi->faults[fi]);
+	debugfs_remove_recursive(bfi->dir);
+	kfree(bfi);
+}
+
+int xen_blkif_fi_init(struct xen_blkif *blkif)
+{
+	struct xen_blkif_fi *bfi;
+	int fi, err = -ENOMEM;
+
+	bfi = kmalloc(sizeof(*bfi), GFP_KERNEL);
+	if (!bfi)
+		return -ENOMEM;
+
+	bfi->dir = debugfs_create_dir(dev_name(&blkif->be->dev->dev),
+				      blkif_fi_dir);
+	if (!bfi->dir)
+		goto err_dir;
+
+	for (fi = 0; fi < XENBLKIF_FI_MAX; fi++) {
+		bfi->faults[fi] = xen_fi_dir_add(bfi->dir,
+						 xen_blkif_fi_names[fi]);
+		if (!bfi->faults[fi])
+			goto err_fault;
+	}
+
+	blkif->fi_info = bfi;
+	return 0;
+
+err_fault:
+	for (; fi > 0; fi--)
+		xen_fi_del(bfi->faults[fi]);
+	debugfs_remove_recursive(bfi->dir);
+err_dir:
+	kfree(bfi);
+	return err;
+}
+
+bool xenblkif_should_fail(struct xen_blkif *blkif, xen_blkbk_fi_t type)
+{
+	struct xen_blkif_fi *bfi = blkif->fi_info;
+
+	return xen_should_fail(bfi->faults[type]);
+}
diff --git a/drivers/block/xen-blkback/blkback_fi.h b/drivers/block/xen-blkback/blkback_fi.h
new file mode 100644
index 0000000..4e29850
--- /dev/null
+++ b/drivers/block/xen-blkback/blkback_fi.h
@@ -0,0 +1,37 @@
+#ifndef _XEN_BLKBACK_FI_H
+#define _XEN_BLKBACK_FI_H
+
+struct xen_fi;
+struct xen_blkif;
+
+typedef enum {
+	XENBLKIF_FI_MAX
+} xen_blkbk_fi_t;
+
+#ifdef CONFIG_XEN_BLKDEV_BACKEND_FAULT_INJECTION
+
+void xen_blkbk_fi_fini(void);
+int xen_blkbk_fi_init(void);
+
+void xen_blkif_fi_fini(struct xen_blkif *blkif);
+int xen_blkif_fi_init(struct xen_blkif *blkif);
+
+bool xenblkif_should_fail(struct xen_blkif *blkif, xen_blkbk_fi_t type);
+
+#else
+
+static inline void xen_blkbk_fi_fini(void) { }
+static inline int xen_blkbk_fi_init(void) { return 0; }
+
+static inline void xen_blkif_fi_fini(struct xen_blkif *blkif) { }
+static inline int xen_blkif_fi_init(struct xen_blkif *blkif) { return 0; }
+
+static inline bool xenblkif_should_fail(struct xen_blkif *blkif, xen_blkbk_fi_t type)
+{
+	return false;
+}
+
+#endif /* CONFIG_XEN_BLKDEV_BACKEND_FAULT_INJECTION */
+
+#endif /* _XEN_BLKBACK_FI_H */
+
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index ecb35fe..6d12d4d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -329,6 +329,9 @@ struct xen_blkif {
 	/* All rings for this device. */
 	struct xen_blkif_ring	*rings;
 	unsigned int		nr_rings;
+#ifdef CONFIG_XEN_BLKDEV_BACKEND_FAULT_INJECTION
+	void			*fi_info;
+#endif
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 21c1be1..9931fc8 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -22,6 +22,7 @@
 #include <xen/events.h>
 #include <xen/grant_table.h>
 #include "common.h"
+#include "blkback_fi.h"
 
 /* On the XenBus the max length of 'ring-ref%u'. */
 #define RINGREF_NAME_LEN (20)
@@ -246,6 +247,8 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 	unsigned int j, r;
 	bool busy = false;
 
+	xen_blkif_fi_fini(blkif);
+
 	for (r = 0; r < blkif->nr_rings; r++) {
 		struct xen_blkif_ring *ring = &blkif->rings[r];
 		unsigned int i = 0;
@@ -998,6 +1001,8 @@ static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
 		return err;
 	}
 
+	(void) xen_blkif_fi_init(blkif);
+
 	return 0;
 
 fail:

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply related

* Re: [PATCH RFC iptables] iptables: Per-net ns lock
From: Florian Westphal @ 2018-04-20 10:50 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: netdev, pablo, rstoyanov1, fw, ptikhomirov, avagin
In-Reply-To: <5708be4f-d8a1-fb94-67a7-3b447036be80@virtuozzo.com>

Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> Pablo, Florian, could you please provide comments on this?
> 
> On 09.04.2018 19:55, Kirill Tkhai wrote:
> > In CRIU and LXC-restore we met the situation,
> > when iptables in container can't be restored
> > because of permission denied:
> > 
> > https://github.com/checkpoint-restore/criu/issues/469
> > 
> > Containers want to restore their own net ns,
> > while they may have no their own mnt ns.
> > This case they share host's /run/xtables.lock
> > file, but they may not have permission to open
> > it.
> > 
> > Patch makes /run/xtables.lock to be per-namespace,
> > i.e., to refer to the caller task's net ns.

It looks ok to me but then again the entire userspace
lock thing is a ugly band aid :-/

^ permalink raw reply

* Re: [PATCH 1/3] xen: add generic fault injection facility
From: Juergen Gross @ 2018-04-20 10:59 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104726.17823.40147.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> The overall idea of this patch is to add a generic fault injection facility
> to Xen, which later can be used in various places by different Xen parts.
> 
> Core implementation ideas:
> 
> - The facility build is controlled by boolean config
>   CONFIG_XEN_FAULT_INJECTION option ("N" by default).
> 
> - All fault injection logic is located in an optionally compiled separated
>   file.
> 
> - Fault injection attribute and control directory creation and destruction
>   are wrapped with helpers, producing and accepting a pointer to an opaque
>   object thus making all the rest of code independent on fault injection
>   engine.
> 
> When enabled Xen root fault injection directory appears:
> 
> - /sys/kernel/debug/xen/fault_inject/
> 
> The falicity provides the following helpers (exported to be accessible in
> modules):
> 
> - xen_fi_add(name) - adds fault injection control directory "name" to Xen
>   root fault injection directory
> 
> - xen_fi_dir_create(name) - allows to create a subdirectory "name" in Xen
>   root fault injection directory.
> 
> - xen_fi_dir_add(dir, name) - adds fault injection control directory "name"
>   to directory "dir"
> 
> - xen_should_fail(fi) - check whether fi hav to fail.
> 
> Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Juergen Gross <jgross@suse.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: x86@kernel.org
> CC: xen-devel@lists.xenproject.org
> CC: linux-kernel@vger.kernel.org
> CC: Stanislav Kinsburskii <staskins@amazon.com>
> CC: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/xen/Kconfig        |    7 +++
>  arch/x86/xen/Makefile       |    1 
>  arch/x86/xen/fault_inject.c |  109 +++++++++++++++++++++++++++++++++++++++++++
>  include/xen/fault_inject.h  |   45 ++++++++++++++++++
>  4 files changed, 162 insertions(+)
>  create mode 100644 arch/x86/xen/fault_inject.c
>  create mode 100644 include/xen/fault_inject.h
> 
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index c1f98f3..483fc16 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -77,3 +77,10 @@ config XEN_PVH
>  	bool "Support for running as a PVH guest"
>  	depends on XEN && XEN_PVHVM && ACPI
>  	def_bool n
> +
> +config XEN_FAULT_INJECTION
> +	bool "Enable Xen fault injection"
> +	depends on FAULT_INJECTION_DEBUG_FS
> +	default n
> +	help
> +	  Enable Xen fault injection facility

Why for x86 only? I'd rather add this under drivers/xen

> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index d83cb54..3158fe1 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)		+= vga.o
>  obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>  obj-$(CONFIG_XEN_EFI)		+= efi.o
>  obj-$(CONFIG_XEN_PVH)	 	+= xen-pvh.o
> +obj-$(CONFIG_XEN_FAULT_INJECTION)	+= fault_inject.o
> diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
> new file mode 100644
> index 0000000..ecf0f7c
> --- /dev/null
> +++ b/arch/x86/xen/fault_inject.c
> @@ -0,0 +1,109 @@
> +/*
> + * Fauit injection interface for Xen virtual block devices
> + *
> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:

Please use the appropriate SPDX header instead of the full GPL2
boilerplate.


Juergen

^ permalink raw reply

* [PATCH net-next] tun: do not compute the rxhash, if not needed
From: Paolo Abeni @ 2018-04-20 11:18 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jason Wang

Currently, the tun driver, in absence of an eBPF steering program,
always compute the rxhash in its rx path, even when such value
is later unused due to additional checks (

This changeset moves the all the related checks just before the
__skb_get_hash_symmetric(), so that the latter is no more computed
when unneeded.

Also replace an unneeded RCU section with rcu_access_pointer().

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 drivers/net/tun.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e58be152d5c..091ace726763 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -525,11 +525,6 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 
 	rcu_read_lock();
 
-	/* We may get a very small possibility of OOO during switching, not
-	 * worth to optimize.*/
-	if (tun->numqueues == 1 || tfile->detached)
-		goto unlock;
-
 	e = tun_flow_find(head, rxhash);
 	if (likely(e)) {
 		/* TODO: keep queueing to old queue until it's empty? */
@@ -548,7 +543,6 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 		spin_unlock_bh(&tun->lock);
 	}
 
-unlock:
 	rcu_read_unlock();
 }
 
@@ -1937,10 +1931,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		rcu_read_unlock();
 	}
 
-	rcu_read_lock();
-	if (!rcu_dereference(tun->steering_prog))
+	/* Compute the costly rx hash only if needed for flow updates.
+	 * We may get a very small possibility of OOO during switching, not
+	 * worth to optimize.
+	 */
+	if (!rcu_access_pointer(tun->steering_prog) && tun->numqueues > 1 &&
+	    !tfile->detached)
 		rxhash = __skb_get_hash_symmetric(skb);
-	rcu_read_unlock();
 
 	if (frags) {
 		/* Exercise flow dissector code path. */
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH RFC iptables] iptables: Per-net ns lock
From: Kirill Tkhai @ 2018-04-20 11:24 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, pablo, rstoyanov1, ptikhomirov, avagin
In-Reply-To: <20180420105034.7qwwr3lrh7w7xlh4@breakpoint.cc>

Hi, Florian,

On 20.04.2018 13:50, Florian Westphal wrote:
> Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>> Pablo, Florian, could you please provide comments on this?
>>
>> On 09.04.2018 19:55, Kirill Tkhai wrote:
>>> In CRIU and LXC-restore we met the situation,
>>> when iptables in container can't be restored
>>> because of permission denied:
>>>
>>> https://github.com/checkpoint-restore/criu/issues/469
>>>
>>> Containers want to restore their own net ns,
>>> while they may have no their own mnt ns.
>>> This case they share host's /run/xtables.lock
>>> file, but they may not have permission to open
>>> it.
>>>
>>> Patch makes /run/xtables.lock to be per-namespace,
>>> i.e., to refer to the caller task's net ns.
> 
> It looks ok to me but then again the entire userspace
> lock thing is a ugly band aid :-/

I'm agree, but I'm not sure there is a possibility
to go away from it in classic iptables...

Kirill

^ permalink raw reply

* Re: [PATCH 2/3] xen netback: add fault injection facility
From: Juergen Gross @ 2018-04-20 11:25 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104731.17823.97617.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> This patch adds wrapper helpers around generic Xen fault inject facility.
> The major reason is to keep all the module fault injection directories
> in a dedicated subdirectory instead of Xen fault inject root.
> 
> IOW, when using these helpers, per-device and named by device name fault
> injection control directories will appear under the following directory:
> - /sys/kernel/debug/xen/fault_inject/xen-netback/
> instead of:
> - /sys/kernel/debug/xen/fault_inject/
> 
> Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Matteo Croce <mcroce@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: Gerard Garcia <ggarcia@deic.uab.cat>
> CC: David Ahern <dsa@cumulusnetworks.com>
> CC: Juergen Gross <jgross@suse.com>
> CC: Amir Levy <amir.jer.levy@intel.com>
> CC: Jakub Kicinski <jakub.kicinski@netronome.com>
> CC: linux-kernel@vger.kernel.org
> CC: netdev@vger.kernel.org
> CC: xen-devel@lists.xenproject.org
> CC: Stanislav Kinsburskii <staskins@amazon.com>
> CC: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  drivers/net/Kconfig                  |    8 ++
>  drivers/net/xen-netback/Makefile     |    1 
>  drivers/net/xen-netback/common.h     |    3 +
>  drivers/net/xen-netback/netback.c    |    3 +
>  drivers/net/xen-netback/netback_fi.c |  119 ++++++++++++++++++++++++++++++++++
>  drivers/net/xen-netback/netback_fi.h |   35 ++++++++++
>  drivers/net/xen-netback/xenbus.c     |    6 ++
>  7 files changed, 175 insertions(+)
>  create mode 100644 drivers/net/xen-netback/netback_fi.c
>  create mode 100644 drivers/net/xen-netback/netback_fi.h
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 8918466..5cc9acd 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
>  	  compile this driver as a module, chose M here: the module
>  	  will be called xen-netback.
>  
> +config XEN_NETDEV_BACKEND_FAULT_INJECTION
> +	  bool "Xen net-device backend driver fault injection"
> +	  depends on XEN_NETDEV_BACKEND
> +	  depends on XEN_FAULT_INJECTION
> +	  default n
> +	  help
> +	    Allow to inject errors to Xen backend network driver
> +
>  config VMXNET3
>  	tristate "VMware VMXNET3 ethernet driver"
>  	depends on PCI && INET
> diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
> index d49798a..28abcdc 100644
> --- a/drivers/net/xen-netback/Makefile
> +++ b/drivers/net/xen-netback/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
>  
>  xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o
> +xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index a46a1e9..30d676d 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -286,6 +286,9 @@ struct xenvif {
>  
>  #ifdef CONFIG_DEBUG_FS
>  	struct dentry *xenvif_dbg_root;
> +#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
> +	void *fi_info;
> +#endif
>  #endif
>  
>  	struct xen_netif_ctrl_back_ring ctrl;
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index a27daa2..ecc416e 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -33,6 +33,7 @@
>   */
>  
>  #include "common.h"
> +#include "netback_fi.h"
>  
>  #include <linux/kthread.h>
>  #include <linux/if_vlan.h>
> @@ -1649,6 +1650,7 @@ static int __init netback_init(void)
>  			PTR_ERR(xen_netback_dbg_root));
>  #endif /* CONFIG_DEBUG_FS */
>  
> +	(void) xen_netbk_fi_init();

This is the only usage of xen_netbk_fi_init(). Why don't you make it
return void from the beginning?

>  	return 0;
>  
>  failed_init:
> @@ -1659,6 +1661,7 @@ module_init(netback_init);
>  
>  static void __exit netback_fini(void)
>  {
> +	xen_netbk_fi_fini();
>  #ifdef CONFIG_DEBUG_FS
>  	if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
>  		debugfs_remove_recursive(xen_netback_dbg_root);
> diff --git a/drivers/net/xen-netback/netback_fi.c b/drivers/net/xen-netback/netback_fi.c
> new file mode 100644
> index 0000000..47541d0
> --- /dev/null
> +++ b/drivers/net/xen-netback/netback_fi.c
> @@ -0,0 +1,119 @@
> +/*
> + * Fault injection interface for Xen backend network driver
> + *
> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:

SPDX again.

> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "common.h"
> +
> +#include <linux/debugfs.h>
> +
> +#include <xen/fault_inject.h>
> +#include "netback_fi.h"
> +
> +static struct dentry *vif_fi_dir;
> +
> +static const char *xenvif_fi_names[] = {
> +};
> +
> +struct xenvif_fi {
> +	struct dentry *dir;
> +	struct xen_fi *faults[XENVIF_FI_MAX];
> +};
> +
> +int xen_netbk_fi_init(void)
> +{
> +	vif_fi_dir = xen_fi_dir_create("xen-netback");
> +	if (!vif_fi_dir)
> +		return -ENOMEM;
> +	return 0;
> +}
> +
> +void xen_netbk_fi_fini(void)
> +{
> +	debugfs_remove_recursive(vif_fi_dir);
> +}
> +
> +void xenvif_fi_fini(struct xenvif *vif)
> +{
> +	struct xenvif_fi *vfi = vif->fi_info;
> +	int fi;
> +
> +	if (!vif->fi_info)
> +		return;
> +
> +	vif->fi_info = NULL;
> +
> +	for (fi = 0; fi < XENVIF_FI_MAX; fi++)
> +		xen_fi_del(vfi->faults[fi]);
> +	debugfs_remove_recursive(vfi->dir);
> +	kfree(vfi);
> +}
> +
> +int xenvif_fi_init(struct xenvif *vif)
> +{
> +	struct dentry *parent;
> +	struct xenvif_fi *vfi;
> +	int fi, err = -ENOMEM;
> +
> +	parent = vif_fi_dir;
> +	if (!parent)
> +		return -ENOMEM;
> +
> +	vfi = kmalloc(sizeof(*vfi), GFP_KERNEL);
> +	if (!vfi)
> +		return -ENOMEM;
> +
> +	vfi->dir = debugfs_create_dir(vif->dev->name, parent);
> +	if (!vfi->dir)
> +		goto err_dir;
> +
> +	for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
> +		vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
> +				xenvif_fi_names[fi]);

How does this work? xenvif_fi_names[] is an empty array and this is the
only reference to it. Who is allocating the memory for that array?

> +		if (!vfi->faults[fi])
> +			goto err_fault;
> +	}
> +
> +	vif->fi_info = vfi;
> +	return 0;
> +
> +err_fault:
> +	for (; fi > 0; fi--)
> +		xen_fi_del(vfi->faults[fi]);

What about vfi->faults[0] ?

> +	debugfs_remove_recursive(vfi->dir);
> +err_dir:
> +	kfree(vfi);
> +	return err;
> +}
> +
> +bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
> +{
> +	struct xenvif_fi *vfi = vif->fi_info;
> +
> +	return xen_should_fail(vfi->faults[type]);
> +}
> diff --git a/drivers/net/xen-netback/netback_fi.h b/drivers/net/xen-netback/netback_fi.h
> new file mode 100644
> index 0000000..895c6a6
> --- /dev/null
> +++ b/drivers/net/xen-netback/netback_fi.h
> @@ -0,0 +1,35 @@
> +#ifndef _XEN_NETBACK_FI_H
> +#define _XEN_NETBACK_FI_H
> +
> +struct xen_fi;

Why?

> +
> +typedef enum {
> +	XENVIF_FI_MAX
> +} xenvif_fi_t;

It would have helped if you had added some users of the stuff you are
adding here. This enum just looks weird this way.

> +
> +#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
> +
> +int xen_netbk_fi_init(void);
> +void xen_netbk_fi_fini(void);
> +
> +void xenvif_fi_fini(struct xenvif *vif);
> +int xenvif_fi_init(struct xenvif *vif);
> +
> +bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type);
> +
> +#else
> +
> +static inline int xen_netbk_fi_init(void) { return 0; }
> +static inline void xen_netbk_fi_fini(void) { }
> +
> +static inline void xenvif_fi_fini(struct xenvif *vif) { }
> +static inline int xenvif_fi_init(struct xenvif *vif) { return 0; }
> +
> +static inline bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
> +{
> +	return false;
> +}
> +
> +#endif /* CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION */
> +
> +#endif /* _XEN_NETBACK_FI_H */
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index e1aef25..c775ee0 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -21,6 +21,7 @@
>  #include "common.h"
>  #include <linux/vmalloc.h>
>  #include <linux/rtnetlink.h>
> +#include "netback_fi.h"
>  
>  struct backend_info {
>  	struct xenbus_device *dev;
> @@ -502,6 +503,7 @@ static void backend_disconnect(struct backend_info *be)
>  #ifdef CONFIG_DEBUG_FS
>  		xenvif_debugfs_delif(vif);
>  #endif /* CONFIG_DEBUG_FS */
> +		xenvif_fi_fini(vif);
>  		xenvif_disconnect_data(vif);
>  
>  		/* At this point some of the handlers may still be active
> @@ -1024,6 +1026,10 @@ static void connect(struct backend_info *be)
>  		}
>  	}
>  
> +	err = xenvif_fi_init(be->vif);
> +	if (err)
> +		goto err;
> +
>  #ifdef CONFIG_DEBUG_FS
>  	xenvif_debugfs_addif(be->vif);
>  #endif /* CONFIG_DEBUG_FS */
>

Without any user of that infrastructure I really can't say whether I
want this.


Juergen

^ permalink raw reply

* Re: [PATCH 3/3] xen blkback: add fault injection facility
From: Juergen Gross @ 2018-04-20 11:28 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104736.17823.42983.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> This patch adds wrapper helpers around generic Xen fault inject
> facility.
> The major reason is to keep all the module fault injection directories
> in a dedicated subdirectory instead of Xen fault inject root.
> 
> IOW, when using these helpers, per-device and named by device name
> fault injection control directories will appear under the following
> directory:
> - /sys/kernel/debug/xen/fault_inject/xen-blkback/
> instead of:
> - /sys/kernel/debug/xen/fault_inject/
> 
> Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: "Roger Pau Monné" <roger.pau@citrix.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-block@vger.kernel.org
> CC: xen-devel@lists.xenproject.org
> CC: Stanislav Kinsburskii <staskins@amazon.com>
> CC: David Woodhouse <dwmw@amazon.co.uk>

This is an exact copy of the netback patch apart from the names.

I don't like adding multiple copies of the same coding to the tree.


Juergen

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox