Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [CFT][PATCH] userns: Avoid problems with negative groups
From: Eric W. Biederman @ 2014-11-28 16:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel@vger.kernel.org, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger
In-Reply-To: <CALCETrX2s-7iaLMEKLQsExTEp3JyoAPQG44p0v5wkeED3-6dQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Thu, Nov 27, 2014 at 9:21 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>>
>>>> This change should break userspace by the minimal amount needed
>>>> to fix this issue.
>>>>
>>>> This should fix CVE-2014-8989.
>>>
>>> I think this is both unnecessarily restrictive and that it doesn't fix
>>> the bug.
>>
>> You are going to have to work very hard to convince me this is
>> unnecessarily restrictive.
>>
>>>For example, I can exploit CVE-2014-8989 without ever
>>> writing a uid map or a gid map.
>>
>> Yes.  I realized just after I sent the patch that setgroups(0, NULL)
>> would still work without a mapping set.  That is a first glass grade a
>> oversight that resulted in a bug.  None of the other uid or gid changing
>> syscalls without a mapping set, and setgroups was just overlooked
>> because it was different.  Oops.
>>
>> I will send an updated patch that stops setgroups from working without
>> a mapping set shortly.
>>
>>> IIUC, the only real issue is that user namespaces allow groups to be
>>> dropped using setgroups that wouldn't otherwise be dropped.  Can we
>>> get away with adding a per-user-ns flag that determines whether
>>> setgroups can be used?
>>
>> Being able to call setgroups is fundamental to login programs, and login
>> programs are one of the things user namespaces need to support.  So
>> adding an extra flag and an extra place where privilege is required
>> is just noise, and will wind up breaking every user of user namespaces.
>>
>> Further being able to setup uid and gid mappings without privilege is
>> primarily a nice to have.  The original design did not have unprivileged
>> setting of uid and gid maps and if it proves insecure I goofed and the
>> feature isn't safe so it needs to be removed.
>
> Being able to set a single-user uid map and gid map is very useful for
> sandboxing.  This lets unprivileged users drop filesystem and network
> access and still run most normal programs.  A surprising number of
> normal unprivileged programs fail if run without a mapping.

You can still set a single uid map unprivileged.  That should be enough
to keep your capabilities inside the namespace as long as you need them.

I am sad that in practice you can't set a single gid map, as everyone
calls setgroups.

Although I sort of think it might be worth scouring userspace for
something that will call setgroups and drop all of your groups.  If we
can find something preexisting that will solve this entire mess in a
much more elegant way.

>> This does mean that running a system with negative groups and users
>> delegated subordinate gids in /etc/subuid is a bad idea and system
>> administrators shouldn't do that as those negative groups won't prove
>> effective in stopping their users.  But this is all under system
>> administrator control so shrug.  There isn't a way to avoid that
>> fundamental conflict.
>>
>>> setgroups would be unusable until the gid_map has been written and
>>> then it would become usable if and only if the parent userns could use
>>> setgroups and the opener of gid_map was privileged.
>>
>> That proposal sounds a lot more restrictive and a lot more of a pain
>> to use than what I have implemented in my patch.
>>
>>> If we wanted to allow finer-grained control, we could allow writing
>>> control lines like:
>>>
>>> options +setgroups
>>>
>>> or
>>>
>>> options -setgroups
>>>
>>> in gid_map, or we could add user_ns_flags that can only be written
>>> once and only before either uid_map or gid_map is written.
>>
>> Definitely more complicated and I can't imagine a case where I need
>> a gid map without needing to call setgroups.
>
> I do it all the time.  Unshare, set mappings (with no inner uid 0 at
> all), set no_new_privs, drop caps, and go.
>
> Can we try the intermediate approach?  If you set gid_map without
> privilege and you have supplementary groups, then let the write to
> gid_map succeed but prevent setgroups from ever working?  That should
> only be a couple of lines of code longer than your patch, and it will
> avoid breaking sandbox use cases.

I am torn.  Send me an incremental patch and I will be happy to evaluate
it and if all is good fold the change in.  I hate breaking userspace but
if I break it I would rather it be a clean break that is easy to spot
and work around rather than something that almost works, and causes
people a lot of difficulty debugging.

My expectation is that systems that are serious will have /etc/subuid
and /etc/subgid and newuidmap and newgidmap setup.  Which is the other
way to allow you to map your own gid.

> If we want to get really fancy in the future, we could have a concept
> of pinned groups.  That is, if you're in a userns and you're a member
> of an unmapped group, then you can't drop that group.  (Actually, that
> all by itself might be enough to fix this issue.)

Not allowing you to drop groups really isn't enough.  One of the first
things applications like ssh do is call setgroups(0, NULL) to drop the
privileges granted by supplementary groups.  Further login program
somewhere call setgroups(N, ....) and give you every group mapped
by /etc/group.

I don't think I want to run containers with every supplementary group I
might want to drop mapped, and more than that, it would require changing
a lot more userspace than just the userspace that just does unpriv
containers with a single uid, and a single gid mapped.

But please test and see if you really need to map your group, and send
me an incremental patch if you see a way to do better.  Breaking
userspace sucks.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [CFT][PATCH] userns: Avoid problems with negative groups
From: Andy Lutomirski @ 2014-11-28 15:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger
In-Reply-To: <87wq6frjcw.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Thu, Nov 27, 2014 at 9:21 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>>> This change should break userspace by the minimal amount needed
>>> to fix this issue.
>>>
>>> This should fix CVE-2014-8989.
>>
>> I think this is both unnecessarily restrictive and that it doesn't fix
>> the bug.
>
> You are going to have to work very hard to convince me this is
> unnecessarily restrictive.
>
>>For example, I can exploit CVE-2014-8989 without ever
>> writing a uid map or a gid map.
>
> Yes.  I realized just after I sent the patch that setgroups(0, NULL)
> would still work without a mapping set.  That is a first glass grade a
> oversight that resulted in a bug.  None of the other uid or gid changing
> syscalls without a mapping set, and setgroups was just overlooked
> because it was different.  Oops.
>
> I will send an updated patch that stops setgroups from working without
> a mapping set shortly.
>
>> IIUC, the only real issue is that user namespaces allow groups to be
>> dropped using setgroups that wouldn't otherwise be dropped.  Can we
>> get away with adding a per-user-ns flag that determines whether
>> setgroups can be used?
>
> Being able to call setgroups is fundamental to login programs, and login
> programs are one of the things user namespaces need to support.  So
> adding an extra flag and an extra place where privilege is required
> is just noise, and will wind up breaking every user of user namespaces.
>
> Further being able to setup uid and gid mappings without privilege is
> primarily a nice to have.  The original design did not have unprivileged
> setting of uid and gid maps and if it proves insecure I goofed and the
> feature isn't safe so it needs to be removed.

Being able to set a single-user uid map and gid map is very useful for
sandboxing.  This lets unprivileged users drop filesystem and network
access and still run most normal programs.  A surprising number of
normal unprivileged programs fail if run without a mapping.

>
> This does mean that running a system with negative groups and users
> delegated subordinate gids in /etc/subuid is a bad idea and system
> administrators shouldn't do that as those negative groups won't prove
> effective in stopping their users.  But this is all under system
> administrator control so shrug.  There isn't a way to avoid that
> fundamental conflict.
>
>> setgroups would be unusable until the gid_map has been written and
>> then it would become usable if and only if the parent userns could use
>> setgroups and the opener of gid_map was privileged.
>
> That proposal sounds a lot more restrictive and a lot more of a pain
> to use than what I have implemented in my patch.
>
>> If we wanted to allow finer-grained control, we could allow writing
>> control lines like:
>>
>> options +setgroups
>>
>> or
>>
>> options -setgroups
>>
>> in gid_map, or we could add user_ns_flags that can only be written
>> once and only before either uid_map or gid_map is written.
>
> Definitely more complicated and I can't imagine a case where I need
> a gid map without needing to call setgroups.

I do it all the time.  Unshare, set mappings (with no inner uid 0 at
all), set no_new_privs, drop caps, and go.

Can we try the intermediate approach?  If you set gid_map without
privilege and you have supplementary groups, then let the write to
gid_map succeed but prevent setgroups from ever working?  That should
only be a couple of lines of code longer than your patch, and it will
avoid breaking sandbox use cases.

If we want to get really fancy in the future, we could have a concept
of pinned groups.  That is, if you're in a userns and you're a member
of an unmapped group, then you can't drop that group.  (Actually, that
all by itself might be enough to fix this issue.)

--Andy

>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Will Deacon @ 2014-11-28 15:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, andrew@lunn.ch, heiko@sntech.de,
	gnomes@lxorguk.ukuu.org.uk, Chunyan Zhang, jslaby@suse.cz,
	jason@lakedaemon.net, lanqing.liu@spreadtrum.com, Pawel Moll,
	corbet@lwn.net, zhang.lyra@gmail.com,
	zhizhou.zhang@spreadtrum.com, geng.ren@spreadtrum.com,
	m-karicheri2@ti.com, shawn.guo@freescale.com,
	linux-serial@vger.kernel.org, grant.likely@linaro.org,
	orsonzhai@gmail.com,
	"florian.vaussard@epfl.ch" <flori>
In-Reply-To: <20141128150325.GC24370@e104818-lin.cambridge.arm.com>

On Fri, Nov 28, 2014 at 03:03:26PM +0000, Catalin Marinas wrote:
> On Fri, Nov 28, 2014 at 02:44:12PM +0000, Will Deacon wrote:
> > On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:
> > > On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> > > > On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > > > > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > > > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > > > > +
> > > > > > > > +       timer {
> > > > > > > > +               compatible = "arm,armv8-timer";
> > > > > > > > +               interrupts = <1 13 0xff01>,
> > > > > > > > +                            <1 14 0xff01>,
> > > > > > > > +                            <1 11 0xff01>,
> > > > > > > > +                            <1 10 0xff01>;
> > > > > > > > +               clock-frequency = <26000000>;
> > > > > > > 
> > > > > > > Please remove the clock-frequency property. Your FW should initialise
> > > > > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > > > > 
> > > > > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > > > > timer driver when CONFIG_ARM64.
> > > > > 
> > > > > I'll ack such a patch ;)
> > > > 
> > > > How rude would this be?
> > > > 
> > > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > > > index 2133f9d59d06..aaaf3433ccb9 100644
> > > > --- a/drivers/clocksource/arm_arch_timer.c
> > > > +++ b/drivers/clocksource/arm_arch_timer.c
> > > > @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> > > > 		return;
> > > >  
> > > > 	/* Try to determine the frequency from the device tree or CNTFRQ */
> > > > -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > > +	if (IS_ENABLED(CONFIG_ARM64) ||
> > > > +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > > 		if (cntbase)
> > > > 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> > > > 		else
> > > >
> > > 
> > > Probably too rude, given it doesn't WARN() the user.
> > 
> > We override broken hardware ID registers all the time in device-tree without
> > dumping stack. Why is this any different?
> 
> I'm not for dumping the stack, it's not relevant (just more noise).
> 
> > > We should be extremely loud if we see the clock-frequency property on an
> > > arm64 system. Whether or not we should ignore the property is another
> > > matter.
> > 
> > I don't really see the point in ignoring it. We will see broken hardware
> > [1] and this is just preventing ourselves from working around it. I'd much
> > rather have arch-timers with a "clock-frequence" property than have some
> > other timer instead because the kernel driver is being stubborn.
> 
> I agree that sooner or later we'll need a workaround (we already did for
> Juno). My point is that many consider such overriding behaviour to be
> the default - i.e. don't bother writing any sane value in CNTFRQ in
> firmware at boot because Linux can cope without. It gets worse when
> companies develop their firmware long before starting to upstream kernel
> patches, so too late to fix it.
> 
> > [1] A previous version of the Juno firmware, for example.
> 
> What about CONFIG_BROKEN_FIRMWARE, default off?

I'd rather have a `firmware test' module, which could be as noisy as it
likes when it finds issues like this. It could also do things like fuzz the
PSCI interface.

> In the meantime I think we can be more tolerant:
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 2133f9d59d06..87f67a93fcc7 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -376,6 +376,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
>  			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
>  		else
>  			arch_timer_rate = arch_timer_get_cntfrq();
> +	} else if (IS_ENABLED(CONFIG_ARM64)) {
> +		pr_warn("Architected timer frequency overridden by DT (broken firmware?)\n");
>  	}

That looks sensible. It would be interesting to print the value of CNTFRQ
too.

Will

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Catalin Marinas @ 2014-11-28 15:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, andrew-g2DYL2Zd6BY@public.gmane.org,
	heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
	Chunyan Zhang, jslaby-AlSwsSmVLrQ@public.gmane.org,
	jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
	lanqing.liu-lxIno14LUO0EEoCn2XhGlw@public.gmane.org, Pawel Moll,
	corbet-T1hC0tSOHrs@public.gmane.org,
	zhang.lyra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	zhizhou.zhang-lxIno14LUO0EEoCn2XhGlw@public.gmane.org,
	geng.ren-lxIno14LUO0EEoCn2XhGlw@public.gmane.org,
	m-karicheri2-l0cyMroinI0@public.gmane.org,
	shawn.guo-KZfg59tc24xl57MIdRCFDg@public.gmane.org,
	linux-serial-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	orsonzhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	"florian.vaussard-p8DiymsW2f8@public.gmane.org" <flori>
In-Reply-To: <20141128144412.GG7144-5wv7dgnIgG8@public.gmane.org>

On Fri, Nov 28, 2014 at 02:44:12PM +0000, Will Deacon wrote:
> On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> > > On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > > > +
> > > > > > > +       timer {
> > > > > > > +               compatible = "arm,armv8-timer";
> > > > > > > +               interrupts = <1 13 0xff01>,
> > > > > > > +                            <1 14 0xff01>,
> > > > > > > +                            <1 11 0xff01>,
> > > > > > > +                            <1 10 0xff01>;
> > > > > > > +               clock-frequency = <26000000>;
> > > > > > 
> > > > > > Please remove the clock-frequency property. Your FW should initialise
> > > > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > > > 
> > > > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > > > timer driver when CONFIG_ARM64.
> > > > 
> > > > I'll ack such a patch ;)
> > > 
> > > How rude would this be?
> > > 
> > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > > index 2133f9d59d06..aaaf3433ccb9 100644
> > > --- a/drivers/clocksource/arm_arch_timer.c
> > > +++ b/drivers/clocksource/arm_arch_timer.c
> > > @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> > > 		return;
> > >  
> > > 	/* Try to determine the frequency from the device tree or CNTFRQ */
> > > -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > +	if (IS_ENABLED(CONFIG_ARM64) ||
> > > +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > 		if (cntbase)
> > > 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> > > 		else
> > >
> > 
> > Probably too rude, given it doesn't WARN() the user.
> 
> We override broken hardware ID registers all the time in device-tree without
> dumping stack. Why is this any different?

I'm not for dumping the stack, it's not relevant (just more noise).

> > We should be extremely loud if we see the clock-frequency property on an
> > arm64 system. Whether or not we should ignore the property is another
> > matter.
> 
> I don't really see the point in ignoring it. We will see broken hardware
> [1] and this is just preventing ourselves from working around it. I'd much
> rather have arch-timers with a "clock-frequence" property than have some
> other timer instead because the kernel driver is being stubborn.

I agree that sooner or later we'll need a workaround (we already did for
Juno). My point is that many consider such overriding behaviour to be
the default - i.e. don't bother writing any sane value in CNTFRQ in
firmware at boot because Linux can cope without. It gets worse when
companies develop their firmware long before starting to upstream kernel
patches, so too late to fix it.

> [1] A previous version of the Juno firmware, for example.

What about CONFIG_BROKEN_FIRMWARE, default off?

In the meantime I think we can be more tolerant:

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 2133f9d59d06..87f67a93fcc7 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -376,6 +376,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
 		else
 			arch_timer_rate = arch_timer_get_cntfrq();
+	} else if (IS_ENABLED(CONFIG_ARM64)) {
+		pr_warn("Architected timer frequency overridden by DT (broken firmware?)\n");
 	}
 
 	/* Check the timer frequency. */

^ permalink raw reply related

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Will Deacon @ 2014-11-28 14:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: gnomes@lxorguk.ukuu.org.uk, heiko@sntech.de, andrew@lunn.ch,
	Catalin Marinas, Chunyan Zhang, jason@lakedaemon.net,
	lanqing.liu@spreadtrum.com, Pawel Moll, corbet@lwn.net,
	zhang.lyra@gmail.com, zhizhou.zhang@spreadtrum.com,
	geng.ren@spreadtrum.com, m-karicheri2@ti.com,
	linux-arm-kernel@lists.infradead.org,
	linux-serial@vger.kernel.org, grant.likely@linaro.org,
	orsonzhai@gmail.com,
	"florian.vaussard@epfl.ch" <florian>
In-Reply-To: <20141128144650.GK25883@leverpostej>

On Fri, Nov 28, 2014 at 02:46:51PM +0000, Mark Rutland wrote:
> On Fri, Nov 28, 2014 at 02:44:12PM +0000, Will Deacon wrote:
> > On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:
> > > On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> > > > On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > > > > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > > > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > > > > +
> > > > > > > > +       timer {
> > > > > > > > +               compatible = "arm,armv8-timer";
> > > > > > > > +               interrupts = <1 13 0xff01>,
> > > > > > > > +                            <1 14 0xff01>,
> > > > > > > > +                            <1 11 0xff01>,
> > > > > > > > +                            <1 10 0xff01>;
> > > > > > > > +               clock-frequency = <26000000>;
> > > > > > > 
> > > > > > > Please remove the clock-frequency property. Your FW should initialise
> > > > > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > > > > 
> > > > > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > > > > timer driver when CONFIG_ARM64.
> > > > > 
> > > > > I'll ack such a patch ;)
> > > > 
> > > > How rude would this be?
> > > > 
> > > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > > > index 2133f9d59d06..aaaf3433ccb9 100644
> > > > --- a/drivers/clocksource/arm_arch_timer.c
> > > > +++ b/drivers/clocksource/arm_arch_timer.c
> > > > @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> > > > 		return;
> > > >  
> > > > 	/* Try to determine the frequency from the device tree or CNTFRQ */
> > > > -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > > +	if (IS_ENABLED(CONFIG_ARM64) ||
> > > > +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > > 		if (cntbase)
> > > > 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> > > > 		else
> > > >
> > > 
> > > Probably too rude, given it doesn't WARN() the user.
> > 
> > We override broken hardware ID registers all the time in device-tree without
> > dumping stack. Why is this any different?
> 
> Exposure to guests via KVM, and the fact that it's possible to write to
> the register from EL3. This isn't so much broken HW (which cannot be
> fixed) as broken FW (which can be fixed).

I think we'll see plenty of systems where vendors are reluctant to offer
over-the-air firmware upgrades for issues that can be solved simply by
adding a property to the device-tree.

> Printing the warning gives people the chance to realise and fix the
> issue during bringup.

Print a warning, but don't dump the stack. A simple pr_warn is fine.

Will

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Mark Brown @ 2014-11-28 14:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Catalin Marinas, andrew@lunn.ch, heiko@sntech.de,
	gnomes@lxorguk.ukuu.org.uk, Chunyan Zhang, jslaby@suse.cz,
	jason@lakedaemon.net, lanqing.liu@spreadtrum.com, Pawel Moll,
	corbet@lwn.net, zhang.lyra@gmail.com,
	zhizhou.zhang@spreadtrum.com, geng.ren@spreadtrum.com,
	m-karicheri2@ti.com, shawn.guo@freescale.com,
	linux-serial@vger.kernel.org, grant.likely@linaro.org,
	orsonzhai@gmail.com
In-Reply-To: <20141128144412.GG7144@arm.com>

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Fri, Nov 28, 2014 at 02:44:12PM +0000, Will Deacon wrote:
> On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:

> > Probably too rude, given it doesn't WARN() the user.

> We override broken hardware ID registers all the time in device-tree without
> dumping stack. Why is this any different?

I do tend to agree that a WARN() is excessive but given the amount of
pushback on using this property on ARMv8 systems saying something would
be nice, though...

> > We should be extremely loud if we see the clock-frequency property on an
> > arm64 system. Whether or not we should ignore the property is another
> > matter.

> I don't really see the point in ignoring it. We will see broken hardware
> [1] and this is just preventing ourselves from working around it. I'd much
> rather have arch-timers with a "clock-frequence" property than have some
> other timer instead because the kernel driver is being stubborn.

...it is likely that even if there is a warning we'll end up in this
situation.  If the kernel doesn't complain at all it seems totally
reasonable for people to use the feature, but just refusing to allow it
to be used at all doesn't seem like it's actually helping things.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Mark Rutland @ 2014-11-28 14:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, andrew@lunn.ch, heiko@sntech.de,
	gnomes@lxorguk.ukuu.org.uk, Chunyan Zhang, jslaby@suse.cz,
	jason@lakedaemon.net, lanqing.liu@spreadtrum.com, Pawel Moll,
	corbet@lwn.net, zhang.lyra@gmail.com,
	zhizhou.zhang@spreadtrum.com, geng.ren@spreadtrum.com,
	m-karicheri2@ti.com, shawn.guo@freescale.com,
	linux-serial@vger.kernel.org, grant.likely@linaro.org,
	orsonzhai@gmail.com, florian.vaussard
In-Reply-To: <20141128144412.GG7144@arm.com>

On Fri, Nov 28, 2014 at 02:44:12PM +0000, Will Deacon wrote:
> On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> > > On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > > > +
> > > > > > > +       timer {
> > > > > > > +               compatible = "arm,armv8-timer";
> > > > > > > +               interrupts = <1 13 0xff01>,
> > > > > > > +                            <1 14 0xff01>,
> > > > > > > +                            <1 11 0xff01>,
> > > > > > > +                            <1 10 0xff01>;
> > > > > > > +               clock-frequency = <26000000>;
> > > > > > 
> > > > > > Please remove the clock-frequency property. Your FW should initialise
> > > > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > > > 
> > > > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > > > timer driver when CONFIG_ARM64.
> > > > 
> > > > I'll ack such a patch ;)
> > > 
> > > How rude would this be?
> > > 
> > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > > index 2133f9d59d06..aaaf3433ccb9 100644
> > > --- a/drivers/clocksource/arm_arch_timer.c
> > > +++ b/drivers/clocksource/arm_arch_timer.c
> > > @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> > > 		return;
> > >  
> > > 	/* Try to determine the frequency from the device tree or CNTFRQ */
> > > -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > +	if (IS_ENABLED(CONFIG_ARM64) ||
> > > +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > > 		if (cntbase)
> > > 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> > > 		else
> > >
> > 
> > Probably too rude, given it doesn't WARN() the user.
> 
> We override broken hardware ID registers all the time in device-tree without
> dumping stack. Why is this any different?

Exposure to guests via KVM, and the fact that it's possible to write to
the register from EL3. This isn't so much broken HW (which cannot be
fixed) as broken FW (which can be fixed).

Printing the warning gives people the chance to realise and fix the
issue during bringup.

Mark.

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Will Deacon @ 2014-11-28 14:44 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, andrew-g2DYL2Zd6BY@public.gmane.org,
	heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
	Chunyan Zhang, jslaby-AlSwsSmVLrQ@public.gmane.org,
	jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
	lanqing.liu-lxIno14LUO0EEoCn2XhGlw@public.gmane.org, Pawel Moll,
	corbet-T1hC0tSOHrs@public.gmane.org,
	zhang.lyra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	zhizhou.zhang-lxIno14LUO0EEoCn2XhGlw@public.gmane.org,
	geng.ren-lxIno14LUO0EEoCn2XhGlw@public.gmane.org,
	m-karicheri2-l0cyMroinI0@public.gmane.org,
	shawn.guo-KZfg59tc24xl57MIdRCFDg@public.gmane.org,
	linux-serial-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	orsonzhai-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	florian.vaussard-p8DiymsW2f8
In-Reply-To: <20141128143532.GI25883@leverpostej>

On Fri, Nov 28, 2014 at 02:35:32PM +0000, Mark Rutland wrote:
> On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> > On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > > +
> > > > > > +       timer {
> > > > > > +               compatible = "arm,armv8-timer";
> > > > > > +               interrupts = <1 13 0xff01>,
> > > > > > +                            <1 14 0xff01>,
> > > > > > +                            <1 11 0xff01>,
> > > > > > +                            <1 10 0xff01>;
> > > > > > +               clock-frequency = <26000000>;
> > > > > 
> > > > > Please remove the clock-frequency property. Your FW should initialise
> > > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > > 
> > > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > > timer driver when CONFIG_ARM64.
> > > 
> > > I'll ack such a patch ;)
> > 
> > How rude would this be?
> > 
> > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > index 2133f9d59d06..aaaf3433ccb9 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> > 		return;
> >  
> > 	/* Try to determine the frequency from the device tree or CNTFRQ */
> > -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > +	if (IS_ENABLED(CONFIG_ARM64) ||
> > +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> > 		if (cntbase)
> > 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> > 		else
> >
> 
> Probably too rude, given it doesn't WARN() the user.

We override broken hardware ID registers all the time in device-tree without
dumping stack. Why is this any different?

> We should be extremely loud if we see the clock-frequency property on an
> arm64 system. Whether or not we should ignore the property is another
> matter.

I don't really see the point in ignoring it. We will see broken hardware
[1] and this is just preventing ourselves from working around it. I'd much
rather have arch-timers with a "clock-frequence" property than have some
other timer instead because the kernel driver is being stubborn.

Will

[1] A previous version of the Juno firmware, for example.

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Mark Rutland @ 2014-11-28 14:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: andrew@lunn.ch, heiko@sntech.de, gnomes@lxorguk.ukuu.org.uk,
	Chunyan Zhang, Will Deacon, jslaby@suse.cz, jason@lakedaemon.net,
	lanqing.liu@spreadtrum.com, Pawel Moll, corbet@lwn.net,
	zhang.lyra@gmail.com, zhizhou.zhang@spreadtrum.com,
	geng.ren@spreadtrum.com, m-karicheri2@ti.com,
	shawn.guo@freescale.com, linux-serial@vger.kernel.org,
	grant.likely@linaro.org, orsonzhai@gmail.com,
	"florian.vaussard@epfl.ch" <florian.va>
In-Reply-To: <20141128142913.GB5393@localhost>

On Fri, Nov 28, 2014 at 02:29:13PM +0000, Catalin Marinas wrote:
> On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> > On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > > +
> > > > > +       timer {
> > > > > +               compatible = "arm,armv8-timer";
> > > > > +               interrupts = <1 13 0xff01>,
> > > > > +                            <1 14 0xff01>,
> > > > > +                            <1 11 0xff01>,
> > > > > +                            <1 10 0xff01>;
> > > > > +               clock-frequency = <26000000>;
> > > > 
> > > > Please remove the clock-frequency property. Your FW should initialise
> > > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > > 
> > > Since this comes up regularly, I think we need a dev_warn() in the arch
> > > timer driver when CONFIG_ARM64.
> > 
> > I'll ack such a patch ;)
> 
> How rude would this be?
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 2133f9d59d06..aaaf3433ccb9 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
> 		return;
>  
> 	/* Try to determine the frequency from the device tree or CNTFRQ */
> -	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> +	if (IS_ENABLED(CONFIG_ARM64) ||
> +	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
> 		if (cntbase)
> 			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
> 		else
>

Probably too rude, given it doesn't WARN() the user.

We should be extremely loud if we see the clock-frequency property on an
arm64 system. Whether or not we should ignore the property is another
matter.

Mark.

^ permalink raw reply

* Re: [PATCH v3 3/5] arm64: dts: Add support for Spreadtrum SC9836 SoC in dts and Makefile
From: Catalin Marinas @ 2014-11-28 14:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: gnomes@lxorguk.ukuu.org.uk, heiko@sntech.de, broonie@linaro.org,
	Will Deacon, andrew@lunn.ch, linux-api@vger.kernel.org,
	Chunyan Zhang, artagnon@gmail.com, lanqing.liu@spreadtrum.com,
	arnd@arndb.de, zhang.lyra@gmail.com, corbet@lwn.net,
	jslaby@suse.cz, zhizhou.zhang@spreadtrum.com,
	geng.ren@spreadtrum.com, m-karicheri2@ti.com,
	linux-arm-kernel@lists.infradead.org,
	linux-serial@vger.kernel.org, grant.li
In-Reply-To: <20141127134309.GJ857@leverpostej>

On Thu, Nov 27, 2014 at 01:43:09PM +0000, Mark Rutland wrote:
> On Thu, Nov 27, 2014 at 12:12:15PM +0000, Catalin Marinas wrote:
> > On Thu, Nov 27, 2014 at 11:50:43AM +0000, Mark Rutland wrote:
> > > On Tue, Nov 25, 2014 at 12:16:56PM +0000, Chunyan Zhang wrote:
> > > > +
> > > > +       timer {
> > > > +               compatible = "arm,armv8-timer";
> > > > +               interrupts = <1 13 0xff01>,
> > > > +                            <1 14 0xff01>,
> > > > +                            <1 11 0xff01>,
> > > > +                            <1 10 0xff01>;
> > > > +               clock-frequency = <26000000>;
> > > 
> > > Please remove the clock-frequency property. Your FW should initialise
> > > CNTFRQ_EL0 on all CPUs (certainly PSCI 0.2 requires that you do this).
> > 
> > Since this comes up regularly, I think we need a dev_warn() in the arch
> > timer driver when CONFIG_ARM64.
> 
> I'll ack such a patch ;)

How rude would this be?

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 2133f9d59d06..aaaf3433ccb9 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -371,7 +371,8 @@ arch_timer_detect_rate(void __iomem *cntbase, struct device_node *np)
		return;
 
	/* Try to determine the frequency from the device tree or CNTFRQ */
-	if (of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
+	if (IS_ENABLED(CONFIG_ARM64) ||
+	    of_property_read_u32(np, "clock-frequency", &arch_timer_rate)) {
		if (cntbase)
			arch_timer_rate = readl_relaxed(cntbase + CNTFRQ);
		else

^ permalink raw reply related

* Re: [PATCH v3 5/5] tty/serial: Add Spreadtrum sc9836-uart driver support
From: Orson Zhai @ 2014-11-28 10:13 UTC (permalink / raw)
  To: One Thousand Gnomes, Chunyan Zhang
  Cc: grant.likely, robh+dt, catalin.marinas, gregkh, ijc+devicetree,
	jslaby, galak, broonie, mark.rutland, m-karicheri2, pawel.moll,
	artagnon, rrichter, will.deacon, arnd, corbet, jason, broonie,
	heiko, shawn.guo, florian.vaussard, andrew, hytszk, geng.ren,
	zhizhou.zhang, lanqing.liu, zhang.lyra, wei.qiao, devicetree,
	linux-arm-kernel, linux-kernel, sprdlinux, linux-doc,
	linux-serial, linux-api
In-Reply-To: <20141126123313.520ac83f@lxorguk.ukuu.org.uk>

Hi, Alan,

Thanks for your comments!
Some question for them below,

On 2014年11月26日 20:33, One Thousand Gnomes wrote:
>> +		 213 = /dev/ttySPX0		SPRD serial port 0
>> +		    ...
>> +		 216 = /dev/ttySPX3		SPRD serial port 3
> Please use dynamic allocation or give a very very good reason why you
> can't
do we just leave the .major & .minor as NULL in struct uart_driver for 
dynamic allocation?
>> +config SERIAL_SPRD_NR
>> +        int "Maximum number of sprd serial ports"
>> +        depends on SERIAL_SPRD
>> +        default "4"
> If you are doing dynamic allocation this should pretty much go away
I think you mean getting the number of uarts from dt and dynamically 
allocate their port structure?
>
>> +static int sprd_startup(struct uart_port *port)
>> +{
>> +	int ret = 0;
>> +	unsigned int ien, ctrl1;
>> +	struct sprd_uart_port *sp;
>> +
>> +	serial_out(port, SPRD_CTL2, ((THLD_TX_EMPTY << 8) | THLD_RX_FULL));
>> +
>> +	/* clear rx fifo */
>> +	while (serial_in(port, SPRD_STS1) & 0x00ff)
>> +		serial_in(port, SPRD_RXD);
>> +
>> +	/* clear tx fifo */
>> +	while (serial_in(port, SPRD_STS1) & 0xff00)
>> +		;
> Missing a cpu_relax and I would have thought a timeout on both of these.
We will add in V4
>
>
>> +static void sprd_set_termios(struct uart_port *port,
>> +				    struct ktermios *termios,
>> +				    struct ktermios *old)
>> +{
>> +	unsigned int baud, quot;
>> +	/* calculate parity */
>> +	lcr &= ~SPRD_LCR_PARITY;
>> +	if (termios->c_cflag & PARENB) {
>> +		lcr |= SPRD_LCR_PARITY_EN;
>> +		if (termios->c_cflag & PARODD)
>> +			lcr |= SPRD_LCR_ODD_PAR;
>> +		else
>> +			lcr |= SPRD_LCR_EVEN_PAR;
>> +	}
> If you don't support mark/space parity then also clear CMSPAR in
> termios->c_cflag.
just simply use code like "termios->c_cflag &= ~CMSPAR" ?

> If you do then it ought to be handled above.
>
>> +
>> +	/* clock divider bit16~bit20 */
>> +	serial_out(port, SPRD_CLKD1, (quot & 0x1f0000) >> 16);
>> +	serial_out(port, SPRD_LCR, lcr);
>> +	fc |= 0x3e00 | THLD_RX_FULL;
>> +	serial_out(port, SPRD_CTL1, fc);
> Also set the resulting baud back into the termios (see the 8250 termios
> for an example)
you mean something like this part ?

/* Don't rewrite B0 */
if (tty_termios_baud_rate(termios))
tty_termios_encode_baud_rate(termios, baud, baud);


>
>
>> +static int __init sprd_console_setup(struct console *co, char *options)
>> +{
>> +	struct uart_port *port;
>> +	int baud = 115200;
>> +	int bits = 8;
>> +	int parity = 'n';
>> +	int flow = 'n';
>> +
>> +	if (unlikely(co->index >= UART_NR_MAX || co->index < 0))
>> +		co->index = 0;
>> +
>> +	port = (struct uart_port *)sprd_port[co->index];
>> +	if (port == NULL) {
>> +		pr_info("srial port %d not yet initialized\n", co->index);
> Typo
it will be fixed :)

Thanks,
Orson
> Looks basically sound to me
>
> Alan


^ permalink raw reply

* Re: [PATCH v6 37/46] tun: move internal flag defines out of uapi
From: Jason Wang @ 2014-11-28  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Miller,
	cornelia.huck-tA70FqPdS9bQT0dZR+AlfA,
	rusty-8fk3Idey6ehBDgjK7y7TUQ, nab-IzHhD5pYlfBP7FQvKIMDCQ,
	pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	thuth-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	dahi-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Zhi Yong Wu, Ben Hutchings,
	Herbert Xu, Tom Herbert, Masatake YAMATO, Xi Wang,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1417118789-18231-38-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>



On Fri, Nov 28, 2014 at 4:10 AM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 
wrote:
> TUN_ flags are internal and never exposed
> to userspace. Any application using it is almost
> certainly buggy.
> 
> Move them out to tun.c, we'll remove them in follow-up patches.
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  include/uapi/linux/if_tun.h | 16 ++--------
>  drivers/net/tun.c           | 74 
> ++++++++++++++-------------------------------
>  2 files changed, 26 insertions(+), 64 deletions(-)
> 
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index e9502dd..277a260 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -22,21 +22,11 @@
>  
>  /* Read queue size */
>  #define TUN_READQ_SIZE	500
> -
> -/* TUN device flags */
> -#define TUN_TUN_DEV 	0x0001	
> -#define TUN_TAP_DEV	0x0002
> +/* TUN device type flags: deprecated. Use IFF_TUN/IFF_TAP instead. */
> +#define TUN_TUN_DEV 	IFF_TUN
> +#define TUN_TAP_DEV	IFF_TAP
>  #define TUN_TYPE_MASK   0x000f
>  
> -#define TUN_FASYNC	0x0010
> -#define TUN_NOCHECKSUM	0x0020
> -#define TUN_NO_PI	0x0040
> -/* This flag has no real effect */
> -#define TUN_ONE_QUEUE	0x0080
> -#define TUN_PERSIST 	0x0100	
> -#define TUN_VNET_HDR 	0x0200
> -#define TUN_TAP_MQ      0x0400
> -
>  /* Ioctl defines */
>  #define TUNSETNOCSUM  _IOW('T', 200, int) 
>  #define TUNSETDEBUG   _IOW('T', 201, int) 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 9dd3746..bc89d07 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -103,6 +103,21 @@ do {								\
>  } while (0)
>  #endif
>  
> +/* TUN device flags */
> +
> +/* IFF_ATTACH_QUEUE is never stored in device flags,
> + * overload it to mean fasync when stored there.
> + */
> +#define TUN_FASYNC	IFF_ATTACH_QUEUE
> +#define TUN_NO_PI	IFF_NO_PI
> +/* This flag has no real effect */
> +#define TUN_ONE_QUEUE	IFF_ONE_QUEUE
> +#define TUN_PERSIST 	IFF_PERSIST
> +#define TUN_VNET_HDR 	IFF_VNET_HDR
> +#define TUN_TAP_MQ      IFF_MULTI_QUEUE
> +
> +#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
> +		      IFF_MULTI_QUEUE)
>  #define GOODCOPY_LEN 128
>  
>  #define FLT_EXACT_COUNT 8
> @@ -1521,32 +1536,7 @@ static struct proto tun_proto = {
>  
>  static int tun_flags(struct tun_struct *tun)
>  {
> -	int flags = 0;
> -
> -	if (tun->flags & TUN_TUN_DEV)
> -		flags |= IFF_TUN;
> -	else
> -		flags |= IFF_TAP;
> -
> -	if (tun->flags & TUN_NO_PI)
> -		flags |= IFF_NO_PI;
> -
> -	/* This flag has no real effect.  We track the value for backwards
> -	 * compatibility.
> -	 */
> -	if (tun->flags & TUN_ONE_QUEUE)
> -		flags |= IFF_ONE_QUEUE;
> -
> -	if (tun->flags & TUN_VNET_HDR)
> -		flags |= IFF_VNET_HDR;
> -
> -	if (tun->flags & TUN_TAP_MQ)
> -		flags |= IFF_MULTI_QUEUE;
> -
> -	if (tun->flags & TUN_PERSIST)
> -		flags |= IFF_PERSIST;
> -
> -	return flags;
> +	return tun->flags & (TUN_FEATURES | IFF_PERSIST | IFF_TUN | 
> IFF_TAP);
>  }
>  
>  static ssize_t tun_show_flags(struct device *dev, struct 
> device_attribute *attr,
> @@ -1706,28 +1696,8 @@ static int tun_set_iff(struct net *net, struct 
> file *file, struct ifreq *ifr)
>  
>  	tun_debug(KERN_INFO, tun, "tun_set_iff\n");
>  
> -	if (ifr->ifr_flags & IFF_NO_PI)
> -		tun->flags |= TUN_NO_PI;
> -	else
> -		tun->flags &= ~TUN_NO_PI;
> -
> -	/* This flag has no real effect.  We track the value for backwards
> -	 * compatibility.
> -	 */
> -	if (ifr->ifr_flags & IFF_ONE_QUEUE)
> -		tun->flags |= TUN_ONE_QUEUE;
> -	else
> -		tun->flags &= ~TUN_ONE_QUEUE;
> -
> -	if (ifr->ifr_flags & IFF_VNET_HDR)
> -		tun->flags |= TUN_VNET_HDR;
> -	else
> -		tun->flags &= ~TUN_VNET_HDR;
> -
> -	if (ifr->ifr_flags & IFF_MULTI_QUEUE)
> -		tun->flags |= TUN_TAP_MQ;
> -	else
> -		tun->flags &= ~TUN_TAP_MQ;
> +	tun->flags = (tun->flags & ~TUN_FEATURES) |
> +		(ifr->ifr_flags & TUN_FEATURES);
>  
>  	/* Make sure persistent devices do not get stuck in
>  	 * xoff state.
> @@ -1890,9 +1860,11 @@ static long __tun_chr_ioctl(struct file *file, 
> unsigned int cmd,
>  	if (cmd == TUNGETFEATURES) {
>  		/* Currently this just means: "what IFF flags are valid?".
>  		 * This is needed because we never checked for invalid flags on
> -		 * TUNSETIFF. */
> -		return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
> -				IFF_VNET_HDR | IFF_MULTI_QUEUE,
> +		 * TUNSETIFF.  Why do we report IFF_TUN and IFF_TAP which are
> +		 * not legal for TUNSETIFF here?  It's probably a bug, but it
> +		 * doesn't seem to be worth fixing.
> +		 */

The question is not very clear. IFF_TUN and IFF_TAP are legal for 
 ifr.ifr_flags during TUNSETIFF. No?
> 
> +		return put_user(IFF_TUN | IFF_TAP | TUN_FEATURES,
>  				(unsigned int __user*)argp);
>  	} else if (cmd == TUNSETQUEUE)
>  		return tun_set_queue(file, &ifr);
> -- 
> MST
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [CFT][PATCH v2] userns: Avoid problems with negative groups
From: Eric W. Biederman @ 2014-11-28  5:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-man, Kees Cook, Linux API, Linux Containers, Josh Triplett,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LSM,
	Michael Kerrisk-manpages, Richard Weinberger, Casey Schaufler,
	Andrew Morton
In-Reply-To: <87wq6frjcw.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>


Classic unix permission checks have an interesting feature.  The group
permissions for a file can be set to less than the other permissions
on a file.  Occassionally this is used deliberately to give a certain
group of users fewer permissions than the default.

Overlooking negative groups has resulted in the permission checks for
setting up a group mapping in a user namespace to be too lax.  Tighten
the permission checks in new_idmap_permitted to ensure that mapping
uids and gids into user namespaces without privilege will not result
in new combinations of credentials being available to the users.

When setting mappings without privilege only the creator of the user
namespace is interesting as all other users that have CAP_SETUID over
the user namespace will also have CAP_SETUID over the user namespaces
parent.  So the scope of the unprivileged check is reduced to just
the case where cred->euid is the namespace creator.

For setting a uid mapping without privilege only euid is considered as
setresuid can set uid, suid and fsuid from euid without privielege
making any combination of uids possible with user namespaces already
possible without them.

For setting a gid mapping without privilege only egid on a credential
without supplementary groups is condsidered, as setresgid can set gid,
sgid and fsgid from egid without privilege.  The requirement for no
supplementary groups is because CAP_SETUID in a user namespace allows
supplementary groups to be cleared, which unfortunately means allowing
a credential with supplementary groups would allow new combinations
of credentials to exist, and thus would allow defeating negative
groups without permission.

setgroups is modified to fail not only when the group ids do not
map but also when there are no gid mappings at all, preventing
setgroups(0, NULL) from succeeding when gid mappings have not been
established.

This change should break userspace by the minimal amount needed
to fix this issue.

This should fix CVE-2014-8989.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h | 10 ++++++++++
 kernel/groups.c                |  7 ++++++-
 kernel/uid16.c                 |  5 ++++-
 kernel/user_namespace.c        | 16 ++++++++++++----
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e95372654f09..26d5e8f5db97 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -46,6 +46,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
 	return ns;
 }
 
+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+	return ns->gid_map.nr_extents != 0;
+}
+
 extern int create_user_ns(struct cred *new);
 extern int unshare_userns(unsigned long unshare_flags, struct cred **new_cred);
 extern void free_user_ns(struct user_namespace *ns);
@@ -70,6 +75,11 @@ static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
 	return &init_user_ns;
 }
 
+static inline bool gid_mapping_possible(const struct user_namespace *ns)
+{
+	return true;
+}
+
 static inline int create_user_ns(struct cred *new)
 {
 	return -EINVAL;
diff --git a/kernel/groups.c b/kernel/groups.c
index 451698f86cfa..302aa415158f 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -6,6 +6,7 @@
 #include <linux/slab.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 #include <asm/uaccess.h>
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
@@ -220,14 +221,18 @@ out:
 
 SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!gid_mapping_possible(user_ns) ||
+	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
 
+	/* How do I alloc a negative gidsetsize??? */
+
 	group_info = groups_alloc(gidsetsize);
 	if (!group_info)
 		return -ENOMEM;
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 602e5bbbceff..602c7de2aa11 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -13,6 +13,7 @@
 #include <linux/highuid.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/user_namespace.h>
 
 #include <asm/uaccess.h>
 
@@ -173,10 +174,12 @@ out:
 
 SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int retval;
 
-	if (!ns_capable(current_user_ns(), CAP_SETGID))
+	if (!gid_mapping_possible(user_ns) ||
+	    !ns_capable(user_ns, CAP_SETGID))
 		return -EPERM;
 	if ((unsigned)gidsetsize > NGROUPS_MAX)
 		return -EINVAL;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index aa312b0dc3ec..b338c42d04fd 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -812,16 +812,24 @@ static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
 				struct uid_gid_map *new_map)
 {
-	/* Allow mapping to your own filesystem ids */
-	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
+	const struct cred *cred = file->f_cred;
+
+	/* Allow mapping an id without CAP_SETUID and CAP_SETGID when
+	 * allowing the root of the user namespace CAP_SETUID or
+	 * CAP_SETGID restricted to just that id will not change the
+	 * set of credentials available that user.
+	 */
+	if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
+	    uid_eq(ns->owner, cred->euid)) {
 		u32 id = new_map->extent[0].lower_first;
 		if (cap_setid == CAP_SETUID) {
 			kuid_t uid = make_kuid(ns->parent, id);
-			if (uid_eq(uid, file->f_cred->fsuid))
+			if (uid_eq(uid, cred->euid))
 				return true;
 		} else if (cap_setid == CAP_SETGID) {
 			kgid_t gid = make_kgid(ns->parent, id);
-			if (gid_eq(gid, file->f_cred->fsgid))
+			if (gid_eq(gid, cred->egid) &&
+			    (cred->group_info->ngroups == 0))
 				return true;
 		}
 	}
-- 
1.9.1

^ permalink raw reply related

* Re: [CFT][PATCH] userns: Avoid problems with negative groups
From: Eric W. Biederman @ 2014-11-28  5:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel@vger.kernel.org, LSM, Casey Schaufler,
	Serge E. Hallyn, Richard Weinberger
In-Reply-To: <CALCETrUuWDq2akKfb50AiPHeDDWzPW7ijz1QwnuNiskyZbBEfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

>> This change should break userspace by the minimal amount needed
>> to fix this issue.
>>
>> This should fix CVE-2014-8989.
>
> I think this is both unnecessarily restrictive and that it doesn't fix
> the bug.  

You are going to have to work very hard to convince me this is
unnecessarily restrictive.  

>For example, I can exploit CVE-2014-8989 without ever
> writing a uid map or a gid map.

Yes.  I realized just after I sent the patch that setgroups(0, NULL)
would still work without a mapping set.  That is a first glass grade a
oversight that resulted in a bug.  None of the other uid or gid changing
syscalls without a mapping set, and setgroups was just overlooked
because it was different.  Oops.

I will send an updated patch that stops setgroups from working without
a mapping set shortly.

> IIUC, the only real issue is that user namespaces allow groups to be
> dropped using setgroups that wouldn't otherwise be dropped.  Can we
> get away with adding a per-user-ns flag that determines whether
> setgroups can be used?

Being able to call setgroups is fundamental to login programs, and login
programs are one of the things user namespaces need to support.  So
adding an extra flag and an extra place where privilege is required
is just noise, and will wind up breaking every user of user namespaces.

Further being able to setup uid and gid mappings without privilege is
primarily a nice to have.  The original design did not have unprivileged
setting of uid and gid maps and if it proves insecure I goofed and the
feature isn't safe so it needs to be removed.

This does mean that running a system with negative groups and users
delegated subordinate gids in /etc/subuid is a bad idea and system
administrators shouldn't do that as those negative groups won't prove
effective in stopping their users.  But this is all under system
administrator control so shrug.  There isn't a way to avoid that
fundamental conflict.

> setgroups would be unusable until the gid_map has been written and
> then it would become usable if and only if the parent userns could use
> setgroups and the opener of gid_map was privileged.

That proposal sounds a lot more restrictive and a lot more of a pain
to use than what I have implemented in my patch.

> If we wanted to allow finer-grained control, we could allow writing
> control lines like:
>
> options +setgroups
>
> or
>
> options -setgroups
>
> in gid_map, or we could add user_ns_flags that can only be written
> once and only before either uid_map or gid_map is written.

Definitely more complicated and I can't imagine a case where I need
a gid map without needing to call setgroups.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/3] kcmp: Move kcmp.h into uapi
From: Michael Ellerman @ 2014-11-28  2:18 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: shuahkh-JPH+aEBZ4P+UEJcrhfAQsw, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, gorcunov-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <1414040834-30209-1-git-send-email-mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>

On Thu, 2014-10-23 at 16:07 +1100, Michael Ellerman wrote:
> kcmp.h appears to be part of the API, it's documented in kcmp(2), and
> the selftests/kcmp code uses it. So move it to uapi so it's actually
> exported.

Looks like this series fell through the cracks?

It still applies on rc6. Should I resend?

cheers

> Signed-off-by: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
> ---
>  include/linux/kcmp.h      | 13 +------------
>  include/uapi/linux/Kbuild |  1 +
>  include/uapi/linux/kcmp.h | 17 +++++++++++++++++
>  3 files changed, 19 insertions(+), 12 deletions(-)
>  create mode 100644 include/uapi/linux/kcmp.h
> 
> diff --git a/include/linux/kcmp.h b/include/linux/kcmp.h
> index 2dcd1b3aafc8..9dfb23e1771b 100644
> --- a/include/linux/kcmp.h
> +++ b/include/linux/kcmp.h
> @@ -1,17 +1,6 @@
>  #ifndef _LINUX_KCMP_H
>  #define _LINUX_KCMP_H
>  
> -/* Comparison type */
> -enum kcmp_type {
> -	KCMP_FILE,
> -	KCMP_VM,
> -	KCMP_FILES,
> -	KCMP_FS,
> -	KCMP_SIGHAND,
> -	KCMP_IO,
> -	KCMP_SYSVSEM,
> -
> -	KCMP_TYPES,
> -};
> +#include <uapi/linux/kcmp.h>
>  
>  #endif /* _LINUX_KCMP_H */
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index b70237e8bc37..1cf50d682dbf 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -209,6 +209,7 @@ header-y += ivtvfb.h
>  header-y += ixjuser.h
>  header-y += jffs2.h
>  header-y += joystick.h
> +header-y += kcmp.h
>  header-y += kd.h
>  header-y += kdev_t.h
>  header-y += kernel-page-flags.h
> diff --git a/include/uapi/linux/kcmp.h b/include/uapi/linux/kcmp.h
> new file mode 100644
> index 000000000000..84df14b37360
> --- /dev/null
> +++ b/include/uapi/linux/kcmp.h
> @@ -0,0 +1,17 @@
> +#ifndef _UAPI_LINUX_KCMP_H
> +#define _UAPI_LINUX_KCMP_H
> +
> +/* Comparison type */
> +enum kcmp_type {
> +	KCMP_FILE,
> +	KCMP_VM,
> +	KCMP_FILES,
> +	KCMP_FS,
> +	KCMP_SIGHAND,
> +	KCMP_IO,
> +	KCMP_SYSVSEM,
> +
> +	KCMP_TYPES,
> +};
> +
> +#endif /* _UAPI_LINUX_KCMP_H */

^ permalink raw reply

* Re: [PATCH v14 0/3] Add drm driver for Rockchip Socs
From: Heiko Stübner @ 2014-11-27 22:17 UTC (permalink / raw)
  To: Daniel Kurtz
  Cc: Mark yao, Arnd Bergmann, Dave Airlie, Joerg Roedel,
	Boris BREZILLON, Rob Clark, Daniel Vetter, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree@vger.kernel.org, linux-doc, LKML, dri-devel, linux-api,
	"open list:ARM/Rockchip SoC..." <linux-rock>
In-Reply-To: <CAGS+omDDAYtx+02ZBMGnLXygWpfBa283pCU00vP-sahwNnoyFA@mail.gmail.com>

Am Donnerstag, 27. November 2014, 14:05:00 schrieb Daniel Kurtz:
> On Thu, Nov 27, 2014 at 2:08 AM, Mark yao <mark.yao@rock-chips.com> wrote:
> > On 2014年11月27日 10:12, Dave Airlie wrote:
> >>> Hi Dave
> >>> 
> >>>      Do you mean that I need send you a branch, based on drm-next, merge
> >>>      with
> >>> 
> >>> iommu tree and rockchip drm?
> >> 
> >> Yes, grab drm-next, git pull the arm/rockchip branch from Joerg's
> >> tree, put rockchip drm
> >> patches on top, send me pull request.
> >> 
> >> I'll validate it then.
> >> 
> >> Dave.
> > 
> > Hi Dave
> > 
> >     I have send a pull request to you, with Joerg's iommu arm/rockchip
> >     branch.
> >     
> >     I got a same problem when use "make multi_v7_defconfig" as Heiko said:
> >         drivers/video/fbdev/Kconfig:5:error: recursive dependency
> >         detected!
> >         drivers/video/fbdev/Kconfig:5:    symbol FB is selected by
> >         DRM_KMS_FB_HELPER drivers/gpu/drm/Kconfig:34:    symbol
> >         DRM_KMS_FB_HELPER is selected by DRM_ROCKCHIP
> >         drivers/gpu/drm/rockchip/Kconfig:1:    symbol DRM_ROCKCHIP
> >         depends on ARM_DMA_USE_IOMMU arch/arm/Kconfig:95:    symbol
> >         ARM_DMA_USE_IOMMU is selected by VIDEO_OMAP3
> >         drivers/media/platform/Kconfig:96:    symbol VIDEO_OMAP3 depends
> >         on VIDEO_V4L2 drivers/media/v4l2-core/Kconfig:6:    symbol
> >         VIDEO_V4L2 depends on I2C drivers/i2c/Kconfig:7:    symbol I2C is
> >         selected by FB_DDC
> >         drivers/video/fbdev/Kconfig:59:    symbol FB_DDC is selected by
> >         FB_CYBER2000_DDC drivers/video/fbdev/Kconfig:374:    symbol
> >         FB_CYBER2000_DDC depends on FB_CYBER2000
> >         drivers/video/fbdev/Kconfig:362:    symbol FB_CYBER2000 depends
> >         on FB>     
> >     I was confused how to solve the recursive dependency, remove depends
> >     on ARM_DMA_USE_IOMMU & IOMMU_API?
> The "depends on ARM_DMA_USE_IOMMU & IOMMU_API" was suggested by Arnd
> Bergmann during code review (originally they were selected).
> 
> Removing them definitely fixes the dependency recursion.
> Also, since they are both already selected by ROCKCHIP_IOMMU,
> everything will build correctly.
> So, this sounds good to me, but I am no expert on Kconfig.

personally I'd think the omap v4l stuff is at fault here. It is selecting 
ARM_DMA_USE_IOMMU and OMAP_IOMMU instead of depending on them.

And OMAP_IOMMU is a regular driver option which can be selected through 
Kconfig, so I don't think it should get selected but enabled by choice.

It would be interesting to know if the Exynos drm is also affected by this, as 
it is using a similar depency-structure by depending on the iommu and 
ARM_DMA_USE_IOMMU.


Heiko

^ permalink raw reply

* Re: [PATCH v14 0/3] Add drm driver for Rockchip Socs
From: Rob Clark @ 2014-11-27 22:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Kurtz, Mark yao, Dave Airlie, Joerg Roedel,
	Heiko Stübner, Boris BREZILLON, Daniel Vetter, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree@vger.kernel.org, linux-doc, LKML, dri-devel, linux-api
In-Reply-To: <3124625.mkxyU0zW0C@wuerfel>

On Thu, Nov 27, 2014 at 4:56 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 27 November 2014 14:05:00 Daniel Kurtz wrote:
>> On Thu, Nov 27, 2014 at 2:08 AM, Mark yao <mark.yao@rock-chips.com> wrote:
>> >
>> > On 2014年11月27日 10:12, Dave Airlie wrote:
>> >>>>
>> >>>>
>> >>> Hi Dave
>> >>>      Do you mean that I need send you a branch, based on drm-next, merge with
>> >>> iommu tree and rockchip drm?
>> >>
>> >> Yes, grab drm-next, git pull the arm/rockchip branch from Joerg's
>> >> tree, put rockchip drm
>> >> patches on top, send me pull request.
>> >>
>> >> I'll validate it then.
>> >>
>> >> Dave.
>> >>
>> > Hi Dave
>> >     I have send a pull request to you, with Joerg's iommu arm/rockchip branch.
>> >
>> >     I got a same problem when use "make multi_v7_defconfig" as Heiko said:
>> >         drivers/video/fbdev/Kconfig:5:error: recursive dependency detected!
>> >         drivers/video/fbdev/Kconfig:5:    symbol FB is selected by DRM_KMS_FB_HELPER
>> >         drivers/gpu/drm/Kconfig:34:    symbol DRM_KMS_FB_HELPER is selected by DRM_ROCKCHIP
>> >         drivers/gpu/drm/rockchip/Kconfig:1:    symbol DRM_ROCKCHIP depends on ARM_DMA_USE_IOMMU
>> >         arch/arm/Kconfig:95:    symbol ARM_DMA_USE_IOMMU is selected by VIDEO_OMAP3
>> >         drivers/media/platform/Kconfig:96:    symbol VIDEO_OMAP3 depends on VIDEO_V4L2
>> >         drivers/media/v4l2-core/Kconfig:6:    symbol VIDEO_V4L2 depends on I2C
>> >         drivers/i2c/Kconfig:7:    symbol I2C is selected by FB_DDC
>> >         drivers/video/fbdev/Kconfig:59:    symbol FB_DDC is selected by FB_CYBER2000_DDC
>> >         drivers/video/fbdev/Kconfig:374:    symbol FB_CYBER2000_DDC depends on FB_CYBER2000
>> >         drivers/video/fbdev/Kconfig:362:    symbol FB_CYBER2000 depends on FB
>> >
>> >     I was confused how to solve the recursive dependency, remove depends on ARM_DMA_USE_IOMMU & IOMMU_API?
>>
>> The "depends on ARM_DMA_USE_IOMMU & IOMMU_API" was suggested by Arnd
>> Bergmann during code review (originally they were selected).
>>
>> Removing them definitely fixes the dependency recursion.
>> Also, since they are both already selected by ROCKCHIP_IOMMU,
>> everything will build correctly.
>> So, this sounds good to me, but I am no expert on Kconfig.
>>
>
> I think the problem here is VIDEO_OMAP3 (among others), it should do the
> same as rockchips and use 'depends on' instead of 'select'.
>
> Another problem is the 'select I2C' in FB_DDC, but that would be much harder
> to fix.

that 'select I2C' is the source of many loops..  I'd looked at it
briefly before, but then ran away in despair.

BR,
-R

>         Arnd

^ permalink raw reply

* Re: [PATCH v14 0/3] Add drm driver for Rockchip Socs
From: Arnd Bergmann @ 2014-11-27 21:56 UTC (permalink / raw)
  To: Daniel Kurtz
  Cc: Mark yao, Dave Airlie, Joerg Roedel, Heiko Stübner,
	Boris BREZILLON, Rob Clark, Daniel Vetter, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, LKML, dri-devel,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAGS+omDDAYtx+02ZBMGnLXygWpfBa283pCU00vP-sahwNnoyFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thursday 27 November 2014 14:05:00 Daniel Kurtz wrote:
> On Thu, Nov 27, 2014 at 2:08 AM, Mark yao <mark.yao-TNX95d0MmH7DzftRWevZcw@public.gmane.org> wrote:
> >
> > On 2014年11月27日 10:12, Dave Airlie wrote:
> >>>>
> >>>>
> >>> Hi Dave
> >>>      Do you mean that I need send you a branch, based on drm-next, merge with
> >>> iommu tree and rockchip drm?
> >>
> >> Yes, grab drm-next, git pull the arm/rockchip branch from Joerg's
> >> tree, put rockchip drm
> >> patches on top, send me pull request.
> >>
> >> I'll validate it then.
> >>
> >> Dave.
> >>
> > Hi Dave
> >     I have send a pull request to you, with Joerg's iommu arm/rockchip branch.
> >
> >     I got a same problem when use "make multi_v7_defconfig" as Heiko said:
> >         drivers/video/fbdev/Kconfig:5:error: recursive dependency detected!
> >         drivers/video/fbdev/Kconfig:5:    symbol FB is selected by DRM_KMS_FB_HELPER
> >         drivers/gpu/drm/Kconfig:34:    symbol DRM_KMS_FB_HELPER is selected by DRM_ROCKCHIP
> >         drivers/gpu/drm/rockchip/Kconfig:1:    symbol DRM_ROCKCHIP depends on ARM_DMA_USE_IOMMU
> >         arch/arm/Kconfig:95:    symbol ARM_DMA_USE_IOMMU is selected by VIDEO_OMAP3
> >         drivers/media/platform/Kconfig:96:    symbol VIDEO_OMAP3 depends on VIDEO_V4L2
> >         drivers/media/v4l2-core/Kconfig:6:    symbol VIDEO_V4L2 depends on I2C
> >         drivers/i2c/Kconfig:7:    symbol I2C is selected by FB_DDC
> >         drivers/video/fbdev/Kconfig:59:    symbol FB_DDC is selected by FB_CYBER2000_DDC
> >         drivers/video/fbdev/Kconfig:374:    symbol FB_CYBER2000_DDC depends on FB_CYBER2000
> >         drivers/video/fbdev/Kconfig:362:    symbol FB_CYBER2000 depends on FB
> >
> >     I was confused how to solve the recursive dependency, remove depends on ARM_DMA_USE_IOMMU & IOMMU_API?
> 
> The "depends on ARM_DMA_USE_IOMMU & IOMMU_API" was suggested by Arnd
> Bergmann during code review (originally they were selected).
> 
> Removing them definitely fixes the dependency recursion.
> Also, since they are both already selected by ROCKCHIP_IOMMU,
> everything will build correctly.
> So, this sounds good to me, but I am no expert on Kconfig.
> 

I think the problem here is VIDEO_OMAP3 (among others), it should do the
same as rockchips and use 'depends on' instead of 'select'.

Another problem is the 'select I2C' in FB_DDC, but that would be much harder
to fix.

	Arnd

^ permalink raw reply

* Re: [CFT][PATCH] userns: Avoid problems with negative groups
From: Andy Lutomirski @ 2014-11-27 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Josh Triplett, Andrew Morton, Kees Cook,
	Michael Kerrisk-manpages, Linux API, linux-man,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LSM,
	Casey Schaufler, Serge E. Hallyn, Richard Weinberger
In-Reply-To: <87lhmwwpey.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Thu, Nov 27, 2014 at 8:59 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Classic unix permission checks have an interesting feature.  The group
> permissions for a file can be set to less than the other permissions
> on a file.  Occassionally this is used deliberately to give a certain
> group of users fewer permissions than the default.
>
> Overlooking negative groups has resulted in the permission checks for
> setting up a group mapping in a user namespace to be too lax.  Tighten
> the permission checks in new_idmap_permitted to ensure that mapping
> uids and gids into user namespaces without privilege will not result
> in new combinations of credentials being available to the users.
>
> When setting mappings without privilege only the creator of the user
> namespace is interesting as all other users that have CAP_SETUID over
> the user namespace will also have CAP_SETUID over the user namespaces
> parent.  So the scope of the unprivileged check is reduced to just
> the case where cred->euid is the namespace creator.
>
> For setting a uid mapping without privilege only euid is considered as
> setresuid can set uid, suid and fsuid from euid without privielege
> making any combination of uids possible with user namespaces already
> possible without them.
>
> For setting a gid mapping without privilege only egid on a credential
> without supplementary groups is condsidered, as setresgid can set gid,
> sgid and fsgid from egid without privilege.  The requirement for no
> supplementary groups is because CAP_SETUID in a user namespace allows
> supplementary groups to be cleared, which unfortunately means allowing
> a credential with supplementary groups would allow new combinations
> of credentials to exist, and thus would allow defeating negative
> groups without permission.
>
> This change should break userspace by the minimal amount needed
> to fix this issue.
>
> This should fix CVE-2014-8989.

I think this is both unnecessarily restrictive and that it doesn't fix
the bug.  For example, I can exploit CVE-2014-8989 without ever
writing a uid map or a gid map.

IIUC, the only real issue is that user namespaces allow groups to be
dropped using setgroups that wouldn't otherwise be dropped.  Can we
get away with adding a per-user-ns flag that determines whether
setgroups can be used?

setgroups would be unusable until the gid_map has been written and
then it would become usable if and only if the parent userns could use
setgroups and the opener of gid_map was privileged.

If we wanted to allow finer-grained control, we could allow writing
control lines like:

options +setgroups

or

options -setgroups

in gid_map, or we could add user_ns_flags that can only be written
once and only before either uid_map or gid_map is written.

--Andy

>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>
> If people can test and review this change and verify  this and verify it
> doesn't break anything I can't help breaking I will appreciate it.
>
>  kernel/user_namespace.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index aa312b0dc3ec..b338c42d04fd 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -812,16 +812,24 @@ static bool new_idmap_permitted(const struct file *file,
>                                 struct user_namespace *ns, int cap_setid,
>                                 struct uid_gid_map *new_map)
>  {
> -       /* Allow mapping to your own filesystem ids */
> -       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) {
> +       const struct cred *cred = file->f_cred;
> +
> +       /* Allow mapping an id without CAP_SETUID and CAP_SETGID when
> +        * allowing the root of the user namespace CAP_SETUID or
> +        * CAP_SETGID restricted to just that id will not change the
> +        * set of credentials available that user.
> +        */
> +       if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) &&
> +           uid_eq(ns->owner, cred->euid)) {
>                 u32 id = new_map->extent[0].lower_first;
>                 if (cap_setid == CAP_SETUID) {
>                         kuid_t uid = make_kuid(ns->parent, id);
> -                       if (uid_eq(uid, file->f_cred->fsuid))
> +                       if (uid_eq(uid, cred->euid))
>                                 return true;
>                 } else if (cap_setid == CAP_SETGID) {
>                         kgid_t gid = make_kgid(ns->parent, id);
> -                       if (gid_eq(gid, file->f_cred->fsgid))
> +                       if (gid_eq(gid, cred->egid) &&
> +                           (cred->group_info->ngroups == 0))
>                                 return true;
>                 }
>         }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* [PATCH v6 44/46] virtio_scsi: export to userspace
From: Michael S. Tsirkin @ 2014-11-27 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	Rusty Russell, Alexei Starovoitov, Greg Kroah-Hartman,
	Daniel Borkmann, Andy Grover, Sakari Ailus, David Drysdale,
	stephen hemminger, =?UTF-8?q?Piotr=20Kr=C3=B3l?=,
	Bjarke Istrup Pedersen, linux-api, virtualization
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

Replace uXX by __uXX and _packed by __attribute((packed))
as seems to be the norm for userspace headers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/uapi/linux/virtio_scsi.h | 74 ++++++++++++++++++++--------------------
 include/uapi/linux/Kbuild        |  1 +
 2 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/include/uapi/linux/virtio_scsi.h b/include/uapi/linux/virtio_scsi.h
index af44864..42b9370 100644
--- a/include/uapi/linux/virtio_scsi.h
+++ b/include/uapi/linux/virtio_scsi.h
@@ -34,78 +34,78 @@
 
 /* SCSI command request, followed by data-out */
 struct virtio_scsi_cmd_req {
-	u8 lun[8];		/* Logical Unit Number */
+	__u8 lun[8];		/* Logical Unit Number */
 	__virtio64 tag;		/* Command identifier */
-	u8 task_attr;		/* Task attribute */
-	u8 prio;		/* SAM command priority field */
-	u8 crn;
-	u8 cdb[VIRTIO_SCSI_CDB_SIZE];
-} __packed;
+	__u8 task_attr;		/* Task attribute */
+	__u8 prio;		/* SAM command priority field */
+	__u8 crn;
+	__u8 cdb[VIRTIO_SCSI_CDB_SIZE];
+} __attribute__((packed));
 
 /* SCSI command request, followed by protection information */
 struct virtio_scsi_cmd_req_pi {
-	u8 lun[8];		/* Logical Unit Number */
+	__u8 lun[8];		/* Logical Unit Number */
 	__virtio64 tag;		/* Command identifier */
-	u8 task_attr;		/* Task attribute */
-	u8 prio;		/* SAM command priority field */
-	u8 crn;
+	__u8 task_attr;		/* Task attribute */
+	__u8 prio;		/* SAM command priority field */
+	__u8 crn;
 	__virtio32 pi_bytesout;	/* DataOUT PI Number of bytes */
 	__virtio32 pi_bytesin;		/* DataIN PI Number of bytes */
-	u8 cdb[VIRTIO_SCSI_CDB_SIZE];
-} __packed;
+	__u8 cdb[VIRTIO_SCSI_CDB_SIZE];
+} __attribute__((packed));
 
 /* Response, followed by sense data and data-in */
 struct virtio_scsi_cmd_resp {
 	__virtio32 sense_len;		/* Sense data length */
 	__virtio32 resid;		/* Residual bytes in data buffer */
 	__virtio16 status_qualifier;	/* Status qualifier */
-	u8 status;		/* Command completion status */
-	u8 response;		/* Response values */
-	u8 sense[VIRTIO_SCSI_SENSE_SIZE];
-} __packed;
+	__u8 status;		/* Command completion status */
+	__u8 response;		/* Response values */
+	__u8 sense[VIRTIO_SCSI_SENSE_SIZE];
+} __attribute__((packed));
 
 /* Task Management Request */
 struct virtio_scsi_ctrl_tmf_req {
 	__virtio32 type;
 	__virtio32 subtype;
-	u8 lun[8];
+	__u8 lun[8];
 	__virtio64 tag;
-} __packed;
+} __attribute__((packed));
 
 struct virtio_scsi_ctrl_tmf_resp {
-	u8 response;
-} __packed;
+	__u8 response;
+} __attribute__((packed));
 
 /* Asynchronous notification query/subscription */
 struct virtio_scsi_ctrl_an_req {
 	__virtio32 type;
-	u8 lun[8];
+	__u8 lun[8];
 	__virtio32 event_requested;
-} __packed;
+} __attribute__((packed));
 
 struct virtio_scsi_ctrl_an_resp {
 	__virtio32 event_actual;
-	u8 response;
-} __packed;
+	__u8 response;
+} __attribute__((packed));
 
 struct virtio_scsi_event {
 	__virtio32 event;
-	u8 lun[8];
+	__u8 lun[8];
 	__virtio32 reason;
-} __packed;
+} __attribute__((packed));
 
 struct virtio_scsi_config {
-	u32 num_queues;
-	u32 seg_max;
-	u32 max_sectors;
-	u32 cmd_per_lun;
-	u32 event_info_size;
-	u32 sense_size;
-	u32 cdb_size;
-	u16 max_channel;
-	u16 max_target;
-	u32 max_lun;
-} __packed;
+	__u32 num_queues;
+	__u32 seg_max;
+	__u32 max_sectors;
+	__u32 cmd_per_lun;
+	__u32 event_info_size;
+	__u32 sense_size;
+	__u32 cdb_size;
+	__u16 max_channel;
+	__u16 max_target;
+	__u32 max_lun;
+} __attribute__((packed));
 
 /* Feature Bits */
 #define VIRTIO_SCSI_F_INOUT                    0
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 44a5581..e61947c 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -428,6 +428,7 @@ header-y += virtio_net.h
 header-y += virtio_pci.h
 header-y += virtio_ring.h
 header-y += virtio_rng.h
+header-y += virtio_scsi.h
 header=y += vm_sockets.h
 header-y += vt.h
 header-y += wait.h
-- 
MST

^ permalink raw reply related

* [PATCH v6 39/46] tun: add VNET_LE flag
From: Michael S. Tsirkin @ 2014-11-27 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	netdev, linux-api
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

virtio 1.0 modified virtio net header format,
making all fields little endian.

Users can tweak header format before submitting it to tun,
but this means more data copies where none were necessary.
And if the iovec is in RO memory, this means we might
need to split iovec also means we might in theory overflow
iovec max size.

This patch adds a simpler way for applications to handle this,
using new "little endian" flag in tun.
As a result, tun simply byte-swaps header fields as appropriate.
This is a NOP on LE architectures.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/if_tun.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 277a260..18b2403 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -57,6 +57,7 @@
 #define IFF_ONE_QUEUE	0x2000
 #define IFF_VNET_HDR	0x4000
 #define IFF_TUN_EXCL	0x8000
+#define IFF_VNET_LE	0x10000
 #define IFF_MULTI_QUEUE 0x0100
 #define IFF_ATTACH_QUEUE 0x0200
 #define IFF_DETACH_QUEUE 0x0400
-- 
MST

^ permalink raw reply related

* [PATCH v6 37/46] tun: move internal flag defines out of uapi
From: Michael S. Tsirkin @ 2014-11-27 20:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	Jason Wang, Zhi Yong Wu, Ben Hutchings, Herbert Xu, Tom Herbert,
	Masatake YAMATO, Xi Wang, netdev, linux-api
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

TUN_ flags are internal and never exposed
to userspace. Any application using it is almost
certainly buggy.

Move them out to tun.c, we'll remove them in follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/if_tun.h | 16 ++--------
 drivers/net/tun.c           | 74 ++++++++++++++-------------------------------
 2 files changed, 26 insertions(+), 64 deletions(-)

diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index e9502dd..277a260 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -22,21 +22,11 @@
 
 /* Read queue size */
 #define TUN_READQ_SIZE	500
-
-/* TUN device flags */
-#define TUN_TUN_DEV 	0x0001	
-#define TUN_TAP_DEV	0x0002
+/* TUN device type flags: deprecated. Use IFF_TUN/IFF_TAP instead. */
+#define TUN_TUN_DEV 	IFF_TUN
+#define TUN_TAP_DEV	IFF_TAP
 #define TUN_TYPE_MASK   0x000f
 
-#define TUN_FASYNC	0x0010
-#define TUN_NOCHECKSUM	0x0020
-#define TUN_NO_PI	0x0040
-/* This flag has no real effect */
-#define TUN_ONE_QUEUE	0x0080
-#define TUN_PERSIST 	0x0100	
-#define TUN_VNET_HDR 	0x0200
-#define TUN_TAP_MQ      0x0400
-
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
 #define TUNSETDEBUG   _IOW('T', 201, int) 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9dd3746..bc89d07 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -103,6 +103,21 @@ do {								\
 } while (0)
 #endif
 
+/* TUN device flags */
+
+/* IFF_ATTACH_QUEUE is never stored in device flags,
+ * overload it to mean fasync when stored there.
+ */
+#define TUN_FASYNC	IFF_ATTACH_QUEUE
+#define TUN_NO_PI	IFF_NO_PI
+/* This flag has no real effect */
+#define TUN_ONE_QUEUE	IFF_ONE_QUEUE
+#define TUN_PERSIST 	IFF_PERSIST
+#define TUN_VNET_HDR 	IFF_VNET_HDR
+#define TUN_TAP_MQ      IFF_MULTI_QUEUE
+
+#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
+		      IFF_MULTI_QUEUE)
 #define GOODCOPY_LEN 128
 
 #define FLT_EXACT_COUNT 8
@@ -1521,32 +1536,7 @@ static struct proto tun_proto = {
 
 static int tun_flags(struct tun_struct *tun)
 {
-	int flags = 0;
-
-	if (tun->flags & TUN_TUN_DEV)
-		flags |= IFF_TUN;
-	else
-		flags |= IFF_TAP;
-
-	if (tun->flags & TUN_NO_PI)
-		flags |= IFF_NO_PI;
-
-	/* This flag has no real effect.  We track the value for backwards
-	 * compatibility.
-	 */
-	if (tun->flags & TUN_ONE_QUEUE)
-		flags |= IFF_ONE_QUEUE;
-
-	if (tun->flags & TUN_VNET_HDR)
-		flags |= IFF_VNET_HDR;
-
-	if (tun->flags & TUN_TAP_MQ)
-		flags |= IFF_MULTI_QUEUE;
-
-	if (tun->flags & TUN_PERSIST)
-		flags |= IFF_PERSIST;
-
-	return flags;
+	return tun->flags & (TUN_FEATURES | IFF_PERSIST | IFF_TUN | IFF_TAP);
 }
 
 static ssize_t tun_show_flags(struct device *dev, struct device_attribute *attr,
@@ -1706,28 +1696,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 	tun_debug(KERN_INFO, tun, "tun_set_iff\n");
 
-	if (ifr->ifr_flags & IFF_NO_PI)
-		tun->flags |= TUN_NO_PI;
-	else
-		tun->flags &= ~TUN_NO_PI;
-
-	/* This flag has no real effect.  We track the value for backwards
-	 * compatibility.
-	 */
-	if (ifr->ifr_flags & IFF_ONE_QUEUE)
-		tun->flags |= TUN_ONE_QUEUE;
-	else
-		tun->flags &= ~TUN_ONE_QUEUE;
-
-	if (ifr->ifr_flags & IFF_VNET_HDR)
-		tun->flags |= TUN_VNET_HDR;
-	else
-		tun->flags &= ~TUN_VNET_HDR;
-
-	if (ifr->ifr_flags & IFF_MULTI_QUEUE)
-		tun->flags |= TUN_TAP_MQ;
-	else
-		tun->flags &= ~TUN_TAP_MQ;
+	tun->flags = (tun->flags & ~TUN_FEATURES) |
+		(ifr->ifr_flags & TUN_FEATURES);
 
 	/* Make sure persistent devices do not get stuck in
 	 * xoff state.
@@ -1890,9 +1860,11 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	if (cmd == TUNGETFEATURES) {
 		/* Currently this just means: "what IFF flags are valid?".
 		 * This is needed because we never checked for invalid flags on
-		 * TUNSETIFF. */
-		return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-				IFF_VNET_HDR | IFF_MULTI_QUEUE,
+		 * TUNSETIFF.  Why do we report IFF_TUN and IFF_TAP which are
+		 * not legal for TUNSETIFF here?  It's probably a bug, but it
+		 * doesn't seem to be worth fixing.
+		 */
+		return put_user(IFF_TUN | IFF_TAP | TUN_FEATURES,
 				(unsigned int __user*)argp);
 	} else if (cmd == TUNSETQUEUE)
 		return tun_set_queue(file, &ifr);
-- 
MST

^ permalink raw reply related

* [PATCH v6 16/46] virtio_blk: v1.0 support
From: Michael S. Tsirkin @ 2014-11-27 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	Rusty Russell, virtualization, linux-api
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

Based on patch by Cornelia Huck.

Note: for consistency, and to avoid sparse errors,
      convert all fields, even those no longer in use
      for virtio v1.0.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_blk.h | 15 ++++-----
 drivers/block/virtio_blk.c      | 70 ++++++++++++++++++++++++-----------------
 2 files changed, 49 insertions(+), 36 deletions(-)

diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 9ad67b2..247c8ba 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
 
 /* Feature bits */
 #define VIRTIO_BLK_F_BARRIER	0	/* Does host support barriers? */
@@ -114,18 +115,18 @@ struct virtio_blk_config {
 /* This is the first element of the read scatter-gather list. */
 struct virtio_blk_outhdr {
 	/* VIRTIO_BLK_T* */
-	__u32 type;
+	__virtio32 type;
 	/* io priority. */
-	__u32 ioprio;
+	__virtio32 ioprio;
 	/* Sector (ie. 512 byte offset) */
-	__u64 sector;
+	__virtio64 sector;
 };
 
 struct virtio_scsi_inhdr {
-	__u32 errors;
-	__u32 data_len;
-	__u32 sense_len;
-	__u32 residual;
+	__virtio32 errors;
+	__virtio32 data_len;
+	__virtio32 sense_len;
+	__virtio32 residual;
 };
 
 /* And this is the final byte of the write scatter-gather list. */
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index c6a27d5..f601f16 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -80,7 +80,7 @@ static int __virtblk_add_req(struct virtqueue *vq,
 {
 	struct scatterlist hdr, status, cmd, sense, inhdr, *sgs[6];
 	unsigned int num_out = 0, num_in = 0;
-	int type = vbr->out_hdr.type & ~VIRTIO_BLK_T_OUT;
+	__virtio32 type = vbr->out_hdr.type & ~cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT);
 
 	sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr));
 	sgs[num_out++] = &hdr;
@@ -91,19 +91,19 @@ static int __virtblk_add_req(struct virtqueue *vq,
 	 * block, and before the normal inhdr we put the sense data and the
 	 * inhdr with additional status information.
 	 */
-	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+	if (type == cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_SCSI_CMD)) {
 		sg_init_one(&cmd, vbr->req->cmd, vbr->req->cmd_len);
 		sgs[num_out++] = &cmd;
 	}
 
 	if (have_data) {
-		if (vbr->out_hdr.type & VIRTIO_BLK_T_OUT)
+		if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT))
 			sgs[num_out++] = data_sg;
 		else
 			sgs[num_out + num_in++] = data_sg;
 	}
 
-	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+	if (type == cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_SCSI_CMD)) {
 		sg_init_one(&sense, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
 		sgs[num_out + num_in++] = &sense;
 		sg_init_one(&inhdr, &vbr->in_hdr, sizeof(vbr->in_hdr));
@@ -119,12 +119,13 @@ static int __virtblk_add_req(struct virtqueue *vq,
 static inline void virtblk_request_done(struct request *req)
 {
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
+	struct virtio_blk *vblk = req->q->queuedata;
 	int error = virtblk_result(vbr);
 
 	if (req->cmd_type == REQ_TYPE_BLOCK_PC) {
-		req->resid_len = vbr->in_hdr.residual;
-		req->sense_len = vbr->in_hdr.sense_len;
-		req->errors = vbr->in_hdr.errors;
+		req->resid_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.residual);
+		req->sense_len = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.sense_len);
+		req->errors = virtio32_to_cpu(vblk->vdev, vbr->in_hdr.errors);
 	} else if (req->cmd_type == REQ_TYPE_SPECIAL) {
 		req->errors = (error != 0);
 	}
@@ -173,25 +174,25 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
 
 	vbr->req = req;
 	if (req->cmd_flags & REQ_FLUSH) {
-		vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH;
+		vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_FLUSH);
 		vbr->out_hdr.sector = 0;
-		vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+		vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 	} else {
 		switch (req->cmd_type) {
 		case REQ_TYPE_FS:
 			vbr->out_hdr.type = 0;
-			vbr->out_hdr.sector = blk_rq_pos(vbr->req);
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, blk_rq_pos(vbr->req));
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		case REQ_TYPE_BLOCK_PC:
-			vbr->out_hdr.type = VIRTIO_BLK_T_SCSI_CMD;
+			vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_SCSI_CMD);
 			vbr->out_hdr.sector = 0;
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		case REQ_TYPE_SPECIAL:
-			vbr->out_hdr.type = VIRTIO_BLK_T_GET_ID;
+			vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_GET_ID);
 			vbr->out_hdr.sector = 0;
-			vbr->out_hdr.ioprio = req_get_ioprio(vbr->req);
+			vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(vbr->req));
 			break;
 		default:
 			/* We don't put anything else in the queue. */
@@ -204,9 +205,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
 	num = blk_rq_map_sg(hctx->queue, vbr->req, vbr->sg);
 	if (num) {
 		if (rq_data_dir(vbr->req) == WRITE)
-			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
+			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT);
 		else
-			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
+			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_IN);
 	}
 
 	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
@@ -476,7 +477,8 @@ static int virtblk_get_cache_mode(struct virtio_device *vdev)
 				   struct virtio_blk_config, wce,
 				   &writeback);
 	if (err)
-		writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE);
+		writeback = virtio_has_feature(vdev, VIRTIO_BLK_F_WCE) ||
+		            virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
 
 	return writeback;
 }
@@ -821,25 +823,35 @@ static const struct virtio_device_id id_table[] = {
 	{ 0 },
 };
 
-static unsigned int features[] = {
+static unsigned int features_legacy[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
 	VIRTIO_BLK_F_WCE, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
 	VIRTIO_BLK_F_MQ,
+}
+;
+static unsigned int features[] = {
+	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
+	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
+	VIRTIO_BLK_F_TOPOLOGY,
+	VIRTIO_BLK_F_MQ,
+	VIRTIO_F_VERSION_1,
 };
 
 static struct virtio_driver virtio_blk = {
-	.feature_table		= features,
-	.feature_table_size	= ARRAY_SIZE(features),
-	.driver.name		= KBUILD_MODNAME,
-	.driver.owner		= THIS_MODULE,
-	.id_table		= id_table,
-	.probe			= virtblk_probe,
-	.remove			= virtblk_remove,
-	.config_changed		= virtblk_config_changed,
+	.feature_table			= features,
+	.feature_table_size		= ARRAY_SIZE(features),
+	.feature_table_legacy		= features_legacy,
+	.feature_table_size_legacy	= ARRAY_SIZE(features_legacy),
+	.driver.name			= KBUILD_MODNAME,
+	.driver.owner			= THIS_MODULE,
+	.id_table			= id_table,
+	.probe				= virtblk_probe,
+	.remove				= virtblk_remove,
+	.config_changed			= virtblk_config_changed,
 #ifdef CONFIG_PM_SLEEP
-	.freeze			= virtblk_freeze,
-	.restore		= virtblk_restore,
+	.freeze				= virtblk_freeze,
+	.restore			= virtblk_restore,
 #endif
 };
 
-- 
MST

^ permalink raw reply related

* [PATCH v6 15/46] virtio_net: v1.0 endianness
From: Michael S. Tsirkin @ 2014-11-27 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	Rusty Russell, virtualization, netdev, linux-api
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

Based on patches by Rusty Russell, Cornelia Huck.
Note: more code changes are needed for 1.0 support
(due to different header size).
So we don't advertize support for 1.0 yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_net.h | 15 ++++++++-------
 drivers/net/virtio_net.c        | 33 ++++++++++++++++++++-------------
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 172a7f0..b5f1677 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
+#include <linux/virtio_types.h>
 #include <linux/if_ether.h>
 
 /* The feature bitmap for virtio net */
@@ -84,17 +85,17 @@ struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	// GSO frame, IPv6 TCP
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	// TCP has ECN set
 	__u8 gso_type;
-	__u16 hdr_len;		/* Ethernet + IP + tcp/udp hdrs */
-	__u16 gso_size;		/* Bytes to append to hdr_len per frame */
-	__u16 csum_start;	/* Position to start checksumming from */
-	__u16 csum_offset;	/* Offset after that to place checksum */
+	__virtio16 hdr_len;		/* Ethernet + IP + tcp/udp hdrs */
+	__virtio16 gso_size;		/* Bytes to append to hdr_len per frame */
+	__virtio16 csum_start;	/* Position to start checksumming from */
+	__virtio16 csum_offset;	/* Offset after that to place checksum */
 };
 
 /* This is the version of the header to use when the MRG_RXBUF
  * feature has been negotiated. */
 struct virtio_net_hdr_mrg_rxbuf {
 	struct virtio_net_hdr hdr;
-	__u16 num_buffers;	/* Number of merged rx buffers */
+	__virtio16 num_buffers;	/* Number of merged rx buffers */
 };
 
 /*
@@ -149,7 +150,7 @@ typedef __u8 virtio_net_ctrl_ack;
  * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
  */
 struct virtio_net_ctrl_mac {
-	__u32 entries;
+	__virtio32 entries;
 	__u8 macs[][ETH_ALEN];
 } __attribute__((packed));
 
@@ -193,7 +194,7 @@ struct virtio_net_ctrl_mac {
  * specified.
  */
 struct virtio_net_ctrl_mq {
-	__u16 virtqueue_pairs;
+	__virtio16 virtqueue_pairs;
 };
 
 #define VIRTIO_NET_CTRL_MQ   4
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b0bc8ea..c07e030 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -347,13 +347,14 @@ err:
 }
 
 static struct sk_buff *receive_mergeable(struct net_device *dev,
+					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
 					 unsigned long ctx,
 					 unsigned int len)
 {
 	void *buf = mergeable_ctx_to_buf_address(ctx);
 	struct skb_vnet_hdr *hdr = buf;
-	int num_buf = hdr->mhdr.num_buffers;
+	u16 num_buf = virtio16_to_cpu(rq->vq->vdev, hdr->mhdr.num_buffers);
 	struct page *page = virt_to_head_page(buf);
 	int offset = buf - page_address(page);
 	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
@@ -369,7 +370,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &len);
 		if (unlikely(!ctx)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
-				 dev->name, num_buf, hdr->mhdr.num_buffers);
+				 dev->name, num_buf,
+				 virtio16_to_cpu(rq->vq->vdev,
+						 hdr->mhdr.num_buffers));
 			dev->stats.rx_length_errors++;
 			goto err_buf;
 		}
@@ -454,7 +457,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 	}
 
 	if (vi->mergeable_rx_bufs)
-		skb = receive_mergeable(dev, rq, (unsigned long)buf, len);
+		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
 	else if (vi->big_packets)
 		skb = receive_big(dev, rq, buf, len);
 	else
@@ -473,8 +476,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 	if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		pr_debug("Needs csum!\n");
 		if (!skb_partial_csum_set(skb,
-					  hdr->hdr.csum_start,
-					  hdr->hdr.csum_offset))
+			  virtio16_to_cpu(vi->vdev, hdr->hdr.csum_start),
+			  virtio16_to_cpu(vi->vdev, hdr->hdr.csum_offset)))
 			goto frame_err;
 	} else if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID) {
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -514,7 +517,8 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
 		if (hdr->hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
 
-		skb_shinfo(skb)->gso_size = hdr->hdr.gso_size;
+		skb_shinfo(skb)->gso_size = virtio16_to_cpu(vi->vdev,
+							    hdr->hdr.gso_size);
 		if (skb_shinfo(skb)->gso_size == 0) {
 			net_warn_ratelimited("%s: zero gso size.\n", dev->name);
 			goto frame_err;
@@ -876,16 +880,19 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-		hdr->hdr.csum_start = skb_checksum_start_offset(skb);
-		hdr->hdr.csum_offset = skb->csum_offset;
+		hdr->hdr.csum_start = cpu_to_virtio16(vi->vdev,
+						skb_checksum_start_offset(skb));
+		hdr->hdr.csum_offset = cpu_to_virtio16(vi->vdev,
+							 skb->csum_offset);
 	} else {
 		hdr->hdr.flags = 0;
 		hdr->hdr.csum_offset = hdr->hdr.csum_start = 0;
 	}
 
 	if (skb_is_gso(skb)) {
-		hdr->hdr.hdr_len = skb_headlen(skb);
-		hdr->hdr.gso_size = skb_shinfo(skb)->gso_size;
+		hdr->hdr.hdr_len = cpu_to_virtio16(vi->vdev, skb_headlen(skb));
+		hdr->hdr.gso_size = cpu_to_virtio16(vi->vdev,
+						    skb_shinfo(skb)->gso_size);
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
@@ -1112,7 +1119,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
 		return 0;
 
-	s.virtqueue_pairs = queue_pairs;
+	s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
 	sg_init_one(&sg, &s, sizeof(s));
 
 	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
@@ -1189,7 +1196,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	sg_init_table(sg, 2);
 
 	/* Store the unicast list and count in the front of the buffer */
-	mac_data->entries = uc_count;
+	mac_data->entries = cpu_to_virtio32(vi->vdev, uc_count);
 	i = 0;
 	netdev_for_each_uc_addr(ha, dev)
 		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
@@ -1200,7 +1207,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
 	/* multicast list and count fill the end */
 	mac_data = (void *)&mac_data->macs[uc_count][0];
 
-	mac_data->entries = mc_count;
+	mac_data->entries = cpu_to_virtio32(vi->vdev, mc_count);
 	i = 0;
 	netdev_for_each_mc_addr(ha, dev)
 		memcpy(&mac_data->macs[i++][0], ha->addr, ETH_ALEN);
-- 
MST

^ permalink raw reply related

* [PATCH v6 12/46] virtio: set FEATURES_OK
From: Michael S. Tsirkin @ 2014-11-27 20:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, cornelia.huck, rusty, nab, pbonzini, thuth, dahi,
	Rusty Russell, virtualization, linux-api
In-Reply-To: <1417118789-18231-1-git-send-email-mst@redhat.com>

set FEATURES_OK as per virtio 1.0 spec

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/virtio_config.h |  2 ++
 drivers/virtio/virtio.c            | 29 ++++++++++++++++++++++-------
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 80e7381..4d05671 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -38,6 +38,8 @@
 #define VIRTIO_CONFIG_S_DRIVER		2
 /* Driver has used its parts of the config, and is happy */
 #define VIRTIO_CONFIG_S_DRIVER_OK	4
+/* Driver has finished configuring features */
+#define VIRTIO_CONFIG_S_FEATURES_OK	8
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED		0x80
 
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 746d350..9248125 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -160,6 +160,7 @@ static int virtio_dev_probe(struct device *_d)
 	struct virtio_device *dev = dev_to_virtio(_d);
 	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
 	u64 device_features;
+	unsigned status;
 
 	/* We have a driver! */
 	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
@@ -183,18 +184,32 @@ static int virtio_dev_probe(struct device *_d)
 
 	dev->config->finalize_features(dev);
 
+	if (virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+		add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+		status = dev->config->get_status(dev);
+		if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
+			dev_err(_d, "virtio: device refuses features: %x\n",
+			       status);
+			err = -ENODEV;
+			goto err;
+		}
+	}
+
 	err = drv->probe(dev);
 	if (err)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
-	else {
-		add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-		if (drv->scan)
-			drv->scan(dev);
+		goto err;
 
-		virtio_config_enable(dev);
-	}
+	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	if (drv->scan)
+		drv->scan(dev);
+
+	virtio_config_enable(dev);
 
+	return 0;
+err:
+	add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
+
 }
 
 static int virtio_dev_remove(struct device *_d)
-- 
MST

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox