imx-drm: Add HDMI support

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* imx-drm: Add HDMI support
       [not found] <52602EEA.6060402@prisktech.co.nz>
@ 2013-10-17 19:39 ` Russell King - ARM Linux
  2013-10-17 19:52   ` Alexander Shiyan
  2013-10-19 15:55 ` Russell King - ARM Linux
  1 sibling, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-17 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 18, 2013 at 07:39:38AM +1300, Tony Prisk wrote:
> Hi Fabio,
>
> Little bit late to the party, but here is the patch for the latest  
> version of the HDMI driver I was working on. Having had a quick flick  
> through the mailing list I see Russell has noticed a few of the problems  
> I had earlier on, which are resolved in my later work. Thought this  
> might be useful to help you tidy things up.
>
> In particular, it fixes the VIC=6 forced issue, and the pink-line on the  
> edge of the screen issue.

Where to start.  Well, last night's testing resulted in this result
(from my g+ update last night):

1280x720 (720p) - works fine.
1920x1080 (1080p) - blank screen/no detected signal.
1280x1024 - blank screen/no detected signal.
1366x768 - works.
1024x768 - works.
800x600 - blank screen/no detected signal.
720x576 (576p) - speckles, the occasional line, and blanks.
848x480 - blank screen/no detected signal.
720x480 (480p) - speckles, the occasional line, and blanks.
640x480 - blank screen/no detected signal/occasional picture with speckles.

So there's a whole bunch of problems here - quite a lot of resolutions
just don't work.  Many reasons.

One of them is the settings for two registers on the HDMI Phy.
You and Fabio have these settings in your patch(es) - you do it in
different ways but the resulting settings are the same:

+    if (mode->clock <= 148500) {
+        imx_hdmi_phy_i2c_write(hdmi, 0x0210, HDMI_PHY_VLEVCTRL);
+        imx_hdmi_phy_i2c_write(hdmi, 0x8009, HDMI_PHY_CKSYMTXCTRL);
+    } else {
+        imx_hdmi_phy_i2c_write(hdmi, 0x0129, HDMI_PHY_VLEVCTRL);
+        imx_hdmi_phy_i2c_write(hdmi, 0x800b, HDMI_PHY_CKSYMTXCTRL);
+    }

This doesn't work reliably for me, and differs from the FSL 4.1.0 BSP,
which does work for me:

+       hdmi_phy_i2c_write(hdmi, 0x800d, 0x09);  /* CKSYMTXCTRL */
+       hdmi_phy_i2c_write(hdmi, 0x01ad, 0x0E);  /* VLEVCTRL */

This stops the 576p and 480p modes from blanking, although the speckles
are still there.  So, using the freescale settings results in a marginal
improvement - and actually means that when I put a 'scope on the lines,
the behaviour doesn't change.  That tells me the settings both you and
Fabio are suggesting seem to be borderline.

Now, what did 'scoping the TMDS clock tell me.  Well, I have a photograph
of the TMDS clock at the non-working 800x600 resolution:

  http://www.home.arm.linux.org.uk/~rmk/imx/IMG_1545-adj.JPG

and they say a picture paints a thousand words.  Essentially, the TMDS
clock is being frequency modulated.

If I load up a LVDS output on the same IPU output as the HDMI, and give
it the same EDID as on the HDMI, and then do this to X (from working
720p to non-working 800x600):

# xrandr --output LVDS1 --off --output HDMI1 --mode 800x600

That results in the TV blanking and the modulation on the TMDS clock
line.  Incidentally, the TMDS clock is a little over 40.01MHz.  Follow
that with:

# xrandr --output HDMI1 --off --output LVDS1 --mode 800x600

and the TV unblanks, says it has 800x600 and the picture looks fine.
The TMDS clock looks clean, and it's running at 39.997MHz.  Go back
to the HDMI1 output:

# xrandr --output LVDS1 --off --output HDMI1 --mode 800x600

and the picture goes again and the TMDS clock is again unstable, and
at the slightly higher frequency.

Note: This is only possible because of other bugs in the HDMI output
driver - it doesn't _actually_ shut down the output when told to by
DRM, but leaves it mostly programmed as it was, only clearing two
bits which are to do with low power mode (PDZ and ENTMDS in the
HDMI_PHY_CONF0 register).  These don't actually shut down the output
at all!  Put a scope on the output to prove it to yourself!

I've dumped out all the HDMI registers.  Very little changes between
these two (only the PHY_CONF0 and a couple of registers in the Phy
change to do with reading back the impedance of the lines.)  So it's
not a HDMI or a HDMI phy problem.

What's more interesting is the DI0 clocking mode changes between
these two: with HDMI1 selected, the IPU is used as the clock for
the DI.  With LVDS, the DI0 "external" clock is used instead.

Great, or so I thought.  Let's try using the same DI0 external clock
for HDMI.  Now, what happens is slightly strange.  With just
IPU_DI_CLKMODE_EXT set, our initial mode works fine, but others
don't because the clock doesn't get set - unless we go via selecting
LVDS1 output mode first.

With IPU_DI_CLKMODE_EXT and IPU_DI_CLKMODE_SYNC set, things become
loads worse - switching resolutions loses display and returning back
to the working one also fails until the next reboot.

Now, I notice that the FSL 4.1.0 stuff appears to try and decide
which of the two clock inputs to use for the DI module, something
which isn't done in the driver in staging.  Also there's the commented
out stuff:

#ifdef WTF_IS_THIS
        /*
         * Freescale has this in their Kernel. It is neither clear what
         * it does nor why it does it
         */
        if (div & 0x10)
                div &= ~0x7;
        else {
                /* Round up divider if it gets us closer to desired pix clk */
                if ((div & 0xC) == 0xC) {
                        div += 0x10;
                        div &= ~0xF;
                }
        }
#endif

Quite simply, the "div" value is as follows:

bits 11:4 - integer divider
bits  3:0 - fractional part

So, what this is saying is that for any _odd_ integer divider, we want
to either divide by the odd divider, or odd.5 - but no other fractional
values.  Otherwise, if the divider is an even integer divider, and the
fractional part is greater than .75, we want to round that up.  (It may
also be that the final div & ~0xf should be outside that second if
statement.)

It could be that the fractional divider is broken and doesn't work
correctly with the values that this is trying to avoid.

That would explain the apparant modulation on the TMDS clock (normally
such fractional dividers work by skipping clock pulses, which would
then explain why the MPLL slews back and forth in a fairly regular
manner.)

If you're doing scan-out, you *really* don't want a pulse-skipping
fractional divider anywhere in the chain.

Well,@for the time being, I'm giving up with HDMI on the IMX - it's
far too broken for it to be sensibly used, and the IPU stuff in the
manual just lacks far too much information to be able to debug this
stuff properly.  I fear that this hardware is a victim of its own
immense complexity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: imx-drm: Add HDMI support
  2013-10-17 19:39 ` imx-drm: Add HDMI support Russell King - ARM Linux
@ 2013-10-17 19:52   ` Alexander Shiyan
  2013-10-17 20:35     ` Russell King - ARM Linux
  2013-10-30 18:01     ` Matt Sealey
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Shiyan @ 2013-10-17 19:52 UTC (permalink / raw)
  To: linux-arm-kernel

> > Little bit late to the party, but here is the patch for the latest  
> > version of the HDMI driver I was working on. Having had a quick flick  
> > through the mailing list I see Russell has noticed a few of the problems  
> > I had earlier on, which are resolved in my later work. Thought this  
> > might be useful to help you tidy things up.
...
> Now, I notice that the FSL 4.1.0 stuff appears to try and decide
> which of the two clock inputs to use for the DI module, something
> which isn't done in the driver in staging.  Also there's the commented
> out stuff:
> 
> #ifdef WTF_IS_THIS
>         /*
>          * Freescale has this in their Kernel. It is neither clear what
>          * it does nor why it does it
>          */
>         if (div & 0x10)
>                 div &= ~0x7;
>         else {
>                 /* Round up divider if it gets us closer to desired pix clk */
>                 if ((div & 0xC) == 0xC) {
>                         div += 0x10;
>                         div &= ~0xF;
>                 }
>         }
> #endif
> 
> Quite simply, the "div" value is as follows:
> 
> bits 11:4 - integer divider
> bits  3:0 - fractional part
> 
> So, what this is saying is that for any _odd_ integer divider, we want
> to either divide by the odd divider, or odd.5 - but no other fractional
> values.  Otherwise, if the divider is an even integer divider, and the
> fractional part is greater than .75, we want to round that up.  (It may
> also be that the final div & ~0xf should be outside that second if
> statement.)
> 
> It could be that the fractional divider is broken and doesn't work
> correctly with the values that this is trying to avoid.

This problem has been described by me, but solution hung in the air:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/176424.html

---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-17 19:52   ` Alexander Shiyan
@ 2013-10-17 20:35     ` Russell King - ARM Linux
  2013-10-19 12:12       ` Russell King - ARM Linux
  2013-10-30 18:01     ` Matt Sealey
  1 sibling, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-17 20:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 17, 2013 at 11:52:15PM +0400, Alexander Shiyan wrote:
> > > Little bit late to the party, but here is the patch for the latest  
> > > version of the HDMI driver I was working on. Having had a quick flick  
> > > through the mailing list I see Russell has noticed a few of the problems  
> > > I had earlier on, which are resolved in my later work. Thought this  
> > > might be useful to help you tidy things up.
> ...
> > Now, I notice that the FSL 4.1.0 stuff appears to try and decide
> > which of the two clock inputs to use for the DI module, something
> > which isn't done in the driver in staging.  Also there's the commented
> > out stuff:
> > 
> > #ifdef WTF_IS_THIS
> >         /*
> >          * Freescale has this in their Kernel. It is neither clear what
> >          * it does nor why it does it
> >          */
> >         if (div & 0x10)
> >                 div &= ~0x7;
> >         else {
> >                 /* Round up divider if it gets us closer to desired pix clk */
> >                 if ((div & 0xC) == 0xC) {
> >                         div += 0x10;
> >                         div &= ~0xF;
> >                 }
> >         }
> > #endif
> > 
> > Quite simply, the "div" value is as follows:
> > 
> > bits 11:4 - integer divider
> > bits  3:0 - fractional part
> > 
> > So, what this is saying is that for any _odd_ integer divider, we want
> > to either divide by the odd divider, or odd.5 - but no other fractional
> > values.  Otherwise, if the divider is an even integer divider, and the
> > fractional part is greater than .75, we want to round that up.  (It may
> > also be that the final div & ~0xf should be outside that second if
> > statement.)
> > 
> > It could be that the fractional divider is broken and doesn't work
> > correctly with the values that this is trying to avoid.
> 
> This problem has been described by me, but solution hung in the air:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/176424.html

There's a whinge I've got to have here about the clk API code in here,
and the use of the clk API.  It's broken.

Let's start at the beginning.  What does "clk_set_rate()" and
"clk_round_rate()" do?  Well, clk_set_rate() sets the rate on the clock
to the value requested or a value close to that value appropriate to the
clock.  So, what does clk_round_rate() do?  clk_round_rate() tells you
what clock rate you _would_ get if you requested that rate via
clk_set_rate().

In other words:

	rounded_rate = clk_round_rate(clk, rate);

and

	clk_set_rate(clk, rate);
	rounded_rate = clk_get_rate(clk);

are exactly the same _except_ that the former doesn't have the side
effect of changing the actual clock itself.  So:

	rounded_rate = clk_round_rate(clk, rate);
	clk_set_rate(clk, rounded_rate);

is quite insane and wasteful.

--- end of whinge ---

Using your adjustements (which are basically):

        if (div & 0xf)
                div += 0x10;
        div &= ~15;

gives me working 1024x768, 800x600, and 640x480, (all @60, other
refreshes not checked) but nothing else.

I then thought about "round to nearest", which would be:

	if (div & 8)
		div += 8;
	div &= ~7;

but this gives me no difference from your version.  The plus point is
that with both of these, the result is a nice and clean display, with
no sign of any speckles or any other artifacts.

So, what happens if I re-enable the WTF_IS_THIS code?  Well:

1280x720p @ 60 works.
1280x720p @ 50 works.
1366x768 @ 59.8 works.
1024x768 @ 60 works.
800x600 @ 60.3 works.

1920x1080p @ 50 doesn't.
1920x1080p @ 60 doesn't.
1280x1024 @ 60 doesn't.
800x600 @ 56.2 doesn't.
720x576 @ 50 doesn't.
848x480 @ 60 doesn't.
720x480 @ 60 doesn't.
640x480 @ 60 doesn't.

What else does the FSL 4.1.0 code do?  Well, there's this:

	if (clk_get(NULL, "tve_clk") == di_parent ||
		clk_get(NULL, "ldb_di0_clk") == di_parent ||
		clk_get(NULL, "ldb_di1_clk") == di_parent) {
		/* if di clk parent is tve/ldb, then keep it;*/
		dev_dbg(ipu->dev, "use special clk parent\n");
		clk_set_parent(&ipu->pixel_clk[disp], ipu->di_clk[disp]);
	} else {
		/* try ipu clk first*/
		dev_dbg(ipu->dev, "try ipu internal clk\n");
		clk_set_parent(&ipu->pixel_clk[disp], ipu->ipu_clk);
		rounded_pixel_clk = clk_round_rate(&ipu->pixel_clk[disp], pixel_clk);
		/*
		 * we will only use 1/2 fraction for ipu clk,
		 * so if the clk rate is not fit, try ext clk.
		 */
		if (!sig.int_clk &&
			((rounded_pixel_clk >= pixel_clk + pixel_clk/200) ||
			(rounded_pixel_clk <= pixel_clk - pixel_clk/200))) {
			dev_dbg(ipu->dev, "try ipu ext di clk\n");
			rounded_pixel_clk = pixel_clk * 2;
			rounded_parent_clk = clk_round_rate(di_parent,
						rounded_pixel_clk);
			while (rounded_pixel_clk < rounded_parent_clk) {
				/* the max divider from parent to di is 8 */
				if (rounded_parent_clk / pixel_clk < 8)
					rounded_pixel_clk += pixel_clk * 2;
				else
					rounded_pixel_clk *= 2;
			}
			clk_set_rate(di_parent, rounded_pixel_clk);
			rounded_pixel_clk =
				clk_round_rate(ipu->di_clk[disp], pixel_clk);
			clk_set_rate(ipu->di_clk[disp], rounded_pixel_clk);
			clk_set_parent(&ipu->pixel_clk[disp], ipu->di_clk[disp]);
		}
	}
	rounded_pixel_clk = clk_round_rate(&ipu->pixel_clk[disp], pixel_clk);
	clk_set_rate(&ipu->pixel_clk[disp], rounded_pixel_clk);

which appears to be about selecting either the IPU clock or the DI
external clock.  (The comments are misleading btw.)  This is
unimplemented in imx-drm, and it just selects the IPU clock or the DI
clock depending on which "encoder" is being used.  For HDMI, that's IPU
clock only.  For LVDS and TVE, that's DI external clock only.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-17 20:35     ` Russell King - ARM Linux
@ 2013-10-19 12:12       ` Russell King - ARM Linux
  2013-10-21  9:13         ` Shawn Guo
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-19 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

Okay, more on this.

If I switch to using the DI clock, sourced from PLL5, and then "fix"
the clock tree such that I can change PLL5's rate, then I get most of
the display modes working.

However, it will occasionally not work, and when that happens I need
to reboot to get any kind of output working again.  I haven't delved
into it with the scope to find out what the HDMI clock is doing yet.

Having solved it, what I think is going on is as follows:

When one of the PLLs is prepared and then subsequently enabled, the
following sequence is used:

- Clear bypass
- Power up the PLL
- Wait for the PLL to indicate lock
- Enable output

This causes problems for me.  If I change this to:

- Power up the PLL
- Wait for PLL to indicate lock
- Clear bypass
- Enable output

then I don't see any problems.  My theory is that the bypass mux and/or
enable gate settings are synchronised with the mux output, and if the
mux output glitches while the PLL is gaining lock, it knocks that out
preventing the output from ever being enabled until the chip is hit with
a reset.

The other issue is that the set_rate() functions don't wait for the PLL
to lock before returning, and don't set bypass mode.  It looks like this
code relies on drivers unpreparing the clocks before calling clk_set_rate().
I don't see the clk API enforcing this in any way, so I'd suggest that
the set_rate() function also does the bypass -> set rate -> wait lock
-> clear bypass transition too.  Maybe this should be applied to the
other PLLs too.

Here's a diff which updates the av PLL to do this:

 arch/arm/mach-imx/clk-pllv3.c |   77 ++++++++++++++++++++++++++++------------
 1 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mach-imx/clk-pllv3.c b/arch/arm/mach-imx/clk-pllv3.c
index f6640b6..11a7048 100644
--- a/arch/arm/mach-imx/clk-pllv3.c
+++ b/arch/arm/mach-imx/clk-pllv3.c
@@ -45,21 +45,10 @@ struct clk_pllv3 {
 
 #define to_clk_pllv3(_hw) container_of(_hw, struct clk_pllv3, hw)
 
-static int clk_pllv3_prepare(struct clk_hw *hw)
+static int clk_pllv3_wait_lock(struct clk_pllv3 *pll)
 {
-	struct clk_pllv3 *pll = to_clk_pllv3(hw);
-	unsigned long timeout;
-	u32 val;
-
-	val = readl_relaxed(pll->base);
-	val &= ~BM_PLL_BYPASS;
-	if (pll->powerup_set)
-		val |= BM_PLL_POWER;
-	else
-		val &= ~BM_PLL_POWER;
-	writel_relaxed(val, pll->base);
+	unsigned long timeout = jiffies + msecs_to_jiffies(100);
 
-	timeout = jiffies + msecs_to_jiffies(10);
 	/* Wait for PLL to lock */
 	do {
 		if (readl_relaxed(pll->base) & BM_PLL_LOCK)
@@ -68,10 +57,31 @@ static int clk_pllv3_prepare(struct clk_hw *hw)
 			break;
 	} while (1);
 
-	if (readl_relaxed(pll->base) & BM_PLL_LOCK)
-		return 0;
+	return readl_relaxed(pll->base) & BM_PLL_LOCK ? 0 : -ETIMEDOUT;
+}
+
+static int clk_pllv3_prepare(struct clk_hw *hw)
+{
+	struct clk_pllv3 *pll = to_clk_pllv3(hw);
+	u32 val, newval;
+	int ret;
+
+	val = readl_relaxed(pll->base);
+	if (pll->powerup_set)
+		newval = val | BM_PLL_POWER;
 	else
-		return -ETIMEDOUT;
+		newval = val & ~BM_PLL_POWER;
+	writel_relaxed(newval, pll->base);
+
+	ret = clk_pllv3_wait_lock(pll);
+	if (ret == 0) {
+		newval &= ~BM_PLL_BYPASS;
+		writel_relaxed(newval, pll->base);
+	} else {
+		writel_relaxed(val, pll->base);
+	}
+
+	return ret;
 }
 
 static void clk_pllv3_unprepare(struct clk_hw *hw)
@@ -80,6 +90,7 @@ static void clk_pllv3_unprepare(struct clk_hw *hw)
 	u32 val;
 
 	val = readl_relaxed(pll->base);
+	pr_info("%s: 0x%08x\n", __func__, val);
 	val |= BM_PLL_BYPASS;
 	if (pll->powerup_set)
 		val &= ~BM_PLL_POWER;
@@ -256,9 +267,10 @@ static int clk_pllv3_av_set_rate(struct clk_hw *hw, unsigned long rate,
 	struct clk_pllv3 *pll = to_clk_pllv3(hw);
 	unsigned long min_rate = parent_rate * 27;
 	unsigned long max_rate = parent_rate * 54;
-	u32 val, div;
+	u32 val, newval, div;
 	u32 mfn, mfd = 1000000;
 	s64 temp64;
+	int ret;
 
 	if (rate < min_rate || rate > max_rate)
 		return -EINVAL;
@@ -270,13 +282,32 @@ static int clk_pllv3_av_set_rate(struct clk_hw *hw, unsigned long rate,
 	mfn = temp64;
 
 	val = readl_relaxed(pll->base);
-	val &= ~pll->div_mask;
-	val |= div;
-	writel_relaxed(val, pll->base);
+
+	/* set the PLL into bypass mode */
+	newval = val | BM_PLL_BYPASS;
+	writel_relaxed(newval, pll->base);
+
+	/* configure the new frequency */
+	newval &= ~pll->div_mask;
+	newval |= div;
+	writel_relaxed(newval, pll->base);
 	writel_relaxed(mfn, pll->base + PLL_NUM_OFFSET);
-	writel_relaxed(mfd, pll->base + PLL_DENOM_OFFSET);
+	writel(mfd, pll->base + PLL_DENOM_OFFSET);
+
+	if (val & BM_PLL_POWER) {
+		/* PLL is powered down */
+		ret = 0;
+	} else {
+		ret = clk_pllv3_wait_lock(pll);
+		if (ret == 0) {
+			/* only if it locked can we switch back to the PLL */
+			newval &= ~BM_PLL_BYPASS;
+			newval |= val & BM_PLL_BYPASS;
+			writel(newval, pll->base);
+		}
+	}
 
-	return 0;
+	return ret;
 }
 
 static const struct clk_ops clk_pllv3_av_ops = {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
       [not found] <52602EEA.6060402@prisktech.co.nz>
  2013-10-17 19:39 ` imx-drm: Add HDMI support Russell King - ARM Linux
@ 2013-10-19 15:55 ` Russell King - ARM Linux
  1 sibling, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-19 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

Some comments from trying to compile and test this.

Firstly, it's white space damaged.

On Fri, Oct 18, 2013 at 07:39:38AM +1300, Tony Prisk wrote:
> diff --git a/drivers/staging/imx-drm/imx-hdmi.c  
> b/drivers/staging/imx-drm/imx-hdmi.c
> new file mode 100644
> index 0000000..a4b2934
> --- /dev/null
> +++ b/drivers/staging/imx-drm/imx-hdmi.c
> @@ -0,0 +1,1710 @@
...
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/of_i2c.h>

Doesn't exist in v3.12-rc.

> +    /* HSYNC active edge delay width (pixel clocks) */
> +    margin = mode->htotal - mode->hsync_end;
> +    hdmi_writeb(hdmi, margin >> 8, HDMI_FC_HSYNCINDELAY1);
> +    hdmi_writeb(hdmi, margin, HDMI_FC_HSYNCINDELAY0);
> +    
...
> +    /* VSYNC active edge delay width (pixel clocks) */
> +    margin = mode->vtotal - mode->vsync_end;
> +    hdmi_writeb(hdmi, margin, HDMI_FC_VSYNCINDELAY);

Both of these are wrong.  The delay is from the end of the active region
to the start of sync pulse.  That's:

	margin = mode->hsync_start - mode->hdisplay;

Also, commentry is wrong: vertical units are in lines, not pixel clocks.

> +static void imx_hdmi_update_csc_coeffs(struct imx_hdmi *hdmi)
> +{
> +    const u16 (*csc_coeff)[3][4] = &csc_coeff_default;
...
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][0] & 0xff), HDMI_CSC_COEF_A1_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][0] >> 8), HDMI_CSC_COEF_A1_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][1] & 0xff), HDMI_CSC_COEF_A2_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][1] >> 8), HDMI_CSC_COEF_A2_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][2] & 0xff), HDMI_CSC_COEF_A3_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][2] >> 8), HDMI_CSC_COEF_A3_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][3] & 0xff), HDMI_CSC_COEF_A4_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[0][4] >> 8), HDMI_CSC_COEF_A4_MSB);

Array overrun.  Should be [0][3] not [0][4].

> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][0] & 0xff), HDMI_CSC_COEF_B1_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][0] >> 8), HDMI_CSC_COEF_B1_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][1] & 0xff), HDMI_CSC_COEF_B2_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][1] >> 8), HDMI_CSC_COEF_B2_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][2] & 0xff), HDMI_CSC_COEF_B3_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][2] >> 8), HDMI_CSC_COEF_B3_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][3] & 0xff), HDMI_CSC_COEF_B4_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[1][4] >> 8), HDMI_CSC_COEF_B4_MSB);

Array overrun.  Should be [1][3] not [1][4].

> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][0] & 0xff), HDMI_CSC_COEF_C1_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][0] >> 8), HDMI_CSC_COEF_C1_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][1] & 0xff), HDMI_CSC_COEF_C2_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][1] >> 8), HDMI_CSC_COEF_C2_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][2] & 0xff), HDMI_CSC_COEF_C3_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][2] >> 8), HDMI_CSC_COEF_C3_MSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][3] & 0xff), HDMI_CSC_COEF_C4_LSB);
> +    hdmi_writeb(hdmi, ((*csc_coeff)[2][4] >> 8), HDMI_CSC_COEF_C4_MSB);

Array overrun.  Should be [2][3] not [2][4].

> +static int imx_hdmi_register_drm(struct imx_hdmi *hdmi)
> +{
> +    int ret;
> +
> +    drm_mode_connector_attach_encoder(&hdmi->connector, &hdmi->encoder);

This must be done after the encoder has been added, so that
hdmi->encoder.base.id has been initialised.  Other IMX DRM bridges
do this right at the end.  Failure to do this causes the connector
to have no encoders.

> +
> +    hdmi->connector.funcs = &imx_hdmi_connector_funcs;
> +    hdmi->encoder.funcs = &imx_hdmi_encoder_funcs;
> +
> +    hdmi->encoder.encoder_type = DRM_MODE_ENCODER_TMDS;
> +    hdmi->connector.connector_type = DRM_MODE_CONNECTOR_HDMIA;
> +
> +    hdmi->connector.polled = DRM_CONNECTOR_POLL_HPD;
> +
> +    drm_encoder_helper_add(&hdmi->encoder, &imx_hdmi_encoder_helper_funcs);
> +    ret = imx_drm_add_encoder(&hdmi->encoder, &hdmi->imx_drm_encoder,
> +            THIS_MODULE);
> +    if (ret) {
> +        dev_err(hdmi->dev, "adding encoder failed with %d\n", ret);
> +        return ret;
> +    }
> +
> +    drm_connector_helper_add(&hdmi->connector,
> +            &imx_hdmi_connector_helper_funcs);
> +
> +    ret = imx_drm_add_connector(&hdmi->connector,
> +            &hdmi->imx_drm_connector, THIS_MODULE);
> +    if (ret) {
> +        imx_drm_remove_encoder(hdmi->imx_drm_encoder);
> +        dev_err(hdmi->dev, "adding connector failed with %d\n", ret);
> +        return ret;
> +    }
> +
> +    hdmi->connector.encoder = &hdmi->encoder;

drm_mode_connector_attach_encoder() should be here.

On testing, the display flashes on and off, and there's a definite skew
in the output on the last few lines of a 720p display.  This is due to
the wrong sync margins I point out above.

It also suffers from the magenta line down the left hand side of the
display.  This code doesn't seem to solve the issue on imx6solo:

    hdmi_writeb(hdmi, ~HDMI_MC_SWRSTZ_TMDSSWRST_REQ, HDMI_MC_SWRSTZ); 
 
    val = hdmi_readb(hdmi, HDMI_FC_INVIDCONF);   
    
    for (i = 0; i < 5; i++)
        hdmi_writeb(hdmi, val, HDMI_FC_INVIDCONF);
 
    hdmi_writeb(hdmi, ~HDMI_MC_SWRSTZ_TMDSSWRST_REQ, HDMI_MC_SWRSTZ);

instead, it seems to require this:

	/* TMDS software reset */
	hdmi_writeb(hdmi, (u8)~HDMI_MC_SWRSTZ_TMDSSWRST_REQ, HDMI_MC_SWRSTZ);

 	val = hdmi_readb(hdmi, HDMI_FC_INVIDCONF);
	if (hdmi->mx6dl_workaround) {
		hdmi_writeb(hdmi, val, HDMI_FC_INVIDCONF);
		return;
	}
 
 	for (count = 0; count < 5; count++)
 		hdmi_writeb(hdmi, val, HDMI_FC_INVIDCONF);

where mx6dl_workaround is true for the 6solo/duallite, but false for
6quad.  In other words for mx6dl, one reset, followed by a _single_
read + write of HDMI_FC_INVIDCONF.

As I've written previously, I think we need to use two compatible
strings to identify the different SoCs so we can activate the
appropriate workaround according to which SoC we're running on.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-19 12:12       ` Russell King - ARM Linux
@ 2013-10-21  9:13         ` Shawn Guo
  2013-10-21  9:34           ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Shawn Guo @ 2013-10-21  9:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Oct 19, 2013 at 01:12:18PM +0100, Russell King - ARM Linux wrote:
> Okay, more on this.
> 
> If I switch to using the DI clock, sourced from PLL5, and then "fix"
> the clock tree such that I can change PLL5's rate, then I get most of
> the display modes working.
> 
> However, it will occasionally not work, and when that happens I need
> to reboot to get any kind of output working again.  I haven't delved
> into it with the scope to find out what the HDMI clock is doing yet.
> 
> Having solved it, what I think is going on is as follows:
> 
> When one of the PLLs is prepared and then subsequently enabled, the
> following sequence is used:
> 
> - Clear bypass
> - Power up the PLL
> - Wait for the PLL to indicate lock
> - Enable output
> 
> This causes problems for me.  If I change this to:
> 
> - Power up the PLL
> - Wait for PLL to indicate lock
> - Clear bypass
> - Enable output
> 
> then I don't see any problems.  My theory is that the bypass mux and/or
> enable gate settings are synchronised with the mux output, and if the
> mux output glitches while the PLL is gaining lock, it knocks that out
> preventing the output from ever being enabled until the chip is hit with
> a reset.

You're right.  I just checked FSL 3.0.35 kernel and found that BYPASS
bit is written after POWER bit.

> The other issue is that the set_rate() functions don't wait for the PLL
> to lock before returning, and don't set bypass mode.  It looks like this
> code relies on drivers unpreparing the clocks before calling clk_set_rate().
> I don't see the clk API enforcing this in any way, so I'd suggest that
> the set_rate() function also does the bypass -> set rate -> wait lock
> -> clear bypass transition too.  Maybe this should be applied to the
> other PLLs too.

Yes.  We experienced the problem when upgrading FSL kernel tree to 3.10.
The .set_rate() should wait for LOCK bit, but does not need to touch
BYPASS.

> Here's a diff which updates the av PLL to do this:

Some other PLLs needs update too.  I will cook up patches to fix these
problems.

Shawn

>  arch/arm/mach-imx/clk-pllv3.c |   77 ++++++++++++++++++++++++++++------------
>  1 files changed, 54 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/mach-imx/clk-pllv3.c b/arch/arm/mach-imx/clk-pllv3.c
> index f6640b6..11a7048 100644
> --- a/arch/arm/mach-imx/clk-pllv3.c
> +++ b/arch/arm/mach-imx/clk-pllv3.c
> @@ -45,21 +45,10 @@ struct clk_pllv3 {
>  
>  #define to_clk_pllv3(_hw) container_of(_hw, struct clk_pllv3, hw)
>  
> -static int clk_pllv3_prepare(struct clk_hw *hw)
> +static int clk_pllv3_wait_lock(struct clk_pllv3 *pll)
>  {
> -	struct clk_pllv3 *pll = to_clk_pllv3(hw);
> -	unsigned long timeout;
> -	u32 val;
> -
> -	val = readl_relaxed(pll->base);
> -	val &= ~BM_PLL_BYPASS;
> -	if (pll->powerup_set)
> -		val |= BM_PLL_POWER;
> -	else
> -		val &= ~BM_PLL_POWER;
> -	writel_relaxed(val, pll->base);
> +	unsigned long timeout = jiffies + msecs_to_jiffies(100);
>  
> -	timeout = jiffies + msecs_to_jiffies(10);
>  	/* Wait for PLL to lock */
>  	do {
>  		if (readl_relaxed(pll->base) & BM_PLL_LOCK)
> @@ -68,10 +57,31 @@ static int clk_pllv3_prepare(struct clk_hw *hw)
>  			break;
>  	} while (1);
>  
> -	if (readl_relaxed(pll->base) & BM_PLL_LOCK)
> -		return 0;
> +	return readl_relaxed(pll->base) & BM_PLL_LOCK ? 0 : -ETIMEDOUT;
> +}
> +
> +static int clk_pllv3_prepare(struct clk_hw *hw)
> +{
> +	struct clk_pllv3 *pll = to_clk_pllv3(hw);
> +	u32 val, newval;
> +	int ret;
> +
> +	val = readl_relaxed(pll->base);
> +	if (pll->powerup_set)
> +		newval = val | BM_PLL_POWER;
>  	else
> -		return -ETIMEDOUT;
> +		newval = val & ~BM_PLL_POWER;
> +	writel_relaxed(newval, pll->base);
> +
> +	ret = clk_pllv3_wait_lock(pll);
> +	if (ret == 0) {
> +		newval &= ~BM_PLL_BYPASS;
> +		writel_relaxed(newval, pll->base);
> +	} else {
> +		writel_relaxed(val, pll->base);
> +	}
> +
> +	return ret;
>  }
>  
>  static void clk_pllv3_unprepare(struct clk_hw *hw)
> @@ -80,6 +90,7 @@ static void clk_pllv3_unprepare(struct clk_hw *hw)
>  	u32 val;
>  
>  	val = readl_relaxed(pll->base);
> +	pr_info("%s: 0x%08x\n", __func__, val);
>  	val |= BM_PLL_BYPASS;
>  	if (pll->powerup_set)
>  		val &= ~BM_PLL_POWER;
> @@ -256,9 +267,10 @@ static int clk_pllv3_av_set_rate(struct clk_hw *hw, unsigned long rate,
>  	struct clk_pllv3 *pll = to_clk_pllv3(hw);
>  	unsigned long min_rate = parent_rate * 27;
>  	unsigned long max_rate = parent_rate * 54;
> -	u32 val, div;
> +	u32 val, newval, div;
>  	u32 mfn, mfd = 1000000;
>  	s64 temp64;
> +	int ret;
>  
>  	if (rate < min_rate || rate > max_rate)
>  		return -EINVAL;
> @@ -270,13 +282,32 @@ static int clk_pllv3_av_set_rate(struct clk_hw *hw, unsigned long rate,
>  	mfn = temp64;
>  
>  	val = readl_relaxed(pll->base);
> -	val &= ~pll->div_mask;
> -	val |= div;
> -	writel_relaxed(val, pll->base);
> +
> +	/* set the PLL into bypass mode */
> +	newval = val | BM_PLL_BYPASS;
> +	writel_relaxed(newval, pll->base);
> +
> +	/* configure the new frequency */
> +	newval &= ~pll->div_mask;
> +	newval |= div;
> +	writel_relaxed(newval, pll->base);
>  	writel_relaxed(mfn, pll->base + PLL_NUM_OFFSET);
> -	writel_relaxed(mfd, pll->base + PLL_DENOM_OFFSET);
> +	writel(mfd, pll->base + PLL_DENOM_OFFSET);
> +
> +	if (val & BM_PLL_POWER) {
> +		/* PLL is powered down */
> +		ret = 0;
> +	} else {
> +		ret = clk_pllv3_wait_lock(pll);
> +		if (ret == 0) {
> +			/* only if it locked can we switch back to the PLL */
> +			newval &= ~BM_PLL_BYPASS;
> +			newval |= val & BM_PLL_BYPASS;
> +			writel(newval, pll->base);
> +		}
> +	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static const struct clk_ops clk_pllv3_av_ops = {
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-21  9:13         ` Shawn Guo
@ 2013-10-21  9:34           ` Russell King - ARM Linux
  2013-10-21 12:36             ` Shawn Guo
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-21  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 21, 2013 at 05:13:03PM +0800, Shawn Guo wrote:
> On Sat, Oct 19, 2013 at 01:12:18PM +0100, Russell King - ARM Linux wrote:
> > The other issue is that the set_rate() functions don't wait for the PLL
> > to lock before returning, and don't set bypass mode.  It looks like this
> > code relies on drivers unpreparing the clocks before calling clk_set_rate().
> > I don't see the clk API enforcing this in any way, so I'd suggest that
> > the set_rate() function also does the bypass -> set rate -> wait lock
> > -> clear bypass transition too.  Maybe this should be applied to the
> > other PLLs too.
> 
> Yes.  We experienced the problem when upgrading FSL kernel tree to 3.10.
> The .set_rate() should wait for LOCK bit, but does not need to touch
> BYPASS.

However, the IPU is more stable with switching to bypass while it relocks
than not.

> > Here's a diff which updates the av PLL to do this:
> 
> Some other PLLs needs update too.  I will cook up patches to fix these
> problems.

Yes, I only changed the set_rate on the AV codecs as I haven't tested
that on the others.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-21  9:34           ` Russell King - ARM Linux
@ 2013-10-21 12:36             ` Shawn Guo
  2013-10-21 13:17               ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Shawn Guo @ 2013-10-21 12:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 21, 2013 at 10:34:48AM +0100, Russell King - ARM Linux wrote:
> However, the IPU is more stable with switching to bypass while it relocks
> than not.

As far as I know, FSL kernel has always been doing relock without
switching to bypass, and I haven't heard unstable IPU issue from their
testing.  Do you have a case that I can set up here to see the issue?

Shawn

> > > Here's a diff which updates the av PLL to do this:
> > 
> > Some other PLLs needs update too.  I will cook up patches to fix these
> > problems.
> 
> Yes, I only changed the set_rate on the AV codecs as I haven't tested
> that on the others.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-21 12:36             ` Shawn Guo
@ 2013-10-21 13:17               ` Russell King - ARM Linux
  2013-10-30  8:34                 ` Shawn Guo
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-21 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 21, 2013 at 08:36:55PM +0800, Shawn Guo wrote:
> On Mon, Oct 21, 2013 at 10:34:48AM +0100, Russell King - ARM Linux wrote:
> > However, the IPU is more stable with switching to bypass while it relocks
> > than not.
> 
> As far as I know, FSL kernel has always been doing relock without
> switching to bypass, and I haven't heard unstable IPU issue from their
> testing.  Do you have a case that I can set up here to see the issue?

export DISPLAY=:0
while :; do
  xrandr -s 1280x720 -r 50
  sleep 5
  xrandr -s 1280x720 -r 60
  sleep 5
done

is one way.  Another way is to switch between all possible resolutions
supported by the connected device, keeping each resolution for 30 seconds
a time.  Normally after one or two run-throughs, the IPU will be deader
than the parrot in Monty Python's parrot sketch, requiring a reboot to
get it working again.

As I've said previously, when it gets into this state, all the status
registers I can find report that all FIFOs in the IPU are sitting in
their empty state.  The TMDS clock is correct, and the frame composer
in the HDMI block is working, but it's not producing any sync signals or
pixel data.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-21 13:17               ` Russell King - ARM Linux
@ 2013-10-30  8:34                 ` Shawn Guo
  0 siblings, 0 replies; 18+ messages in thread
From: Shawn Guo @ 2013-10-30  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 21, 2013 at 02:17:58PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 21, 2013 at 08:36:55PM +0800, Shawn Guo wrote:
> > On Mon, Oct 21, 2013 at 10:34:48AM +0100, Russell King - ARM Linux wrote:
> > > However, the IPU is more stable with switching to bypass while it relocks
> > > than not.
> > 
> > As far as I know, FSL kernel has always been doing relock without
> > switching to bypass, and I haven't heard unstable IPU issue from their
> > testing.  Do you have a case that I can set up here to see the issue?
> 
> export DISPLAY=:0
> while :; do
>   xrandr -s 1280x720 -r 50
>   sleep 5
>   xrandr -s 1280x720 -r 60
>   sleep 5
> done

Sorry, it took me a long time to set up this test on xubuntu 13.10.
Building an image using your cubox-i branch, I can still see the above
test plays my HDMI device to death quite easily.  That said, I'm not
sure if the BYPASS handling in .set_rate() does help the unstable issue
we're seeing.

BTW, the xubuntu session can crash quite easily here if I play the
desktop a little bit, dragging a window and moving it around, popping
up a dialog, or something.

Shawn

> 
> is one way.  Another way is to switch between all possible resolutions
> supported by the connected device, keeping each resolution for 30 seconds
> a time.  Normally after one or two run-throughs, the IPU will be deader
> than the parrot in Monty Python's parrot sketch, requiring a reboot to
> get it working again.
> 
> As I've said previously, when it gets into this state, all the status
> registers I can find report that all FIFOs in the IPU are sitting in
> their empty state.  The TMDS clock is correct, and the frame composer
> in the HDMI block is working, but it's not producing any sync signals or
> pixel data.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-17 19:52   ` Alexander Shiyan
  2013-10-17 20:35     ` Russell King - ARM Linux
@ 2013-10-30 18:01     ` Matt Sealey
  2013-10-30 19:01       ` Russell King - ARM Linux
  1 sibling, 1 reply; 18+ messages in thread
From: Matt Sealey @ 2013-10-30 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 17, 2013 at 2:52 PM, Alexander Shiyan <shc_work@mail.ru> wrote:
>> > Little bit late to the party, but here is the patch for the latest
>> > version of the HDMI driver I was working on. Having had a quick flick
>> > through the mailing list I see Russell has noticed a few of the problems
>> > I had earlier on, which are resolved in my later work. Thought this
>> > might be useful to help you tidy things up.
> ...
>> Now, I notice that the FSL 4.1.0 stuff appears to try and decide
>> which of the two clock inputs to use for the DI module, something
>> which isn't done in the driver in staging.  Also there's the commented
>> out stuff:
>>
>> #ifdef WTF_IS_THIS
>>         /*
>>          * Freescale has this in their Kernel. It is neither clear what
>>          * it does nor why it does it
>>          */
>>         if (div & 0x10)
>>                 div &= ~0x7;
>>         else {
>>                 /* Round up divider if it gets us closer to desired pix clk */
>>                 if ((div & 0xC) == 0xC) {
>>                         div += 0x10;
>>                         div &= ~0xF;
>>                 }
>>         }
>> #endif

There's another twiddle it does in another function:

http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n76

>> Quite simply, the "div" value is as follows:
>>
>> bits 11:4 - integer divider
>> bits  3:0 - fractional part
>>
>> So, what this is saying is that for any _odd_ integer divider, we want
>> to either divide by the odd divider, or odd.5 - but no other fractional
>> values.  Otherwise, if the divider is an even integer divider, and the
>> fractional part is greater than .75, we want to round that up.  (It may
>> also be that the final div & ~0xf should be outside that second if
>> statement.)
>>
>> It could be that the fractional divider is broken and doesn't work
>> correctly with the values that this is trying to avoid.

It's flakey but it works.

I had the same conversations with Sascha about a year ago, and then
again about 9 months ago.. "works for me" was the result.

Russell hit the nail on the head; this is the code Freescale's driver
does that the staging driver doesn't.

http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n1258

Essentially the fractional divider works, but not on the IPU HSC
clock, because it's not fast enough to generate the correct rate
without using the fractions. 1/16th variance in frequency than
requested is WAY out of VESA specification for monitors. Some older
monitors are even a tiny bit more strict on certain modes where you're
hitting the very limits of their capability (at or past 1080p). You
should be tending to provide the exact clock, except for extremely low
resolutions (PAL, NTSC, QVGA panels kind of low) where the rate is so
low usually the hardware is able to sort out the mess. I seem to
remember it needs to be within 2% - 1 bit extra on the fraction and
you're 3 times that over the range of accectable clocks monitors MUST
accept by specification.

The solution on i.MX51 is use PLL3 as a dedicated video PLL, but you
can't use this as the DI external clock for both DIs unless they share
a mode (or have a common denominator for the frequency so you can set
the dividers right and PLL3 can stay at an acceptable rate). There are
some clever hacks we could do around it but so few i.MX devices have
more than one active video output at once. Unfortunately that moves
everything that defaults as parenting from PLL3 as the clock rate will
change on hotplugging HDMI, which could make USB stop working so well,
and a bunch of other stuff.

On i.MX51, too, the IPU HSC can't go above 133MHz without making the
system unstable, so any mode that needs 148MHz (most high HDMI modes)
won't work.

The fix was on i.MX50 and i.MX53 - they have a dedicated PLL4, put
there almost purely for video ;) and the IPU clock goes up to 150MHz.
And that continued with the i.MX6 (which can do 200MHz and has more
PLLs to parent from).

There's no good reason to use the DI internal clock from IPU HSC,
unless you can guarantee an EXACT rate coming out (i.e. if you need
64MHz, then you can get that with 128MHz/2.0 - which is fine.) *or*
you need to output to TVE or LDB which require that you use their
clocks as DI parent for synchronization (which is usually fine as it's
able to get the accuracy you need, or it just doesn't matter in these
cases). For any other case you *MUST* use the external clock or you
violate VESA specs.

--
Matt Sealey <neko@bakuhatsu.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-30 18:01     ` Matt Sealey
@ 2013-10-30 19:01       ` Russell King - ARM Linux
  2013-11-05 22:39         ` Matt Sealey
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-10-30 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 30, 2013 at 01:01:15PM -0500, Matt Sealey wrote:
> I seem to
> remember it needs to be within 2% - 1 bit extra on the fraction and
> you're 3 times that over the range of accectable clocks monitors MUST
> accept by specification.

I think 2% may be too generous - that's what I originally tried when
fiddling around with this and it wasn't good enough in some
circumstances for my TV to produce a picture.

> On i.MX51, too, the IPU HSC can't go above 133MHz without making the
> system unstable, so any mode that needs 148MHz (most high HDMI modes)
> won't work.

That's where validating the modes you're prepared to support becomes
important, and there's hooks in DRM for doing that.  If you can't
support those with 148.5MHz dotclocks, then you have to return
MODE_CLOCK_HIGH from the connector's mode_valid callback.  That will
result in DRM pruning the modes reported by the connector to those
which the display hardware can support.

> There's no good reason to use the DI internal clock from IPU HSC,
> unless you can guarantee an EXACT rate coming out (i.e. if you need
> 64MHz, then you can get that with 128MHz/2.0 - which is fine.)

This is exactly what I'm doing on my Carrier-1 board.  If the output
can use the the internal clock (iow, if IPU_DI_CLKMODE_EXT is clear)
and the IPU clock can generate the dotclock within 1% of the requested
rate, then we set the DI to use the IPU clock and set the divisor
appropriately.

If the IPU clock can't satisfy the dot clock within 1%, I try to set
the external DI clock to the required rate, read it back and calculate
the required DI integer divisor.  Yes, I could allow some fractional
divisors here, but at the moment it's easier not to.

Each of the four cases I handle separately - and I've gotten rid of
the horrid CCF stuff in ipu-di.c: there's really no need for that
additional complication here, we're not sharing this mux and divisor
which are both internal to the DI with anything else.

Consequently, my ipu-di.c (relative to v3.12-rc7) looks like this,
and works pretty well with HDMI, providing a stable picture with
almost all mode configurations reported by my TV.

It's also fewer lines too. :)

 drivers/staging/imx-drm/ipu-v3/ipu-di.c |  328 ++++++++++---------------------
 1 files changed, 105 insertions(+), 223 deletions(-)

diff --git a/drivers/staging/imx-drm/ipu-v3/ipu-di.c b/drivers/staging/imx-drm/ipu-v3/ipu-di.c
index d766e18bfca0..82a9ebad697c 100644
--- a/drivers/staging/imx-drm/ipu-v3/ipu-di.c
+++ b/drivers/staging/imx-drm/ipu-v3/ipu-di.c
@@ -19,9 +19,6 @@
 #include <linux/io.h>
 #include <linux/err.h>
 #include <linux/platform_device.h>
-#include <linux/clk.h>
-#include <linux/clk-provider.h>
-#include <linux/clkdev.h>
 
 #include "imx-ipu-v3.h"
 #include "ipu-prv.h"
@@ -33,10 +30,7 @@ struct ipu_di {
 	struct clk *clk_di;	/* display input clock */
 	struct clk *clk_ipu;	/* IPU bus clock */
 	struct clk *clk_di_pixel; /* resulting pixel clock */
-	struct clk_hw clk_hw_out;
-	char *clk_name;
 	bool inuse;
-	unsigned long clkflags;
 	struct ipu_soc *ipu;
 };
 
@@ -141,130 +135,6 @@ static inline void ipu_di_write(struct ipu_di *di, u32 value, unsigned offset)
 	writel(value, di->base + offset);
 }
 
-static int ipu_di_clk_calc_div(unsigned long inrate, unsigned long outrate)
-{
-	u64 tmp = inrate;
-	int div;
-
-	tmp *= 16;
-
-	do_div(tmp, outrate);
-
-	div = tmp;
-
-	if (div < 0x10)
-		div = 0x10;
-
-#ifdef WTF_IS_THIS
-	/*
-	 * Freescale has this in their Kernel. It is neither clear what
-	 * it does nor why it does it
-	 */
-	if (div & 0x10)
-		div &= ~0x7;
-	else {
-		/* Round up divider if it gets us closer to desired pix clk */
-		if ((div & 0xC) == 0xC) {
-			div += 0x10;
-			div &= ~0xF;
-		}
-	}
-#endif
-	return div;
-}
-
-static unsigned long clk_di_recalc_rate(struct clk_hw *hw,
-		unsigned long parent_rate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	unsigned long outrate;
-	u32 div = ipu_di_read(di, DI_BS_CLKGEN0);
-
-	if (div < 0x10)
-		div = 0x10;
-
-	outrate = (parent_rate / div) * 16;
-
-	return outrate;
-}
-
-static long clk_di_round_rate(struct clk_hw *hw, unsigned long rate,
-				unsigned long *prate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	unsigned long outrate;
-	int div;
-	u32 val;
-
-	div = ipu_di_clk_calc_div(*prate, rate);
-
-	outrate = (*prate / div) * 16;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	if (!(val & DI_GEN_DI_CLK_EXT) && outrate > *prate / 2)
-		outrate = *prate / 2;
-
-	dev_dbg(di->ipu->dev,
-		"%s: inrate: %ld div: 0x%08x outrate: %ld wanted: %ld\n",
-			__func__, *prate, div, outrate, rate);
-
-	return outrate;
-}
-
-static int clk_di_set_rate(struct clk_hw *hw, unsigned long rate,
-				unsigned long parent_rate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	int div;
-	u32 clkgen0;
-
-	clkgen0 = ipu_di_read(di, DI_BS_CLKGEN0) & ~0xfff;
-
-	div = ipu_di_clk_calc_div(parent_rate, rate);
-
-	ipu_di_write(di, clkgen0 | div, DI_BS_CLKGEN0);
-
-	dev_dbg(di->ipu->dev, "%s: inrate: %ld desired: %ld div: 0x%08x\n",
-			__func__, parent_rate, rate, div);
-	return 0;
-}
-
-static u8 clk_di_get_parent(struct clk_hw *hw)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	u32 val;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	return val & DI_GEN_DI_CLK_EXT ? 1 : 0;
-}
-
-static int clk_di_set_parent(struct clk_hw *hw, u8 index)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	u32 val;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	if (index)
-		val |= DI_GEN_DI_CLK_EXT;
-	else
-		val &= ~DI_GEN_DI_CLK_EXT;
-
-	ipu_di_write(di, val, DI_GENERAL);
-
-	return 0;
-}
-
-static struct clk_ops clk_di_ops = {
-	.round_rate = clk_di_round_rate,
-	.set_rate = clk_di_set_rate,
-	.recalc_rate = clk_di_recalc_rate,
-	.set_parent = clk_di_set_parent,
-	.get_parent = clk_di_get_parent,
-};
-
 static void ipu_di_data_wave_config(struct ipu_di *di,
 				     int wave_gen,
 				     int access_size, int component_size)
@@ -528,42 +398,58 @@ static void ipu_di_sync_config_noninterlaced(struct ipu_di *di,
 		ipu_di_sync_config(di, cfg_vga, 0, ARRAY_SIZE(cfg_vga));
 }
 
-int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
+static void ipu_di_config_clock(struct ipu_di *di,
+	const struct ipu_di_signal_cfg *sig)
 {
-	u32 reg;
-	u32 di_gen, vsync_cnt;
-	u32 div;
-	u32 h_total, v_total;
-	int ret;
-	unsigned long round;
-	struct clk *parent;
+	struct clk *clk;
+	unsigned clkgen0;
+	uint32_t val;
 
-	dev_dbg(di->ipu->dev, "disp %d: panel size = %d x %d\n",
-		di->id, sig->width, sig->height);
+	if (sig->clkflags & IPU_DI_CLKMODE_EXT) {
+		/*
+		 * CLKMODE_EXT means we must use the DI clock: this is
+		 * needed for things like LVDS which needs to feed the
+		 * DI and LDB with the same pixel clock.
+		 */
+		clk = di->clk_di;
+
+		if (sig->clkflags & IPU_DI_CLKMODE_SYNC) {
+			/*
+			 * CLKMODE_SYNC means that we want the DI to be
+			 * clocked at the same rate as the parent clock.
+			 * This is needed (eg) for LDB which needs to be
+			 * fed with the same pixel clock.  We assume that
+			 * the LDB clock has already been set correctly.
+			 */
+			clkgen0 = 1 << 4;
+		} else {
+			/*
+			 * We can use the divider.  We should really have
+			 * a flag here indicating whether the bridge can
+			 * cope with a fractional divider or not.  For the
+			 * time being, let's go for simplicitly and
+			 * reliability.
+			 */
+			unsigned long in_rate;
+			unsigned div;
 
-	if ((sig->v_sync_width == 0) || (sig->h_sync_width == 0))
-		return -EINVAL;
+			clk_set_rate(clk, sig->pixelclock);
 
-	dev_dbg(di->ipu->dev, "Clocks: IPU %luHz DI %luHz Needed %luHz\n",
-		clk_get_rate(di->clk_ipu),
-		clk_get_rate(di->clk_di),
-		sig->pixelclock);
+			in_rate = clk_get_rate(clk);
+			div = (in_rate + sig->pixelclock / 2) / sig->pixelclock;
+			if (div == 0)
+				div = 1;
 
-	/*
-	 * CLKMODE_EXT means we must use the DI clock: this is needed
-	 * for things like LVDS which needs to feed the DI and LDB with
-	 * the same pixel clock.
-	 *
-	 * For other interfaces, we can arbitarily select between the DI
-	 * specific clock and the internal IPU clock.  See DI_GENERAL
-	 * bit 20.  We select the IPU clock if it can give us a clock
-	 * rate within 1% of the requested frequency, otherwise we use
-	 * the DI clock.
-	 */
-	round = sig->pixelclock;
-	if (sig->clkflags & IPU_DI_CLKMODE_EXT) {
-		parent = di->clk_di;
+			clkgen0 = div << 4;
+		}
 	} else {
+		/*
+		 * For other interfaces, we can arbitarily select between
+		 * the DI specific clock and the internal IPU clock.  See
+		 * DI_GENERAL bit 20.  We select the IPU clock if it can
+		 * give us a clock rate within 1% of the requested frequency,
+		 * otherwise we use the DI clock.
+		 */
 		unsigned long rate, clkrate;
 		unsigned div, error;
 
@@ -578,54 +464,80 @@ int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
 
 		/* Allow a 1% error */
 		if (error < 1010 && error >= 990) {
-			parent = di->clk_ipu;
+			clk = di->clk_ipu;
+
+			clkgen0 = div << 4;
 		} else {
-			parent = di->clk_di;
+			unsigned long in_rate;
+			unsigned div;
+
+			clk = di->clk_di;
 
-			ret = clk_set_rate(parent, sig->pixelclock);
-			if (ret)
-				dev_err(di->ipu->dev, "Setting of DI clock failed: %d\n", ret);
+			clk_set_rate(clk, sig->pixelclock);
 
-			/* Use the integer divisor rate - avoid fractional dividers */
-			round = rate;
+			in_rate = clk_get_rate(clk);
+			div = (in_rate + sig->pixelclock / 2) / sig->pixelclock;
+			if (div == 0)
+				div = 1;
+
+			clkgen0 = div << 4;
 		}
 	}
 
-	ret = clk_set_parent(di->clk_di_pixel, parent);
-	if (ret) {
-		dev_err(di->ipu->dev,
-			"setting pixel clock to parent %s failed with %d\n",
-				__clk_get_name(parent), ret);
-		return ret;
-	}
+	di->clk_di_pixel = clk;
+
+	/* Set the divider */
+	ipu_di_write(di, clkgen0, DI_BS_CLKGEN0);
 
 	/*
-	 * CLKMODE_SYNC means that we want the DI to be clocked at the
-	 * same rate as the parent clock.  This is needed (eg) for LDB
-	 * which needs to be fed with the same pixel clock.
-	 *
-	 * Note: clk_set_rate(clk, clk_round_rate(clk, rate)) is the
-	 * same as clk_set_rate(clk, rate);
+	 * Set the high/low periods.  Bits 24:16 give us the falling edge,
+	 * and bits 8:0 give the rising edge.  LSB is fraction, and is
+	 * based on the divider above.  We want a 50% duty cycle, so set
+	 * the falling edge to be half the divider.
 	 */
-	if (sig->clkflags & IPU_DI_CLKMODE_SYNC)
-		round = clk_get_rate(parent);
+	ipu_di_write(di, (clkgen0 >> 4) << 16, DI_BS_CLKGEN1);
 
-	ret = clk_set_rate(di->clk_di_pixel, round);
+	/* Finally select the input clock */
+	val = ipu_di_read(di, DI_GENERAL) & ~DI_GEN_DI_CLK_EXT;
+	if (clk == di->clk_di)
+		val |= DI_GEN_DI_CLK_EXT;
+	ipu_di_write(di, val, DI_GENERAL);
 
-	dev_dbg(di->ipu->dev, "Want %luHz IPU %luHz DI %luHz using %s, got %luHz\n",
+	dev_dbg(di->ipu->dev, "Want %luHz IPU %luHz DI %luHz using %s, %luHz\n",
 		sig->pixelclock,
 		clk_get_rate(di->clk_ipu),
 		clk_get_rate(di->clk_di),
-		parent == di->clk_di ? "DI" : "IPU",
-		clk_get_rate(di->clk_di_pixel));
+		clk == di->clk_di ? "DI" : "IPU",
+		clk_get_rate(di->clk_di_pixel) / (clkgen0 >> 4));
+}
+
+int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
+{
+	u32 reg;
+	u32 di_gen, vsync_cnt;
+	u32 div;
+	u32 h_total, v_total;
+
+	dev_dbg(di->ipu->dev, "disp %d: panel size = %d x %d\n",
+		di->id, sig->width, sig->height);
+
+	if ((sig->v_sync_width == 0) || (sig->h_sync_width == 0))
+		return -EINVAL;
 
 	h_total = sig->width + sig->h_sync_width + sig->h_start_width +
 		sig->h_end_width;
 	v_total = sig->height + sig->v_sync_width + sig->v_start_width +
 		sig->v_end_width;
 
+	dev_dbg(di->ipu->dev, "Clocks: IPU %luHz DI %luHz Needed %luHz\n",
+		clk_get_rate(di->clk_ipu),
+		clk_get_rate(di->clk_di),
+		sig->pixelclock);
+
 	mutex_lock(&di_mutex);
 
+	ipu_di_config_clock(di, sig);
+
 	div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
 	div = div / 16;		/* Now divider is integer portion */
 
@@ -709,7 +621,11 @@ EXPORT_SYMBOL_GPL(ipu_di_init_sync_panel);
 
 int ipu_di_enable(struct ipu_di *di)
 {
-	int ret = clk_prepare_enable(di->clk_di_pixel);
+	int ret;
+
+	WARN_ON(IS_ERR(di->clk_di_pixel));
+
+	ret = clk_prepare_enable(di->clk_di_pixel);
 	if (ret)
 		return ret;
 
@@ -721,6 +637,8 @@ EXPORT_SYMBOL_GPL(ipu_di_enable);
 
 int ipu_di_disable(struct ipu_di *di)
 {
+	WARN_ON(IS_ERR(di->clk_di_pixel));
+
 	ipu_module_disable(di->ipu, di->module);
 
 	clk_disable_unprepare(di->clk_di_pixel);
@@ -776,13 +694,6 @@ int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 		u32 module, struct clk *clk_ipu)
 {
 	struct ipu_di *di;
-	int ret;
-	const char *di_parent[2];
-	struct clk_init_data init = {
-		.ops = &clk_di_ops,
-		.num_parents = 2,
-		.flags = 0,
-	};
 
 	if (id > 1)
 		return -ENODEV;
@@ -804,45 +715,16 @@ int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 	if (!di->base)
 		return -ENOMEM;
 
-	di_parent[0] = __clk_get_name(di->clk_ipu);
-	di_parent[1] = __clk_get_name(di->clk_di);
-
 	ipu_di_write(di, 0x10, DI_BS_CLKGEN0);
 
-	init.parent_names = (const char **)&di_parent;
-	di->clk_name = kasprintf(GFP_KERNEL, "%s_di%d_pixel",
-			dev_name(dev), id);
-	if (!di->clk_name)
-		return -ENOMEM;
-
-	init.name = di->clk_name;
-
-	di->clk_hw_out.init = &init;
-	di->clk_di_pixel = clk_register(dev, &di->clk_hw_out);
-
-	if (IS_ERR(di->clk_di_pixel)) {
-		ret = PTR_ERR(di->clk_di_pixel);
-		goto failed_clk_register;
-	}
-
 	dev_dbg(dev, "DI%d base: 0x%08lx remapped to %p\n",
 			id, base, di->base);
 	di->inuse = false;
 	di->ipu = ipu;
 
 	return 0;
-
-failed_clk_register:
-
-	kfree(di->clk_name);
-
-	return ret;
 }
 
 void ipu_di_exit(struct ipu_soc *ipu, int id)
 {
-	struct ipu_di *di = ipu->di_priv[id];
-
-	clk_unregister(di->clk_di_pixel);
-	kfree(di->clk_name);
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-10-30 19:01       ` Russell King - ARM Linux
@ 2013-11-05 22:39         ` Matt Sealey
  2013-11-07 17:29           ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Sealey @ 2013-11-05 22:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Oct 30, 2013 at 01:01:15PM -0500, Matt Sealey wrote:
>> I seem to
>> remember it needs to be within 2% - 1 bit extra on the fraction and
>> you're 3 times that over the range of accectable clocks monitors MUST
>> accept by specification.
>
> I think 2% may be too generous - that's what I originally tried when
> fiddling around with this and it wasn't good enough in some
> circumstances for my TV to produce a picture.

One day I *will* find the VESA spec where I read that. It is mentioned
somewhere in the HDMI spec as a reference, but the VESA doc isn't
public. There was a time I had access to it - so I have a copy. Will
figure it out...

>> On i.MX51, too, the IPU HSC can't go above 133MHz without making the
>> system unstable, so any mode that needs 148MHz (most high HDMI modes)
>> won't work.
>
> That's where validating the modes you're prepared to support becomes
> important, and there's hooks in DRM for doing that.

Nope! Not in the right place.

> If you can't support those with 148.5MHz dotclocks, then you have to return
> MODE_CLOCK_HIGH from the connector's mode_valid callback.

Here's the problem; connectors and encoders are initialized under DRM
before the display driver. For the connector to even KNOW what the
highest valid clock is, it would need to ask the lower level display
driver, since the connector and even encoder has no clue of this
limitation and shouldn't be hardcoded, or even know what the input
pixel clock capabilities are.

If you specify a mode that the SII9022 can't display at all, or the
monitor connected (via EDID information) says it can't support that
from the maximum TMDS clock field in the EDID, sure, it can send back
MODE_CLOCK_HIGH, but otherwise what needs to happen is the sii9022
driver needs to know FAR too much.

> result in DRM pruning the modes reported by the connector to those
> which the display hardware can support.

Except that the connector cannot know what the display hardware can
support, as above.

I already pointed the above out on the DRM list maybe 18 months ago,
it's a big flaw in the DRM design, and the only way I can figure
around it is to have the connector driver look for it's display
controller in the device tree and read out the clocks from there, or
just a property (max-tmds-clock?) which would hack around it,
overriding a higher value from EDID.

http://thread.gmane.org/gmane.comp.video.dri.devel/65394

>> There's no good reason to use the DI internal clock from IPU HSC,
>> unless you can guarantee an EXACT rate coming out (i.e. if you need
>> 64MHz, then you can get that with 128MHz/2.0 - which is fine.)
>
> This is exactly what I'm doing on my Carrier-1 board.  If the output
> can use the the internal clock (iow, if IPU_DI_CLKMODE_EXT is clear)
> and the IPU clock can generate the dotclock within 1% of the requested
> rate, then we set the DI to use the IPU clock and set the divisor
> appropriately.

In theory there are two things. One is that old VESA spec which says
"we expect some crappy clocks, so anything within a certain range will
work" and the CEA spec which says any mode which is specified as X MHz
as the clock rate would also be required to be displayed at
X*(1000/1001) (which covers NTSC weirdness). So in fact it may only be
0.01% variance that really matters.. 1% could be too high. It depends
if they're following the CEA spec to the letter or the VESA specs to
the letter.

One thing that always got me, and I know this is an aside, is how many
"HDMI" systems people have where they're setting the default video
mode to 1024x768. You'd think nobody ever read the CEA-861 spec, ever
- it specifically states that the ONLY mode guaranteed is 640x480. In
fact, 1024x768 is not in the list for either primary OR secondary
defined CEA modes, so there's a high chance your TV won't do it.
Monitors - sure, they may do it, they may do it because they have to
for Windows Logo certification, but TVs don't get certified by
Microsoft.

> Each of the four cases I handle separately - and I've gotten rid of
> the horrid CCF stuff in ipu-di.c: there's really no need for that
> additional complication here, we're not sharing this mux and divisor
> which are both internal to the DI with anything else.
>
> Consequently, my ipu-di.c (relative to v3.12-rc7) looks like this,
> and works pretty well with HDMI, providing a stable picture with
> almost all mode configurations reported by my TV.
>
> It's also fewer lines too. :)

True, and it looks a lot like the Freescale drivers now, but I do
think creating a clockdev clock is probably the "correct" way to do
it, especially since in debug situations you'd be able to see this -
and it's value, and it's current parent - in sysfs along with the
other clocks.

It might also help to encourage people not to just play with internal
clocks, but to create clock devices and use them inside their drivers
- tons of audio stuff has manual clock management right now, and all
this information gets lost through (IMO) needless abstraction in the
audio subsystem. I might be convinced that it's not needless, but I
would still say there's no good reason NOT to implement any clock
which has a parent which is a registered clkdev clock as a clkdev
clock itself, and expose it with the rest of the clock debug
infrastructure.

I don't think I like that fractional dividers aren't used; until
someone can come up with a great reason why not (except, I think odd
fractions dividers are not stable, so it could be just masking off the
bottom bit of the fraction is the key) and someone tests it on more
than one monitor..

I can break out my monitor testing regime, once I figure out the
i.MX51 stuff. I have a few HDMI monitors I bought over the years which
were notoriously flakey, and a few DVI ones that had real trouble
being connected to a real HDMI source at some point.

Thanks,
Matt Sealey <neko@bakuhatsu.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-11-05 22:39         ` Matt Sealey
@ 2013-11-07 17:29           ` Russell King - ARM Linux
  2013-11-09  0:23             ` Matt Sealey
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-11-07 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > That's where validating the modes you're prepared to support becomes
> > important, and there's hooks in DRM for doing that.
> 
> Nope! Not in the right place.

Sure it's the right place.  It's the connector which gets to read the
modes, and the connector is part of the DRM system.

> > If you can't support those with 148.5MHz dotclocks, then you have to return
> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
> 
> Here's the problem; connectors and encoders are initialized under DRM
> before the display driver. For the connector to even KNOW what the
> highest valid clock is, it would need to ask the lower level display
> driver, since the connector and even encoder has no clue of this
> limitation and shouldn't be hardcoded, or even know what the input
> pixel clock capabilities are.

That's because the way imx-drm is setup with its multitude of individual
drivers is utter madness; there's no way that it will ever move out of
drivers/staging as long as it messes around violating the internals of
DRM by playing those kinds of games.  This was discussed at some
considerable length at kernel summit, with me as the main motivator for
it - including some private discussions with various people involved.

The DRM model is a card model, just like ALSA.  In this model, the
subsystem gets initialised once, and at that point all components of
the "card" are expected to be known.  The reverse is true when tearing
down a card: the whole card goes away in one go, individual components
do not.

What this means for DRM is that all CRTCs, connectors and encoders are
registered with DRM entirely within the card-level ->load callback,
and all of those are torn down at the ->unload callback.  Between those
two points in time, nothing in the configuration of the "card" ever
changes.

The same is true of ALSA: ALSA too does not support hotplugging
individual components.  This is why we have ASoC - ASoC handles the
grouping up of individual components, works out when all components
are available, and only then does it register the "card" with ALSA.
When any of those components go away, ASoC pulls the whole card from
ALSA.

As I say, this was discussed at kernel summit.  One thing was made
abundently clear: DRM is _not_ going to support hotplugging individual
components of a card.  Moreover, what has been made perfectly clear
is that imx-drm is not going to move out of drivers/staging as long
as it abuses DRM's card model.

> I already pointed the above out on the DRM list maybe 18 months ago,
> it's a big flaw in the DRM design, and the only way I can figure
> around it is to have the connector driver look for it's display
> controller in the device tree and read out the clocks from there, or
> just a property (max-tmds-clock?) which would hack around it,
> overriding a higher value from EDID.

No, not at all.  If you build up the set of components first, before
involving DRM, then you have the full information.  I'm doing that
today: the DRM "card" can only finish initialisation when all known
connectors can find their CRTCs within the DRM card.  If any CRTC
(== IPU DIs) specified in the DT file for a "connector" (eg, HDMI
interface, lvds bridge, etc) are not present, the DRM card doesn't
initialise.

Once you have that in place, you have a way to find out the properties
of the CRTC, and you then have a way to validate the modes properly
against all parts of the display system.

> True, and it looks a lot like the Freescale drivers now,

Not at all.  The Freescale drivers use the clk API as this did, and just
as badly.  I'm afraid that by making that comment, you've just shown that
you much prefer writing long verbose emails rather than actually reading
and understanding the subject for which you're responding.

I refer you to this:

http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n155

which clearly shows Freescale's 4.1.0 BSP using the clk API to control
the DI clock mux.

When you look at the mess that using the clk API creates in the imx-drm
driver in staging:

static int clk_di_set_rate(struct clk_hw *hw, unsigned long rate,
                                unsigned long parent_rate)
{
        struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
        int div;
        u32 clkgen0;

        clkgen0 = ipu_di_read(di, DI_BS_CLKGEN0) & ~0xfff;

        div = ipu_di_clk_calc_div(parent_rate, rate);

        ipu_di_write(di, clkgen0 | div, DI_BS_CLKGEN0);

int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
{
..
        ret = clk_set_rate(di->clk_di_pixel, round);
...
        div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
        div = div / 16;         /* Now divider is integer portion */

        /* Setup pixel clock timing */
        /* Down time is half of period */
        ipu_di_write(di, (div << 16), DI_BS_CLKGEN1);

So, we call clk_set_rate(), that calls down to clk_di_set_rate() which
sets up the DI_BS_CLKGEN0 register.  We then have to read that register
layer in ipu_di_init_sync_panel() to get the divider to program the
duty cycle which we want generated.  Now, compare that to this, where
we choose which clock locally according to some rules set down by the
requirements of the attached display bridge, calculate the divider
required, and then:

        /* Set the divider */
        ipu_di_write(di, clkgen0, DI_BS_CLKGEN0);

        /*
         * Set the high/low periods.  Bits 24:16 give us the falling edge,
         * and bits 8:0 give the rising edge.  LSB is fraction, and is
         * based on the divider above.  We want a 50% duty cycle, so set
         * the falling edge to be half the divider.
         */
        ipu_di_write(di, (clkgen0 >> 4) << 16, DI_BS_CLKGEN1);

That's quite a bit simpler, and doesn't violate abstraction levels at
all.  And getting rid of this clk API crap where it's not required (a)
ends up with a better driver which actually _works_, and (b) results
in fewer lines of code:

 drivers/staging/imx-drm/ipu-v3/ipu-di.c |  328 ++++++++++---------------------
 1 files changed, 105 insertions(+), 223 deletions(-)

> but I do
> think creating a clockdev clock is probably the "correct" way to do
> it, especially since in debug situations you'd be able to see this -
> and it's value, and it's current parent - in sysfs along with the
> other clocks.

The clk API just gets in the way of making the correct decisions.
Really, it does.  We need to calculate the divider to work out what
is the right clock to use, or whether we can use a clock.  Having
that hidden behind the clk API does not give us the ability to make
that decision.

> It might also help to encourage people not to just play with internal
> clocks, but to create clock devices and use them inside their drivers

Drivers playing with their own _internal_ clocks is entirely reasonable,
and it's entirely reasonable that they need not be exposed to userspace.
Take for instance your average PC, it has lots of clocks, especially
on the video card, none of them exposed.  They've gotten away with that
for decades.  Why are we any different.

"Oh, we're embedded, we're special!" I hear you cry.  No we're not
special at all.

> I don't think I like that fractional dividers aren't used; until
> someone can come up with a great reason why not (except, I think odd
> fractions dividers are not stable, so it could be just masking off the
> bottom bit of the fraction is the key) and someone tests it on more
> than one monitor..

Why fractional dividers are bad news - this is the TMDS clock that
resulted from the DI being configured with a fractional divider:

http://www.home.arm.linux.org.uk/~rmk/cubox/IMG_1545-adj.JPG

No, that's not blur.  That's the result of a fractional divider before
the HDMI phy PLL.  The result is that the PLL is frequency modulated,
and the connected TV basically tells the IMX to "piss off".

It's not "odd fractions" that are a problem - I believe Freescale's
code is correct, the problem is that no one has _thought_ about what
this is doing:

#ifdef WTF_IS_THIS
	/*
	 * Freescale has this in their Kernel. It is neither clear what
	 * it does nor why it does it
	 */
	if (div & 0x10)
		div &= ~0x7;
	else {
		/* Round up divider if it gets us closer to desired pix clk */
		if ((div & 0xC) == 0xC) {
			div += 0x10;
			div &= ~0xF;
		}
	}
#endif

I know what that's doing, it's not rocket science.  But if you have to
resort to using fractional dividers when you have another clock available
to generate a precise clock without the inherent instability caused by
fractional dividers, you might as well make use of it.

Moreover, I'm not the only one to run into _real_ problems with fractional
dividers - other people have too with the iMX stuff, and have also had to
disable these fractional dividers too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-11-07 17:29           ` Russell King - ARM Linux
@ 2013-11-09  0:23             ` Matt Sealey
  2013-11-09  0:53               ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Sealey @ 2013-11-09  0:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
>> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > That's where validating the modes you're prepared to support becomes
>> > important, and there's hooks in DRM for doing that.
>>
>> Nope! Not in the right place.
>
> Sure it's the right place.  It's the connector which gets to read the
> modes, and the connector is part of the DRM system.

It's really not the right place.

>> > If you can't support those with 148.5MHz dotclocks, then you have to return
>> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
>>
>> Here's the problem; connectors and encoders are initialized under DRM
>> before the display driver. For the connector to even KNOW what the
>> highest valid clock is, it would need to ask the lower level display
>> driver, since the connector and even encoder has no clue of this
>> limitation and shouldn't be hardcoded, or even know what the input
>> pixel clock capabilities are.
>
> That's because the way imx-drm is setup with its multitude of individual
> drivers is utter madness;

This is how all DRM cards are written, even the good ones. For the
connector to be able to ask the crtc "is this mode even supportable?"
it would need a mode_valid callback to call, down to the encoder (and
maybe the encoder would call crtc) and down down to the crtc. No such
callback exists for this exact reason. At the point the connector
pulls EDID modes and uses mode_valid, it's entirely possible and
nearly always true that it doesn't even HAVE an encoder or CRTC.

 there's no way that it will ever move out of
> drivers/staging as long as it messes around violating the internals of
> DRM by playing those kinds of games.  This was discussed at some
> considerable length at kernel summit, with me as the main motivator for
> it - including some private discussions with various people involved.

What it lacks, in this case, is not any functional code in those
components, but a workable model to glue those components together in
the correct order, on a per-board basis.

The appropriate card model isn't implemented - except in crazy
functions in imx-drm-core.c - the rest of it is absolutely fine.

What that doesn't fix is the underlying, unescapable fact above. At
the time the encoder or connector actually goes off and pulls EDID,
parses modes and does it's thing, at least until about 6 months ago,
it was long, long before it was attached to a CRTC.

The ONLY thing that can happen is a call to the crtc mode_fixup()
which exists only to say "go fuck yourself" to a mode_set call.
Nothing ever culls these mode lists after the connector generates
them, because it owns that object and ditching items from it is
semi-dangerous layering violation. So here is the user experience with
the current model:

* User clicks a button in GNOME to set the fanciest resolution they
can find, which is listed in their dialog
* User is told "that mode can't be set, sorry"
* User waits for you to go to sleep and then suffocates you with a pillow

And what it should be is:

* User clicks a button in GNOME to set the fanciest resolution
actually supported with this combination, because the drivers knew
enough to pre-filter invalid modes
* User always gets a valid mode set because it only ever reports valid modes

That second, consumer-oriented, model where the usable mode list is
predicated on results from the *entire* card and not just what the
monitor said, simply wasn't - and I assume still isn't - possible. Why
not? Because when a Radeon is plugged into a monitor it bloody well
supports it, and that's the end of it. People don't make displays that
modern graphics cards can't use. By the time 1080p TVs in common
circulation rolled around for consumer devices, WQXGA monitors already
existed, so desktop PC graphics cards followed suit pretty quickly.

However, some embedded devices have restrictions. I have a couple
devices at home that have a maximum resolution of 1280x720 - because
the SoC doesn't provide anything that can do more than a 75MHz pixel
clock or so. So, that sucks, but it's a real limitation of the SoC
that is essentially insurmountable.

In the case on the i.MX51, it was just never designed to be a
1080p-capable device. However, there is NOTHING in the architecture of
the chip except the maximum clock and some bus bandwidth foibles that
says it's impossible. I can run 1080p at 30 and it operates great in 2D
and even does respectable 3D. The video decoders still work - if you
have a low enough bitrate/complexity movie, it *will* decode 1080p at
30fps. So artificially restricting the displays to "720p maximum" is
overly restrictive to customers, considering that 1366x768,1440x900
are well within the performance of the units. 1600x900, 1920x1080 (low
enough refresh rate) are still usable.

The problem is under DRM

> The DRM model is a card model, just like ALSA.  In this model, the
> subsystem gets initialised once, and at that point all components of
> the "card" are expected to be known.  The reverse is true when tearing
> down a card: the whole card goes away in one go, individual components
> do not.
>
> What this means for DRM is that all CRTCs, connectors and encoders are
> registered with DRM entirely within the card-level ->load callback,
> and all of those are torn down at the ->unload callback.  Between those
> two points in time, nothing in the configuration of the "card" ever
> changes.

Right but there is no card driver, and all the current "embedded" DRM
drivers get around this by making some crass assumptions about the
connections - most of them just have an LCD controller and HDMI
controller built-in so they haven't trampled over the case where the
device tree allowed an i2c device to probe and initialize and somehow
needs discovery to glue into the framework as an encoder. The "encoder
slave" system right now is totally futzed, while it would be a GREAT
point to implement every encoder-connector combination that card
drivers need to pick up, it doesn't have the right functionality.

The reason there's no card driver.. discussed below..

> The same is true of ALSA: ALSA too does not support hotplugging
> individual components.  This is why we have ASoC - ASoC handles the
> grouping up of individual components, works out when all components
> are available, and only then does it register the "card" with ALSA.
> When any of those components go away, ASoC pulls the whole card from
> ALSA.

There's a distinct lack of actual procedure and process standardized
for collecting card data from device tree, and in the case of a single
SoC, the current ALSA/DRM model will be one driver for every board,
and every board needs it's own compatible property, and every board
does *the exact same thing*. You can see this with the current
imx{51|53|6x}-board-{codec} driver model. Every single driver is doing
almost identical stuff, where they share codecs they are doing
ABSOLUTELY identical stuff. There is stuff in the card level driver
that doesn't even do anything at the card level (and the card level
always calls down for this functionality to the codec level).

If device trees were adopted with the intent to allow removal of the
"board setup code written in C and glommed in a directory with the SoC
support code", implementing "card support" for every board is just
moving that crappy platform code from one directory to another with
absolutely zero benefit or flexibility or ability to expose a common
configuration.

What happened with ALSA is the same shit we're going to get with DRM.

> No, not at all.  If you build up the set of components first, before
> involving DRM, then you have the full information.  I'm doing that
> today: the DRM "card" can only finish initialisation when all known
> connectors can find their CRTCs within the DRM card.  If any CRTC
> (== IPU DIs) specified in the DT file for a "connector" (eg, HDMI
> interface, lvds bridge, etc) are not present, the DRM card doesn't
> initialise.

And this is what's missing. So, here's the question: how do you
implement a card-level driver model that marshalls this information
together and submits it to the separate components without, in the
process, creating a whole new driver framework? The need for having
the Connector ask the CRTC what a valid mode is doesn't make sense
here, in that DRM doesn't have such a framework to do such a thing
(reasons above). In the i.MX51 case, the maximum display clock is
*absolutely not* a function of the connector, but it must know this
information in the same way that it would also need to know the
maximum TMDS output clock for the encoder (bzzt - sorry, sir, it
doesn't) so that it can validate the modes, and in the way it knows to
gather the maximum supported TMDS clock from the monitor EDID (if
present).

> Once you have that in place, you have a way to find out the properties
> of the CRTC, and you then have a way to validate the modes properly
> against all parts of the display system.

Right, and your card driver "isn't DRM" in the same way as DRM has a
card model - it operates on the level of a PCIe device driver that
probes and magically sets up *all* it's subordinate buses, encoders,
connectors.

You can do that with i.MX6 HDMI because the HDMI is built into the SoC
- if you added a node for it and added a cross-reference to it, then
it's fine.

What happens in the case where you defined an HDMI transmitter as an
external I2C device - it's connected on this bus, Linux already probed
the thing. Your card model would have to go out and find this i2c
device somehow, and there's not a great way of doing it. What if the
EDID can't be obtained by straight I2C and you have to set some weird
bits inside the transmitter to access the DDC bus? That requires an
API that doesn't exist, and any topology-finding drivers that are
built off the single use case of "we just attach a panel direct to the
SoC by some parallel or differential link" rather than the
intermediate-transmitters case is going to miss out the finer points
of the intermediate-transmitters case.

>> True, and it looks a lot like the Freescale drivers now,
>
> Not at all.  The Freescale drivers use the clk API as this did, and just
> as badly.  I'm afraid that by making that comment, you've just shown that
> you much prefer writing long verbose emails rather than actually reading
> and understanding the subject for which you're responding.
>
> I refer you to this:
>
> http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n155
>
> which clearly shows Freescale's 4.1.0 BSP using the clk API to control
> the DI clock mux.

I understand THAT driver like the back of my hand..

>> think creating a clockdev clock is probably the "correct" way to do
>> it, especially since in debug situations you'd be able to see this -
>> and it's value, and it's current parent - in sysfs along with the
>> other clocks.
>
> The clk API just gets in the way of making the correct decisions.
> Really, it does.  We need to calculate the divider to work out what
> is the right clock to use, or whether we can use a clock.  Having
> that hidden behind the clk API does not give us the ability to make
> that decision.

It shouldn't ever have been hidden anyway, someone completely
over-abstracted the clock API at some point by making it a
query/get/set interface for a generic clock object, where the only
reasonable values are the rate and, in a limited fashion, the parent.

Why not add clk_get_divider() which essentially returns 0 for anything
that's not actually a real divider clock?

>> It might also help to encourage people not to just play with internal
>> clocks, but to create clock devices and use them inside their drivers
>
> Drivers playing with their own _internal_ clocks is entirely reasonable,
> and it's entirely reasonable that they need not be exposed to userspace.
> Take for instance your average PC, it has lots of clocks, especially
> on the video card, none of them exposed.  They've gotten away with that
> for decades.  Why are we any different.
>
> "Oh, we're embedded, we're special!" I hear you cry.  No we're not
> special at all.

No, we're not special in that there's no functional difference, but
the real situation is that as a developer it's REALLY difficult to get
the information you want from some internal clock somewhere compared
to a core SoC clock inside the clkdev framework, and every internal
clock needs it's own magic, custom implementation - which just makes
maintaining it more work since 'everyone' then has to learn not only
how to derive operation from a generic clkdev device, but how the
actual implementation of the internal clock works as well underneath
it.

Which is what clkdev was supposed to get rid of, right? What it does
is make simple cases much easier.

I will agree with you that in this case, this is totally screwy, but I
don't want to see it discourage people from still doing it... that
said........

>> I don't think I like that fractional dividers aren't used; until
>> someone can come up with a great reason why not (except, I think odd
>> fractions dividers are not stable, so it could be just masking off the
>> bottom bit of the fraction is the key) and someone tests it on more
>> than one monitor..
>
> Why fractional dividers are bad news - this is the TMDS clock that
> resulted from the DI being configured with a fractional divider:
>
> http://www.home.arm.linux.org.uk/~rmk/cubox/IMG_1545-adj.JPG
>
> No, that's not blur.  That's the result of a fractional divider before
> the HDMI phy PLL.  The result is that the PLL is frequency modulated,
> and the connected TV basically tells the IMX to "piss off".

Understood.

> It's not "odd fractions" that are a problem

Maybe I should have made myself a bit clearer.. the further away from
0.0 or 0.5 fractional part you get, the less it works, and the further
up from 0.5 the less reliable it is regardless.

There are some odd (as in oddball, not the opposite to even) fractions
that don't make any sense and I found I got more monitors to work if I
just masked off the bottom bit of the divider or bottom two, but that
broke a couple monitors more than it fixed.

What the effect for me was, if I quit using the fractional divider
completely, that it would always get caught by the "clock within a
reasonable distance of the requested rate" catch in many cases, and
then we ignore it and use an external clock, and the amount of testing
someone can do with only 11 monitors, limited spare units, and not a
lot of time to spend is limited. Which you propose later.. who gives a
shit if we can't use fractional rates? Just set the clock right!

There's a funky reason why dropping the divider actually works, or why
probably masking off certain parts of it for certain modes works..

> - I believe Freescale's code is correct

I disagree with that and I'll explain why :)

> the problem is that no one has _thought_ about what this is doing:

I disagree with that since I was thinking about it in 2009.. I just
can't figure out EXACTLY what the real fix is, I never had enough
hardware, time, or patience.

I totally agree with the assessment Alexander had, and I trust your
scopes. A screwy clock gets generated. I don't think it's because of
the "fractional part" though.

They've been doing odd tricks, since the earliest BSP I could find,
and I started working on it at this BSP version;

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_2.6.28#n865

Which was modified at this commit, going further back:

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/commit/drivers/mxc/ipu3/ipu_disp.c?h=imx_2.6.28&id=1c930f0b2f1a67c7eb46a045c7517376e8ecaa32

I only fiddled with this thing for the shortest time before moving to
2.6.31 as fast as possible; the code reappears in later BSPs just
ported in with no git history... but here, note that set_rate sets
DI_BS_CLKGEN1, too. There's not really a good reason why it WOULDN'T.

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_common.c?h=imx_2.6.31_10.08.01#n151

And we get to more modern times, and DI_BS_CLKGEN1 is not in the clk API code.

The rounding essentially says you can only have a fraction of 1/2 if
your clock divider integer part is odd. If your clock divider integer
part is even, and the fractional part has a particular set of values
(0.75, 0.8125, 0.875, 0.9375), it'll bump it to the next
(odd-integer-part), unfractioned divider. I never got how that even
worked, but it is, in my opinion, somewhat to do with this gem in
setting DI_BS_CLKGEN1:

        div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
        div = div / 16; /* Now divider is integer portion */

        /* Setup pixel clock timing */
        /* Down time is half of period */
        ipu_di_write(di, (div << 16), DI_BS_CLKGEN1); <-- right here

Recall that the original code (1c930f0b2f1a) also set the "up" portion here.

If you will notice, it writes the integer part of the divider (so in
Alexanders case, 0x3) to the "down period" component of the second
clock generator register by shifting the entire thing up 16 bits.

According to the docs, DI_BS_CLKGEN1[16] is ANOTHER FRACTION BIT (as
is DI_BS_CLKGEN1[0] for the "up period").

My theory that I could never confirm, is that DI_BS_CLKGEN1 has always
been misprogrammed because it's off-by-one on the actual integer part
of the divider, and there's probably a case for setting the delay
period (top of DI_BS_CLKGEN0) and the "up period" of DI_BS_CLKGEN1
too. I just never had the amount of hardware I'd need to do a
functional test with real devices that gave us those clock dividers.

In Alex's case with his 3.5 divider, the down period is being set to
1.5 (i.e. (0x38 >> 4) <<16) in that bit position. With a 3.0 divider,
dropping 1.5 in the down period is sorta correct. Clock works. Yay?

It also *ALWAYS* works in the external clock case, since the external
clock (as per Freescale) is always used as parent, generating the DI
clock with an exact divider of 2, 4, or 8.

Does that maybe go so far as to explain why those fractional dividers
really don't work properly?

There's a REALLY good explanation in the i.MX6QDRM manual around
"34.7.10.3 Timing generator" which gives exactly how this clock is set
up. What you're expecting as a straight fractional division of a clock
isn't true - every pin output on the DI is derived from a fundamental
timebase which is defined by that fractional divider in DI_BS_CLKGEN0.
What actually goes to the display is THEN derived from that
fundamental timebase modified by the valueS (plural!) in
DI_BS_CLKGEN1. There's a significant amount of math to figure out
whether the exact display timing is going to be valid, not being done
here, but it turns out through pure luck, it tends to work out for a
couple cases..

I never sat down and thought of the actual math required and since it
impacts ALL the waveform generation too, it seemed like a lot of work
I wasn't prepared to do@the time. I don't think I was paid enough
for it, when I was being paid to do it. I don't get paid to do it at
all right now (thank Glob) and nobody's giving me any free hardware
that makes me want to do it... so it's in your court.

> I know what that's doing, it's not rocket science.  But if you have to
> resort to using fractional dividers when you have another clock available
> to generate a precise clock without the inherent instability caused by
> fractional dividers

If DI_BS_CLKGEN1 can be proven mis-programmed and fixing that fixes
the instability, then awesome. I'd like to see that investigated
though.

If not, sure, I agree with you 100%. There's no reason

> Moreover, I'm not the only one to run into _real_ problems with fractional
> dividers - other people have too with the iMX stuff, and have also had to
> disable these fractional dividers too.

Well I've been dealing with them since the i.MX51 taped out and
honestly.. it worked a lot better with the 2.6.26 kernel with the
original hacks, but at that time I don't recall we had any support for
an external clock. As soon as 2.6.28 rolled round we found some weird
problems, and Freescale added code to use external clocks and we
stopped caring because it pretty much always worked, with some
veeerrrrrry strange caveats - most of which were masked by API
deficiencies in the framebuffer subsystem and a distinct inability to
ask the IPU driver what clock it actually set from the code in our
transmitter driver.

Which is, back to an earlier point, *why I want real clkdevs internal
to drivers* since it makes it a lot easier to pass the information
around in a standardized way. The transmitter may need to know
*precisely* what pixel clock was set, and at the encoder level what I
can get back is clk_round_rate rounded by the crtc to the exact clock
it's outputting, rather than recalculating the clock from the mode
information (syncs+totals+blah, you patched this out in your fixes)
and not being able to take into account the rounding EXCEPT if I add
my own API.. to fetch the information. It's more reasonable if the
encoder can just request the crtc's clk handle and do what it may,
than for a special API for passing "mode clock information" came into
it.

What we've not ascertained in reality is not "does it generate a weird
clock" but why did Freescale only limit only those very specific
fractions (why would (if (div & 0xc)==0xc) div+=0x1; div &= 0xf be
"working" for them?) and why don't they set the "up period" in their
driver since 2009? And what happens if we really, really properly
configure the clock generation?

Matt Sealey <neko@bakuhatsu.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-11-09  0:23             ` Matt Sealey
@ 2013-11-09  0:53               ` Russell King - ARM Linux
  2013-11-09  6:11                 ` Matt Sealey
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-11-09  0:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 08, 2013 at 06:23:37PM -0600, Matt Sealey wrote:
> On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
> >> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
> >> <linux@arm.linux.org.uk> wrote:
> >> > That's where validating the modes you're prepared to support becomes
> >> > important, and there's hooks in DRM for doing that.
> >>
> >> Nope! Not in the right place.
> >
> > Sure it's the right place.  It's the connector which gets to read the
> > modes, and the connector is part of the DRM system.
> 
> It's really not the right place.
> 
> >> > If you can't support those with 148.5MHz dotclocks, then you have to return
> >> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
> >>
> >> Here's the problem; connectors and encoders are initialized under DRM
> >> before the display driver. For the connector to even KNOW what the
> >> highest valid clock is, it would need to ask the lower level display
> >> driver, since the connector and even encoder has no clue of this
> >> limitation and shouldn't be hardcoded, or even know what the input
> >> pixel clock capabilities are.
> >
> > That's because the way imx-drm is setup with its multitude of individual
> > drivers is utter madness;
> 
> This is how all DRM cards are written, even the good ones. For the
> connector to be able to ask the crtc "is this mode even supportable?"
> it would need a mode_valid callback to call, down to the encoder (and
> maybe the encoder would call crtc) and down down to the crtc. No such
> callback exists for this exact reason. At the point the connector
> pulls EDID modes and uses mode_valid, it's entirely possible and
> nearly always true that it doesn't even HAVE an encoder or CRTC.

Now look at the bigger picture.  What makes the decision about whether
to use a mode with a particular CRTC?  Is it:

(a) DRM
(b) Userspace

The answer is not (a) - DRM just does what userspace instructs.  What
this means is that you can't do per-CRTC validation of mode settings
without big changes to the API, which aren't going to happen.

So the best you can do is to limit the displayed modes according to
what the hardware in general is able to do.

Luckily, with imx-drm, in the case of single IPUs, both "crtcs" have
exactly the same capability as far as clocks go - which is common with
every other DRM system so far.  So really, there's no need to know which
CRTC is going to be associated with the connector - we already know the
capabilities there.

> What it lacks, in this case, is not any functional code in those
> components, but a workable model to glue those components together in
> the correct order, on a per-board basis.
> 
> The appropriate card model isn't implemented - except in crazy
> functions in imx-drm-core.c - the rest of it is absolutely fine.

Well, I've actually fixed the problem *right* *now*.

> The ONLY thing that can happen is a call to the crtc mode_fixup()
> which exists only to say "go fuck yourself" to a mode_set call.
> Nothing ever culls these mode lists after the connector generates
> them, because it owns that object and ditching items from it is
> semi-dangerous layering violation. So here is the user experience with
> the current model:
> 
> * User clicks a button in GNOME to set the fanciest resolution they
> can find, which is listed in their dialog
> * User is told "that mode can't be set, sorry"
> * User waits for you to go to sleep and then suffocates you with a pillow
> 
> And what it should be is:
> 
> * User clicks a button in GNOME to set the fanciest resolution
> actually supported with this combination, because the drivers knew
> enough to pre-filter invalid modes
> * User always gets a valid mode set because it only ever reports valid modes
> 
> That second, consumer-oriented, model where the usable mode list is
> predicated on results from the *entire* card and not just what the
> monitor said, simply wasn't - and I assume still isn't - possible. Why
> not? Because when a Radeon is plugged into a monitor it bloody well
> supports it, and that's the end of it. People don't make displays that
> modern graphics cards can't use. By the time 1080p TVs in common
> circulation rolled around for consumer devices, WQXGA monitors already
> existed, so desktop PC graphics cards followed suit pretty quickly.

Sorry, no, I'm not buying your arguments here - you may be right but
your apparant bias towards "oh, DRM was written for Radeon" is soo
wrong.  Maybe you should consider that the DRM maintainer works for
Intel, who are a competitor of AMD not only in the CPU market but also
the video market too.

However, even so.  Getting back to the hardware we have, which is
imx-drm, there is no difference between the capabilities of the two
"CRTCs" in a single IPU as far as mode pixel rates are concerned.
There _may_ be a difference between the two IPUs on IMX6Q, but not
within a single IPU.

> However, some embedded devices have restrictions. I have a couple
> devices at home that have a maximum resolution of 1280x720 - because
> the SoC doesn't provide anything that can do more than a 75MHz pixel
> clock or so. So, that sucks, but it's a real limitation of the SoC
> that is essentially insurmountable.

Right, so turning everything into micro-components is a bad idea.
That's already been said (I think I've already said that about imx-drm.)

> In the case on the i.MX51, it was just never designed to be a
> 1080p-capable device. However, there is NOTHING in the architecture of
> the chip except the maximum clock and some bus bandwidth foibles that
> says it's impossible. I can run 1080p at 30 and it operates great in 2D
> and even does respectable 3D. The video decoders still work - if you
> have a low enough bitrate/complexity movie, it *will* decode 1080p at
> 30fps. So artificially restricting the displays to "720p maximum" is
> overly restrictive to customers, considering that 1366x768,1440x900
> are well within the performance of the units. 1600x900, 1920x1080 (low
> enough refresh rate) are still usable.
> 
> The problem is under DRM

Err no.  The reason 1080p at 30 works is because it uses the same pixel
rate as 1280x720.  74.25MHz.

So the problem is with your thinking.  What you're thinking is "it won't
do 1080p so we have to deny modes based on the resolution".  There's
absolutely no problem what so ever restricting the set of available
modes based on _pixel_ _clock_ _rate_ in DRM.  DRM fully supports this.
There is no problem here.

So you _can_ have 1080p at 30Hz if you have a bandwidth limitation.
All it takes is for the _correct_ limitations to be imposed rather than
the artificial ones you seem to be making up.

And at this point I give up reading your diatribe.  You've been told
many times in the past not to write huge long essays as emails, because
frankly people won't read them.  It's 1 am, I'm not going to spend
another hour or more reading whatever you've written below this point
because its another 300 lines I just don't have time to read at this.

If you'd like to rephrase in a more concise manner then I may read
your message.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-11-09  0:53               ` Russell King - ARM Linux
@ 2013-11-09  6:11                 ` Matt Sealey
  2013-11-09 10:28                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Sealey @ 2013-11-09  6:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 8, 2013 at 6:53 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Nov 08, 2013 at 06:23:37PM -0600, Matt Sealey wrote:
>> On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>
> Now look at the bigger picture.  What makes the decision about whether
> to use a mode with a particular CRTC?  Is it:
>
> (a) DRM
> (b) Userspace
>
> The answer is not (a) - DRM just does what userspace instructs.  What
> this means is that you can't do per-CRTC validation of mode settings
> without big changes to the API, which aren't going to happen.

That's exactly what I was saying in the first place, and what Alex
Duecher said to me a year ago after I spent 2 days straight crashing
an i.MX51 trying to figure out a way to get the connector to ask the
CRTC for the right information.

So I accept that we can't do validation of modes at a CRTC level, but
what's the solution? I don't accept that it would be to shove mode
culling policy to userspace.

xorg-video-modesetting driver is a gold standard - if all users want
is to get a linear framebuffer with a particular mode (and we should
all be happy for this on ARM since it means KMS is working, horray!),
it should not have to know which modes will actually work or not or be
modified to be platform specific.

In the case of "card drivers" for DRM,  on the kernel side, having a
"card" driver per board to micromanage settings will get as unwieldy
as having multiple platform-specific xorg-video-modesetting patches,
when there are 10 or 20 boards based on a particular chip in mainline,
and 20 or 30 SoCs supported in active use. ALSA is getting away with
it right now because barely anyone has gotten to the point of having
working audio with device tree except the i.MX6 and i.MX5 cases. It
won't scale as it goes forward.

I am betting the one you "have *right* *now*" is the above, you wrote
a driver which, given a very specific set of device tree entries with
specific compatible properties in combination, specifically
initializes several i.MX6-specific components together in the right
order. This also won't scale, going forward.

> Luckily, with imx-drm, in the case of single IPUs, both "crtcs" have
> exactly the same capability as far as clocks go - which is common with
> every other DRM system so far.  So really, there's no need to know which
> CRTC is going to be associated with the connector - we already know the
> capabilities there.

That's not the point I was trying to get across.

> If you'd like to rephrase in a more concise manner then I may read
> your message.

Maybe I am just not explaining it well enough. Here it is in the
shortest way I can do it.

* i.MX6Q - two discrete IPU with two DI. IPU have independent clock,
up to 200MHz for HSC.
* i.MX53/50 - one discrete IPU with two DI. IPU clock up to 150MHz for HSC.
* i.MX51 - one discrete IPU with two DI. IPU clock up to 133MHz for HSC.

Same driver.

DI can divide HSC clock or accept external clock to generate, using a
fractional divider, any clock rate it wants. This clock becomes the
"fundamental timebase" for the DISP_CLK, which is generated as
expected, based on "up" and "down" period values for the fundamental
timebase.

Whether HSC clock or external clock, the maximum DISP_CLK which goes
out to the display is bounded by the current HSC clock rate.

Two problems.

* i.MX51 maximum clock rate (133MHz) is lower than standard pixel
clock rate for 1080p60 and other high-resolution modes with
high-refresh rates

* DI_BS_CLKGEN0 and 1 need to be programmed correctly to generate a
correct DISP_CLK out.

Problem without solution (i.e. DRM filters modes back-asswards):

* i.MX51 connector is probably not part of the SoC, but an independent
and totally divorced component from the IPU. One example is an IPU
parallel display connected to an active HDMI encoder, which may be
provided by companies such as Texas Instruments or Silicon Image.
These things exist in the wild. It is not reasonable for a Silicon
Image transmitter driver as a generic i2c device (which it really is)
to have intimate knowledge of the i.MX51. It is not reasonable for the
Silicon Image transmitter to be predicated upon a specific SoC.

DRM offers no way for an encoder/connector to find out this
information from the correct place - the CRTC driver - at the time it
fetches EDID from the monitor, or gains mode information some other
way. DRM only offers a way to use the maximum TMDS clock from the EDID
to cull modes the *monitor* can't handle in the CVT/GTF mode timing
case (or disable usage of CVT/GTF completely).

Problem with solution (i.e. don't just strip fractions please):

Current programming methodology for DI_BS_CLKGEN1 only takes into
account exact case where DI_BS_CLKGEN0 divider is an integer and is
stuffed into CLKGEN1 "down period" shifted one bit right, which is why
the "strip the fractions" part works.

CLKGEN1 only has a Q8.1 fractional divider whereas CLKGEN0 has a Q12.4
fractional divider.

Any use of anything but the integer part of the CLKGEN0 divider
probably cannot be represented as an exact value in CLKGEN1 without
the original parent clock being multiplied by a suitable value or
CLKGEN0 divider (Q12.4) being further adjusted to allow valid CLKGEN1
values (Q8.1).

Using CLKGEN1 better and doing more comprehensive 'upstream' clock
management may give better results. In the configurations Alex and
yourself tried, it is not possible to derive the correct DISP_CLK from
the fundamental timebase created by CLKGEN0 using the values CLKGEN1
was programmed with. Previous mail, a GEN0 divider of 3.5 ends up as a
GEN1 down period of 1.5. A divider of 3.0 ends up as GEN1 down period
of 1.5.

In this case the solution is to correct the fundamental timebase such
that it is possible to generate that value. Assuming input clock
133MHz, expected rate 38MHz. Simplest case divider is 133/38 = 3.5.
With current code, "down period" will be 1.5 instead of (3/2) but it
should be 1.75 which we cannot represent because of the single
fraction bit.

Possible fix? Set divider to 1.75 instead. DI clock generator will
create a 76MHz clock as fundamental timebase. This *shouldn't* go
directly out on any pins but it's used to synchronize everything else
on the waveform generators. What I could never work out is what to do
next; set GEN1 down period to 3.5? Or 2.0? The diagrams don't explain
whether the period is the relation from timebase to parent clock, or
timebase to DISP_CLK very clearly, bit it seems 2.0 would be the
correct one in this instance (it makes more sense given the current
working divider/down period settings).

Here's my notes from sometime in 2010:

"When the divider is calculated for the pixel clock, what it should be
trying to generate is a Q8.1 number for the "down" and/or "up" periods
in GEN1 (you could just set one of them, dependent on clock polarity
setting, assuming you want your data clocked at the edge of the input
clock.. all stuff the driver kinda glosses over) rather than
concentrating on the fractional divider in GEN0 first. The Q12.4
divider is there so you have more flexibility when you find a perfect
Q8.1 for the up/down clock periods depending on the input clock."

Apart from that contrived case above, I used Alex's case, I couldn't
sit down and write that algorithm even if I had all the coffee in
Columbia (and some other stuff too), I've had 3 years to try and just
didn't. I'm dealing with getting the platform that explains the cases
I have up and running in the first place, dealing with *all* the same
problems you are with the added problem of sometimes ARM not work. I'm
not rushing to get it upstream by a deadline - sometimes my wife
complains that I spent too much time in my office after work - so when
she lets me be, and it actually boots reliably and I figure out a
couple U-Boot problems, maybe we'll get in sync on the issue and be
fixing the same things in collaboration, but until then... I'm
suggesting you might want to do it as it might cause less weird
results later down the road. Of course if you TL;DR it, then you can
live with the broken display driver for as long as you like. Concise?

Thanks :)

Matt Sealey <neko@bakuhatsu.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* imx-drm: Add HDMI support
  2013-11-09  6:11                 ` Matt Sealey
@ 2013-11-09 10:28                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2013-11-09 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Nov 09, 2013 at 12:11:51AM -0600, Matt Sealey wrote:
> That's exactly what I was saying in the first place, and what Alex
> Duecher said to me a year ago after I spent 2 days straight crashing
> an i.MX51 trying to figure out a way to get the connector to ask the
> CRTC for the right information.

I still completely disagree with you on that.  I've yet to see a
situation where you have different restrictions on what can be done
depending on which "CRTC" in the SoC you connect an interface to.

> In the case of "card drivers" for DRM,  on the kernel side, having a
> "card" driver per board to micromanage settings will get as unwieldy
> as having multiple platform-specific xorg-video-modesetting patches,

*sigh* completely the wrong end of the stick, and I bet most of the rest
of this email is pounded out based on that.

> I am betting the one you "have *right* *now*" is the above, you wrote
> a driver which, given a very specific set of device tree entries with
> specific compatible properties in combination, specifically
> initializes several i.MX6-specific components together in the right
> order. This also won't scale, going forward.

What utter rubbish.  I have nothing further to add other than you are
wrong.  I'm not going to give you any further information for you to
pound your poor keyboard with to compose another verbose reply which
has the wrong end of the stick.

> > If you'd like to rephrase in a more concise manner then I may read
> > your message.
> 
> Maybe I am just not explaining it well enough. Here it is in the
> shortest way I can do it.
> 
> * i.MX6Q - two discrete IPU with two DI. IPU have independent clock,
> up to 200MHz for HSC.
> * i.MX53/50 - one discrete IPU with two DI. IPU clock up to 150MHz for HSC.
> * i.MX51 - one discrete IPU with two DI. IPU clock up to 133MHz for HSC.
> 
> Same driver.

Right, so what you've just told me in a concise way is that the
restrictions are per-SoC, not per-CRTC as you've just been claiming in
your rediculously verbose emails (the rest of your email cut without
reading.)

So what we do is we arrange for the imx-drm connector layers to call
into the imx-drm-core layer, to ask the imx-drm CRTC layer what the
maximum pixel clock is.

We then have the connector compare the maximum pixel clock rate with
the clock rate required for the mode, and if the mode has a pixel
clock rate greater than the CRTC can provide, we return MODE_CLOCK_HIGH.

If done correctly, this doesn't require any code in the connector
layer: you arrange for imx-drm-core to provide a _generic_ cross-
connector function which provides this information, and you point
the connector's .mode_valid at that generic function.

There is no problem here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-09 10:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <52602EEA.6060402@prisktech.co.nz>
2013-10-17 19:39 ` imx-drm: Add HDMI support Russell King - ARM Linux
2013-10-17 19:52   ` Alexander Shiyan
2013-10-17 20:35     ` Russell King - ARM Linux
2013-10-19 12:12       ` Russell King - ARM Linux
2013-10-21  9:13         ` Shawn Guo
2013-10-21  9:34           ` Russell King - ARM Linux
2013-10-21 12:36             ` Shawn Guo
2013-10-21 13:17               ` Russell King - ARM Linux
2013-10-30  8:34                 ` Shawn Guo
2013-10-30 18:01     ` Matt Sealey
2013-10-30 19:01       ` Russell King - ARM Linux
2013-11-05 22:39         ` Matt Sealey
2013-11-07 17:29           ` Russell King - ARM Linux
2013-11-09  0:23             ` Matt Sealey
2013-11-09  0:53               ` Russell King - ARM Linux
2013-11-09  6:11                 ` Matt Sealey
2013-11-09 10:28                   ` Russell King - ARM Linux
2013-10-19 15:55 ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).