Bug 107585

Summary: Wrong max TMDS clock rate on ThinkPad W541 mini-DisplayPort output
Product: DRI Reporter: Simon <Simon80>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WONTFIX QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: enhancement    
Priority: medium CC: gary.c.wang, intel-gfx-bugs, jani.nikula
Version: unspecifiedKeywords: bisected
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: Triaged, ReadyForDev
i915 platform: HSW i915 features: display/DP
Description Flags
dmesg-4.18.0-7de4aa318074 none

Description Simon 2018-08-15 20:16:26 UTC
I have a ThinkPad W541 (released in Q1 2015) with an Intel Core i7-4910MQ processor in it. According to the processor specs[1], it is theoretically capable of driving 3840x2160@60Hz, but only supports HDMI 1.4. The laptop itself has a mini-DisplayPort 1.2 port.

[1] https://ark.intel.com/products/78939/Intel-Core-i7-4910MQ-Processor-8M-Cache-up-to-3_90-GHz

The laptop's own specifications[2] claim it's capable of outputting 4K at 60Hz, but there's no specific confirmation of what's possible on the Dual-Mode HDMI output.

[2] http://psref.lenovo.com/syspool/Sys/PDF/ThinkPad/ThinkPad%20W541/ThinkPad_W541_Platform_Specifications.pdf

Older kernels (4.6 and below, it turns out) are able to output 4K at 30Hz via passive mini-DisplayPort to HDMI adapter. I tried to use `cvt` to calculate a mode I could add via xrandr to manually force 4K output, but it turns out that cvt outputs something that requires a clock rate above 300 MHz, which the kernel rejected:
> $ cvt --verbose 3840 2160 30.0
> Warning: Refresh Rate is not CVT standard (50, 60, 75 or 85Hz).
> # 3840x2160 29.98 Hz (CVT) hsync: 65.96 kHz; pclk: 338.75 MHz
> Modeline "3840x2160_30.00"  338.75  3840 4080 4488 5136  2160 2163 2168 2200 -hsync +vsync

I later also tried the --reduced option, which cvt refuses to process unless the refresh rate is a multiple of 60 Hz. Eventually, I got this working without using the numbers from cvt, see below.

At this point, I bisected the kernel to figure out why it stopped working, and ended up on commit c578d15226c99f0566d5d022f81af6b7d69928db:
> drm/i915: Respect DP++ adaptor TMDS clock limit
> Try to detect the max TMDS clock limit for the DP++ adaptor (if any)
> and take it into account when checking the port clock.
> Note that as with the sink (HDMI vs. DVI) TMDS clock limit we'll ignore
> the adaptor TMDS clock limit in the modeset path, in case users are
> already "overclocking" their TMDS links. One subtle change here is that
> we'll have to respect the adaptor TMDS clock limit when we decide whether
> to do 12bpc or 8bpc, otherwise we might end up picking 12bpc and
> accidentally driving the TMDS link out of spec even when the user chose
> a mode that fits wihting the limits at 8bpc. This means you can't
> "overclock" your DP++ dongle at 12bpc anymore, but you can continue to
> do so at 8bpc.
> Note that for simplicity we'll use the I2C access method for all dual
> mode adaptors including type 2. Otherwise we'd have to start mixing
> DP AUX and HDMI together. In the future we may need to do that if we
> come across any board designs that don't hook up the DDC pins to the
> DP++ connectors. Such boards would obviously only work with type 2
> dual mode adaptors, and not type 1.
> v2: Store adaptor type under indel_hdmi->dp_dual_mode
>     Pass adaptor type to drm_dp_dual_mode_max_tmds_clock(),
>     and use it for type1 adaptors as well
> Cc: stable@vger.kernel.org
> Reported-by: Tore Anderson <tore@fud.no>
> Fixes: 7a0baa623446 ("Revert "drm/i915: Disable 12bpc hdmi for now"")
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Cc: Shashank Sharma <shashank.sharma@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Link: http://patchwork.freedesktop.org/patch/msgid/1462216105-20881-3-git-send-email-ville.syrjala@linux.intel.com
> Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
> (cherry picked from commit b1ba124d8e95cca48d33502a4a76b1ed09d213ce)
> Signed-off-by: Jani Nikula <jani.nikula@intel.com>

I then built a 4.18 kernel and enabled KMS debug output in the drm module, which confirmed the problem:
> [   13.052359] [drm:drm_dp_dual_mode_detect [drm_kms_helper]] DP dual mode HDMI ID: DP-HDMI ADAPTOR\004 (err 0)
> [   13.053111] [drm:drm_dp_dual_mode_detect [drm_kms_helper]] DP dual mode adapt or ID: ff (err 0)
> [   13.053144] [drm:intel_hdmi_set_edid [i915]] DP dual mode adaptor (type 1 HDMI) detected (max TMDS clock: 165000 kHz)

I modified the detection logic to consider adaptor ID 0xff as a type 2 adaptor, which leads the kernel to try to query the max TMDS clock rate dynamically. Such a query fails on this hardware, so I guess it might be fair to describe it as a type 1 adapter that supports a max TMDS clock rate of 300 MHz. Modifying the hardcoded return value to be 300Mhz, instead of 165, results in a kernel that works perfectly on this hardware, but I guess that may not be a safe solution for some other type 1 adapters.

After a bit of email correspondence with Jani Nikula, I noticed that `xrandr --verbose` spits out ful modelines that I could then provide back to it on broken kernels to re-add the missing display modes. I tried this out and it provides a reasonable workaround, assuming one has access to a kernel that can provide the correct numbers:
> xrandr --newmode 3840x2160_30.00 297 3840 4016 4104 4400  2160 2168 2178 2250 +hsync +vsync
> xrandr --addmode HDMI1 3840x2160_30.00

However, it would be much better to get this working by default. I took a look at the VBT on this system with intel_bios_reader, and I can see this line:
> 		HDMI max data rate: 0x01
A snippet from the kernel seems to imply that this value translates to 300 MHz[3].

Is there any reason why the i915 module couldn't look at the VBT and relax the clock rate limit to 300 MHz when that's indicated in the VBT? That would solve this problem without black screen issues on 165Mhz type 1 adapters.

[3] https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i915/intel_vbt_defs.h#L309-L311
Comment 1 Simon 2018-08-16 02:27:22 UTC
Wow! I made patches to implement the behaviour I described, but only after doing so did I notice that the kernel sees a value of 0x0 in the HDMI max data rate field. After digging a little deeper, I found this commit[1], which probably explains the discrepancy, and closes the door on a fix involving the VBT, at least for hardware equivalent to my system's configuration.

As a last-ditch attempt to be friendlier to users, I'd propose at least making an educated guess about the maximum TMDS clockrate based on which generation of Intel Graphics chipset is driving the port.

[1] https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/?id=5a17ee2c8f9013f5db852d27564b837f9f2c5a9f
Comment 2 Jani Saarinen 2018-08-16 06:48:41 UTC
Please try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.

Jani, any comments or need Ville here?
Comment 3 Jani Nikula 2018-08-16 07:14:33 UTC
(In reply to Simon from comment #1)
> As a last-ditch attempt to be friendlier to users, I'd propose at least
> making an educated guess about the maximum TMDS clockrate based on which
> generation of Intel Graphics chipset is driving the port.

It's not about the GPU, it's about the adapters. The user could plug in anything, and if we are unable to query it, the safe bet is to default to 165 MHz.

As I wrote in my email, for the benefit of others:
> With some adapters, we are unable to detect the adapter/sink max
> rate. If we incorrectly assume 300 MHz and it's really 165 MHz, the
> result is a black screen. If we incorrectly assume 165 MHz and it's
> really 300 MHz, the result is a picture on screen albeit with a worse
> resolution than the user wants.

From my POV, it's generally much more peaceful to deal with people whose bug is a degraded screen than with people whose bug is a black screen. ;)
Comment 4 Simon 2018-08-16 11:25:57 UTC
Yeah, maybe in general the user can plug in random adaptors, but in this case the adaptor is internal, part of the laptop's mini-DisplayPort.

Sure, a black screen is suboptimal, but if someone is trying to use an old adapter to drive a 4K screen, they're going to need to replace the adaptor regardless of whether the result is a black screen or 1080p output. If they get a black screen, they can work around it by switching to a lower output mode, as long as the 4K output isn't the only display attached to the system. So for any laptop user the black screen isn't so severe, because the user can use the internal screen to pick a different mode. I'd guess that desktop users generally would have an HDMI port they can use instead of a passive connection via DisplayPort.

Meanwhile, in the case of TMDS underclocking, there's no direct clue that something is wrong, and almost no workaround. There were no clues in the kernel log without debugging enabled, and I couldn't figure out a workaround without running a working kernel and copying the modelinr out. For many people, the workaround would be to go back to Windows or OS X and just conclude that Linux has inferior hardware support. Is there any way we can figure out how the Windows driver is managing to do the right thing in this case?
Comment 5 Jani Nikula 2018-08-16 11:45:33 UTC
(In reply to Simon from comment #4)
> Yeah, maybe in general the user can plug in random adaptors, but in this
> case the adaptor is internal, part of the laptop's mini-DisplayPort.

If you stick the cable to a Dual Mode (DP++) port, the logic is in the cable. Or in a dongle you stick between the DP++ port and a regular HDMI cable.
Comment 7 Jani Nikula 2018-08-16 12:11:47 UTC
Please keep the communication on the bug, thanks.

On Thu, 16 Aug 2018, Simon Ruggier <simon80@gmail.com> wrote:
> Should I bother booting with drm-tip? I never attached a dmesg from
> 4.18, so it's missing either way, but from your responses, I'm
> guessing there's nothing in drm-tip that fixes this.

I don't recall any fixes between v4.18 and drm-tip that would address
this. It might prove useful to attach the dmesg. But based on the
snippets you've included so far it looks like you have a type 1 dual
mode adaptor, and we limit it to 165 MHz.

IIUC whether clocks higher than 165 MHz work on type 1 adapters is pure luck, and depend on the electrical characteristics of the adapter and the cable. Since they were designed with max 165 MHz in mind, all bets are off.
Comment 8 Simon 2018-08-16 20:27:06 UTC
Ah, OK, I had assumed that the cable was fully passive, but if these register values are coming from the cable, then yes, my position is becoming more and more tenuous. What is the Windows driver doing, then? I guess it's just taking risks?

It would be nice if there was an override for this situation that was easier than manually adding a mode via xrandr, but admittedly, it's a tradeoff there for some users, and I can no longer argue that it's a bug at this point, more of a feature request.

If there's anything you're willing to change to improve the situation (in a "patches welcome" sort of way), please reopen the bug. If you're resolved to stick with the status quo, then we might as well resolve this as not a bug.
Comment 9 Jani Nikula 2018-08-17 06:38:12 UTC
Please post your full dmesg with drm.debug=14 in case we've overlooked something.
Comment 10 Simon 2018-08-25 01:34:32 UTC
Created attachment 141275 [details]

This dmesg output is from a kernel built from commit 437b1c598624454e36690c1c56ce1a27e2ed7893.
Comment 11 Lakshmi 2018-08-30 07:39:20 UTC
Jani, do we need any other info from Simon for this issue?
Comment 12 Jani Nikula 2018-10-24 09:19:25 UTC
I think our conclusion is wontfix. Thanks for the report.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.