Bug 76319

Summary: [NVE6] MMIO FAULT, black screen on QHD+ K2100M
Product: xorg Reporter: D. Moens <d-bugzilla>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: blocker    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
# grep -n -i nouveau messages.out > messages.filtered
none
# journalctl -b -k > dmesg.out
none
# journalctl -b -p err > errors.out
none
# journalctl -b /usr/bin/X > X.out
none
# lspci -vv (limited to graphics adapter)
none
errors2.out none

Description D. Moens 2014-03-18 13:39:05 UTC
Created attachment 95999 [details]
# grep -n -i nouveau messages.out > messages.filtered

Experiencing a black screen when starting X on a Dell Precision M4800 QHD+ laptop (NVE6).
The internal screen size is 3200x1800 ; the DP-attached Dell UP2414Q measures 3840x2160.

The system is fully functional (ssh-enabled) and responds to VT switches (CTRL-ALT-Fx), but all screens (X & VT) are black.

Sometimes (after a couple of minutes), I do notice a change in screen brightness (power saving function ?).


- Fedora rawhide :
kernel-3.14.0-0.rc7.git0.1.fc21.x86_64
xorg-x11-server-Xorg-1.15.0-5.fc21.x86_64
xorg-x11-drv-nouveau-1.0.10-1.fc21.x86_64

- lscpi :
01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1)

- Kernel command line:
BOOT_IMAGE=/vmlinuz-rawhide-nouveau root=/dev/mapper/vg01-rootfs4 ro rd.lvm.lv=vg01/rootfs4 vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8 nouveau.debug=PDISP=debug,VBIOS=trace drm.debug=0xe

- Recompiled latest http://cgit.freedesktop.org/~darktama/nouveau/ as of 2014-03-18.



In the attached messages.filtered, X is started at line 1170.

As I am trying for several weeks now to get this system in a usable shape (Dell disabled Optimus in the BIOS, and there are VT issues with the nVidia binary blob), I am fully willing to recompile & test whatever you can possibly throw at me.
Comment 1 Ilia Mirkin 2014-03-18 14:40:57 UTC
I think you may be the proud owner of multiple issues, as evidenced by the gpu lockup you get down the line.

First off, there's no way that load had those nouveau.debug settings. You would have seen a LOT more messages. I'm guessing you looked at some log file, and that log file ignores debug-level messages. Try checking in dmesg directly.

Secondly, I'm pretty sure there's no way to achieve those resolutions without the 540MHz clock frequency provided for by DP1.2 (at least not with 24bit color). In core/engine/disp/dport.c:nouveau_dp_train, prepend a 540000 value in the first position of the bw_list array.

Try to get a full debug log (which should include the various DP debug messages, as well as others), with that change, and the nouveau.debug settings you were already using (but apparently were either not being picked up, or something else).
Comment 2 D. Moens 2014-03-19 13:07:19 UTC
Ilia, thank you for your instantaneous reply.

(In reply to comment #1)
> I think you may be the proud owner of multiple issues, as evidenced by the
> gpu lockup you get down the line.

Thanks, I feel honoured.

> First off, there's no way that load had those nouveau.debug settings. You
> would have seen a LOT more messages. I'm guessing you looked at some log
> file, and that log file ignores debug-level messages. Try checking in dmesg
> directly.

Full dmesg (with patch applied) attached.


> 
> Secondly, I'm pretty sure there's no way to achieve those resolutions
> without the 540MHz clock frequency provided for by DP1.2 (at least not with
> 24bit color). In core/engine/disp/dport.c:nouveau_dp_train, prepend a 540000
> value in the first position of the bw_list array.

Patched :

$ diff nouveau/nvkm/engine/disp/dport.c.orig nouveau/nvkm/engine/disp/dport.c
276c276
< 	const u32 bw_list[] = { 270000, 162000, 0 };
---
> 	const u32 bw_list[] = { 540000, 270000, 162000, 0 };


> 
> Try to get a full debug log (which should include the various DP debug
> messages, as well as others), with that change, and the nouveau.debug
> settings you were already using (but apparently were either not being picked
> up, or something else).

Oh joy, I now do get a graphical screen (and high-res VT).

Appearantly, I am also bitten by bug #70354 (witness errors.out) ?

Please find all logs in attachment (with external monitor detached, for sake of simplicity).


Again, thank you for your involvement.
Didier
Comment 3 D. Moens 2014-03-19 13:08:41 UTC
Created attachment 96042 [details]
# journalctl -b -k > dmesg.out
Comment 4 D. Moens 2014-03-19 13:09:18 UTC
Created attachment 96043 [details]
# journalctl -b -p err > errors.out
Comment 5 D. Moens 2014-03-19 13:09:55 UTC
Created attachment 96044 [details]
# journalctl -b /usr/bin/X > X.out
Comment 6 D. Moens 2014-03-19 13:10:48 UTC
Created attachment 96045 [details]
# lspci -vv (limited to graphics adapter)
Comment 7 Ilia Mirkin 2014-03-19 13:16:20 UTC
(In reply to comment #2)
> Patched :
> 
> $ diff nouveau/nvkm/engine/disp/dport.c.orig nouveau/nvkm/engine/disp/dport.c
> 276c276
> < 	const u32 bw_list[] = { 270000, 162000, 0 };
> ---
> > 	const u32 bw_list[] = { 540000, 270000, 162000, 0 };
> 
> 
> Oh joy, I now do get a graphical screen (and high-res VT).

Great, so the display part of it is working? Image/resolution is all fine? Could you also check with your external monitor? No one involved with nouveau had such a screen, so we weren't sure it would work. I guess we can start throwing that in when the HW supports it (hmmm... would have to figure out which HW supports it...)

> 
> Appearantly, I am also bitten by bug #70354 (witness errors.out) ?

Hmmm, looks like it -- but if you're running darktama's latest repo, it should have a fix for that (http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=39b5507a36f01dc920fd9d21a70aed62fe4c0fdf)

You could also try using blob pgraph fw, although that is not known to help that issue.
Comment 8 D. Moens 2014-03-20 11:30:24 UTC
Report of current status ...


1. Code :


I've updated to darktama's latest git :
$ git log | head -n 1
commit e7cdcda87d2c2158ef34d88c9632bf7541c07c3a


2. Environment :


I'm now running on stock Fedora 20 (as it is not feasible to run Rawhide on a production system until the F21 release in october/november).

2a. With current kernel 3.13.6-200.fc20.x86_64, compilation halts at :
nouveau/drm/core/subdev/mxm/base.c: In function ‘mxm_shadow_dsm’:
nouveau/drm/core/subdev/mxm/base.c:109:2: error: implicit declaration of function ‘acpi_evaluate_dsm’ [-Werror=implicit-function-declaration]
  obj = acpi_evaluate_dsm(handle, muid, rev, 0x00000010, &argv4);

2b. For now, upgrading to 3.14.0-0.rc7.git0.1.fc20.x86_64 (backported FC20 rpms are provided at http://copr-be.cloud.fedoraproject.org/results/besser82/kernel-backport/fedora-20-x86_64/kernel-3.14.0-0.rc7.git0.1.fc21/). Current git compiles fine with 3.14rc7.


3. Issues :


3a. Sometimes (30-50% of the cases ? not conclusive, not enough iterations) the kernel completely halts when loading the nouveau module (CAPSLOCK led responds). Cold restart needed ;

3b. The 540MHz patch in 'core/engine/disp/dport.c:nouveau_dp_train' is appearantly still required ;

3c. MMIO FAULT / HUB_INIT timeouts still occur, cfr. attachment errors2.out .

3d. Power saving (DPMS ?) : when returning from DPMS, only the mouse pointer is shown on an otherwise (dark grey) lit screen ;

3e. Suspend : after resume, display is not restored, only a black screen is shown.


4. Preliminary conclusion :

Otherwise, everything appears to work very well (X & VT), in full resolution and with the external monitor attached (single/cloned/extended displays).
I can live with the troublesome suspend, but failing DPMS is a deal-breaker for production use.

Not tested :
- thermal power management (fans?) ;
- performance metrics.


5. Further testing :

Due to the missing Optimus & other nVidia issues, I'm currently still running the M4800 next to my aging production system (M6600), but planning to migrate in the near future.
If you'd like me to test new code, performance, ... on a NVE6 with QHD+/DP for the next days, this would be the time to go ahead.
Comment 9 D. Moens 2014-03-20 11:31:28 UTC
Created attachment 96094 [details]
errors2.out
Comment 10 D. Moens 2014-04-03 10:01:38 UTC
As the crux of this bug (black screen) has been resolved, I'm closing this bug.
Comment 11 Andrew G. Dunn 2014-05-24 12:47:01 UTC
Did other bugs get created from this report? Comment 1 indicated there were many issues.

Are you now using F20 with the rawhide kernel?
Comment 12 D. Moens 2014-05-26 12:36:21 UTC
I did file bug 76585 and 76990 too.

As there was no reply (no hard feelings) to my invitation for further testing/developing (comment #8, paragraph 5), I have currently fallen back to the nvidia binary blob.


Binary blob advantages (tested) :
- suspend/resume works

BB possible advantages (not tested) :
- performance
- stability

BB disadvantages :
- closed source
- no display in VT's
- switching between VT's required to restore screen display after screen power off
- closed source

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.