Description
Elmar Stellnberger
2016-01-27 16:00:53 UTC
Created attachment 121326 [details]
clean journal under runlevel 1 with s2ram at last (without hdmimhz)
Here comes the journal of when I had tested this issue some time ago.
What happens if you unplug/plug the monitor? Does it start working? Or is HDMI permanently hosed? Plugging and unplugging does not help; so it permanently stays without a signal after resume from s2ram. Besides this I forgot to say that the proprietary nvidia driver for this card has a working resume for the HDMI output; i.e. the monitor comes up again. Can you boot with "nouveau.debug=debug,bios=trace drm.debug=0x1e" in your kernel cmdline and include dmesg, including doing a unplug/plug after s2ram? Created attachment 121348 [details]
debug boot log including plug and unplug event after wakeup
Ilia; is there anything in the logs that could be helpful? Would you like to have a similar log with the proprietary driver? Now, I have compiled and installed suspend-utils and tested s2ram with the following parameters: s2ram --no_kms no_kms --nofbsuspend --force --pci_save --acpi_sleep=1 s2ram --no_kms no_kms --nofbsuspend --force ... and the result is still the same as with a pure systemctl suspend: the external HDMI monitor stays black while the integrated LVDS comes up well. Possibly these tests could point you in the right direction when searching for an error why the external monitor does not come up again. Likely not a simple KMS issue; perhaps some extra code would be needed to post the HDMI port or so. Anyone here who could tell me the relevance of these test results? latest test result: s2disk behaves in the exactly same way: LVDS comes up well while the external monitor connected via HDMI stays black. Could you please stop assigning this bug report to anyone? If someone wants to have a look at the bug, **he** will assign himself, not you. Now, really few do assign themselves to a bug using the "assigned to" field. And it has the consequence of removing the mailing list from the loop, so you end up annoying one developer and making sure no one else will receive updates about your bug. I'm having issue with getting an image on external monitors using a 9400M + 9600M GT setup, but I need to get access to an external monitor to tackle it. Maybe your bug could benefit from potential patches I'll submit. Created attachment 122037 [details]
Celsius H265: Xorg.0.log of s2ram (DVI monitor gets disabled)
exactly the same issue can be observed with a Celsius H265 notebook and an Nvidia G96GLM [Quadro FX 770M].
Created attachment 122038 [details]
Celsius H265: journalctl -xb of s2ram (DVI monitor gets disabled)
Created attachment 122039 [details]
Celsius H265: Xorg.0.log of s2ram (crashes)
Unfortunately with the Celsius H265 and nouveau there seems to be another issue as well: Most often the notebook does not awake any more from s2ram; or to tell it more specifically it awakes first the screen content being shown again. However then immediately afterwards the screen turns black while num lock still works but nothing else.
Created attachment 122040 [details]
Celsius H265: journalctl of the day (many crashes)
Many journals of the day (first journals begin with the Amilo Xi3650, but then all s2ram journals are from the Celsius H265 with all but one s2ram leading to a crash).
Created attachment 122041 [details]
Celsius H265: journalctl of the day (many crashes)
now; that is the right file.
Created attachment 122042 [details]
Celsius H265: s2ram - journalctl with KDE (crash)
It is strange but almost any desktop environment will crash after s2ram on the Celsius H265 (be it Xfce or KDE).
confirmed with 4.6.0-rc2-ARCH-00004-g603539d on an Amilo Xi 3650; after resume from s2ram it still recognizes the plug and unplug events (screencfg popup shown) but the external monitor stays black across plugs and unplugs. Created attachment 123007 [details]
journal: delayed wakeup with Celsius on 4.6.0-rc2
Now I have also tried it with the Celsius; s2ram has significantly improved since my last test as the machine never crashed. At first bootup 3 s2rams worked ok and at the second bootup the same problem as before seemed to appear: the original screen content flashed up then the monitor turning black again thereafter immediately. However this time Num Lock was still working and I could make the machine come up again by pressing Ctrl-Alt-Entf on the integrated keyboard (which I did for both s2ram tests on the second bootup after a minute or so). External monitor staying black after s2ram; also here with the Celsius and kernel 4.6.0-rc2.
Created attachment 123319 [details]
journal including 2x s2ram with 4.6.0-rc5-ARCH-00005-g0b20a43
4.6.0-rc5-ARCH-00005-g0b20a43: Now the machine comes up fine (tested two times on FTS Celsius H265) but both displays (integrated+external) stay black after s2ram. The old screen content does not even flash up for a short time. Num-Lock + SysRq Keys work fine after s2ram.
P.S.: there was an error trying to capture the logs:
journalctl -x --since "2016-04-28" >journal
Failed to get journal fields: Ungültige Nachricht
(it says 'invalid message').
Created attachment 123436 [details]
journalctl -xb with 4.6.0-rc6-ARCH-00006-g7d92f59
Today I have re-tested s2ram with the Celsius H265; the machine came up again; the screen content was shown for short but then immediately thereafter falling black. The machine was working ok after s2ram; i.e. the shell executed the command after systemctl suspend correctly (which was journalctl -xb >journal-rc6). The screen did not come up again thereafter no matter which keys I had pressed so that I finally exited with the SysRQ keys.
Hmm, there is no mention of Nouveau suspending in the last two logs you linked. Nor is there any mention of the system powering up again (the last message is from systemd: "-- The system has now entered the suspend sleep state.". Since the computer is coming up again when you resume it, could you blindly get the logs after it resumes? (I assume you ran something like `s2ram; journalctl -xb > journal-rc6`, with the journalctl being run before resuming (if that is even possible)?) Created attachment 123438 [details] journalctl -x --since "2016-03-05 14:00:00" (4.6.0-rc6-ARCH-00006-g7d92f59) > journalctl -x --since "2016-03-05 14:00:00" >journal-rc6.2 Failed to get journal fields: Ungültige Nachricht (invalid message) Pierre; now it should also have captured resume. However I am still wondering about the error message that journalctl spies out. I hope that everything which is needed would be in the log; otherwise I could retry it with "systemctl supsend; sleep 60; dmesg >rc6-suspend.dmesg; shutdown now;". You were right; I had forgotten a sleep in between on my first approach. Hmm; as I can see here there is again no resume activity; only a reboot log entry; strange! Created attachment 123439 [details]
dmesg with 4.6.0-rc6-ARCH-00006-g7d92f59
Yep; the dmesg looks better.
I was going to ask for a dmesg, but you were quicker. Apparently Nouveau gets unhappy even before being suspended. Any idea what you were doing when those errors occurred:
>
> [ 236.095542] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002
> [ 389.046834] nouveau 0000:01:00.0: Xorg[337]: fail ttm_validate
> [ 389.046841] nouveau 0000:01:00.0: Xorg[337]: validating bo list
> [ 389.046846] nouveau 0000:01:00.0: Xorg[337]: validate: -12
> [ 477.074185] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002
> [ 1084.518190] nouveau 0000:01:00.0: Direct firmware load for nouveau/nv84_xuc00f failed with error -2
> [ 1084.518200] nouveau 0000:01:00.0: vp: unable to load firmware nouveau/nv84_xuc00f
> [ 1084.518204] nouveau 0000:01:00.0: vp: init failed, -2
> [ 1084.518226] nouveau 0000:01:00.0: Direct firmware load for nouveau/nv84_xuc103 failed with error -2
> [ 1084.518230] nouveau 0000:01:00.0: bsp: unable to load firmware nouveau/nv84_xuc103
> [ 1084.518233] nouveau 0000:01:00.0: bsp: init failed, -2
> [ 1085.910774] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
> [ 1085.910780] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00400000, e20: 00001800, e24: 00220000
> [ 1085.910798] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
> [ 1085.910801] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 00400010, e20: 00001800, e24: 00220000
> [ 1085.910806] nouveau 0000:01:00.0: gr: 00200000 [] ch 7 [000f598000 xine[1360]] subc 3 class 8297 mthd 1b0c data 1000f010
> [ 1197.735066] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
> [ 1230.906924] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000040 [RT_FAULT] - Address 0048bd0000
> [ 1230.906932] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00800080, e20: 00001800, e24: 00030000
> [ 1230.906949] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00000040 [RT_FAULT] - Address 0048bd1000
> [ 1230.906953] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 00800090, e20: 00001800, e24: 00030000
> [ 1230.906969] nouveau 0000:01:00.0: gr: 00200000 [] ch 7 [000f598000 xine[1360]] subc 3 class 8297 mthd 1558 data 00000001
> [ 1230.906994] nouveau 0000:01:00.0: fb: trapped write at 0048bd0000 on channel 7 [0f598000 xine[1360]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]
> [ 1233.825803] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
> [ 1233.825810] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00400000, e20: 00001800, e24: 00220000
> [ 1233.825820] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
> [ 1233.825823] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 00400010, e20: 00001800, e24: 00220000
> [ 1233.825828] nouveau 0000:01:00.0: gr: 00200000 [] ch 7 [000f598000 xine[1360]] subc 3 class 8297 mthd 1b0c data 1000f010
> [ 1233.825840] nouveau 0000:01:00.0: fb: trapped write at 0020318000 on channel 7 [0f598000 xine[1360]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]
> [ 1465.186818] nouveau 0000:01:00.0: Xorg[337]: fail ttm_validate
> [ 1465.186826] nouveau 0000:01:00.0: Xorg[337]: validating bo list
> [ 1465.186836] nouveau 0000:01:00.0: Xorg[337]: validate: -12
> [ 1578.523088] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002
Strange; the firmware file nouveau/nv84_xuc103 is not on my computer though I have installed linux-firmware and the latest mesa (Arch); perhaps a packaging error since there is a nouveau/nv50/nv84_video.c in the original source .tar.xz from www.mesa3d.org. Will see how to compile from sources ... What I did before suspend? - running KDE and a konsole but nothing more. Created attachment 123440 [details]
dmesg with 4.6.0-rc6-ARCH-00006-g7d92f59 and nv84_xuc00f
nouveau/nv84_xuc00f is now installed but it does not make any real difference for suspending. The screen still falls black and nouveau still reports some MMIO read/write faults.
A few questions: * What happens if you hot plug the external screen after you logged in your session? Do you get an image, does it receive a signal or does it stay black? * Do you have a smaller screen to test with, just to rule out that it’s not a corner case due to using a 4k screen? (It most likely won’t change, but… just to be sure.) I’ll have a try with my G96s, see if I have one with an HDMI output. As it seems we now have a kernel issue with the H265: https://bugzilla.kernel.org/show_bug.cgi?id=117581 Unfortunately I can not remember whether I had initially done tests without the nouveau module also for the H265 or just for the Xi3650. Pierre; any further help is still highly appreciated! Created attachment 123442 [details]
dmesg with 4.6.0-rc6-ARCH-00006-g7d92f59 and no_console_suspend
The key point about it is (at least to my mind) that it works with no_console_suspend without nouveau. Then it ought to work with nouveau and no_console suspend as well. This time I had a FullHD-external-monitor and no hdmimhz and suspend still did not work: BOOT_IMAGE=/boot/vmlinuz-custom root=/dev/disk/by-label/arch ro resume=/dev/disk/by-label/swap no_console_suspend log_buf_len=1M debug ignore_loglevel
... well, hotplugging the external monitor after s2ram was something we had already tested before (though not in FullHD and without hdmimhz=225). It had not made any difference. Tell me if you should really need that. (In reply to Elmar Stellnberger from comment #30) > ... well, hotplugging the external monitor after s2ram was something we had > already tested before (though not in FullHD and without hdmimhz=225). It had > not made any difference. Tell me if you should really need that. Right, I was thinking before any kind of suspend: power the computer without the external screen, log in, plug the external screen and see if you get an image. And with VGA, resuming is working, right? Only HDMI which doesn’t want to. In the very beginning VGA monitors and the integrated display came up well at least on the Xi3650. In the very beginning there was also another suspend bug with the H265 which is already resolved (The H265 had formerly crashed in 95% of all cases; however for 5% it behaved like the Xi3650; as far as I can tell; i.e. bringing the integrated display up but not the HDMI display). Things have changed now and the integrated display does not come up again any more after suspend. However the crashes for the H265 I am currently testing on were resolved (i.e. there is only a way forward). As none of the displays currently comes up (neither the integrated LVDS nor the external HDMI) I would not expect a VGA display to come up. It is the way that from the BIOS only the integrated display becomes enabled on boot and no external monitor so the integrated display coming no more up after suspend may even be a 'good sign' of the kernel no more relying on the BIOS/ACPI for this but taking things in its own hand as the proprietary Nvidia driver does which can re-enable HDMI monitors after suspend well. .- testing is never way easy - and so is reading and entending bug reports :/ So, I am unable to plug in my discrete G96 to check, but testing on my laptop’s integrated G96, I do not get an image at all on the external screen, be it by booting with the screen plugged in or not. I’ll try to grab a MMIOtrace of the blob. If you could do the same on yours, and also get the VBIOS, it might help. Created attachment 123448 [details]
suspend - mmiotrace for nouveau
Here comes the mmiotrace for nouveau.
Created attachment 123449 [details]
suspend - mmiotrace for the proprietary nvidia driver
Unfortunately it appears not to have captured the full trace for the proprietary driver (resource busy). Moreover the graphics mode could not be restored correctly though both displays came up well on resume from suspend. That may be due to the G96GLM, Quadro FX 770M which is different from the Xi3650 G96 or because it is a newer driver version.
how to extract the VBIOS, Pierre? To extract the VBIOS, look at https://nouveau.freedesktop.org/wiki/DumpingVideoBios/, trying either of both last options (nvagetbios or cat vbios.rom). Regarding the MMIOtrace, I should have been more specific, sorry. I am looking for a MMIOtrace of the blob being loaded with no external screen: 1. start the trace 2. modprobe nvidia 3. start X 4. `echo 'Plugging screen' > /sys/kernel/debug/tracing/trace_marker` 5. plug the screen and set it up to get an image 6. `echo 'Screen is ready' > /sys/kernel/debug/tracing/trace_marker` 7. stop the trace and a similar one where the screen is already plugged in during the boot: 1. start the trace 2. modprobe nvidia 3. start X #if is_needed 4. `echo 'Setting up screen' > /sys/kernel/debug/tracing/trace_marker` 5. set it up to get an image 6. `echo 'Screen is ready' > /sys/kernel/debug/tracing/trace_marker` #endif 7. stop the trace Created attachment 123458 [details]
vbios.rom as gathered from /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/rom
Created attachment 123460 [details]
booting unplugged (part1) - without xorg.conf
Created attachment 123461 [details]
booting unplugged (part2) - without xorg.conf
interestingly there were no mmio signals while enabling the external monitor via xrandr.
Created attachment 123462 [details]
xorg.conf - for booting plugged experiments
Interestingly it was not possible to enable my external monitor at all without the right xorg.conf when booting plugged. The xorg.conf disables the wrong output and thus can make the real output work.
Created attachment 123463 [details]
booting plugged - with minimal xorg.conf
all of the tests were made without the nouveau.modeset parameter (i.e. in FullHD).
Thanks! I’ll try to have a look at it today. No apparent changes with kernel 4.6.0-ARCH-00466-ge80ac9b. When may I expect your fixes to get incorporated into mainline or shall I wait for the post of an individual patch which I can try out here, Pierre? Well, before the fixes get incorporated to mainline, I first have to find out what the fixes are, and that could take really any amount of time. In any case, I’ll post patches here for you to test if I find anything. I had a quick look last time, and got some MMIOtraces from my laptop’s G96 to compare, but that’s about it. I won’t have much free time until July, but I’ll give it a try from time to time. Created attachment 125445 [details]
nvapeeks for nouveau and nvidia340
nvapeek 0x610000 0x20000
* after bootup but before plugging the monitor
* after plugging and configuring the HDMI-monitor
* after s2ram (noveau: external HDMI monitor stays black without a signal but LVDS works; proprietary nvidia340: HDMI-gets restored but not the xinerama configuration; before-and-after s2ram: the 3840x2160 is displayed as 1920x2160 resulting in horizontal distortions)
Created attachment 125809 [details]
vbios+nvapeek-0x101000-Celsius-H270.tar.bz2
Hmm, while the pramin-method produces a smaller output the prom-method reports an invalid signature (see for the *.msg in the .tar.bz2). Anyway both VROMs are different from the corrupted image that should have been fetched on my H265 (both having used the latest envytools-git 1:0.r4326.6e27990-1 from Arch-AUR). Having applied nvapeek 0x101000 0x20000 in UHD-Xinerama mode with 2 screens (as usual). Just let me know if you should need something else; I will provide it as soon as possible.
That looks way better, thanks! I’ll have a look at it this evening. The VBIOS didn’t seem to help, sadly. Could you please dump the content of PDISPLAY (using `nvapeek 0x610000 0x20000`), before NVIDIA/Nouveau gets loaded? And after a resume as well (still without letting drivers interact with the GPU)? Created attachment 125885 [details]
nvapeek 0x610000 0x20000 - without nouveau module loaded
Created attachment 125886 [details] nvapeek 0x610000 0x20000 - after plain blackscreening s2ram - without nouveau module loaded I just forgot about https://bugzilla.kernel.org/show_bug.cgi?id=117581. It appears to have stayed the same (blackscreening unless no_console_suspend is given). Perhaps that parameter would also have saved me from changing my H265/H270 with nouveau; gonna re-test with no_console_suspend now ... Created attachment 125890 [details]
nvapeek 0x610000 0x20000 - with nouveau module loaded, LVDS came up well (kernel 4.7.0)
Thank you for the data. When was the last one taken, after the resume? I thought I had found something: 0x61caa4 and 0x61d2a4 are set when the screen comes up, and remain to 0 after the resume for you. Unfortunately, does are RO regs, so trying to modify them won’t really help… :-/ However, my G96 can now display an image on the external screen (miniDP -> HDMI, through an adapter; but with another adapter it fails), and it does survive suspending. So I’ll compare your data with mine. nvapeek 0x610000 0x20000 - without nouveau module loaded (63.03 KB, text/plain)
2016-08-18 19:01 UTC,: taken directly before s2ram
nvapeek 0x610000 0x20000 - after plain blackscreening s2ram - without nouveau module loaded (67.57 KB, text/plain)
2016-08-18 19:05 UTC,: the same bootup, directly after s2ram; taken blindly with systemctl suspend; sleep 1; nvapeek ...
nvapeek 0x610000 0x20000 - with nouveau module loaded, LVDS came up well (kernel 4.7.0) (66.11 KB, text/plain)
2016-08-18 20:04 UTC,: another bootup (all three were tested with the same kernel) into Xorg + graphics mode with nouveau; snapshot taken after s2ram, integrated LVDS was o.k.; exterenal HDMI stayed black as always
> However, my G96 can now display an image on the external screen (miniDP -> HDMI through an adapter)
Nice to hear; I hope that will help you forth!; - concerning adapters I only have one for the other way round: UHD HDMI output -> DP input of one of my monitors. The first adapter I ordered had blackscreened but after returning that item I got a working one.
I have just seen the following in my dmesg:
> dmesg | grep -i -e warn -e error
[ 8.350192] ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20160422/utad
[ 8.356598] ACPI Warning: SystemIO range 0x00000000000011B0-0x00000000000011BF conflicts with OpRegion 0x0000000000001180-0x00000000000011BB (\GPIO) (20160422/utad
[ 8.363088] ACPI Warning: SystemIO range 0x0000000000001180-0x00000000000011AF conflicts with OpRegion 0x0000000000001180-0x00000000000011BB (\GPIO) (20160422/utad
Could it be that some of the s2ram problems are related with this ACPI-GPIO conflicts?
Pierre and everyone else, whoever else has contributed: Today we have reason to celebrate! This HDMI-s2ram-issue as well as kernel bugs 153371, 153361, 117581 and numerous other issues - like Mark Asselstine has told me - have been resolved with commit 65ea11ec6a82b1d44aba62b59e9eb20247e57c6e (Ville Syrjälä - x86/hweight: Don't clobber %rdi). |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.