Bug 56202

Summary: [regression][kms] Clock rework causes kms boot regression for NV4E (v3.7-rc1, 3.6+nouveau)
Product: xorg Reporter: Ronald <ronald645>
Component: Driver/nouveauAssignee: Ben Skeggs <skeggsb>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
v3.6.0+ @ the commit that caused the regression
none
@ the commit before the one that caused the regression
none
Difference between 3.6.0+ good and bad kernel dmesg output
none
NV4E BIOS ROM, retreived thru debugfs kernel v3.6.2
none
NV4E BIOS ROM, retreived thru debugfs kernel v3.6.2
none
possible fix for the issue
none
video ROM BOIS none

Description Ronald 2012-10-19 20:21:18 UTC
Created attachment 68815 [details]
v3.6.0+ @ the commit that caused the regression

The following commit causes KMS initialisation to result in a scrambled screen with mostly blue elements. The screen responds to changes based on actual content (e.g. system booting, lightdm starting).

drm/nouveau/clock: pull in the implementation from all over the place

I have tested and verified this regression on a 3.6 tree and the recent v3.7 tree pulled from git.kernel.org.

Below you will find some more data about my VGA card.

Attached you will find two dmesg output logs (don't know if they differ):

both are from the v3.7-rc1 tree (they are noted as 3.6.0+ in the logs)
dmesg-good.txt @ the commit before the one that caused the regression
dmesg-bad.txt @ the commit that caused the regression

00:05.0 0300: 10de:0244 (rev a2) (prog-if 00 [VGA controller])
	Subsystem: 103c:30b7
00:05.0 VGA compatible controller: NVIDIA Corporation C51 [GeForce Go 6150] (rev a2) (prog-if 00 [VGA controller])
	Subsystem: Hewlett-Packard Company Presario V6133CL
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at b1000000 (64-bit, non-prefetchable) [size=16M]
	[virtual] Expansion ROM at 40000000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: nouveau
Comment 1 Ronald 2012-10-19 20:22:36 UTC
Created attachment 68816 [details]
@ the commit before the one that caused the regression
Comment 2 Ronald 2012-10-20 07:51:23 UTC
I guess this just turned pretty obvious, I took another look at both the dmesg output of both kernels.

Good kernel says:
+[drm] nouveau 0000:00:05.0: c:

Bad kernel says:
-[drm] nouveau 0000:00:05.0: c: core 24MHz

That is not a typo!
Comment 3 Ronald 2012-10-20 07:52:04 UTC
Created attachment 68829 [details]
Difference between 3.6.0+ good and bad kernel dmesg output
Comment 4 Ronald 2012-10-20 08:32:08 UTC
Created attachment 68832 [details]
NV4E BIOS ROM, retreived thru debugfs kernel v3.6.2

As recommended on the homepage, here is the VBIOS. This is all I can do for now. Hope it helps...
Comment 5 Ronald 2012-10-20 09:29:06 UTC
Created attachment 68835 [details]
NV4E BIOS ROM, retreived thru debugfs kernel v3.6.2

This time not as text/plain...
Comment 6 Ben Skeggs 2012-10-22 04:16:03 UTC
Created attachment 68897 [details] [review]
possible fix for the issue

Are you able to try the attached patch please?
Comment 7 Ben Skeggs 2012-10-22 04:19:27 UTC
(In reply to comment #2)
> I guess this just turned pretty obvious, I took another look at both the
> dmesg output of both kernels.
> 
> Good kernel says:
> +[drm] nouveau 0000:00:05.0: c:
> 
> Bad kernel says:
> -[drm] nouveau 0000:00:05.0: c: core 24MHz
> 
> That is not a typo!

I didn't track down why the "good" kernel doesn't give any answer for the core clock yet, however, I do know why 24MHz is being reported instead of 350MHz.

Both the good and bad kernels will suffer from the issue, so, you can safely ignore this part for the moment.
Comment 8 Salah Coronya 2012-10-22 05:41:50 UTC
Created attachment 68901 [details]
video ROM BOIS

For comparison and contrast - I have the same chipset, but I don't have this problem, even with the latest nouveau git.

00:05.0 VGA compatible controller: NVIDIA Corporation C51 [GeForce 6150 LE] (rev a2) (prog-if 00 [VGA controller])
	Subsystem: Hewlett-Packard Company Device 2a34
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
	[virtual] Expansion ROM at c0000000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: nouveau
	Kernel modules: nouveau

Gentoo, Mesa 8.04. xf86-video-nouveau-0.0.16_pre20120322. Kernel comand line includes "video=VGA-1:1280x1024 video=TV-1:d video=DVI-D-1:d" to work around 31961, but even without it I do not have the symptoms the reporter has.
Comment 9 Ronald 2012-10-22 06:29:13 UTC
I can verify that your provided patch fixes the issue (I pulled if from git, since you comitted already). Thanks!

@Salah Coronya: I can see that I'm using the 'GeForce Go 6150' and you are using the 'GeForce 6150 LE'. I guess that even the degree of the device being crippled varies across marketing names? =)
Comment 10 Ilia Mirkin 2013-08-21 00:06:07 UTC
The patch that fixed it is upstream:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e5a195ecc8cc0280d169d6da33c959df6336e9f

Marking as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.