Created attachment 131838 [details]
dmesg output from the earlier 4.10 kernel showing correct functioning
Kernel 4.11.3-1 on a Lenovo Thinkpad T510 requires nomodeset to boot. Kernel 4.10.13-1 was fine. The T510 uses an NVIDIA Quadro NVS 3100M with 512MB. The OS is OpenSUSE Tumbleweed (a semi-tested rolling release for the brave). A dmesg from the working 4.10 kernel is attached.
Downstream bug report is bugzilla.opensuse.org #1043280.
Any particular reason you're blaming nouveau?
From the 4.10 log, it appears that i915 is the primary drm driver.
What does "won't boot" mean?
What happens if you specify 'nouveau.modeset=0' (which has the effect of disabling nouveau entirely but leaving i915 as it was)?
Thanks for the prompt response.
nouveau.modeset=0 allows it to successfully boot all the way into the GUI.
In successful 4.10 kernel boots, nouveau console messages are present. In unsuccessful 4.11 boots with console logging, the console messages would stop and the screen would experience bad scrolling (redraw problems) at unpredictable points in the list but never mentioning nouveau. I know that it didn't complete booting without the screen because the ssh daemon never came up, dashing my hopes for better logging.
OK, so after booting with nouveau.modeset=0, rmmod nouveau, and reinsert it without that option. As your system should be fully up, you should get the relevant info of what fails.
Created attachment 131881 [details]
dmesg after rmmod and insmod of nouveau
Tried the command
sudo rmmod -v nouveau
sudo insmod /lib/modules/4.11.3-1-default/kernel/drivers/gpu/drm/nouveau/nouveau.ko
X was not running and I was on console. In each case stdout and stderr were redirected, but results were zero-length, so omitted for clarity.
Results of insmod:
Visibly, the four dmesg items from 1763.415095 to 1763.417364 were shown, the dispay froze up and the fan turned on, blowing very warm air. SSH was very slow, taking around a minute to get a prompt. Each command typed could take a similar amount of time. A shutdown command was never able to complete. Dmesg output during the ssh session after insmod is attached. Hope this helps.
Can you see if this patch helps any?
If not, please boot with nouveau.debug=trace as well.
Created attachment 131935 [details]
dmesg with debug=trace
Thanks for your efforts. Boot with patched driver unsuccessful. Dmesg from rmmod and insmod with debug trace attached. Thanks also to Takashi Iwai for the build of the patched kernel.
Interesting. Dies in PMU preinit somewhere, which in turn calls nkvm_pmu_reset. I don't see an obvious reason for that to die except ... if PTIMER is somehow off?
The only difference is from 1e2115d8c0c0da62405400316f5499d909e479bc which makes it so that nvkm_falcon_v1_new is now being called, although I can't imagine what would go wrong there.
This will require someone who knows what they're doing to figure out ... i.e. not me. Hopefully Ben can take a look.
I investigated an issue a while back that I believe is likely the same as what you're experiencing. I'm, unfortunately, not able to reproduce properly (long story) on any hardware I own.
I've identified a few issues that could result in what you're seeing, however, I have no idea why they've suddenly become a problem in 4.11. There doesn't seem to be an obvious commit that's the culprit here, so I can't be sure any of the issues I found will actually resolve the problem.
If you would you be able to bisect between 4.10 and 4.11, and determine the exact commit where this starts happening, that'd be a big help. It's potentially not even a nouveau commit that's triggered this.
I have the same issue on an Asus PL80Jt which has the same Nvidia Chip built in. Using modprobe.blacklist=nouveau let's me but just fine (except that PRIME won't be enabled and the card will not be disabled by vgaswitcheroo). Otherwise I experience what has been described here.
I compiled kernel 4.12-rc6 this morning and it boots fine again. So this will be fixed in 4.12. My machine is too slow to do a bisect (kernel compiles several hours) so I cannot provide you with the information what broke it originally.
Thank you Mr. Coenen. I agree. Kernel 4.12.0-rc6 does not exhibit the problem on a Lenovo T510 either.
Between a fix already in the pipeline and a modeset workaround in the meantime, I'm happy. Does anyone still need the bisection completed?
(In reply to Ben Steel from comment #10)
> Thank you Mr. Coenen. I agree. Kernel 4.12.0-rc6 does not exhibit the
> problem on a Lenovo T510 either.
> Between a fix already in the pipeline and a modeset workaround in the
> meantime, I'm happy. Does anyone still need the bisection completed?
I, personally, wouldn't mind knowing what the cause was exactly, and what fixed it. It could be important down the line if it ever reappears.
Yikes. I only waited about 28 hours before completely restoring that machine from backup and putting it back in service. Food for thought: my greatest fear is that the problem may have been illuminated by the change to GCC 7, which doesn't like compiling 4.10 kernels.