Three changes (one in _glm.py and two in _glm_cv.py) to enable more detailed analysis of CV performance#935
Open
alanchalk wants to merge 2 commits intoQuantco:mainfrom
Open
Three changes (one in _glm.py and two in _glm_cv.py) to enable more detailed analysis of CV performance#935alanchalk wants to merge 2 commits intoQuantco:mainfrom
alanchalk wants to merge 2 commits intoQuantco:mainfrom
Conversation
…indices in CV\n\n- Make convert_from_pandas public in _glm.py to allow external predictions\n- Scale test weights to 1 in _glm_cv.py to ensure correct test deviance\n- Store train indices from each fold as self.train_indices_ in _glm_cv.py
Author
|
Hi, I notice you have been doing some testing. The tests seem to have been written with knowledge of the original bug, for example one of the tests is:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
At the moment:
The deviances in deviance_path_ are too low. This is because w_test (the test weights) are not rescaled to 1.
The predict method of GeneralizedLinearRegressorCV does not work. This seems to be because it inherits from the glm, and in the linear_predictor it uses X @ self.coef_path_[alpha_index]. This is not correct when coef_path_ comes from CV since the first dimension is then the number of folds.
The CV method provides the deviance_path_ for (average) validation performance but not for train performance. Knowing how train and validation performance compare as penalization is reduced, is useful in practice.
To address some of the above, I have made, compiled and tested three changes
These 3 changes allow the user to create any predictions needed and to create both train and validation curves on the data used for CV. I have a notebook which does this using a version of glum which I have built.
I have not tried to fix the predict method for CV.
A gist which will run on the new build, and which demonstrates use of the changes is at:
https://gist.github.com/alanchalk/cbb68ff9741ec89504d6f21b4b1ff344
(The gist mentions that if you are on a mac you can pip install the revised build from test pypi. If you need windows or linux I can try to add.