Snippets Groups Projects

CVs with high number of folds don't return predicted train data with equally represented folds.

The issue is best explained with an example. Given the following cross-validator setup

        slice #:| 1. |  2. | 3. | 4. | 5. |
        fold 0: | Tr | Tr  | Tr | Te | Va |
        fold 1: | Tr | Tr  | Te | Va | Tr |
        fold 2: | Tr | Te  | Va | Tr | Tr |
        fold 3: | Te | Va  | Tr | Tr | Tr |
        fold 4: | Va | Tr  | Tr | Tr | Te |

HepNet.predict(cv='train') will return predicted train data where slice 1 of the dataset was parsed through the trained network of fold 2, slice 2-4 from fold 4, and slice 5 from fold 3, since the predicted data is overwritten when looping over the different folds. This means the trained networks from fold 0 and fold 1 are not represented at all. They are, however, represented when predicting the validation or testing set. This prevents proper comparisons between the network outputs of the train/test/val set for the full dataset.

Designs

Child items ...

Activity

Frank Sauerburger added EM::B (Important) label 4 years ago

added EM::B (Important) label
Frank Sauerburger mentioned in commit 20205294 4 years ago

mentioned in commit 20205294

By Benjamin Paul Jaeger on 2020-11-07T00:43:43 (imported from GitLab)
Frank Sauerburger mentioned in merge request !60 4 years ago

mentioned in merge request !60

By Benjamin Paul Jaeger on 2020-11-07T00:44:58 (imported from GitLab)
Frank Sauerburger unassigned @frank 2 years ago

unassigned @frank

Please register or sign in to reply