Abstract
Accuracy and usefulness of learned data-driven PHM models are closely related to availability and representativeness of data. Notably, two particular problems can be pointed out. First, how to improve the performances of learning algorithms in presence of underrepresented data and severe class distribution skews? This is often the case in PHM applications where faulty data can be hard (even dangerous) to gather, and can be sparsely distributed accordingly to the solicitations and failure modes. Secondly, how to cope with unlabelled data? Indeed, in many PHM problems, health states and transitions between states are not well defined, which leads to imprecision and uncertainty challenges. According to all this, the purpose of this paper is to address the problem of "learning PHM models when data are imbalanced and/or unlabelled" by proposing two types of learning schemes to face it. Imbalanced and unlabelled data are first defined and illustrated, and a taxonomy of PHM problems is proposed. The aim of this classification is to rank the difficulty of developing PHM models with respect to representativeness of data. Following that, two strategies are proposed as pieces of solution to cope with imbalanced and unlabeled data. The first one aims at going through very fast and/or evolving algorithms. This kind of training scheme enables repeating the learning phase in order to manage state discovery (as new data are available), notably when data are imbalanced. The second strategy aims at dealing with incompleteness and uncertainty of labels by taking advantage of partially-supervised training approaches. This enables taking into account some a priori knowledge and managing noise on labels. Both strategies are proposed as to improve robustness and reliability of estimates.