Learning to Predict Drug Target Interaction From Missing Not at Random Labels

The prediction of Drug-Target Interaction (DTI) is an important research direction in bioinformatics as it greatly shortens the development cycle of new drugs. State-of-the-art computational methods for DTI prediction adopt a binary classification framework. The supervision is incomplete, i.e. only a small amount of DTIs are known and treated as positive instances, while the rest are unknown and treated as negative. Two severe problems occur in such a framework: (1) the number of negative samples is overwhelming and (2) a negative label cannot rule out the possibility of a positive drug-target interaction. In this paper, we address the problem of learning from incomplete labels in DTI prediction. The key assumption here is that labels are missing not at random. For example, negative DTI labels are more likely to be missing because biomedical researchers prioritize to study DTIs that are more likely to be positive. We introduce a novel probabilistic model, factorization with non-random missing labels (FNML). It models the generative process for the DTI labels (i.e. the labels are positive or negative) and responses (i.e. the labels are observed or missing). In particular, the probability of observing or missing a label is associated with the sign of the label. In order to further reduce prediction variance and improve prediction accuracy on highly imbalanced DTI datasets, we present FNML-EN, an ensemble scheme which is designed specifically for FNML model. We conduct comprehen...
Source: IEE Transactions on NanoBioscience - Category: Nanotechnology Source Type: research