ANNE: A weight engineering approach that extracts encoded information in an artificial neural network model via manipulating neuronal weights learned from data


Summary ^

Artificial Neural Network Encoder (ANNE), a novel weight engineering method developed in Hu Li's lab that harness the power of autoencoder and demonstrated that it is possible to decode meaningful information encoded in ANN models trained for specific tasks. We applied ANNE on breast cancer gene expression data with known clinical properties as case studies. Our work illustrates the trained autoencoder models are indeed information encoders that meaningful gene-gene associations with numerous supported evidences can be retrieved. This work therefore opens a new avenue in machine intelligence that ANN models will no longer perceived as tools to perform recognition tasks but as powerful tools to extract meaningful information embedded within the sea of high dimensional data.

ANNE workflow

Overview of Artificial Neural Network Encoder (ANNE) algorithm using breast cancer prognosis as an illustrative example. (a) Model building and training. Gene expression profiles of breast cancer patients with known prognostic outcomes were assigned as good and poor prognosis according to disease relapse-free survival (DRFS) that inform survival length of patients without cancer relapse. Models corresponding to good and poor prognosis respectively were built. Autoencoder algorithm was used to train the models, with input vector represents all genes present in transcriptomics data and the expression value of each gene corresponds to respective node at input layer. The dimensionality of output layer (i.e. number of nodes) is the same with input layer. The aim of autoencoder training is to reconstruct values from input layer at the output layer. The resulting output vector generated from output layer will be compared with input vector to compute error of reconstruction. The training process was repeated by updating weights connecting nodes (or neurons) from input layer to hidden layer and from hidden layer to output layer via backpropagation algorithm. Training process will come to halt after no further improvement on reconstruction error was achieved. (b) Network decoding via weight engineering approach. Weights connecting all nodes (neurons) from input to output layers in a trained autoencoder model were used to decode meaningful gene-gene associations encoded in trained models using an association scoring scheme. Computed association scores for all gene pairs were given in an association score matrix where gene pairs with top 200 absolute scores were selected. Genes that occur multiple times in these top 200 gene pairs will serve as "anchors" to agglomerate gene pairs into a network.

Results ^

Explore gene association networks derived from autoencoder models for phenotype groups here.

Download ^

Download the scripts on GitHub with sample dataset to run ANNE on your local Linux system. Extract the scripts and sample datasets into the same folder and read the README file to get started.

Support ^

For support or questions of ANNE, please post to our google group.

Citation ^

Manuscript in preparation.

© 2016 H Li • All Rights Reserved