Sarthak Garkhel
Vol. 7, Jan-Jun 2019
Abstract:
A web application is worked for order bioinformatics datasets. Our application gives a simple and intuitive visual interface which will be valuable for specialized and non-specialized clients. This application is mainly used for classification bioinformatics datasets, especially multi class large datasets, using sequential and parallel classification algorithms that is hopefully be widespread acceptance and adopted in both academia and business. Biological datasets are applied and classified using both serial as well as parallel support vector machine. Our proposed application has been changed altogether without any preparation introduces a general system for information pre-handling, order, and expectation. These three main tasks are applied in different datasets of different size such as Leukemia, Colon-cancer, Breast-cancer, DNA, and Protein. In the pre-processing phase, various types of data pre-processing techniques like Data Cleaning, Data Transformation, Data Reduction, and Data Discretization are used to solve incomplete and/or inconsistent problems in raw data. Then, in classification phase, a classification starts to work on pre-processing data according to different algorithms such as Serial SVM Algorithm, Parallel SVM Algorithm, Clustering, Decision Trees, Genetic Programming, and Bayesian Networks to produce a trained model based on training datasets. Finally, in the prediction phase, the trained model is used to predict the class value of a new instance in a given dataset. In order to establish an efficient and effective prediction model, we have taken into account that our prediction model must have the following criteria Accuracy, Speed, Robustness, and Scalability. The purposed application has shown much promise due to its robust classification capabilities to produce a prediction model with high accuracy ranging from 70.32 % to 97.33 %.