We consider the supervised pattern classification in the high-dimensional setting, in which the number of features is much larger than the number of observations. We present a novel approach to the sparse linear discriminant analysis (LDA) using the zero-norm. The resulting optimization problem is non-convex, discontinuous and very hard to solve. We overcome the discontinuity by using an appropriate continuous approximation to zero-norm such that the resulting problem can be formulated as a DC (Difference of Convex functions) program to which DC programming and DC Algorithms (DCA) can be investigated. The computational results show the efficiency and the superiority of our approach versus the l 1 regularization model on both feature selection and classification.