Skip to main content

How to perform feature selection for text classification

1. Bag of words transformation, bigrams, trigrams
2. Stem words, remove punctuation
3. Determine how informative each feature by determining how it's distributed across the classes

Comments

Popular posts from this blog

Detecting and classifying nodules in Lung CT scans

1        Definition 1.1 Project Overview Lung cancer is the second most common cancer in both men and women that afflicts 225,500 people a year in the United States. Nearly 1 out of 4 cancer deaths are from lung cancer, more than colon, breast, and prostate cancers combined [1]. Early detection of the cancer can allow for early treatment which significantly increases the chances of survival [2]. Lung cancer screening is performed with a CT scan that collects hundreds of images to build a full 3D composite of the lung. Next, small growths called pulmonary nodules need to be detected. These nodules show up as small, circular structures on the CT scans. Figure 1 Lung nodule at left arrow, cigarettes and lighter at right arrow In some cases, the nodules are not obvious and may take a trained eye and considerable amount of time to detect. Building a machine learning algorithm that can automatically detect the nodules can save ...