Census Information Exploration

Sidra Anam, Saurabh Gupta


Data mining is an extension of traditional data analysis and statistical approaches in that it incorporates analytical techniques drawn from a range of disciplines including numerical analysis, pattern matching and areas of artificial intelligence such as machine learning, neural networks and genetic algorithms [1]. In this paper we have done numerical analysis by taking a “Census Income” dataset [2]. It is a real data of a particular area whose work is to gather all the information regarding the age, workclass, education, education number, marital status, occupation, relation, sex, capital gain, capital loss, hours per week and salary etc. For this we gathered different samples from a particular area of United States. We also inserted some records to make it useful. We found this data in as much as dirty form, that even we can’t apply cleaning tools such as ETL. For upcoming this we manually cleaned it and made it in a form so that we can apply tools to it. This paper is concentrated on the analysis and prediction of income that whether income exceeds $50K/yr based on this census data.


Keywords: Census Income dataset, ETL tool, PSW Modeler.

DOI: https://doi.org/10.26483/ijarcs.v5i3.2064


