Conclusions

The goal of this analysis is to find out what features have impact on immigrants’ success along with the comparison with native-borns and their origin countries’ nationals. We found out some features are important and the others are not that much. We saw there is a difference between immigrants and native-borns in terms of their success.

Methods

EDA

There wasn’t much of differences in terms of features, such as age, education attainment, marital status, and employment status. English proficiency and racial categories are only features that have diffences between immigrants and native-borns, which is natural to happen. Instead, there ware clear differences on those features in terms of wage. Male immigrants make more money than any other gender groups. Immigrants makes more money regardless of their education attainment than native-borns. Yet, when it comes to racial categories, White and Asian native-borns make more money than immigrants. As it assumed from the introduction, there was no improvement of wage in immigrants from the developed countries with slight improvement on education attainment and employment rate. N-400 application form did a good job addressing marital status and place of birth feature having many related words. Similarly, Migration Policy Institute immigration reports also reflect well on the perspective of age, Englsih proficiency, and education attainment.

Naive Bayes

When predicting the success of immigrants, ENG, MAR, RAC1P, ESR, and AGEP variables are used, having 69% of accuracy. On the other hand, DECADE and SEX variable are also used in predicting success of native-borns. We can imply that the decade of entry and gender is not a important indicator of immigrants’ success and also that everything matters in native-borns’ success. It is interesting to see that SEX is not selected during the feature selection when being a male made a huge gap in wage in EDA.

Decision Trees

Classification Decision trees have somewhat similar results to Naive Bayes. In both immigrants and native-borns data, ENG, AGEP, MAR, and RAC1P were the most important features with around 70% accuracy. In regression trees, predicting the wage, the important features got slighly different. Regarding immigrants, ENG and RAC1P were selected as important features in both base Decision trees and Random Forests. AGEP, ESR, and MAR were the top 3 features on native-borns data.

Clustering

On the aggregated data by POBP (Place of birth), there was no clear clusters formed in all three methods. This implies that origin countries don’t play a huge role on immigrants’ success which contradicts the finding from EDA that there was a difference between 4 countries on wage, education attainment, and employment rate.

Dimensionality Reduction

Likewise, both PCA and TSNE was not able to create components that distinguish contries based on the aggregated success rate.

Association Rule Mining

The prominence of ENG, MAR, and ESR continued in ARM in immigrants’ data. Being married, employed, and fluent in English have a connection being successful as immigrants. This makes sense that being fluent in Enlish will lead to an employment and getting married is a big part of the integration in the society. On the other hand in native-borns’ data, it produced the similar result as Naive Bayes that almost all features are inter connected.

Overall, the most important features deciding the immigrants’ success were English Proficiency (ENG) and Marital Status (MAR) followed by Employment Status (ESR) and Racial Category (RAC1P). This is a significant differnce when all features are relevant in native-borns’ success.

Limitation and Future Works

Success is subjective that can be measure in either quantitatively or qualitatively. The dataset used in this project limited that there are only 7 features and the target label is made from other two features. Not only having more features, but also having numerical features will improve the accuracy predicting the success of immigrants. For further analysis, there are some areas to improve. considering the number of immigrants from each countries will add another dimension on the analysis. Also, analyzing how those features change over the years will let us dive into more in depth on immigrants’ success looking at socioeconmic trends in those years.

Back to top