Approach normalize within each column similarity metric discussion: Pearson's correllation coefficient threshold value for marking as similar discussion: finding critical value 47