We study the problem of estimating selectivity of approximate substring queries. 
Its importance in databases is ever increasing as more and more data are input by 
users and are integrated with many typographical errors and different spelling 
conventions. To begin with, we consider edit distance for the similarity between a 
pair of strings. Based on information stored in an extended N-gram table, we 
propose two estimation algorithms, MOF and LBS for the task. The latter extends the 
former with ideas from set hashing signatures. The experimental results show that 
MOF is a light-weight algorithm that gives fairly accurate estimations. However, if 
more space is available, LBS can give better accuracy than MOF and other baseline 
methods. Next, we extend the proposed solution to other similarity predicates, 
SQL LIKE operator and Jaccard similarity.