With the explosion of the Internet, LDAP directories and XML, there is an ever greater 
need to evaluate queries involving (sub)string matching. In many cases, matches need to be 
on multiple attributes/dimensions, with correlations between the dimensions. Effective 
query optimization in this context requires good selectivity estimates. In this paper, we 
use multi-dimensional suffix trees as the basic framework for substring selectivity 
estimation. Given the enormous size of these trees for large databases, we develop a space 
and time efficient probabilistic algorithm to construct multi-dimensional pruned suffix 
trees directly. We then present two techniques to obtain good estimates for a given 
multi-dimensional substring matching query, using a pruned suffix tree. The first one, 
called GNO (for Greedy Non-Overlap), generalizes the greedy parsing suggested by 
Krishnan et al., 1996 for one-dimensional substring selectivity estimation. The second one, 
called MO (for Maximal Overlap), uses all maximal multi-dimensional substrings of the query 
for estimation; these multi-dimensional substrings help to capture the correlation that 
may exist between strings in the multiple dimensions. We demonstrate experimentally, using 
real data sets, that MO is substantially superior to GNO in the quality of the estimate.