Doug Cooper

Center for Research in Computational Linguistics

<doug.cooper.thailand@gmail.com>

Who Says What Why When Where and How?

Mapping and Managing Comparative Data

The Mon-Khmer Languages Project addresses wide-ranging questions regarding the development and divergence of languages in Southeast Asia and beyond. The project's primary resources (developed with the support of the NEH) include a database of lexicographic reference materials, a set of comparative and etymological dictionaries that tie cited data to proposed reconstructions, and an extended collection of on-line tools for survey and analysis. All materials are freely available for use via the project website, http://sealang.net/monkhmer.

At present, the MKLP has substantial resources (more than 1,000 citations each) for about fifty languages and dialects, and sizable samples (>500 items each) for dozens more. As these resources have grown, we have developed new approaches for tagging and storing data (Cooper 2007) and for phonetic approximation in searching (Cooper 2008).

We now discuss the combination of tabular and map-based displays we provide to help users understand the resources at their disposal: to inspect underlying data sets, to restrict the scope of queries to particular branches, sub-branches, and geographical regions, and to visualize, manage and make sense of the results that are returned. Rather than focusing solely on unifying the family’s origins, these tools help reveal the innovations and diversity of its many branches as well.

References:

Cooper, Doug: "Data Sharing in the Mon-Khmer Languages Project." 3rd International Conference of Austroasiatic Linguistics, November 26-28, 2007

Cooper, Doug: "Sound[s|ed] like ...? Approximate Phonetic Search in the Mon-Khmer Languages Project." 18th Annual Conference of the Southeast Asian Linguistics Society, Malaysia, 21-22 May, 2008