CDLI :: news

CDLI SEARCH ENGINE IMPROVED

Robert Casties and Jacob Dahl of the Berlin Max Planck Institute for the History of Science have rewritten the cdli search engine to ensure greater speed and stability. This search engine has now replaced the project's Filemaker-based search page at the new URL http://cdli.ucla.edu/search/search.pt .

The new cdli search has been successfully tested on Firefox and Safari, and we would request that users report to us any difficulties they encounter on other browsers.

Transliterations are now fed to the result pages live from the cdli transliteration server, to which project associates upload their contributions and collations directly. Images are pre-processed to jpeg format in thumbnail and full web resolution (usually 75 and 300ppi); once integrated into UCLA's Digital Library Program, this feed will be taken directly from archival tiff images.

It is now again possible to search in cdli transliterations for either a word or a part of a word (i.e. a single grapheme or a string of graphemes). Whereas the results for a regular catalogue search can be displayed as a list with or without images or in single page browse mode, search in the transliterations are defaulted as a concise list without images. Once a list of results has been loaded, however, it is possible to change the view to "full transliterations with images". Both search in catalogue and transliterations can be sorted according to standard criteria already available in cdli's previous search engine. However, sorting a search in transliterations takes longer than sorting a catalogue search only. Sorting can be done either prior to starting the search by clicking on the pertinent sorting button, or after the search results have been displayed by clicking on the pertinent header.

Sorting in catalogue and transliterations can be combined.

When searching for a word, the search-engine automatically excludes a number of non-grapheme characters, such as "#" following sign readings and denoting damaged sign, or "?", "!", and so forth. Word search is "match entire word" search. That means a search for lu2-kal-la will return lu2-kal-la, lu2#-kal-la, lu2-[kal]-la, and so forth, but not lu2-kal-la-ta. We are working on an experimental "expert" search, and hope to post documentation of that in the near future. A normalized search is also planned, but not yet implemented. Some regular expressions work with the current cdli transliterations search; we recommend your experimental use of the search function. Searching for two words separated by a blank space at the moment finds all occurrences of the two words in the same text (searching for lu2-kal-la a-kal-la will find all texts mentioning these two Ur III administrators together anywhere in the text); we expect to implement an in-line search in the near future. The "part of word (grapheme)" search on the other hand allows the user to perform a broader word search including instances with post-positions and so on. For instance, lu2-kal will return lu2-kal-la, lu2-kal-la-ta. This search can also be used to search for two graphemes in the same text, and will be expanded with an in-line feature as well.

In the "full transcription" mode, the found words or parts of word are highlighted in red, in the "lines with words" mode only the lines with the queried word are displayed.

The use of wild-cards is possible but the display of results as well as highlighting will not work correctly.