With the help of Dr. B, we used regular expressions in search + replace operations over the XML Casablanca script to establish the main structure of our XML. We were able to isolate most scene, camera, sp, speaker, and diag elements using that method. Our next step was splitting the xml into three portions for each member to finish documenting. We split it up by approximate lines, to the closest scene. Once split, we individually used search + replace to add in "who=" attributes within the speaker elements. We had to manually move and tag instances of physical acting cues (dir elements) and tag emotional acting dues(descr elements).
There we're some inconsistencies between the files that we had to quality check and adjust. For example, there were instances of montages within the middle section of the script that didn't exist in the other files.
After we finished documenting the script, we added in the metadata and included a section for the movie metadata, and one for the script metadata. For both sections we included various aspects, such as the different people and locations involved for the movie metadata, and the script scenes and publishers for the script metadata.
Our next step was constructing the schema and associating it each xml files. We had to validate it across each file, and adjust some files further accordingly.