CUS4020 Digital Humanities
Week 3: Data Presentation and XML
Chaak-ming LAU
Department of Linguistics and Modern Language Studies 19 Sep 2023
Week 3 – Intended Learning Outcomes (ILOs)
By the end of this class, students will be able to:
1. Understand the distinction between data and presentation
2. Encode humanities data in basic XML
3. Use a XSLT transformation sheet to display hierarchical data
Spreadsheets
Migration sheet
CUS4020.2c Migration
● Pay attention to ○ Consistency
○ Blank rows/columns
● Possible Classification
● Controlled Vocabulary
1. Create separate sheets for data/metadata and metadata specification
2. Set up a controlled vocabulary
3. Configure validation rules to reject or highlight incorrect values
Data Collection
● Now, add your own rows CUS4020.2c Migration
Presentation 1
● Simple Charts
Proportion: Pictogram
Proportion: Stacked Bar Chart
Presentation 2
● Download he annotated sheet as CSV
● Upload the file to RawGraphs
● Generate an alluvial diagram
https://app.rawgraphs.io/
Data ←→ Presentation
● Same set of data can be processed, interpreted, presented in different ways.
● They should be kept as two separate layers.
“Asia Ex-HK” may be difficult to understand. How should we fix it?
1. Go to the data table, and use a different label for this, e.g. Asia, excluding Hong Kong
2. Add a footnote to the charts
Tabular data vs Hierarchical Data
https://teibyexample.org/tutorials/TBED03v00.htm?target=structure
Limitation of tabular data
● Information about a particular item can be presented in a tabular format
○ Each column represents one element
■ Field name header (first row)
■ Values subsequent rows
● But what if we need to …
○ add more than one creator?
○ specify the surname and given name of each creator?
● XML (eXtensible Markup Language) is a markup language for encoding documents in a way that is both human-readable and machine-readable.
● It is used to represent hierarchical data.
“The term markup language comes from the publishing term to mark up, meaning to annotate sections of a text for proper typesetting.”
http://www.ultraslavonic.info/intro-to-xml/
L** K** T***
● The same record can
be represented in different structures
● Tabular data is neat but does not work well for text-encoding tasks
● Hierarchical representation
● User-defined tags
● Well-defined syntax
● Excellent software support
Let’s look at more examples.
http://www.ibiblio.org/xml/examples/shakespeare/r_and_j.xml
Romeo and Juliet in XML
● Pay attention to the structure of this file.
程序代写 CS代考 加微信: cstutorcs
A song catalogue in XML
Example from W3Schools
Let’s return to our earlier dataset
L** K** T***
● Consider expanding this table to include more information.
● What additional metadata can be added to this set of data to investigate our preference?
Adding more columns to the table?
● We can add …
○ Location
■ Name of the restaurant
○ Food items
■ ← Multiple values
● There is no way to decide how many food items there will be when the system is designed.
○ Let’s go hierarchical! ← XML
Opening XML files
● Most browsers can display XML files in a hierarchical fashion.
● You can use a text editor to edit XML files
■ Vscode.dev is highly recommended for this purpose ■ OxygenXML (paid)
● Let’s have a closer look at how XML files are structured. ○ We should be able to handcraft some XML files
Code Help, Add WeChat: cstutorcs
XML structure Let’s look at an XML element
– is the opening tag (or start tag) – is the closing tag (or end tag)
– head is the name of the tag
The two tags and everything in between are the three components of an element.
Tags must be nested properly.
That means this is not allowed:
Self-closing tags for empty elements
Declaration
This line here is called the declaration, which is here for the processor.
Attributes
name is an attribute of meal, which can take the value breakfast. type is an attribute of item, and it can take the values food or drink.
Example 2
Note that …
1. XML is case sensitive.
2. All content should be placed within a tag. Feel free to add whitespace, tabs or newlines between different tags for clarity.
Programming Help
You will be provided with an XML template: This is how it will be displayed:
How would you modify the XML file to create your own content?
https://docs.google.com/document/d/1nIY tAKyRUwfeHNo8dJSDxCrvMjzOFQvB7d qJyk7cVTk/edit?usp=sharing XML + XSLT XML + XSLT Another example of XML transformation ← XML file (=Data) DH Exercise #3 (5%, take-home assignment) DH Exercise #3 (5%, take-home assignment) DH Exercise #3 (5%, take-home assignment)
Modify the your XML fragment with some additional fields and some additional constraints.
1. Your fragment should follow the XML format.
2. In addition to existing fields, we need to add at least one food or drink item.
3. A meal type can be named either breakfast, lunch or dinner.
● XML files are not designed to be read directly.
○ XSLT (Extensible Stylesheet Language Transformations) can be applied
to an XML to transform an XML file into a desirable format. ○ Omeka does this for you, but with a general template.
● XML contains the data
● XSLT describes the
presentation of the data →
http://xsltransform.net/
My Favorite Songs
Title
Artist
Edit Your XML fragments here.
← XSL Stylesheet
← Resulting HTML
Create your own HTML content on methods of incorporating digital humanities into the field of your major studies.
Step 1: Prepare XML file
➢ Recall the previous exercise – what should be modified?
Step 2: Prepare XSL Stylesheet
➢ You will be provided with the XSL Stylesheet for exercise 1.
➢ In order to match your XML file for exercise 2, what should be modified?
Try using http://xsltransform.net/ to check if your XML and XSL files work!
Step 3: Submission ➢ Submit your
(1) XML file
(2) XSL file
(3) a screenshot of the HTML generated with your XML and XSL files
to https://forms.gle/3vDTasK1cVtZGidYA