22 February 2021

Viewing trademark XML data

OK—you have downloaded one of CIPO’s global or weekly .zip files and unzipped it into a separate folder.  That folder will typically contain ten to twenty thousand files.  Most of them will be .xml files, but there will also be quite a few .png files (containing design mark images) and possibly some .mp3 or .mp4 files (containing sounds & moving images).  We’ll ignore the .png, .mp3 & .mp4 files—but now what?

Like HTML, XML (eXtensible Markup Language) employs tags to encapsulate information.  Unlike HTML tags, XML tags impart no display characteristics (e.g. fonts) to the tagged information.  Also unlike HTML tags, XML tags are user-definable.  This means that they can be—and usually are—self-describing.  In the case of CIPO's trademark XML data, “user” means WIPO and CIPO.  That is, WIPO defines some of the tags in accordance with its ST.66 and ST.96 standards, and CIPO defines other Canada-specific tags as set forth in its Trademarks Data Dictionary.  I’ll discuss examples of both tag types in future posts.

XML files contain nothing but plain text.  The information of interest is arranged hierarchically by encapsulating it within user-defined plain text tags.  

The tags are enclosed in angle brackets < >.  Typically there’s an opening tag and a closing tag.  The closing tag is identical to the opening tag, except for the inclusion of a ‘/’ character at the beginning of the closing tag.

.xml seen in Notepad
Since XML files contain nothing but plain text, you can inspect them with a simple text editor such as the Microsoft Windows Notepad utility. Here is a small portion of CIPO's .xml file for application serial no. 2084791 DISPERSA as viewed in Notepad (click to enlarge the image). Note the opening tag <tmk:MarkVerbalElementText> and the closing tag </tmk:MarkVerbalElementText> encapsulating the information content DISPERSA.  In accordance with Annex A of WIPO's ST.66 standard, the tmk:MarkVerbalElementText element encapsulates the verbal elements of a Word Mark.

Also note the tags’ hierarchical arrangement.  For example, the aforementioned tag pair is further encapsulated within  the <tmk:WordMarkSpecification></tmk:WordMarkSpecification> tag pair—along with two other tag pairs, each of which encapsulate more information.  Notepad indents the tags, which helps us identify the hierarchical structure.  (In accordance with Annex A of WIPO's ST.66 standard, the tmk:WordMarkSpecification element encapsulates various elements concerning a Word Mark, including those seen in the above image.)

The depicted extract is just a small part of CIPO’s .xml file for application serial no. 2084791.  Anyone familiar with trademark information could probably read the plain text .xml file and discern its meaning fairly readily.  However, XML documents are not normally intended for human reading.  Their primary purpose is to preserve information in organized, structured, computer-readable form.

You can also use a word processor such as Microsoft Word to inspect an XML file, but don’t bother.  In my experience Word isn’t well suited to working with XML files.

.xml seen in Internet Explorer
A web browser does a somewhat better job.  Here is the same small portion of the same XML file, as viewed in Microsoft Internet Explorer. Notice that the the tags are color highlighted and their hierarchical structure is made more apparent by indenting different levels of the hierarchy. The encapsulated information content is also bolded.  This makes it somewhat easier to inspect the contents of a single XML file—if that is all you want to do.

.xml seen in Notepad++
If you’ll be doing a lot of work with XML files, you’ll probably want to use a dedicated XML file utility, such as Microsoft's XML Notepad, or the open source Notepad++ utility.  These utilities have a number of useful features particularly adapted to working with XML files.  Here we see the same small portion of the same XML file, as viewed in Notepad++.  Note the vertical bars which help identify different portions of the tags’ hierarchy.

It’s very useful to be able to inspect individual XML files.  Maybe you are only interested in a particular XML file’s content and don’t mind manually inspecting and deciphering the file’s tagged information as outlined above. More generally however, what we want to do is examine the information content of a number (preferably a very large number) of trademark data XML files. Future posts will delve into that topic.