OK—you have downloaded one of CIPO’s global or weekly .zip files and unzipped it into a separate folder. That folder will typically contain ten to twenty thousand files. Most of them will be .xml files, but there will also be quite a few .png files (containing design mark images) and possibly some .mp3 or .mp4 files (containing sounds & moving images). We’ll ignore the .png, .mp3 & .mp4 files—but now what?
Like HTML, XML (eXtensible Markup Language) employs tags to encapsulate information. Unlike HTML tags, XML tags impart no display characteristics (e.g. fonts) to the tagged information. Also unlike HTML tags, XML tags are user-definable. This means that they can be—and usually are—self-describing. In the case of CIPO's trademark XML data, “user” means WIPO and CIPO. That is, WIPO defines some of the tags in accordance with its ST.66 and ST.96 standards, and CIPO defines other Canada-specific tags as set forth in its Trademarks Data Dictionary. I’ll discuss examples of both tag types in future posts.
XML files contain nothing but plain text. The information of interest is arranged hierarchically by encapsulating it within user-defined plain text tags.
The tags are enclosed in angle brackets < >. Typically there’s an opening tag and a closing tag. The closing tag is identical to the opening tag, except for the inclusion of a ‘/’ character at the beginning of the closing tag.
Since XML files contain nothing but plain text, you can inspect them with a simple text editor such as the Microsoft Windows Notepad utility. Here is a small portion of CIPO's .xml file for application serial no. 2084791 DISPERSA as viewed in Notepad (click to enlarge the image). Note the opening tag <tmk:MarkVerbalElementText> and the closing tag </tmk:MarkVerbalElementText> encapsulating the information content DISPERSA. In accordance with Annex A of WIPO's ST.66 standard, the tmk:MarkVerbalElementText element encapsulates the verbal elements of a Word Mark.A web browser does a somewhat better job. Here is the same small portion of the same XML file, as viewed in Microsoft Internet Explorer. Notice that the the tags are color highlighted and their hierarchical structure is made more apparent by indenting different levels of the hierarchy. The encapsulated information content is also bolded. This makes it somewhat easier to inspect the contents of a single XML file—if that is all you want to do.
It’s very useful to be able to inspect individual XML files. Maybe you are only interested in a particular XML file’s content and don’t mind manually inspecting and deciphering the file’s tagged information as outlined above. More generally however, what we want to do is examine the information content of a number (preferably a very large number) of trademark data XML files. Future posts will delve into that topic.