21 February 2021

The Data

Approximately semi-annually (pre-pandemic) CIPO publishes a “global” release of almost 2 million XML files—one file for each & every Canadian trademark (excluding certain marks that were cancelled or expunged before 1979 or applications that were abandoned or refused before 1980).

The XML file for each mark contains a lot of detail, for example a date-specific record of prosecution, opposition, post-registration and cancellation events which have occurred since filing of the application for that mark, up until the date of publication of the XML file (the event record for marks filed before about 1960 isn't as detailed as the record for later-filed marks).  Such event details—and many others—will be explored in future posts on this blog.

In publishing a new global release, CIPO basically stops the clock and generates a complete new set of XML files to represent all Canadian trademarks as of the time of that global release.

Additionally, each week CIPO releases roughly 10 to 20 thousand new XML files.  If a new trademark application is filed, CIPO publishes a new XML file for that application.  If any step is taken by or on behalf of the trademark owner, or by CIPO, or by an interested party (e.g. an opponent, a section 45 requester, etc.) in relation to an existing trademark application or registration, CIPO republishes the XML file for that application or registration to reflect that step (i.e. event).

In order to work with CIPO’s trademark XML data and have “everything”, one must somehow deal with all of the XML files in the most recent global release, plus all of the XML files in every weekly update CIPO releases subsequent to that global.

"All" marks search result
How many XML files are there?  You can easily check by searching CIPO’s online trademark database: select ‘Application number’ as the search field, type an asterisk (*) in the ‘Enter search criteria’ box and click the Search button.  The search result, which appears in the adjacent image (click to enlarge the image) reveals that 1,765,956 trademarks were represented in the database as of 24-Feb-2021.  That number should be equal to the number of XML files contained in CIPO’s most recent global release plus all of the subsequent weekly updates which exist at the time of the search, since the XML files and the online database are populated from the same source.  However, as of this writing, that number is off by one.  Specifically, there is no XML file for one of the trademarks represented in the online database.  That’s because CIPO’s data team realized that a particular trademark record had been created in error and asked users of the XML data to delete that XML file—which I did.  However, the record for that trademark hasn’t been purged from the online database as of this writing.  I have asked CIPO to correct this discrepancy.  [Update: CIPO has corrected the discrepancy as of its 10-Mar-2021 weekly update.]

As of the date of this post, CIPO’s most recent global release was published in October 2019.  You’ll find it on the ’Historical files’ tab in the Trademarks data section of CIPOs IP Horizons web site.

Download selection tabsThe October 2019 global consists of 106 .zip files.  As of the date of this post, CIPO had published an additional 74 weekly update .zip files subsequent to publication of the October 2019 global.  The .zip files vary in size, but 270MB per .zip file is typical of the October 2019 global.  The 74 weekly updates are typically about 350MB per .zip file, but some have been almost 900MB as you can see in this partial screenshot:

Weekly download screenshot

Besides including one .xml file per mark, CIPO's global release and weekly update .zip files also include trademark design images (in separate .png files), sound mark sounds (in separate .mp3 files) and moving images (in separate .mp4 files).  In this blog I’ll deal only with the .xml files.

I’ve captured details (but not every detail) from those XML files in a data warehouse.  In future posts I’ll explain what a data warehouse is and how I transfer data from CIPO’s XML files into the data warehouse.