Reader comments will be accepted for a week/two. Please leave them on this blog post.
This report investigates in detail various aspects of the first five Word documents (1.doc, 2.doc, … 5.doc) that Guccifer 2 published on his WordPress.com blog site. It was widely reported that the first document, 1.doc, displayed “Russian fingerprints” (Russian error messages written in Cyrillic letters). In this report we describe how those “Russian fingerprints” became embedded inside 1.doc.
The sequence of circumstances that created these “Russian fingerprints” is sufficiently complex and unusual to raise the question: Did Guccifer 2 plant those “Russian fingerprints” intentionally?
There are many detailed observations and conclusions in this analysis. A few are highlighted below.
- Guccifer 2’s first five Word documents (and a template file used to create the first three documents) can be located in email attachments in the Wikileaks Podesta email collection. Guccifer 2’s five Word documents were published about four months ahead of the Wikileaks disclosure of the Podesta emails.
- The fact that precursors of Guccifer 2’s first five Word documents are found in the Wikileaks Podesta email collection does not definitively indicate that those emails were the actual source; it simply indicates that such a conclusion cannot be ruled out.
- The source of the widely reported “Warren Flood” (author) and “GSA” (company) metadata which appeared in Guccifer 2’s first three Word documents was found in the Podesta email collection. A document named, Slate_-_Domestic_-_USDA_-_2008-12-20.doc was used as a template when creating 1.doc, 2.doc, and 3.doc. This template injected “Warren Flood” as the author value and “GSA” as the company value in those first three Word documents. This template also injected the title, the watermark and header/footer fields found in the final documents (with slight modifications).
- The appearance of the widely reported “Russian fingerprints” (Russian error messages) that appear in PDF’s created from 1.doc is the result of a long chain of contributing factors and explicit actions. We think that the progression for creating 1.doc probably went along these lines:
- A particular source document, the “Trump opposition report” was chosen. Out of over 2000 Word (.docx) documents in the Podesta email collection, only four (4) contain problematic URL’s that will cause Word 2007 to mis-handle them and to incorrectly diagnose them as invalid. The other three (3) potential source documents have no particular significance to the Trump 2016 election campaign.
- Thus, only the “Trump opposition report” has some significance and has the necessary characteristics which after several unusual steps leads to the final 1.doc document that has embedded Russian error messages in it. Guccifer 2 leaked this document to the media before publishing it on his blog site later that same day.
- Word 2007 was used to create the document. Based on our testing, only Word 2007 has a bug that is triggered by problematic URL’s in the “Trump opposition report”.
- Russian language settings were enabled, both in Word and at the system level. This ensured that the embedded error messages are displayed in Russian (using the Cyrillic alphabet).
- Some of the hyperlink addresses in the “Trump opposition report” reference URL’s with ‘%20’ (HTML space) characters in them. These URL’s triggered a bug in Word 2007.
- When the source document (the original “Trump opposition report”) is first opened, Word 2007 will issue a warning that there are problems with the document’s content. The user will have to confirm twice that he acknowledges the presence of those errors and wants Word 2007 to attempt a recovery.
- The attempted recovery will be only partially successful; the problematic URL’s will be converted into empty URL’s. When this new file is saved in RTF format and then subsequently opened again, Word 2007 will diagnose those empty URL’s as invalid and display a locale-specific error message.
- The choice of the RTF file format is unusual and surprising. RTF has not been in wide use for over 25 years. In fact, Word 2010 deprecated support for RTF. RTF is the only format that will retain the embedded Russian error messages that are found inside the final 1.doc document.
- A copy/paste from the initially saved RTF file to another (empty template) document, followed by another “Save as (RTF)” operation is a necessary additional step, needed to embed the Russian error messages into 1.doc.
- This copy/paste operation followed by a second “Save as (RTF)” operation will embed the Russian error messages into the URL’s text display value of the faulty hyperlink fields inside 1.doc. These embedded Russian error messages will become known as the “Russian fingerprints”. They appeared in the PDF files (derived from 1.doc) that were analyzed by various journalists/researchers.
The following timeline summarizes some key events and developments as they relate to the analysis of Guccifer 2’s early Word document disclosures. For a much more detailed timeline, consult Adam Carter’s Guccifer 2 timeline.
- [2013-07-13] As noted by Thomas Rid (@RidT), the original Guccifer (Marcel Lazăr Lehel) disclosed a similar version of Guccifer 2’s 4.doc in the summer of 2013. Additional metadata analysis indicates that the source document dates back to the time of the Obama administration (2008).
- [2016-06-14] Via the Washington Post [archive] the DNC announced it has been hacked. The WaPo article mentions (in its headline and in the body of the article) that DNC fears that a Trump opposition research document (now known as 1.doc from Guccifer 2) may have been stolen by the alleged hackers.
- [2016-06-15] The security firm, Crowdstrike, who was hired by the DNC, published a blog [archive] article which attributed the alleged DNC hack to Russian state actors.
- [2016-06-15] Guccifer 2 arrived on the scene that same day. Guccifer 2 quickly published ten (10) Office documents on his WordPress.com blog [archive]. Five (5) of those are Word documents; the first one, 1.doc, is the main subject of this analysis. Guccifer 2 initially posed as a Romanian (lone wolf) hacker, but as time went on his story began to deteriorate. Some pundits quickly assigned Russian attribution to Guccifer 2, partly due to Cyrillic artifacts in his first five Word documents. Also, in an online chat, it was observed that Guccifer 2 had weak fluency in Romanian.
- [2016-06-15] That same day, two media outlets published stories, covering 1.doc (the DNC sourced “Trump opposition report”), which was apparently pre-disclosed to them by Guccifer 2. Those media outlets were The Smoking Gun [archive] (TSG) and Gawker [archive]. Their early coverage focused on the content of the Trump opposition report, but the journalists did not notice the presence of error messages written in Russian, nor other Russian indications in the document’s metadata properties.
- [2016-06-15] Matt Tait (@pwnallthings), a security blogger/journalist, began following Guccifer 2. Matt started a Twitter mega-thread here. Matt’s involvement with Guccifer 2 will cause him to be interviewed by Mueller as part of the Mueller investigation of Michael Flynn [archive] in October, 2017.
- [2016-06-16] One day later, a well known online media outlet, Ars Technica [archive] (which covers technology topics), reviewed the PDF [archive] posted by Gawker; this PDF is derived from 1.doc. Ars Technica noticed the presence of error messages located in the last few pages of the 200+ page PDF. Those messages were written in Russian (using the Cyrillic alphabet).
- [2016-06-16] That same day, a much less well-known online media outlet, Independent Voter Network (@IVN), raised various questions and floated various theories on the initial reporting done by TSG and Gawker. In this report, we challenge various claims [archive] made by IVN.
- [2016-10-07] Wikileaks released the first batch of Podesta emails. Per our analysis, all five of Guccifer 2’s first five Word documents (and an additional document used as a template) can be matched with source documents that were included as attachments to Podesta’s emails. We do not conclude that Podesta’s emails were the actual source of Guccifer 2’s first five Word documents, but note that this conclusion cannot be ruled out.
- [2017-05-28] Llama (@jimmysllama) makes a concerted effort to find sources for the first 200/so documents that were published by Guccifer 2. Llama published her findings [archive] on her blog site. Llama finds 4 out of 5 of Guccifer 2’s first batch of Word documents; she finds them in the Wikileaks Podesta email collection. Only 2.doc is AWOL.
To date, there has been a collective effort to gain a better understanding of Guccifer 2’s methods and motivations. This article will hopefully add another brick in our wall of knowledge related to Guccifer 2. Some of the analysis described here repeats aspects of prior work; it is included so that this report is self-contained.
Editorial note: In this report, the pronoun, “we” is used as short-hand for “the Forensicator”.
Locating the Source Documents in the Podesta Email Collection
Based on our review of prior research, only the source document for 2.doc remained missing. Guccifer 2 titled the document as “2016 GOP presidential candidates”. We plugged a search string along the lines of “2016 GOP” into the Wikileaks attachment search box and it came up with a likely match. The full correspondence table is shown below.
A tab-separated file with the results listed above is here. (The “.xls” extension is necessary to work around a WordPress.com restriction. Interested readers can save it as a text file or simply let Excel open the file.)
The detailed metadata for the source documents found in the Podesta emails is shown below.
A tab-separated file with the results listed above is here.
In passing, note that the source documents for 1.doc, 2.doc, and 3.doc are “.docx” (newer Word format) files and the source files for 4.doc and 5.doc are legacy Word (“.doc”) documents.
A Quick Look at Guccifer 2’s Document Metadata
Some relevant metadata for Guccifer 2’s five documents are shown below.
A tab-separated file with the results listed above is here.
The fields highlighted in blue have values that are different from their matching source document.
Note: The “last modified by” value of “user” in 4.doc is different than in the source document – there it is spelled “User”.
The yellow highlighted fields (based on our analysis) were inherited from a file used as a template.
The “Save As (RTF)” operation in Word will reset the version number to “2”; both the Created and Last Modified dates will be identical; the Last Printed date will be inherited from the original. Thus, 4.doc and 5.doc appear to be the result of a “Save As (RTF)” operation with no subsequent edit operations.
Document Comparison Demonstrates that Source Documents Match
We used Word to compare each source document (all of which were found in the Podesta email archive) with Guccifer 2’s five Word documents. This comparison confirms that there are no significant textual differences between the source documents and Guccifer 2’s five published Word documents.
Why Were the First Five Documents named [1-5].doc?
The first thought that comes to mind is that Guccifer 2’s intent was to highlight the Trump opposition report (1.doc) as the first document to open; it has the Russian “fingerprints”. The rest just followed suit. Note that the other documents in Guccifer 2’s first data drop retained their original names (with slight changes such as changing ” ” into “-“, and making all letters lower case). The document naming convention for the first five documents is unique and seems deliberate.
Some researchers have suggested that production of these five documents was outsourced. In this scenario, they were given generic names so as not to draw attention to them, or to highlight the political content that they contained. We do not necessarily support that theory, but mention it here for consideration.
All Five Documents are Encoded as RTF Files with Cyrillic Settings
This has been noted elsewhere; we will demonstrate it here again for completeness.
An RTF file is simply a text file that contains markup language commands defined by the RTF specification.
Typically, RTF files are saved with an “.rtf” suffix, yet Guccifer 2’s documents end in “.doc”. We think that it was necessary for Guccifer 2 to rename the RTF files to work around a file name restriction imposed by WordPress.com. We use a similar technique; we save tab-separated text files with an “.xls” extension so that WordPress.com will accept them; Excel can open these files, though formatting will need to be applied by the user and ultimately the user may wish to save the file in “.xlsx” format.
The metadata associated with the PDF document that TSG published indicates that its source document was named 1.doc, not the expected 1.rtf form that Word would have chosen (by default) when 1.doc was saved as an RTF file. This suggests that Guccifer 2 had already prepared his documents for upload to his WordPress.com website before sharing them with journalists.
Since RTF files are specially formatted text files, they can be easily viewed in a text editor.
Above, we see the string “ansicpg1251″; this establishes the default “code page” as 1251. If we save the original source document, “12192015Trump Report – for dist.docx” as an RTF file (with default US language settings) and view the resulting RTF file we see that it uses code page 1252 instead.
Note: this default code page appears to be derived from the system (computer) language setting. Thus, simply enabling the use of the Russian language in Word is insufficient to generate this particular RTF control word setting.
A code page is defined as follows.
The 1251 code page encodes Cyrillic characters.
The 1252 code page encodes Western Latin (ANSI) characters. It is the encoding used when default English language settings are in force.
If we search Guccifer 2’s RTF documents and the source Word documents pulled from Podesta’s email attachments (saved in RTF format), we see the following.
Above, all of Guccifer 2’s five documents use a default Cyrillic character encoding, the Podesta email attachment documents (saved as RTF with default English language settings) use the expected Western Latin encoding.
Why did Guccifer 2 Save Those First 5 Documents in RTF Format?
Guccifer 2 may have saved his first batch of Word documents in RTF format to make it obvious that Cyrillic (Russian) character encodings were in force. When researchers initially reviewed these RTF files, they noticed the indications that we discussed above. They were able to quickly assert that Guccifer 2 used a Russian keyboard and to conclude that Guccifer 2 was likely Russian.
As we discuss below, 1.doc may have been saved as an RTF file simply to generate (and embed) the Russian malformed hyperlink error messages which later became known as the “Russian fingerprints”. We think that the conversion to RTF caused Word to re-encode each hyperlink into RTF syntax, and this conversion hit a bug in Word 2007; this bug placed empty hyperlink addresses into the final RTF document. When that RTF document was subsequently opened, Word 2007 issued locale specific error messages when it encountered the empty URL’s.
Guccifer 2 may have saved the other four documents as RTF files simply to follow suit and make the action of saving 1.doc in RTF format not seem overt.
Guccifer 2 Used a Template for the First Three Documents
Guccifer 2’s first three (3) Word documents have an Author value of “Warren Flood”, a Company value of “GSA”, and a Title value of “_TITLE”. We searched the Podesta email attachments looking for files with matching field values. Two files were found, shown below.
A tab-separated file with the results listed above is here.
Based on our analysis, we conclude that the file highlighted in blue is the file that was used as a template. That file is named Slate_-_Domestic_-_USDA_-_2008-12-20.doc. Note that the “last printed” times for both files shown above precede the file create date. This may indicate that a master file was modified and then saved with a unique name (with the date of save encoded in it) as new versions of the file were created. To shorten the description that follows, we will call the “USDA” document the “template”.
Both the template and Guccifer 2’s first three documents have a watermark and a footer. All of Guccifer 2’s first three documents have the same metadata values (mentioned above) as those in the template document. However, in the image below we see that the watermark and footer in the template document differ from that of the final documents.
We observe that the original template document had the word DRAFT in the watermark and a date field in the footer. The template document uses “Century” font for both its main font and its watermark font. However, if we look at 3.doc, we see it uses “Calibri” as its main font – yet its watermark font is “Century” [h/t Adam Carter].
The fact that 3.doc‘s watermark font matches the template’s watermark font (“Century”), and yet it is different from its own main font (“Calibri”), supports our conclusion that this template was used to create Guccifer 2’s first three Word documents.
Looking for Flood in All the Right Places
We have two possible template documents: the “ED” and the “USDA” documents mentioned above. Both share the same watermark and footer values, along with their Author, Company, and Title values. We found the most likely template document by analyzing their internal “insert RSID” values and matching them against those in the final documents [h/t Adam Carter]. To view the relevant RSID’s in the two candidate template documents, we first open them in Word and then save them as “RTF” documents; RTF documents are text files which contain RTF markup language commands. We look for RTF commands that define various “insert RSID’s”.
We look for an RSID in the neighborhood of the (“CONFIDENTIAL DRAFT”) watermark in the USDA document.
The closest RSID is 6842998. Next, we search for that RSID in all of Guccifer 2’s first five documents, the two candidate template documents, and also throw in the other five source documents (all saved as RTF).
That is the result we expected: a match on the USDA document and the three Guccifer 2 documents. We try again, but use the relevant insert RSID from the ED document.
This ED document RSID matches only on itself. Therefore, the USDA document (or a version of it) was likely used as a template to create Guccifer 2’s first three Word documents.
We have separately confirmed, by pasting content into the template file and then saving it as an RTF file, that the template’s watermark RSID is preserved. This is true even after changing the watermark (removing DRAFT). We were also able to change the footer and remove the page number without altering the watermark RSID. At present, we do not know whether the version of the template used by Guccifer 2 had the DRAFT string and page number removed, or this change was the result of an explicit step taken by Guccifer 2 as he created the three Word documents. Based on other observations, we think that Guccifer 2 explicitly removed the word DRAFT and the date/time field in the footer.
Why was the Template Document used to Create [1-3].doc?
The presence of the “Warren Flood” and “GSA” attributions present in [1-3].doc initially received a lot of attention when the [1-3].doc files were published by Guccifer 2. We have shown that the USDA document was likely used as a template and that this template transferred those metadata items into the final [1-3].doc files.
We see another possible, important, reason for using a mostly empty legacy Word (.doc) file as a template: A copy/paste operation from an RTF document generated by saving the original Trump opposition report as an RTF file (using Word 2007 with Russian language settings) was a necessary step to embed the Russian error messages into the final RTF file (1.doc).
Thus, the use of an intermediate template file justified the copy/paste of the intermediate RTF document into the template document, followed by a final “Save as (RTF)” operation. If we accept that working theory, then the 2.doc and 3.doc files followed suit to draw attention away from the critical additional copy/paste operation that was needed to embed the Russian error messages (the “Russian fingerprints”) into the final 1.doc document.
Why was the Date Field Dropped from the Page Footer?
In the previous discussion, we observed that Guccifer 2 removed the “date field” from the template’s page footer. Below, we offer a possible explanation as to why Guccifer 2 may have decided to delete that date field.
Here is some information about “fields” as they are used in Microsoft Word documents.
When a user inserts a (current) date field into his/her footer line, Word will insert a “date field”. At a detailed level, the format of the date field can be controlled by the user, as described here. That site is a bit dated, but most of the information still applies. Here is an excerpt.
We can find that DATE field in an RTF document as shown below. This RTF file is the template file (USDA) used to create Guccifer 2’s first three documents; the original .doc file was saved as RTF.
In blue, we see the field names (DATE and TIME). In green, we see the localized date format; this format can be changed by the user by editing the field value. The values shown are for a default US installation. The date/time values in red are enclosed as arguments of a “\fldrslt” control word. The RTF specification tell us the following.
Based on some experimentation, the values in red appear to be the local date and time when that file was first saved (as RTF, in this case). Small edits to the text and subsequent saves to a different file name seemed to leave those date/time values unaffected. This makes sense, because \fldrslt is there just to provide a default value.
The syntax of the default (localized) date/time format will depend on where/how Word was installed. Date and time formats vary depending upon the region [Wikipedia] where they are used, for example.
From this analysis of the RTF DATE/TIME fields, we learned:
- The default DATE/TIME fields will encode the local date/time format. This is localized based on where/how the Word application was installed.
- The RTF documents of interest ([1-3].doc) would have recorded the date and time when the files were initially saved in RTF format, if those fields had been retained in the final versions of those documents.
- The fact that Guccifer 2 removed the DATE/TIME fields from the final documents’ footer suggests that Guccifer 2 may have been aware that those fields in the RTF-formatted documents would leak critical metadata. It is also possible that Guccifer 2 did not want the date/time to appear in PDF files, should he communicate them to others, or should they in turn print the original .doc (RTF) files.
What is this Grizzly Doing in my Document?
As we saw above in the metadata tabulation for Guccifer 2’s Word documents, one of the documents (4.doc) had its Company name set to “Grizli777”, and its “last saved by” user as “user”; yet, in the source document the Company field was blank and the “last saved by” user was spelled as “User” (the “U” is capitalized). One researcher [@_fl01] was quick to notice this.
Mr. Wagner is right, Grizly777 shows up in bootleg copies of Microsoft Office [h/t Adam Carter].
We note in passing that any computer forensics expert who came up through the ranks, starting as a hacker in their misspent teen years, would have quickly noticed Grizli777 as an indication that the document may have been generated on a system where cracked software was installed. Although Wagner suggests that this cracked software is popular with Russians and Romanians, it is more accurate to say that cracked software is popular with hackers (and many others) worldwide. Nevertheless, a forensics expert might view this cracked software as an indication that the system where 4.doc was generated was used by a hacker, as Florian did.
Does Grizli777 Also Hack Elections?
Did Grizli777 give up cracking software and then take up hacking elections? Perhaps instead, this unlucky author added his “Company Name” to the cover page? Is he Russian or Romanian? It doesn’t seem so.
Guccifer 2’s 4.doc is an Outlier of Sorts
As we can see from the metadata, 4.doc is a bit of an outlier.
- It was created an hour earlier than the other four documents.
- The “last saved by” field was not changed to “Феликс Эдмундович” as it was for the other four documents. Rather, it was changed from “user” to “User” and the Company name was changed to “Grizli777”.
- The source document for 4.doc relates back to a document created during the Obama administration (2008). The “last printed” date from the original source document was preserved and appears in the final document. This helps confirm that some version of this particular document was in fact the same source document.
- Guccifer 1 disclosed (via TSG) the 4.doc source document (as a PDF with an enhanced color scheme and unusual font) back in 2013.
- This string, “CONFIDENTIAL DRAFT FOR REVIEW — 9/4/08” was removed from the source document page header; the word “SECRET” was added. See the comparison below.
- The original Guccifer 1 disclosure (2013) left the “CONFIDENTIAL DRAFT …” line intact and did not add “SECRET”.
- If we work from the point of view that insertion of the “Grizzli777” company name was deliberate, then this addition of “SECRET” to the page header (with no other changes) served as justification for an additional save operation. This save operation changed the Company name field to “Grizzli777” – leaving behind an indication that a cracked version of Word 2007 was used to generate 4.doc.
Follow the Russian Fingerprints – Error Messages in Cyrillic
Ars Technica noticed early on that there were Russian language error messages in the PDF file posted by Gawker. This file was generated when the document (1.doc) was printed to PDF. Here is an excerpt from Ars Technica (emphasis added to highlight the error messages).
Various researchers took a closer look at those error messages, trying to understand how they might have been generated. IVN reported on this, and had the following to say.
We can confirm that the URL in question is quite long by consulting the original source document and then floating the cursor over the offending hyperlink.
However, as we detail below, we do not think the bug in Word is triggered by long URL’s nor could we reproduce this error using Word 2010 and long URL’s.
IVN‘s claim above, that “the original leaked Trump dossier contains the same error message but written in English!” is incorrect. IVN is in fact referring to Guccifer 2’s doctored version (1.doc) of the original source document (which can be found in the Podesta email collection). The original source document does not contain this error (because the error is generated [we think] when the original source document is saved in RTF format, using Microsoft Word 2007). On this point, IVN is both confused and wrong.
If 1.doc is opened in Microsoft Word with default English language settings, IVN is correct that the error message will appear in English; the same is true if the document is then printed to PDF. However, if we enable Russian language settings in Word, we see the error message appears in its Russian equivalent.
Ars Technica suggests that the Russian error messages only appear in the generated PDF file.
Based upon our analysis, we think that Ars Technica saw the error messages in English when they opened 1.doc in Word, but they also saw the error messages in Cyrillic (Russian) when they viewed Gawker’s PDF. They concluded that the appearance of these Russian error messages had something to do with the process of printing to PDF. In a separate report , we will explain how/why Ars Technica reached this (incorrect) conclusion.
The Not So Common Hyperlink Errors
We return to something that IVN said, below.
We tried to find information on the Internet that might confirm IVN‘s specific claims above. We spent a substantial amount of time trying to replicate the error using the scenario described above. Our efforts were unsuccessful. In our experiments we tried opening the original source document (the 200 page Trump opposition report) [Wikileaks] in Word 2010 (unpatched) and then saving that file as an RTF file. We tried the same thing with Russian settings in force. We tried several other experiments. In no permutation that we tried were we able to generate the error messages.
That is obscure. Let’s take another look at the problematic URL. Bingo.
Enter the Grizzly
Word 2007? That is really old (and unsupported). It sounds familiar though.
Recall that 4.doc had metadata that referenced Grizli777. From this observation, we might conclude that Guccifer 2 used Office 2007.
One fly in the ointment is that 4.doc has the cracked Word 2007 indications, but 1.doc is the file that has embedded Russian error messages which were generated when its source document hit a bug in Word 2007. The altered metadata on [1-3,5].doc obscures indications of cracked software, if they would have otherwise been present. In spite of this divergence, our experiments confirm that Office 2007 was used to create 1.doc in its final form.
Is the Trump Opposition Report Special?
Did it not seem a little unusual for Guccifer 2 to choose the Trump opposition report as his first disclosure? Was it chosen for its URL’s with spaces in them and its ability to trigger the invalid hyperlink error when the source document was saved as an RTF file? Will we find other Word documents in the Podesta email attachments that contain these problematic URL’s (with %20’s in them)?
We searched the (2000+) Word (.docx) documents in the Podesta email collection and found only four documents with the problematic URL’s. They were all attached to a single email [Wikileaks].
Out of this set of four documents, only the Trump opposition report would have been noteworthy as a disclosure.
Desperately Seeking Ivan Inside an RTF File
We know that Guccifer 2’s first five Word documents are encoded as RTF files, yet he named them with .doc suffixes (the legacy Word document suffix). As we will show below, if we view 1.doc in a text editor we can spot the Russian error message text. This is a surprising result – most researchers (this one included) would not think it likely/possible that a .doc file would have Word error messages embedded inside it.
With a keen eye and steady nerves, we open up 1.doc in a text editor (RTF files are specially formatted text files) and hunt down the Russian error message.
Above, (in blue) we see the HYPERLINK field name; the following text is a hyperlink reference (here, simply “\h” – which makes it malformed/empty). In red, we see langnp1049. Today, if we run an Internet search on that string, we will see many pages saying that it is a “Russian fingerprint” that Guccifer 2 unwittingly left behind. If we pull up the RTF specification, we will find out that langnp <number> indicates that the following text (here, encoded as hex bytes) should be interpreted according to the language associated with <number>. That language (“1049”), in this case, is Russian.
The encoded text follows a \fldrslt control word, which was mentioned earlier in the section that discussed the DATE/TIME field. Its definition from the RTF specification is below.
We can extract the RTF encoded hex bytes and then convert them into a normal string of hex bytes (changing the codes for special characters into their ASCII hex encoding); then (using “xxd -r -p”) we convert that hex string into a series of binary bytes, which we then translate from the Russian “code page” (1251, as discussed earlier) into UTF-8 characters, as shown below.
That is our now familiar Cyrillic error message. Google confirms this, below.
Armed with our new knowledge that these Russian error messages are encoded in RTF syntax inside Guccifer 2’s 1.doc document, we can search for that error message in the other four documents disclosed by Guccifer 2 (and throw in the converted source documents).
As we suspected, only 1.doc has this embedded Cyrillic error message. Only the RTF file format will embed Russian error messages inside the file in this way. If the document had been in regular .doc format, there would be no Russian error message inside of it and no Russian language error message indications would have surfaced.
Based on the discussion above, we ask: Was 1.doc deliberately saved in RTF file format with the intent to disclose the “Russian fingerprints”?
How did those Russian Error Messages end up in 1.doc?
We think that (at least) the following conditions were necessary for the Russian error messages to end up inside 1.doc.
- A Word document (the “Trump opposition report”) was chosen as the source document; it is unique in its ability to trigger the invalid hyperlink errors and to have content that is somewhat relevant to the Trump campaign.
- Word 2007 was used.
- Russian language settings were set in both Windows and in Word
- After opening the original “Trump opposition report”, the user had to confirm twice that he understands that the document has problems and to direct Word to attempt a recovery.
- The Word document (.docx) was initially saved as an intermediate RTF file. When this intermediate RTF file is closed and re-opened, the Russian error messages will be displayed. They appear in Russian, because the current language environment is set to Russian.
- Text from this intermediate RTF file is then copied and pasted into an empty Word document. In Guccifer 2’s case, this empty document is a template document that has had its original body text removed.
- This new document with the copied material will be saved again as an RTF file. This final RTF file will become 1.doc; it will be populated with embedded Russian error messages (“Russian fingerprints”).
We demonstrate this progression, below. First, we open the original Trump opposition (.docx) document. We see that Word 2007 warns us that something appears to be amiss.
After saying “OK”, we are asked if we would like Word to attempt a fix?
After we say “Yes”, we can go to the end of the document to see if any error messages have been generated and to query the specifics of the problematic hyperlinks (we have not saved and closed the document, yet).
When we save this file in RTF format and then re-open it, the error messages will appear (in English here, due to our default US settings). Word 2007 does not like those empty URL links.
If we create a new empty document in Word and then simply copy/paste the text from the initially saved intermediate RTF file, then save this new document as an RTF file, we find that the error messages have been embedded into the resulting RTF file; they have replaced the “VIDEO” display string. This is shown below.
Above, in green, we see that “VIDEO” has been replaced with the invalid hyperlink error message. If we had selected Russian language settings, these error messages would have been the now familiar “Russian fingerprint” messages. The specifics for 1.doc may differ slightly from the steps shown above, but this example demonstrates the basic idea.