Metadata: The Ghosts Haunting e-Documents
Metadata is “data about data.”2 A lthough it sounds quite modern, one form of metadata is no doubt familiar to every lawyer: The “fax band” on a document received by facsimile that shows the time and date the fax was received, the number from which it came, and the number of pages sent. A fax band is metadata since it is data about data. And even this simple form of metadata may be important. It could show that a party’s claim that she did not receive a document on a certain date is incorrect.
Metadata is not new, but it has become pervasive in the digital world in which lawyers (and their clients) live. Many programs commonly used in the office create data about data and then save that unseen information along with the visible text of the document in a single file. Put simply, “invisible fax bands” commonly accompany many of the electronic documents we create on a daily basis. This unseen information is typically transferred along with the document in which it is embedded unless removed prior to transmission. Generally, each time a file is transmitted, the invisible fax bands are also sent.
But rather than simply revealing seemingly innocuous information, such as the time and date the file had been prepared, metadata often reveals much, much more. For example, many software programs permit an author to track changes to the text, to save multiple undoes in case the author later decides to undo revisions made long ago, or even to insert invisible comments into the file. Such data could reveal a wealth of information to recipients of the electronic file, potentially affecting significant negotiation positions, litigation strategies, and numerous other sensitive scenarios.
Recently, a lawyer relayed a story to one of the co-authors that demonstrates the risks of exchanging files with embedded data in negotiating a contract against a well-known software maker who, for purposes of this article, will be called “Mercer.” During negotiations, the lawyers for each side used a common word processing program, Microsoft Word, to edit and propose revisions to the contract, and they utilized the program’s track changes feature to allow the lawyers to see the specific changes proposed. They e-mailed the electronic draft, complete with embedded data, back and forth to each other between rounds of revisions. After receiving one such draft from Mercer’s counsel, the lawyer made a few easy mouse clicks to reveal, without using anything but Microsoft Word’s inherent functions, hidden internal comments from Mercer’s business personnel concerning terms of the contract, negotiating positions, and bottom-lines. Had Mercer subsequently insisted that a noncompete clause would be needed to close the deal, the opposing lawyer would have been able to tell if the demand was simply a negotiating ruse. Clearly, metadata is an important consideration in today’s legal environment.
This article explains how metadata is created and embedded in some popular programs and analyzes the ethical obligations to remove this embedded material from documents lawyers create on their clients’ behalf. Did Mercer’s lawyers, for example, violate duties to their client by sending embedded data along with the text of the contract to opposing counsel? This article also provides a number of useful tips on how lawyers can remove metadata from documents created in some of the more popular office programs and avoid situations similar to those suffered by Mercer in their own practice.
The final portion of this article analyzes the duties of a lawyer who receives a file containing embedded data that reveals confidential or privileged information of an opposing party. Is that lawyer bound by the same obligations that apply when documents in a misaddressed envelope are received or, conversely, is the lawyer free to use and review the embedded information?
The Purpose of Metadata
Software does not embed data into documents to cause disclosure of confidential information. While the type and amount of embedded data will vary based upon the particular program used, the primary function of metadata is utilitarian: It is designed to help users revise, organize, and access electronically-created files. Typical metadata includes, for example, information about the person who authored the document and the location (drive, folder) where the file was saved. In addition, a file can include metadata records of past revisions. As a result, one can examine changes that have been made to a file and compare them visually to any hand-written revisions to ensure that they have, in fact, been made. Thus, embedded data may serve useful and legitimate purposes.
Metadata in Microsoft Word
Microsoft Word is a “ubiquitous” software program.3 Lawyers everywhere commonly use it to create documents, and these files are regularly e-mailed in electronic form to clients, third parties, and opposing counsel. Unfortunately in some respects, embedded data is prevalent in Word, and the risk in electronically transferring sensitive metadata through Word files is substantial. The following illustrates embedded information typically found in Word documents:
• File Properties Information — Basic metadata in a pre-2007 version of Word can be seen by reviewing the different menus available.4 A key location is in the “Properties” subset menu, located within the “File” menu. The “Properties” for a particular document may reveal the author, creation dates, and other information. For example, this particular article (as of about halfway through the writing process) contained the following information under File/Properties:
The metadata on that single screen alone reveals that the file was created in August and was still being worked on in October 2005. It also reveals that the document was in its 44th revision (meaning it had been opened and closed 44 times) and had been edited for a total of 205 minutes.5 Had this document been work product for a client and had the author transmitted the file to the client in electronic form, the client would have been able to access this metadata to tell whether the lawyer had worked on the document for as long as indicated in the lawyer’s fee statement. If it had been a report prepared by an expert witness sent to opposing counsel, the attorney could have discerned how long the expert had spent drafting the report. If it had been a brief prepared by an undisclosed attorney and forwarded to opposing counsel, the author’s identity could have been revealed.6 Metadata matters.7
• Track Changes Feature —More troubling than the basic metadata found in the File/Properties screen is the other unseen data that can accompany a Word file. Foremost, “track changes” is a feature within Word that creates a record of every change made to a document. It has many uses: Lawyers who exchange drafts of contracts, as mentioned in the introduction, can turn on this feature to allow prior revisions of a proposed contract to be reviewed during negotiations; word processing personnel may enable “track changes” so that they can review and ensure that they have made each handwritten edit desired by a lawyer; and the list goes on.8
Complications may occur, however, when the author or editor of the document is unaware that the “track changes” feature is on. Such unawareness may be commonplace because, depending on the settings of the program, Word may not actually display the tracked changes on screen. In such a case, the user must enable a specific option to view those changes. For example, this paragraph was written with the track changes feature enabled.9 What you are reading now is the way the paragraph looked when we were finished editing it (i.e., even though “track changes” was turned on, Word did not reveal those tracked revisions on-screen.) Here, though, is what this paragraph looked like when the option to view tracked changes was enabled:
Someone who received the file by e-mail could easily reveal the changes and see the revisions shown above. If this document had been a contract instead of the present article, the metadata could have revealed to an opposing party the negotiator’s mental process in working through revisions previously made to key proposed terms.10 Such information could be valuable to the opposing party in formulating its strategy.
• Fast Saves Feature — Another form of embedded Word data is created by the use of “Fast Saves.” This feature enables the user to quickly save the document without having to take the time to perform a full save. However, fast saves only append the changes to the end of the document file rather than replacing the actual edited material. In other words, fast saved documents may retain information that the author believes was deleted. When fast saves are enabled, “deleted information remains hidden within the document.”11 Opposing counsel who receives a file that has been created with fast saves enabled can easily open the document and recover the previous revisions.12
• Comments Feature —Embedded data can also be found in Word documents in the form of “Comments.” The comments feature is incredibly useful for collaboration. Comments are embedded within the file and accompany it when it is e-mailed. Like track changes, the “comments” feature of Word can leave hidden data within an electronic document that may be valuable to opposing counsel.
• Versions Feature — A final example of a type of hidden metadata in Word is created by the software’s “Versions” feature. If versions is enabled, each time the file is saved, a new version is created and stored, leaving prior versions of the document intact.13 Once again, if the file is transmitted to an opposing party, he or she could review every prior version of the document to see what changes had been made.14
The Duty to Avoid Disclosing Embedded Confidential Information
All of the listed features from Word are useful to lawyers or their word processing personnel. All need to be aware, however, that these tools embed hidden data within the file. Further, they need to recognize that their word processing staff may enable certain features without the lawyer’s knowledge.15 For example, if a lawyer is unaware that her secretary had enabled “Track Changes,” and if the secretary failed to appreciate the problems created by transmitting the file with track changes still embedded, then it could cause a problem.
Risk of unintended disclosure has, of course, always existed, just in a different form. Not too long ago, the primary risk was that a letter intended for a client would instead be mailed to opposing counsel.16 Similarly, a lawyer might have made handwritten comments on a contract proposal drafted by the other side, and, though intending to forward the document to the client for review, inadvertently mailed or faxed it to opposing counsel.
In the digital age, however, new methods for creating, editing, and transmitting documents have increased the risk of unintended disclosures. As discussed above, electronic files may now reveal more information than drafts from the past — they “can reveal a cache of information, including the names of everyone who has worked on. . . a specific document, text and comments that have been deleted, and different drafts of the document.”17 Because of the inherent dangers involved with transmitting metadata, it is important to discuss what professional duties lawyers owe their clients to safeguard this information.
To aid this discussion, we emphasize the distinction between confidential information, that which a lawyer has a professional duty to keep in confidence, and information that is privileged/. The attorney-client privilege protects against forced disclosure of communications between lawyer and client.18 The privilege is a qualified one, however, because only confidential communications between the attorney and client are protected. The privilege does not apply to information learned by the lawyer from third parties or even to the lawyer’s conversations with the client conducted in the presence of others.19
While the attorney-client privilege protects against the forced disclosure of communications, lawyers themselves are restricted from disclosing “confidential information” unless authorized to do so by their client or by judicial authority. The confidential information covered by this duty is far broader than that covered by the attorney-client privilege, encompassing “all information relating to the representation, whatever its source.”20 Given this broad definition, there is a substantial risk that metadata transmitted by an attorney to a third party will contain confidential information. Accordingly, a lawyer who knows a document contains embedded information generally has a duty to remove it before transmission.
But what about a lawyer who unknowingly transmits a document with embedded confidential information? Has that lawyer violated the duty of confidentiality? Some may argue that because “everyone knows” about metadata, any lawyer who fails to remove hidden confidential information has breached his or her professional duty.21 In the authors’ experience, though, the opposite is true: The vast majority of attorneys canvassed about this issue had never heard of metadata, let alone understood how to deal with it. In further support of this less-than-scientific observation, documents that contain embedded data have routinely shown up on the Web — some were even posted by large firm lawyers who ostensibly should be the most educated about embedded data.
In any event, the existence of metadata and the dangers it presents for unintended disclosure are becoming more widely known and appreciated. Lawyers will soon, if the time has not already arrived, be unable to avoid negligence claims or defend against bar complaints by pleading ignorance of the risks that embedded information creates. Attorneys should make every effort to prevent transmission of confidential information. A few simple methods to aid in this effort are detailed in the following section.
How to Avoid Creating Embedded Data and How to Remove It
Several approaches addressing the prevention of inadvertent transmission of metadata are available. This section provides a brief summary of these methods.22
• Avoid Creating Embedded Data —Obviously, the easiest way to avoid disclosure of embedded confidential information is not to create it in the first place. But, simply saving information to a hard drive or networked drive may retain information about the computer or network to which it is linked. To ensure that this location is not included in any file sent to a third party, attorneys should re-save each document to a floppy disk, the desktop, or to a flash drive using “Save As” and sending this copy to opposing counsel.
Beyond this simple tip, Microsoft and other developers have recognized the importance of maintaining the confidentiality of metadata in certain situations and have, in response, provided users with in-program options allowing them to alter the types and amount of embedded information that will be stored in their documents. The following describes simple measures lawyers can take to avoid creating or to limit the creation of embedded data when using some of the more commonly used office programs:
Microsoft Word 2003 —Under the “Tools” menu, select “Options” and click on the “Security” tab. The resulting dialog box allows the user to encrypt the file, edit privacy options, and change the level of macro security. Checking the box “remove personal information from file properties on save” prevents the personal information associated with your computer, network, or registration information from attaching to the document. Thus, this option should be selected when the lawyer works on any potentially sensitive documents in Word that may be transmitted to outside parties.
Other information, such as the author of the document, contained in the “Summary” tab under “Properties” within the “File” menu, may also be considered sensitive and inappropriate for opposing counsel to view. The lawyer can remove any of the offending information from the document by simply deleting the entries in the text boxes and clicking “OK” to save his or her revisions.23
As noted, use of the “Fast Saves” feature of Word can leave hidden data in the document. To turn off fast saves, go to the “Tools” menu, select “Options,” and click on the “Save” tab. Under the “Save” tab, ensure that the “allow fast saves” box is not selected.24
As also noted, Word allows users to save multiple versions of the same document, thus, increasing the unintended disclosure of information contained in earlier versions. To determine whether any older versions of a file exist, go to the “File” menu and click on “Versions.” Any old versions attached to the document will be listed by the date/time and creator of the saved version. To remove a version, simply click on the offending entry and select delete.25
Microsoft PowerPoint 2003 — Similar to Word, Microsoft PowerPoint will track, via normally hidden metadata, personal information such as the identity of the author of the document. To remove this metadata from a PowerPoint file, go to the “Tools” menu and select “Options.” Under the “Security” tab, ensure that “remove personal information from file properties on save” is checked.26 To delete the user name and initials associated with the file, click on the “General” tab in this same submenu. From here, the user can simply highlight and delete the unwanted information.27
Finally, it is important to note that PowerPoint documents often contain embedded files from other programs which may, in turn, contain their own metadata. To ensure that the embedded objects are metadata free, right click the object to be embedded and select “cut.” From there, select the desired slide, go to the “Edit” menu and select “Paste Special.”28 This newly created image will be free from sensitive information concerning its source.
Microsoft Excel 2003 — Many of the same processes used to eliminate metadata from Word and PowerPoint files can also be used to eliminate personal data from Microsoft Excel. However, Excel presents several unique methods for retrieving personal data that attorneys should be aware of prior to sending workbook files to opposing counsel. For instance, in Excel, users have the ability to hide individual rows, or columns of cells from view. To view these hidden cells, hit Ctrl+Shift+Space Bar to select all of the cells in the workbook, then go to the “Format” menu and find the submenu for “Row.” Under this submenu, select “Unhide.” Repeat this process for the “Column” and “Sheet” submenus. This should make all hidden cells and sheets visible and capable of being deleted if the information contained therein is found to be confidential.29
Excel users can also link formulas between multiple workbooks. Though a useful tool, these formulas may contain metadata concerning the documents to which they are linked. To remove this potentially sensitive data, highlight the linking formula, right click, and select “Copy” following this, go to the “Edit” menu and click “Paste Special,” select “Values,” and click “OK.” Note that this will result in the formula being deleted from the document; however, the resulting data will remain in the workbook.30
• Removing Embedded Data Before Transmitting — While the above methods can help reduce the amount of metadata created and stored in electronic files, attorneys should also consider taking additional precautions to remove any other embedded information that has made its way into a file before transmission.31 There are a number of methods to accomplish this task. Because reasonable care is necessary to satisfy the lawyer’s duty of confidentiality, the nature of the communication at issue will indicate what steps are required for particular communications or practices.
Large software makers know about the problems that unintentional transmission of metadata can create for lawyers and have updated their programs with additional functionality to avoid creating, avoid transmitting, and to help find and remove this embedded data. Microsoft created a free add-in, which can be downloaded from the company’s Web site. It is designed to eliminate most sensitive information from documents created in Microsoft Office programs, even when the document was drafted with a metadata-creating feature turned on.32 (Remember: Metadata has utility!) Many lawyers still use the pre-2007 version of the Microsoft Office Suite. Installation of this add-in will create an additional option to “Remove Hidden Data” within the “File” menu in your Microsoft Office programs. After selecting this option, the user will be asked to enter a file name for what will become the clean version of the document. Once a name is provided, the user will click next to start the scan.
When the scan is complete, a text file will open that contains a summary of the scanning results.33 The end result is an effective, easy, and free solution to the problem of metadata transmission via Microsoft Office documents — you just have to remember to use it!
Saving a document in Portable Document Format (pdf) will also reduce the amount of metadata stored in the file. But this process does not eliminate metadata entirely.34 For many purposes, simply saving a document in pdf format may suffice. But pdf files often cannot be easily modified, thereby reducing the efficiency and functionality of document exchanges.
Additionally, several commercial software “scrubbers” are available for purchase.35 While these programs have differing degrees of functionality and integration with other software (such as Microsoft Outlook), they can all be used to scan files before they are transmitted and remove the embedded metadata.
Microsoft Office 2007 —The newest edition of the Microsoft Office suite of applications has responded to user demand for metadata removal by including its own “Document Inspector” and removal tool in Microsoft Word, Excel, and PowerPoint 2007. Similar to the 2003 add-in, this document inspection tool is a quick and easy solution to the problem of metadata removal. To remove the unwanted information, first click on the Microsoft Office button and select “Prepare” from the drop down menu.36 After highlighting “Prepare,” select “Inspect Document” from the menu that appears to the right.37
The document inspector window will open, at which point you will have the opportunity to deselect the types of metadata which you would like to avoid including in the metadata document search. After making this determination, press the “Inspect” button.38
Following inspection, the following window will appear, revealing which categories of metadata are present in your document and giving you the opportunity to individually remove each category.
Categories of metadata that are present within the document will appear with a red exclamation mark to the left of the category description and a “Remove All” button to the right. Simply press the “Remove All” button to delete that category of metadata from the document. It is important to note that once metadata is removed in this matter, or using the add-in tool for Microsoft Office 2007, it cannot be retrieved. Any important metadata that you do not wish to lose should be saved under a separate file name prior to its removal and kept for your own reference.39
• Unintended Disclosure Agreements —A final, less technical way to avoid problems with embedded data is to have an agreement in place with opposing counsel by which the parties acknowledge beforehand that any transmission of confidential embedded data is unintentional and that any documents identified as containing such information should be deleted. Obviously, the efficacy of this option depends upon the trust of counsel, and where the mere viewing of the information would “let the cat out of the bag,” such agreements may be insufficient. In the final analysis, ensuring that embedded data is not created, or ensuring that it is stripped out before a file is sent, will normally be the only effective way to address the problems of embedded data.
Ethical Obligations of the Witting and Unwitting Recipient
Given that metadata is a relatively new concern for lawyers, it is not surprising that formal ethical rules do not yet address whether it is proper for a lawyer to search an electronic file sent by opposing counsel for embedded data. However, like most states, Florida has a general catch-all rule that prohibits “professional conduct involving dishonesty, fraud, deceit or misrepresentation.”40 Florida’s Professional Ethics Committee took this rule under consideration in Ethics Opinion 06-2, further clarifying that a lawyer may not, upon receipt of an electronic document from opposing counsel, “try to obtain from metadata information relating to the representation of the sender’s client that the recipient knows or should know is not intended” to be read by him or her.41 In the event of an inadvertent disclosure of confidential information through exposure to unforeseen metadata, the committee’s opinion indicates that the attorney must “promptly notify the sender” of the disclosure.42
Conclusion
Unlike most states still trapped in the morass, Florida’s Professional Ethics Committee has taken a proactive stance in their treatment of metadata and the ethical problems it presents. Hopefully, this article has educated the reader about what metadata is and how the lawyer should treat confidential embedded information, including some easy-to-use methods to reduce or eliminate metadata from documents created in the more popular office programs.
1 This is based upon an earlier version of an article from the 13 Georgia Bar Journal (No. 5, February 2008). We gratefully acknowledge their permission.
2 Definition of metadata, http://wordnet.princeton.edu/perl/webwn?s=metadata (last visited Dec. 4, 2007).
3 Andrew Beckerman-Rodau, Ethical Risks from the Use of Technology, 31 Rutgers Computer & Tech. L.J. 1, 32 (2004); Brian D. Zall, Metadata: Hidden Information in Microsoft Word Documents and Its Ethical Implications, 33 Colo. Law. 53 (Oct. 2004) (describing legal profession’s widespread adoption of Microsoft Word).
4 Interestingly, most metadata is stored in the last blank space of a Word document. If, for example, you select all of a Word document except its last space (which will appear to be blank) and then copy and paste that material into a new Word document, most metadata will not follow along. For a more technical discussion of how metadata is embedded in a Word document, see Zall, 33 Colo. Law. at 54.
5 To be clear, the file could simply have been open on the screen for 205 minutes. Thus, the amount of time indicated does not necessarily mean that the file was being worked on for all of those 205 minutes.
6 The kind and amount of information stored in the “Properties” file can be customized. To see whether your version has been customized, click on the “Custom” tab at the top of the “Properties” dialog box.
7 There are a number of other sources of metadata. For example, other tabs in the “Properties” dialog box depicted show where the file was stored on the author’s hard drive and other information.
8 See generally, James Veach, Commutation Agreements: Drafting a Clear and Comprehensive Contract, 854 PLI/Comm 43 (2003) (noting that track changes can be used to aid in the drafting process).
9 To turn on “Track Changes,” go to the “Tools” menu and to “Track Changes.” To see whether an open document contains tracked changes, turn on track changes and then ensure that you have selected the “Final Showing Markup” on the “Review” toolbar that appears.
10 See Zall, 33 Colo. Law. at 55-56 (collecting hypotheticals on how metadata could harm clients and lawyers when transmitted to opposing counsel).
11 Toby Brown, Special Handling: How Paper and Electronic Files Differ, 21 GPSolo 22, 23 (Sept. 2004). This is done by selecting “Save As” from the “File” menu, then selecting “Tools” and then “Save Options.” One option is “Allow Fast Saves.” Fast saves are “very useful in the event of hardware failure because it reduces the chance of losing changes to a document.” Id.
12 Documents that contain sensitive information should have fast saves disabled to prevent inadvertent disclosure. If fast saves are enabled, the illogical order of the saved information may result in a failure to completely delete the confidential information despite your best efforts to do so. Opening the Word document in a text editor (such as Windows Notepad) will typically reveal the previously “deleted” text.Word Document That Is Opened in Text Editor Displays Deleted Text, http://support.microsoft.com/kb/287081/EN-US/.
13 To view the previous versions attached to a document in Microsoft Word, simply click on “File” and select “Versions” from the drop down menu.
14 See Zall, 33 Colo. Law. at 54-55.
15 Florida Rule of Professional Conduct 4-5.3(b)(2) requires lawyers with direct supervisory authority over a nonlawyer to “make reasonable efforts to ensure that the person’s conduct is compatible with the professional obligations of the lawyer[.]”
16 See generally, Am. B. Ass’n. Formal Eth. Op. 92-368 (1992) (describing such scenarios).
17 Jason Krause, Hidden Agendas, 90 Am. B. Ass’n. J. 26 (July 2004).
18 See Bryant v. State, 651 S.E.2d 718, 725 (Ga. 2007).
19 Id.
20 Comment to Florida Rule of Professional Conduct 4-1.6.
21 For example, Vincent Polley, then-chair of the ABA’s Cyberspace Law Committee, has been quoted as saying that lawyers can no longer “plead ignorance when it comes to this stuff any more.” Krause, 90 Am. B. Ass’n. J. at 26.
22 See generally, Carole Levitt & Mark Rosch, Making Metadata Control Part of a Firm’s Risk Management, 28 L.A. Law. 40 (Mar. 2005) (describing various means to remove metadata, including some of those discussed here); Storm Evans, How to Commit Malpractice With a Computer, 29 Law Pract. Mgmt. 56 (Mar. 2003) (“If you must e-mail or otherwise deliver a Word document, consider using macros or a utility program to strip away the metadata.”).
23 How to Minimize Metadata in Word 2003, http://support.microsoft.com/kb/825576/.
24 For more information on “Fast Saves,” see Frequently Asked Questions About “Allow Fast Saves,” http://support.microsoft.com/kb/291181/.
25 How to Minimize Metadata in Word 2003, http://support.microsoft.com/kb/825576/.
26 How to Minimize the Amount of Metadata in Powerpoint 2002 Presentations, http://support.microsoft.com/default.aspx?scid=kb;EN-US;314800.
27 Id.
28 Id.
29 How to Minimize Metadata in Microsoft Excel Workbooks, http://support.microsoft.com/default.aspx?scid=kb;EN-US;223789.
30 Id.
31 See Beckerman-Rodau, 31 Rutgers Computer & Tech. L.J. at 32-33 (suggesting that lawyers should consider removing metadata); Gerald J. Hoenig, Technology Property, 18 Probate & Prop. 51 (Sept. 2004) (same).
32 To download this add-in, see http://www.microsoft.com/downloads and search for “remove hidden data.”
33 Control Metadata in Your Legal Documents,http://office.microsoft.com/en-us/help/HA011400341033.aspx.
34 See Jason Krause, Guarding the Cyberfort, 39 Ark. Law. 24, 31 (2004). Suggestions, however, that pdf files contain no metadata are incorrect. See, e.g., Hoenig, 18 Probate & Prop. 51; Ronald E. Mallen & Jefrey M. Smith 1 Legal Malpractice §2.26 (2005) (stating that conversion from Word to pdf “could eliminate meta-data information”). However, most pdf files do contain some metadata; thus, converting a Word file to a pdf file will simply result in different metadata being transmitted.
35 Numerous “scrubbers” can be found through Google, simply searching for “metadata” and “scrubber.”
36 Remove Hidden Data and Personal Information from Office Documents, http://office.microsoft.com/en-us/help/HA100375931033.aspx.
37 Id.
38 Id.
39 Id.
40 Fla. R. Prof. Conduct 4-8.4(c).
41 Florida Prof. Eth. Comm. Op. 06-2 (Sept. 15, 2006).
42 Id.