Document metadata is information attached to a file that may not be visible on the face of the document; documents may also contain supporting elements such as graphic images, photographs, tables and charts, each of which can have its own metadata. Metadata summarises basic information about data, which can make finding and working with particular instances of data easier. Having the ability to filter through that metadata makes it much easier for someone to locate a specific document or other data asset in a variety of different ways.
With this in mind, I wanted to view what type of information that can actually be stored within a document which isn’t visible to the user.
Stumbling on a program called ‘FOCA’ (Fingerprinting Organizations with Collected Archives) developed by Eleven Paths allowed me to view, in detail, all hidden information within documents. The program itself is “free for use in any environment, including but not necessarily limited to: personal, academic, commercial, government, business, non-profit, and for-profit” and can be downloaded here.
FOCA has the ability to trawl for popular search engines such as www.google.com, www.bing.com and www.exalead.com for documents from a particular domain. Once found FOCA allows you to download said documents for it to then to perform an analysis, providing all available metadata for the user to view. FOCA can download the most common types of documents from .doc, .pdf to less well known file types such as .rdp, .ica.
Once downloaded, FOCA can provide a ‘Metadata’ summary of the extracted information. The information can include, Users, Folders, Printers, Software (with Version), Emails, Operating Systems, Passwords and Servers.
Opening FOCA in a virtual machine I first created a project. Within the project I provided FOCA with the domain of which I wanted it to search for all related documentation. For this example I choose to use ee.co.uk.
Once created I then navigated to ‘Metadata’ in the left hand panel. I then stipulated what search engines to search for including Google, Bing & Exalead and selected the extensions I wanted the program to search for. For this demonstration I only selected the most common file types including .doc, .docx, .pdf, .xls, xlsx. Once selected I then clicked ‘Search All’.
The search can take anywhere between 10-30 minutes depending upon the connection speed and the scope of the search. Once all the relevant documents have been found right clicking a file and clicking “Download All” will download all found files to the local system.
Once downloaded right clicking any file and selecting ‘Extract All Metadata’ will start the search for all available metadata.
Once extracted the metadata will be summarised by FOCA depending upon the type of information extracted. As it can be seen from the below output data types such as Users, Folders, Printers, Software (including Versions), Emails and Operating Systems can be viewed in this search.
Below are some screenshots of the findings from a search of ee.co.uk
FOCA is not able to crawl search engines for images but images can be directly inputted into FOCA once downloaded onto the users machine. Images can contain information such as GPS locations & make and model of the device used to take the picture.
Information such as this can allow an attacker to gather critical information about an organisation, allowing them to launch more specific and targeted attacks. Using the information provided within metadata can help in social engineering attacks as specific users can be targeted. Other attacks can include specific vulnerability attacks on applications based off their versions.
All documents before being released outside the organisation need to be cleaned. Microsoft Office provides inbuilt functionality in its Office suite which allow users to ‘clean’ documents before releasing. Below in Office 2010 clicking File > Info > Check for Issues > Inspect Document will allow the user to check the document for all associated metadata attached to the document and will give the user the option to clean the document before releasing.
Below the document was inspected and a number of properties were found including Document properties and Author. Selecting ‘Remove All’ will remove said data.
https://www.elevenpaths.com/labstools/foca/index.html (Download FOCA)
https://www.youtube.com/watch?v=WiblI9fiQQQ (great video by one of the developers of FOCA at DEFCON)
https://www.youtube.com/watch?v=XVjZEijbekw (another great video by the same developer of FOCA at DEFCON)