The other day I was on location with a customer, and as they are a secure site I was not allowed to bring my laptop any further than the security locker at the font desk. This I don’t mind as I would usually just leave the laptop at home, but this time it really was a bind, I needed my laptop on the desk in front of me with the customer beside it.
Rather than shout and scream about their security, or dragging the customer out of his office to a new meeting location with WiFi, I decided to try to work my way around the problems I encountered using the customers laptop.
The biggest problem I faced was a change in font that the customer wanted for their menu. They had decided that (over night!) they wanted it to change from Arial to Calibri, as “most people have that font” thought the customer. While that is often true when talking about Windows based system, it’s not so true when talking about other things such as mobile phones, Mac laptops/desktops or anything that gets a prefix of i in it’s name. Calibri is just not installed by default on any of the operating systems of the world, it’s only added to the system with additional product installs, it just so happens that Microsoft often package it up in free things like the PowerPoint viewer.
So, the customer needed Calibri right then and there, I knew I had the rights to it and could implement it on the site, but did not have the TTF (true type font) file to make it happen as that was on my laptop safely locked away by security.
Of course I tried the standard searches through Google, with keywords such as “calibri font download” only to find that no matter how far I looked, I could not get this font. There was just page after page of useless links with no possibility to download anything. The closest I came was a Russian site with Calibri bold and italic only as separate TTF files, no regular font which was the one needed. These were not the fonts I was looking for.
The thing is, Google is great, it searches every page and file it is allowed to find on the web. And nicely for us they have decided to make it a pretty open system where you can perform very advanced search queries. So I though I would take advantage of that. Instead of searching through all of the HTML files on the web for the font file, I decided to look directly for the font file it’s self.
If you know anything about web servers, you will know that they give the possibility to have directory browsing turned on, something which is incidentally very insecure for your files as you are just about to see. When directory browsing a website, the web server uses the term “index of” in the title to describe the current location. i.e. index of /home/andys-stuff
What we will be doing with the search is looking at all of the servers that have directory browsing turned on, and looking for a certain file type, in this case ttf, and lastly putting in our keyword for searching, in this case Calibri. Here is the search command that you put straight in to the Google search bar.
intitle:”index.of” (ttf) CALIBRI
Search Result →
What it is saying is “in the title, look for the words index of and return only results that file the type ttf with the keyword search of calibri”. Which it puts the file I wanted inside the first result at the top of the list. As you can see, Google takes no notice of what case you use for your search term, upper or lower case is just fine.
After downloading, I just popped it into font squirrel, and applied the results to the site. The customer was super happy.
Of course this type of search can be taken further. If for example I was looking for music I could search for many file types at the same time, such as MP3, WAV, AAC, FLAC in just the same way as I looked for the font.
intitle:”index.of” (mp3|wav|aac|flac) please.please.me
Search Result →
This time I am looking for The Beatles “please please me”, notice that I have put dots between the search keywords to make them work properly. It’s the same for all multi-word searches using this method. Also this time I am looking for many different file types in the search and so have to use a ” | ” (pipe symbol if you are unix or vertical line if your anybody else) between each file type. The full album is at the top of the list, so if you want it in mp3, you have it. Ironically, dramatically faster download speeds that any P2P network, if your in to that kind of downloading.
We could take this a stage further if we wanted to, by ignoring certain content types. We could ignore things like html, php or asp as we want to focus on directory browsing. This reduces the clutter on the search results. This tends to be useful when searching for high value content such as movies or television programs to download.
intitle:”index.of” (m2ts|iso|mp4|avi|mkv|vob) cowboys.and.aliens -html -php -asp -js -css -htm
Search Result →
This time the file types looked for are m2ts, mp4, avi, iso, vob and mkv, the film wanted is “Cowboys and Aliens”, and I do not want to see in the results html, htm, asp, php, js and css files listed. In this occasion the search results. Ok, so it takes a little while to find a full movie version as it’s the third result.
Although Google gives us all this access, you have to look at your local countries laws before downloading anything. For example, here in Holland you are within your rights to download anything of the internet, but not allowed to upload copyrighted materials that you do not own. Where as in the States, downloading copyrighted materials that you don’t own is an offense.
The massive irony here is that P2P torrent sites such as The Pirate Bay are constantly under attack from the law for offering the same level of access to files that they do not host themselves, but I’ve never herd of Google facing such efforts for giving even more direct access to the same things with a method that is almost impossible to monitor.
If you ever want to stop Google from tracking your sites and the files in there, do make sure that Directory Browsing is turned off. It does not matter what web server you have, iis, apache or others, just make sure it is turned off. If you can’t turn off directory browsing for what ever reason, make sure that you pop an index.htm file (or other default loaded file type for your server) in the directory, the server will show the contents of that html file rather than the directory listing.
Hackers can use this too!
There is *nothing* to stop a hacker using this same capability from Google. They could search for example for php files and download them to be able to edit them. For example, I did a small search for “database” and wanted php files. I got a response that included a sites called mit.edu, princeton.edu, trinity.edu . . . . .
intitle:”index.of” (php) database
Search Result →
from any of the directory listings I can download any file there, read it and possibly workout the where the database server is, what it is called, the port it uses, the login name, the password. Pretty much with that, any half-assed hacker can get forward in to the site.
Focusing on 1 site or top level domain
If you wanted to, you can also refine the search so that it only looks in one top level domain or site, you can also use the site: command at the beginning of the search terms. i.e.
This command would search all of the edu sites for a pdf about algebra.
site:.edu intitle:”index.of” (pdf) algebra
Search Result →
Or, this would only search mit.edu for pdf’s about algebra.
site:mit.edu intitle:”index.of” (pdf) algebra
Search Result →
You can use this same tool to search for any file type with any given name for the directory or the file. What does not happen with this search is, it does not search the contents of files for the search terms, it only searches by file name.
Secondly and most importantly, this really relies on people *stupid enough* to leave directory browsing turned on. As you can see from the search results, this is countless millions of people, just make sure you are not one of them.
If you want to extend the search to other servers or conditions, you can replace “index.of” with “contents.of” and more likely receive results from IIS.