LBi: online brand positioning [link:LBi Home Page]
Home  |  About Us  |  Working With You  |  Our Services  |  Our News  |  Blog  |  Contact Us

Search



Archive


Tag Cloud

acquisitions antitrust ask browsers canonical christmas google international keywords language marketing microsoft online advertising pagerank personalised search redirects research search spam yahoo!


Subscribe

If you would like to be alerted when someone posts to the blog please enter your email address below.




RSS 2.0 Feed

Blog RSS feed

Can PDF, Flash and MS Office documents have PageRank?

Posted on 4th November 2009 at 4:05 pm by Ian Macfarlane

The question today is - does Google assign PageRank to non-HTML files such as PDF files, Word documents or Flash files? Here is the definite answer.

Skip to start of post


Introduction

PageRank is just one of the many algorithms that Google uses to rank web pages. However, it is definitely the most well known and, due to the Google Toolbar, one of the most visible.

PageRank originally applied only to web pages, and not other types of files such as Adobe PDF files or Microsoft Office documents. However, Google has indexed these types of files for a long time now, so it would make perfect sense for Google to try and treat these in a similar way to web pages.

A caveat regarding the robots exclusion protocol

As with any test, it is important to ensure that there are no external factors which could affect the results. In this particular case, the Robots Exclusion Protocol is one such factor.

This quote from Matt Cutts sums the issue up nicely:

“a page that is blocked by robots.txt can still accrue PageRank. In the old days, ebay.com blocked Google in robots.txt, but we still wanted to be able to return ebay.com for the query [ebay], so uncrawled urls can accumulate PageRank and be shown in our search results.”

This means that we have to be careful to ensure that any files which we check are not blocked by robots.txt – rather than the non-HTML file itself having PageRank, it could simply be that the URL is blocked by robots.txt. To be sure that Google really does assign PageRank to a particular type of file we have to ensure that it is not blocked by robots.txt.

Note: Although the quote above applies to robots.txt, we have also checked that the files do not have an X-Robots-Tag HTTP header.

What types of files does Google index?

If you go to Google’s Advanced Search page, Google provides options to search for files in a number of formats:

Google Advanced Search supported file types

Google also has a list of supported file types on its file types FAQ page.

Note: We are not going to do an exhaustive list of different file types in this post, but the above list is a good place to start. Also note that we have not looked at images or videos, which have their own Google search verticals.

How we looked for non-HTML files to test

To find non-HTML files which might have PageRank as quickly as possible we used Google’s filetype: operator. We used this operator on its own, rather than combining it with a search query. For example, to search for PDF files we used the query [filetype:pdf].

Note that Google’s filetype: operator isn’t perfect – for example, it will return normal web pages ending with the same extension (for example, here’s a web page with a URL ending with .doc). Therefore, we also have to check each URL to make sure it’s actually the type of file we are looking for.

Results

Note that we are not interested in how high or low the PageRank scores are - what we are looking for here is simply whether they have any PageRank or not.

Adobe Portable Document Format (.pdf)

http://www.deetonline.org/brochure.pdf

PageRank 4 (PageRank 4)

Microsoft Word documents (.doc)

http://www.wvnn.com/privacy_policy.doc

PageRank 4 (PageRank 4)

Flash files (.swf)

http://www.uclalive.org/ucla_live_event_news.swf

PageRank 6 (PageRank 6)

Excel spreadsheets (.xls)

http://www.post.ch/pm_dp_jahresplan.xls

PageRank 3 (PageRank 3)

Plain text files (.txt)

http://www.rarlab.com/themes_new.txt

PageRank 5 (PageRank 5)

We also wanted to check whether Google gives PageRank to file types which aren’t on the list, so we checked a few additional file types:

Microsoft Word 2007 documents (.docx)

http://www.antor.com/EUROPEAN_TRADE_AND_CONSUMER_SHOWS_CALENDAR_2009.docx

PageRank 4 (PageRank 4)

"Comma-separated values" files (.csv)

(a format used for spreadsheets and storing data)

http://www.edeltutiyama.com/hayami2008.csv

PageRank 1 (PageRank 1)

Conclusion

Our research has shown that Google PageRank does not just apply to web pages – it also applies to a range of other documents.

Please note that proving that PageRank applies to the file types examined above only shows that it applies to these particular file types – to be absolutely certain that PageRank applies to a particular file type not listed above, you’d have to check it in the same way.

   

File under: google research flash pagerank

Permalink

Comments

No-one has commented so far, or all comments are awaiting moderation.

Post Your Comment

Name*:

Email* (will never be shown):

Website:

Comment*:

Subscribe

If you would like to be alerted when there are new comments to read please enter your email address below.

RSS 2.0 Feed

Comments RSS feed


« What is PageRank? Google defaults on personalisation »
CONTACT US | TERMS & CONDITIONS | SITE MAP
©LBi. All rights reserved 2000-2010