VOLUME 16 (2012): ISSUE 1. PAPER 3
Universitat Politècnica de València
The number of pages in a website is an indicator (related to its activity) widely used in cybermetric analysis. This indicator can be disaggregated by type of content and file type. In this sense, a gap in the literature about the treatment and quantitative analysis of multimedia files, graphics and type blog is detected, and particularly in their presence and distribution in the academic environment. This paper proposes a diachronic analysis in 2010 of media and graphic files count, and blog-like content for all websites which conforms the Spanish university space. Among the key findings, a very high percentage of blog-like content and image files are detected, which contrasts with the very low figures obtained for multimedia files. Otherwise, diverse limitations in image searchers used are found (coverage, variations between samples, instability and discrepancies between the calculation of global and file format counts), which call for a careful interpretation of the raw results obtained. Finally, a correlation between Bing images and Google images higher than expected (limited by a small set of URLs), and a sharp decrease on Bing coverage during the study period is obtained.
Spain; Universities; Higher education institutions; Spanish university system; Graphic files; Multimedia files; Blog contents; Image search, Webometrics; Cybermetrics; Quantitative webometrics; World wide web; www; Bing; Google, Visibility
The main indicators used for cybermetric purposes can be broadly classified into the following categories (Aguillo, 2009):
Within the topic area dedicated to the study of academic web-spaces (universities, and other higher education institutions), the latter type of indicators is rarely used due to accessibility problems of certain indicators (ie, web traffic); for that reason, indicators related with size and visibility are most commonly used (Aguillo et al., 2006; Orduña-Malea et al., 2010).
Focusing on size indicators, we can identify mainly the following two types (Aguillo, 2000):
The site size can be classified into global count or specific count (that is, a measure according to the type of nature/format/file that a web unit publishes). Some important specific counts according to the file are: office files (such as DOC, PPT or PDF), graphic files (such as JPG, PNG or BMP), web files (such as ASP, PHP or HTML, etc.), and multimedia (such as MPG, WMV, MOV, etc.). These files can contain contents in one or more different formats (such as blog-like content, books, journals, papers, etc.), which can have one or different nature or purposes (academic, teaching, informative, etc.).
Of all these indicators, the most commonly treated are the global count (Aguillo, Ortega & Fernandez, 2008), academic specific count (Orduna-Malea et al., 2009), and within specific count files, the so-called rich files. The latter are important because many of them are entire papers or other scientific documents (Aguillo, 2009). So they are good indicators of academic published information (Kousha & Thelwall, 2008; Kousha, Thelwall & Rezaie, 2010).
However, within the set of existing files, other formats are poorly treated from a quantitative point of view (but important due to the amount generated), although they do not express directly the functional activities of the university. These include, apart from purely web files (HTML, PHP, ASP, etc.), blog content type, multimedia files, and graphics files.
The value of the blog content type as a communication tool for social institutions (including universities) is well established today (Goodfellow & Graham, 2007), and the existence of blog search engines (Thelwall & Hasler, 2007) has allowed the quantitative analysis from the discipline of cybermetrics (Thelwall, 2009), although primarily aimed at studying the spread of ideas (Thelwall & Price, 2006), seeking points of views (Thelwall, 2007), interests of certain groups (Thelwall & Prabowo, 2007), analysis of general issues (called "blog issue analysis": Thelwall & Wilkinson, 2010) or even emotions (Thelwall et al., 2010).
The analysis of multimedia files from a quantitative point of view is still very uncommon (almost nonexistent in academic environments). The few identified studies focus on the analysis of user-generated content on Youtube-like platforms (Cha et al., 2009; Thelwall, South & Vis, 2012; Kousha, Thelwall & Abdoli, forthcoming). The availability of access to Youtube’s API suggests that this type of analysis can be further developed in the future.
In the case of graphic files and images also abound analysis about user-generated content (tagging) on platforms, in this case predominantly Flickr (among others, Sigurbjörnsson & Zwol, 2008; Angus, Thelwall & Stuart, 2008; Angus & Thelwall, 2010).
Moreover, the fact that the users' access to graphic files is recorded in the log files has opened up new working lines, among which are the work of Chen (2001), and Choi & Rasmussen (2003), centered in the analysis of image queries.
Another interesting working line is such about the persistence of digital objects on the Web (Koehler, 1999), where among the items discussed are the image files, and where already indicated the large volume of existing graphic files. In line with this topic, Ortega, Aguillo & Prieto (2006) found that graphic and multimedia files had an important growth rate (10.50%, and 6.43% respectively, between 1997 and 2004, from 738 selected websites all over the world), and also a highly vanishing rate (80.34% of image elements).
Argentina (Tolosa et al., 2007) and Chile (Baeza-Yates & Graells, 2007) have been also deeply analyzed counting the number of links to non-HTML files, such as video, audio, and graphic files.
Also highlights another working line aimed at finding images on the Web, which are worth mentioning the work carried out on Excite (Goodrum & Spink, 2001) and other special search engines like TinEye1 (Kousha, Thelwall & Rezaie, 2012). Finally, should also be remarked research focused on comparative analysis between text and image searches on the Web (Pu, 2005; 2008), and the general search of digital images (Jansen, 2008).
However, the classic studies of cybermetrics applied to the quantification of universities' web performance have not paid much attention about graphic files, in large part because the motivations for the creation and/or reuse of these images respond to very different purposes, and they don’t reflect only the core activities of these institutions (Kousha, Thelwall & Rezaie, 2012). Notwithstanding, there are some recent studies that analyze the use and impact of the images as a resource in academic activities (Angus, Thelwall & Stuart, 2010; Angus & Stuart, 2010), although none of these works take the academic website as the unit of measure.
Despite the quantification of such files is not as direct indicator of the activities of the university as rich files, its volume is very high, and since the global count it is used in universities' web measurements, their nature and influence should be studied more precisely.
As regards the Spanish area, it deserves special attention the work of Alonso, Figuerola & Zazo (2004), and Pinto et al. (2004), the only existing studies where graphic files are quantified within the Spanish academic web-space. However, the sample used (only a part of the Spanish university system), method of analysis (web crawler rather than a commercial search engine), and the obsolescence of their results (conducted in 2004) confirm the need to update the results.
For all these reasons, the main objective of this work is the quantification of the blog-like content, graphic and multimedia files within the Spanish university system, for which the following specific objectives are proposed:
2.1. Data gathering
The analysis is applied to the Spanish university system, formed in 2010 by 76 universities, both public and private. The list of universities and associated URLs were obtained from the Ministry of Education and the Conference of Rectors of Spanish Universities (CRUE)2. In addition to the official URLs (those indicated in the official sources consulted), the existence of alias and "alternative" domains, at various universities was detected.
Regarding alias domains (URLs that share the same second level domain, but have different top level domain), it was found manually at each university the existence of the following domains: .CAT, .COM, .EDU, .ES, .NET, .ORG.
As regards the "alternative" domains (valid web domains with different first and/or second level domain respect to the official, and not necessarily redirected to it), they were searched through Yahoo! Site Explorer, as well as to consult the universities' institutional information on their website.
2.2. Measurement of the sample (I): indicators and sources
All indicators, scope definitions, sources, and commands utilized -with the exception of rich files, widely reported in the literature (Aguillo et al., 2006)- are showed below (table 1), where "domain.tld" should be substituted by each URL under study.
Table 1. Indicators, scope, sources, and commands
File formats provided by new versions of the Microsoft Office suite, such as DOCX and PPTX, were not considered due to their low representativeness in the period of measurement (2010), but should be considered in future works because they are estimated to have a larger growth rate.
In all sources, through the advanced settings of the browser, the following preliminary operations were performed:
Figure 1. Example of global graphic count query on Bing
Figure 2. Example of global multimedia count query on Bing videos
Figure 3. Example of global blog count query on Google blogs
Figure 4 illustrates the query process of graphic count on Google. As it can be seen, although the query can be set directly as shown in table 1, Google reconstructs the query in a general search ("site:domain.tld"), and then select the appropriate file from the sidebar menu box.
Figure 4. Example of file graphic count (PNG) query on Google image
Google announces on April 29th, 2011 that the Google videos service would stop working permanently on May 13th, 2011 due to competition with Youtube, also owned by Google3. Although all along 2010 still can be queried, it will affect the results, as discussed later
Figure 5. Example of global multimedia count query on Google videos
2.3. Measurement of the sample (III): data capture
The date of each measurement of the data simple is as follows:
2.4. Analysis of the sample
Since the joint display of the raw data of all URLs is not adequate, due to the orders of magnitude (both among themselves and between the different search engines) are very different, it was decided to normalize count from 0 to 100, by a process of transformation (Rocki, 2005), with the aim of working subsequently with the called "mean relative representation factor in count" (Rc) (Orduña-Malea et al., 2010).
To this end, the sum of website size (in any of the indicators considered) obtained from the URLs of all Spanish universities over a whole month (accumulated count) is considered equal to 100, and the value of each URL is calculated proportionately:
After normalizing results, total quantitative changes are not retrieved, but a size percentage proportional to the total size obtained by all the universities in a particular search engine every month measurement (4 shots in this case), a concept called "relative representation".
Then, the average of νcn is calculated monthly, obtaining a value, also between 0 and 100, whose name is "mean relative representation factor in count (Rc)". This factor can be calculated for any set of selected sites and any period of time (Orduna-Malea et al, 2010).
Where M is the number of months analyzed (in this case 4 shots, for March, June, September and December 2010).
Finally, in order to calculate the growth rate of web domains along the period, the compound interest formula was used:
The results are divided into global results (accumulated count of each web domain in each shot), and results according to the university web domains.
3.1. Accumulated count
The global results for each search engine used (Google and Bing) are shown below:
The inability to obtain accurate global count from Google prevents a full comparison (and hence, percentages) with the specific files considered (blog content, multimedia files, and rich files). In any case, figure 6 shows the evolution during 2010 of all counts measured by Google, including rich files (summation of PDF, DOC, PPT, XLS, and PS files).
Figure 6. Comparison of count according to different Google searches
The most important files for all URLs that comprises the Spanish academic web space can be clearly identified: rich files (with an important drop in June), and the global graphic files, followed by blog contents, demonstrating the importance of such files in calculating the total number of pages in an online academic site.
The high values obtained both from blogs and graphic files aims to consider the percentage of this type of contents respect to the other considered files. Figure 7 displays this distribution (December shot), showing that 41% of all Google files considered come from graphic files, whereas 6% come from blogs.
Figure 7. Distribution of different files retrieved by Google:
As regards graphic formats, table 2 summarizes the full values obtained at each sample, which notes the widespread use of file JPG (56.10% of all considered files as for December sample), followed by GIF, and PNG. The BMP format usage is minority.
Table 2. Raw graphic file count evolution, and annual nominal interest rate (r)
All formats present a positive statistical range throughout 2010, except PNG, which suffer a very substantial fall in the June data (also detected in JPG); although the data grows since then, it didn’t reach the March levels again. This issue is reflected in the low growth rate (using compound interest formula) obtained.
The row “Total” shows the summation of all 4 types of formats considered, whereas “Global” row shows the global graphic count. This data shows the first inconsistencies: global count is lower than total count in all samples, although the figures keep the same order of magnitude.
Table 3 compares the global count with the graphic and multimedia files, showing the percentage of the specific counts respect to the global one for each shot.
Table 3. Global, graphic, and multimedia file count for Bing
Data shows clearly how Bing decreases its coverage along 2010 from 18,557,201 files in March to only 5,436,832 in September. The percentage of graphic files have been also reduced (from 30.81% to 5.72%), although the great value in March is due to the web behavior of "upc.es" (this issue is identified with the analysis per academic web domain, in the next chapter). Otherwise, the coverage of multimedia files is not representative in any shot along the period (0.02%).
3.2. Count per academic web domain
This chapter is divided into 3 sections, considering graphic, multimedia and blog content.
3.2.1 Graphic count
a) Google images
Count and evolution
Table 4 shows both raw data (total data retrieved from this search engine), normalized results obtained per sample, and Rc value (average of 4 normalized values) for all URLs of the Spanish university system (full details of the 141 URLs, including range and standard deviation, available in Annex I).
Table 4. Ranking of URLs according to Rc(top 20)
The first three universities are the only which surpass the value of 100,000 files in the last sample, and generate jointly more than 15% of all content. On the other hand, it is surprising the low values retrieved from "unirioja.es", one of the largest Spanish academic web domains. Otherwise, some other unusual trends are detected. For example, the unexplained drop of "uab.es" from March to June data, which is complemented with an increment in "uab.cat" for the same period. This can be explained under political decisions to manage specific alternative TLDs.
The normalized data should not be interpreted in a longitudinal way (raw data should be used for that purpose), because it depends on the global count of the Spanish academic web space, and the interpretation of a concrete web domain is affected by the global performance of the web space. The aim of this normalization is just to show the proportion of number of pages of each web domain respect to the global count in each moment, and present later this value averaged at the end of the analyzed period.
To further analyze data variations over time, table 5 presents the URLs with major and minor range values over web domains along 2010.
Table 5. URLs with major and minus range (R), and annual nominal interest rate (r)
The range is generally positive for all web domains (only 24 domains record a negative range). The growth rate (r) also remarks the performance of "uab.cat", due to the low results in the first sample and the value obtained at the end (while R is based on subtracting, r is based on division). For the same reason, "upc.edu" achieves higher "r" value than "us.es"
Table 6 adds information about the 4 graphic formats studied (JPG, GIF, BMP, and PNG) for the top 10 URLs with more global graphic file count, indicating for each URL the Rc values obtained.
Table 6. Rc for graphic files (JPG, GIF, BMP, PNG)
The usage drop of PNG files is mainly due to range values detected for the different domains belonging to UAB University: "uab.cat" (153,700) and "uab.es" (24,700). These figures indicate again a clear change in graphic files policy management at this university. Minimum and maximum ranges for each of the graphic formats are shown in table 7, for illustrative purposes.
Table 7. Major and minor range (R) for graphic files count
Comparing the partial results for each of the types of graphic files and the global graphic count, a number of inconsistencies are observed. In order to analyze them in more detail we have proceeded to obtain for each URL the sum of the results from the 4 types of formats (TOT), and to compare them with the global results (TG).
Contrary to expectations, it is found a set of URLs in which the sum of the 4 graphic formats is superior respect to the global count, which shows some methodological weaknesses of this search functionality (previously detected in table 2 for accumulated count). The results obtained for the URLs where this phenomenon is found are offered in table 8 (data from December, 2010), where the errors obtained are particularly noted in the domains "uv.es", "ua.es", "ucm.es" and "us.es" (domains with large figures), where difference exceeds 10,000 results.
Table 8. Inconsistencies between global graphic count (TG) and sumatory of graphic files (TOT)
A dispersion of count between the different alias domains is also identified. Table 9 shows some of the most important cases detected.
Table 9. Global graphic file count dispersion between alias web domains
The region of Catalonia constitutes the more complex environment; universities like UAB, UB or URV maintain alias with similar and elevated results. It also highlights particularly the case of UCLM, where, in addition to the official domain "uclm.es", the rest of alias maintains similar and very high results sets.
b) Bing images
The problem of dispersion between URL alias is also detected in Bing; notwithstanding, in order not to be repetitive, data are omitted, and they can be consulted directly in the Annex II.
Table 10 shows top 20 URLs with higher Rc value (full details are also included in Annex II); it is identified a sharp drop in the number of results obtained mainly from the December sample, which confirms the loss of representation of this search engine, already observed in global count calculations.
Table 10. Mean relative representation factor in count (Rc) (Bing images)
The presence of domains belonging to polytechnic universities should be pointed out, jointly with the low performance (as detected with Google) for “unirioja.es”, which show 5,710 graphic files in December (1.34% of the global count for this domain: 427,000), and hence does not appear in table 10.
Special attention must also be paid to the high value obtained by "upc.es" in March (and its gradual decline), which clearly shows an inconsistency. This anomalous result in March (which causes a partial Rc= 78.01), and the very low value recovered in the following shot (Rc= 3.17), provokes an increase of ratios in the rest of domains. View this in a longitudinal way has no sense and shows that all the domains except “upc.es” increase their values when in fact this is not true. For this reason is recommended only to take into account the last 3 shots, and the raw values.
For each raw data is also provided their percentage respect to the total number of pages for this web domain, in order to give an overview about the proportion of graphic files in each academic web domain. Anyway, the sharp decrease of coverage generates several unexpected results, especially in the June shot.
In fact, the global graphic file (accumulated) count throughout the whole Spanish academic web space goes from 5,717,222 results in March to 310,845 on December (negative statistic range of 5,406,377). Figure 8 illustrates this negative trend, comparing the global count distribution of the last two samples (September and December).
Figure 8. Distribution of global graphic file accumulated count on September and December, 2010 (Bing images)
On one hand, the largest URLs appear (in December sample) in a very narrow range of values, which is reflected in the almost zero slope of the distribution on the left side of the table. On the other hand, the URLs which are basically losing weight are those with higher global graphic file count (with certain exceptions not justified, such as "uco.es", "unav.es", "unirioja.es" or "uhu.es").
However, there are a wide range of domains increasing their results from September to December. In all of these URLs, a sharp drop is detected in June, rising again in December (although without reaching the initial values, i.e., presenting a negative statistical range), just when the most important domains fall.
Besides these web domains, it is detected another group of URLs, with discrete graphic counts, which have a positive range. All of them are shown in table 11.
Table 11. URLs with positive range (Bing images)
Figure 9 provides a comparative between the global results in Google images, Bing images, and Bing.
Figure 9. Comparison between the distribution of global graphic count and global count both for Bing and Google (December, 2010)
A sharp coverage drop in Bing (reflected on Bing images) is detected, while the results of Google images remain more or less constant for the taken period.
Despite the inconsistencies between Bing images results throughout the measurements period, if we take as reference the most recent data (December 2010), the similarities between Google images and Bing images are higher than expected.
Figure 10 shows the comparative distribution of both sets of data, where it is identified a positive correlation between both sources, except for two important areas. First one is detected in the middle “x” axis (where low results imply small overall differences), and the other one in detected on the upper zone, where less coverage is detected on Bing images. For example, "us.es" gets 147,000 hits on Google images, for just 6,040 images in Bing images. Other URLs with large differences include, among others, "uv.es" (110,000 and 6,190 respectively) and "ua.es" (100,000 and 6,000).
Figure 10. Correlation between Bing images y Google images (December 2010)
On the other hand, the low performance of Bing images in the upper zone seems to show a limitation in the image retrieving process (does not display more than 10,000 results per site). This phenomenon is only detected in December shot.
A comparison between the global search engine (Bing) and the specific image search engines (Bing images and Google images) allow identifying certain inconsistences results. Below (table 12) is presented a comparison between the results of Bing and Google images, captured in the last sample (December 2010). The coverage difference between the two sources allows -for some URLs- getting more search results in the specific image query than in the global query.
Table 12. Comparison between global count (Bing) y global graphic count (Google images) (Dec 2010)
Comparing the counts obtained both from Bing images and Bing, this phenomenon practically disappears, with only 2 URLs detected with this problem: "universidadcamilojosecela.es" (16 results for Bing, and 30 for Bing images), and "unica.edu" (18 results for Bing, and 50 in Bing images).
In fact, the correlation between Bing and Bing images is quite large, as reflected in figure 11, which only differs fundamentally on the area of high performance in both search engines (URLs with larger global Bing count also are the URLs with more global graphic count).
Figure 11. Correlation between Bing and Bing images (Dec 2010)
3.2.2. Multimedia count
Tables with complete data collected both for Google videos and Bing videos are available in the corresponding annexes III and IV.
a) Google videos
The top 20 academic web domains with higher Rc value for the Global multimedia file count indicator are shown in table 13.
Table 13. Mean relative representation factor in count (Rc) (Google videos)
The results show a high representation of the domain "upc.edu" in all samples. Other domains such as "uva.es" and "upv.es" appear at the top due to the significant increase recorded in December. In any case, the values are discrete. In the last sample, a total of 83 URLs do not obtain any result, while there are only 31 URLs with more than 5 results.
The global data also shows irregularities in their evolution over time. Specifically, it is detected a significant drop in June (where the accumulated multimedia count amounted to 1,561 results) to September (where only 1,087 results were obtained). This drop is mainly produced in "us.es" (which goes from 113 documents to 79) and "udg.edu" (from 132 to 5).
In December results grow again, although this is not due to the recovery of the domains described above but to the significant increase recorded in "upc.edu" (from 536 to 646), "uva.es" (56 to 216), "upv.es" (39 to 125), and "uvigo.es" (9 to 146).
b) Bing videos
As for table 13, table 14 shows the corresponding top 20 web domains that achieve greater Rc value on Bing videos.
Table 14. Mean relative representation factor in count (Rc) (Bing videos)
Like Google videos, the largest domain is "upc.edu", although in this case the relative count obtained is lower (29.90, compared to 40.97 achieved on Google videos). Despite this agreement, the difference between these two sources is important because of the low values generally obtained by the different domains, which causes great changes in the positions of URL in each Rc ranking.
There are domains with higher representation on Bing than Google, like "unavarra.es" or "upco.es", and domains far better positioned in Google. The most extreme case is "uva.es", which gets 2nd position on Google videos Rc ranking, while only does 51st on Bing videos.
Apart from the differences between sources, the volume of general results obtained by Bing videos is very discrete. In December 2010, a total of 80 URLs do not show any results. In addition, there is an excessive negative statistical range in almost all domains.
In fact, only 2 URLs are identified with a positive range ("upco.es", and "udl.cat"), although in small amounts. This fall of results can be visualized in figure 10, which compares the evolution of accumulated global multimedia count obtained for all the URLs in the two sources analyzed. In just nine months Bing videos has gone from almost 4,000 to recover just over 1,000 in December, where harvest less files than Google videos for the first time in the period.
Figure 12. Accumulated global multimedia file count per source and sample
3.2.3. Blog count (Google blogs)
The blog content web space is formed (as of December 2010) by 337,845 results, which is an increase of 42,449 records from March 2010, when the first data collection was retrieved. Nevertheless, 59 URLs do not have any results, reflecting a highly skewed distribution.
Table 15 details the web domains with a value greater than 1 Rc in the period of study, as well as raw and normalized values for these URLs. A completed table with full details is available in Annex V.
Table 15. Mean relative representation factor (Rc) (Google blogs)
The first position is occupied prominently by "us.es", with a relatively high representation value (9), indicating a highly distributed distribution. The UCM and UR universities (the other 2 major domains in global count) perform far behind, especially "unirioja.es" with only 32 results.
Otherwise, should be mentioned the presence of 2 private universities (UOC and IE) in the top places, which confirms the better performance of this type of content in private institutions.
The results, taking into account its upward trend, present strange behaviors in some URLs:
Despite these specific dysfunctions, and taking into account the upward trend in the data, the values show a high correlation between samples, as reflected in the distribution of results shown in figure 13.
Figure 13. Global blog accumulated count distribution per sample (Google videos)
The amount of blog-type contents contrast with the few blog platforms identified within academic websites in the Spanish system. In December 2010 only 29 universities (of 76) hosted official academic blog platforms (table 16):
Table 16. Blog platforms within Spanish academic websites (December 2010)
The main results obtained are showed below, structured by type of count analyzed: graphic, multimedia, and blog-type content.
Additionally, the following considerations should be remarked:
As a final point, we conclude that graphic, multimedia and blog-type content files, counted together, represent a significant proportion of the Spanish academic websites, and so they must be taken into account in calculations of the total number of pages, but low accuracy of searchers (fundamentally image, and video) must be taken into account.
Respect to the first conclusion (proportion within Spanish academic web system):
Results show that for all three types of content analyzed, three different university groups are achieving high performances: polytechnic universities (UPC, UPM, and UPV), old established, multidisciplinary, and big universities (such as UCM, UB and UV), and small and specialized universities (such as UA, UPF and UVIGO). This confirms that not only the size of the university is influencing in their graphic, multimedia, and blog performance on the Web.
Multimedia and blog content can be indirect indicators of some academic activities (such as research, teaching and transfer supporting), whereas graphic files can reflect a compromise publishing digital collections or representing complementary material of web publications, for example.
The significant percentage of retrieved contents (especially images and blogs) indicates the need to detect their source, motivation and distribution within the academic web-space. Future work is desirable in order to identify the source of this content (digital collections, learning objects, blogs supporting lectures, news, research or other activities), to estimate their influence in the global academic web performance, and to calculate their possible correlation with other web indicators.
Respect to the second conclusion (low accuracy of image search engines):
Both Google and Bing present severe inconsistencies and irregularities in their image and video searches, which limit the capabilities of these tools to quantitative uses.
Google provides inconsistencies among graphic files count and global graphic count. Moreover, the inability of calculating global count accurately makes no possible to calculate the proportion of graphic, multimedia and blogs respect to the site size. On the other hand, Google’s coverage is bigger than Bing, and their evolution over time is also more coherent.
The period analyzed corresponds with the starting time of fusion with Yahoo! Search. This fact can explain the great instability and low coverage found. For that reason, all data recovered from Bing is showing indirectly the effects of fusion of both search engines in the Spanish academic web space. In any case, the strength of Bing is the possibility of calculating the proportion of graphic count respect to site size, but the disadvantages are the impossibility of searching specific graphic files, and the lower coverage respect to Google.
Aguillo, Isidro F. (2009). Measuring the institutions’ footprint in the web. Library Hi Tech, 27(4): 540-556.
Aguillo, Isidro F.; Granadino, B.; Ortega, José L. & Prieto, J.A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for information science and technology, 57(10): 1296-1302.
Aguillo, Isidro F.; Ortega, J.L. & Fernandez, M. (2008). Webometric Ranking of World Universities: Introduction, methodology, and future developments. Higher Education in Europe, 33 (2/3):233-244.
Alonso Berrocal, J.L.; Figuerola, C.G. & Zazo, A.F. (2004). Cibermetría: nuevas técnicas de estudio aplicables al Web. Trea, Gijón.
Angus, E. & Thelwall, M. (2010). Motivations for image publishing and tagging on Flickr. In: Turid Hedlund and Yasar Tonta (Eds.), Proceedings of the 14th International Conference on Electronic Publishing, Hanken School of Economics, Helsinki, 189-204).
Angus, E., Thelwall, M. & Stuart, D. (2008). General patterns of tag usage among university groups in Flickr. Online Information Review, 32 (1): 89-101.
Angus, E.;Thelwall, M. & Stuart, D. (2010). Flickr’s potential as an academic image resource: an exploratory study. Journal of Librarianship and Information Science, 42(4): 268–278.
Baeza-Yates, R.; & Graells, E. (2007). Características de la web chilena 2007.
Cha, M.; Kwak, H.; Rodriguez, P.; Ahn, Y-Y. & Moon, S. (2009). Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems. IEEE/ACM Transactions on Networking, 17 (5): 1357-1370.
Chen, H. (2001). An analysis of image queries in the field of art history. Journal of the American Society for Information Science and Technology, 52(3): 260–273.
Choi , Y. & Rasmussen , E. (2003). Searching for images: The analysis of users' queries for image retrieval in American history. Journal of the American Society for Information Science and Technology, 54(6): 498–511.
Goodfellow, T. & Graham, S. (2007). The Blog as a High-Impact Institutional Communication Tool. Electronic Library, 25(4): 395-400.
Goodrum, A. & Spink, A. (2001). Image searching on the Excite search engine. Information Processing & Management, 37(2): 295–312.
Jansen, B. (2008). Searching for digital images on the web. Journal of Documentation, 64(1): 81–101.
Koehler, W. (1999). An analysis of web page and web site constancy and permanence. Journal of the American Society for Information Science, 50(2): 162–180.
Kousha, K. & Thelwall. M. (2008). Assessing the impact of research on teaching: an automatic analysis of online syllabuses in science and social sciences. Journal of the American Society of Information Science and Technology, 59(13): 2060-2069.
Kousha, K.; Thelwall, M.; Abdoli, M. (in press). The role of online videos in research communication: a content analysis of YouTube videos cited in academic publications. Journal of the American Society for Information Science and Technology.
Kousha, K.; Thelwall, M. & Rezaie, S. (in press). Can the impact of scholarly images be assessed online? An exploratory study using image identification technology. Journal of the American Society for Information Science and Technology.
Kousha, K.; Thelwall, M.; Rezaie, E. (2010). Using the Web for Research evaluation: The integrated online impact indicator. Journal of informetrics, 4(1): 124-135.
Orduña-Malea, E.; Serrano-Cobos, J.; Lloret-Romero, N. (2009). Las universidades públicas españolas en Google Scholar: presencia y evolución de su publicación académica web. El profesional de la información, 18(5): 493-500.
Orduña-Malea, E.; Serrano-Cobos, J.; Ontalba-Ruipérez, J-A. & Lloret-Romero, N. (2010). Presencia y visibilidad web de las universidades públicas españolas. Revista española de documentación científica, 33(2): 246-278.
Ortega, José L.; Aguillo, Isidro F. & Prieto, Jose A. (2006). Longitudinal study of content and elements in the scientific web environment. Journal of Information Science, 32(4): 344-351.
Pinto-Molina, M.; Alonso-Berrocal, J. L.; Cordón-García, J. A.; Fernández-Marcial, V.; García-Figuerola, C.; García-Marco, J.; Gómez-Camarero, C.; Zazo, Á. F. & Doucet, A. V. (2004). Análisis cualitativo de la visibilidad de la investigación de las universidades españolas a través de sus páginas web. Revista española de documentación científica, 27(3): 345-370.
Pu, H. (2005). A comparative analysis of web image and textual queries. Online Information Review, 29(5): 457–467.
Pu, H. (2008). An analysis of failed queries for web image retrieval. Journal of Information Science, 34(3): 275–289
Rocki, Marek (2005). Statistical and mathematical aspects of rankings: lessons from Poland. Higher education in Europe, 30(2): 173-181.
Sigurbjörnsson, B. & Van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge, Proceedings of the 17th International Conference on World Wide Web 2008, WWW'08, 327-336.
Thelwall, M. & Hasler, L. (2007). Blog search engines. Online Information Review, 31(4): 467-479.
Thelwall, M. & Prabowo, R. (2007). Identifying and characterising public science-related concerns from RSS feeds. Journal of the American Society for Information Science & Technology, 58(3): 379–390.
Thelwall, M. & Wilkinson, D. (2010). Blog Issue Analysis: An exploratory study of issue-related blogging. In: Birger Larsen, Jesper W. Schneider, Fredrik Åström (Eds.), The Janusz Faced Scholar: A festschrift in honour of Peter Ingwersen, ISSI, 203-218.
Thelwall, M. (2007). Blog searching: The first general-purpose source of retrospective public opinion in the social sciences?. Online Information Review, 31(3): 277-289.
Thelwall, M. (2009). Introduction to Webometrics: quantitative web research for the social sciences. San Rafael, CA, Morgan & Claypool, Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1).
Thelwall, M., & Price, L. (2006). Language evolution and the spread of ideas: A procedure for identifying emergent hybrid word family members. Journal of the American Society for Information Science and Technology, 57(10): 1326–1337.
Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D. & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12): 2544–2558.
Thelwall, M.; Sud, P. & Vis, F. (2012). Commenting on YouTube videos: From Guatemalan rock to El Big Bang. Journal of the American Society for Information Science and Technology, 63: 616–629.
Tolosa, G.; Bordignon, F.; Baeza-Yates, R. & Castillo, C. (2007). Characterization of the Argentinian web. Cybermetrics, 11(1): Paper 3. http://cybermetrics.cindoc.csic.es/articles/v11i1p3.html
2 <http://www.educacion.es/educacion/universidades/educacion-superior-universitaria/que-estudiar-donde/universidades-espanolas.html> (retrieved on 03/27/2012).