Quality and Credibility Metrics of Online Entities: Academic Review

Quality Vs Credibility of Online Entities


The quality of being believable or worthy of trust.

Credibility challenges according to Stanford Web Credibility Research:

  • What causes people to believe (or not believe) what they find on the Web?
  • What strategies do users employ in evaluating the credibility of online sources?
  • What contextual and design factors influence these assessments and strategies?
  • How and why are credibility evaluation processes on the Web different from those made in face-to-face human interaction, or in other offline contexts?

P.h.D Thesis: How Do People Evaluate a Web Site’s Credibility?


The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.

  • degree of excellence or fitness for use

Is quality an indicator of credibility or vice versa?

User Generated Content: How Good is It? Slide

  • How can we estimate the quality of UGC?
    • Directly evaluate the quality.
      • What are the elements of social media that can be used to facilitate automated discovery of high-quality content?
      • What is the utility of links between items, quality rating from members of the community, and other non-content information to the task of estimating the quality of UGC?
        • How are these different factors related?
        • Is content alone enough for identifying high-quality items?
        • Can community feedback approximate judgments of specialists?
    • In this work, the authors used a judged question/answer collection where good questions usually have good answers to model a classifier to predict good questions and good answers, obtaining an AUC (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively.
      • The drawback is that the quality gap is balanced by volume. The larger the volume of the UGC, the lower difficult the quality evaluation.
    • Obtaining indirect evidence of the quality.
      • use UGC for a given task and then evaluate the quality of the task results.
      • evaluation of the quality of extraction of semantic relations using the Open Directory Project (ODP). Precision of over 60%.
    • Crossing different UGC sources and infer from there the quality of those sources.
      • using collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70% of the tags can be semantically classified by using Wordnet and Wikipedia.

The Online Entities Quality Challenge

Entities: social media platforms (Facebook, Twitter, ….) or information systems, and information or contents on the internet (articles: posts, comments, …).

The advent and openness of online social media platforms often leaves them highly susceptible to abuse by suspicious entities. It therefore becomes increasingly important to automatically identify these suspicious entities and mitigate/eliminate their threats.

Anomaly Detection on Social Data: P.h.D Thesis

The rapid growth of the Internet and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems.

  • inability of Search Engines to wade through the vast expanse of questionable content and return “quality” results to a user’s query

Developing a Framework for Assessing Information Quality on the World Wide Web

Fundamental Definitions

“Data Quality” is described as data that is “Fit-for-use”:  data considered appropriate for one use may not possess sufficient attributes for another use!

Common Dimensions of Information or Data Quality

  • Accuracy: extent to which data are correct, reliable and certified free of error
  • Consistency: extent to which information is presented in the same format and compatible with previous data
  • Security: extent to which access to information is restricted appropriately to maintain its security
  • Timeliness: extent to which the information is sufficiently up-to-date for the task at hand
  • Completeness: extent to which information is not missing and is of sufficient breadth and depth for the task at hand
  • Concise: extent to which information is compactly represented without being overwhelming (i.e. brief in presentation, yet complete and to the point)
  • Reliability: extent to which information is correct and reliable
  • Accessibility: extent to which information is available, or easily and quickly retrievable
  • Availability: extent to which information is physically accessible
  • Objectivity: extent to which information is unbiased, unprejudiced and impartial
  • Relevancy: extent to which information is applicable and helpful for the task at hand
  • Useability: extent to which information is clear and easily used
  • Understandability: extent to which data are clear without ambiguity and easily comprehended
  • Amount of data: extent to which the quantity or volume of available data is appropriate
  • Believability: extent to which information is regarded as true and credible
  • Navigation: extent to which data are easily found and linked to
  • Reputation: extent to which information is highly regarded in terms of source or content
  • Useful: extent to which information is applicable and helpful for the task at hand
  • Efficiency: extent to which data are able to quickly meet the information needs for the task at hand
  • Value-Added: extent to which information is beneficial, provides advantages from its use

These attributes of data quality can vary depending on the context in which the data is to be used.

Defining what Information Quality means in the context of Search Engines will depend greatly on whether dimensions are being identified for the producers of information, the storage and maintenance systems used for information, or for the searchers and users of information.

  • Consider the information user,  quality dimensions of their interest include relevancy and usefulness. These dimensions are enormously important but extremely difficult to gauge.

Developing a Framework for Assessing Information Quality on the World Wide Web

Quality Metrics

Metrics for IQ in Information Retrieval

Metrics that can assess IQ and can be deployed in Search engines

Quality Datasets



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s