Quality and Credibility Metrics of Online Entities: Academic Review

Quality Vs Credibility of Online Entities


The quality of being believable or worthy of trust.

Credibility challenges according to Stanford Web Credibility Research:

  • What causes people to believe (or not believe) what they find on the Web?
  • What strategies do users employ in evaluating the credibility of online sources?
  • What contextual and design factors influence these assessments and strategies?
  • How and why are credibility evaluation processes on the Web different from those made in face-to-face human interaction, or in other offline contexts?

P.h.D Thesis: How Do People Evaluate a Web Site’s Credibility?


The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.

  • degree of excellence or fitness for use

Is quality an indicator of credibility or vice versa?

User Generated Content: How Good is It? Slide

  • How can we estimate the quality of UGC?
    • Directly evaluate the quality.
      • What are the elements of social media that can be used to facilitate automated discovery of high-quality content?
      • What is the utility of links between items, quality rating from members of the community, and other non-content information to the task of estimating the quality of UGC?
        • How are these different factors related?
        • Is content alone enough for identifying high-quality items?
        • Can community feedback approximate judgments of specialists?
    • In this work, the authors used a judged question/answer collection where good questions usually have good answers to model a classifier to predict good questions and good answers, obtaining an AUC (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively.
      • The drawback is that the quality gap is balanced by volume. The larger the volume of the UGC, the lower difficult the quality evaluation.
    • Obtaining indirect evidence of the quality.
      • use UGC for a given task and then evaluate the quality of the task results.
      • evaluation of the quality of extraction of semantic relations using the Open Directory Project (ODP). Precision of over 60%.
    • Crossing different UGC sources and infer from there the quality of those sources.
      • using collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70% of the tags can be semantically classified by using Wordnet and Wikipedia.

The Online Entities Quality Challenge

Entities: social media platforms (Facebook, Twitter, ….) or information systems, and information or contents on the internet (articles: posts, comments, …).

The advent and openness of online social media platforms often leaves them highly susceptible to abuse by suspicious entities. It therefore becomes increasingly important to automatically identify these suspicious entities and mitigate/eliminate their threats.

Anomaly Detection on Social Data: P.h.D Thesis

The rapid growth of the Internet and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems.

  • inability of Search Engines to wade through the vast expanse of questionable content and return “quality” results to a user’s query

Developing a Framework for Assessing Information Quality on the World Wide Web

Fundamental Definitions

“Data Quality” is described as data that is “Fit-for-use”:  data considered appropriate for one use may not possess sufficient attributes for another use!

Common Dimensions of Information or Data Quality

  • Accuracy: extent to which data are correct, reliable and certified free of error
  • Consistency: extent to which information is presented in the same format and compatible with previous data
  • Security: extent to which access to information is restricted appropriately to maintain its security
  • Timeliness: extent to which the information is sufficiently up-to-date for the task at hand
  • Completeness: extent to which information is not missing and is of sufficient breadth and depth for the task at hand
  • Concise: extent to which information is compactly represented without being overwhelming (i.e. brief in presentation, yet complete and to the point)
  • Reliability: extent to which information is correct and reliable
  • Accessibility: extent to which information is available, or easily and quickly retrievable
  • Availability: extent to which information is physically accessible
  • Objectivity: extent to which information is unbiased, unprejudiced and impartial
  • Relevancy: extent to which information is applicable and helpful for the task at hand
  • Useability: extent to which information is clear and easily used
  • Understandability: extent to which data are clear without ambiguity and easily comprehended
  • Amount of data: extent to which the quantity or volume of available data is appropriate
  • Believability: extent to which information is regarded as true and credible
  • Navigation: extent to which data are easily found and linked to
  • Reputation: extent to which information is highly regarded in terms of source or content
  • Useful: extent to which information is applicable and helpful for the task at hand
  • Efficiency: extent to which data are able to quickly meet the information needs for the task at hand
  • Value-Added: extent to which information is beneficial, provides advantages from its use

These attributes of data quality can vary depending on the context in which the data is to be used.

Defining what Information Quality means in the context of Search Engines will depend greatly on whether dimensions are being identified for the producers of information, the storage and maintenance systems used for information, or for the searchers and users of information.

  • Consider the information user,  quality dimensions of their interest include relevancy and usefulness. These dimensions are enormously important but extremely difficult to gauge.

Developing a Framework for Assessing Information Quality on the World Wide Web

Quality Metrics

Metrics for IQ in Information Retrieval

Metrics that can assess IQ and can be deployed in Search engines

Quality Datasets


Thesis and Dissertation Hubs

– ProQuest
– DiVA
– ETDs
– Ebook
Dart Europe
– OhioLINK
– UM Repository

Credit: Khalid Kyle

Here is the list of websites that i used to get ebook, thesis & dissertation for free in writing my thesis:

– ProQuest
– Queens
– DiVA
– ETDs
– Ebook
– Dart Europe
– OhioLINK
– UM Repository

Here is the list of websites that i used to get ebook, thesis & dissertation for free in writing my thesis:

– ProQuest
– Queens
– DiVA
– ETDs
– Ebook
– Dart Europe
– OhioLINK
– UM Repository

Ph.D. Discussions on Quora

I came across some interesting Ph.D. research related topics on Quora that attempts to measure the quality and satisfaction of a P.h.D programme. Some of the questions that captured my attention include:

  • Q: Is there anyone out there who really enjoyed or is enjoying their time as a Ph.D. student?
    • A: Yes, very many! with a great advisor, great research area, working on the right problems at the right time, etc. Ph.D. students need to be driven to learn and try new things, to not settle, and to find your their own identity as researchers.
  • Q: What are the metrics for measuring the quality of a Ph.D. program?
    • A: metrics that are focused on the process of the Ph.D.
      • Grant funding received by department
      • Number of publications by student at graduation
      • Number of 1st author papers at graduation by student
      • Awards (both faculty and student awards like)
      • Fraction that gets academic jobs (Postdocs count in this number)
    • A: metrics that are focused on the Ph.D. student.
      • Did the student receive a job in the field they wanted?
      • Was the research they produced impactful (citations, not just #)
      • Was the student satisfied with their experience?
      • Do students feel well prepared when they join their next job? (academic or otherwise).
  • Q: I have spent one year in my Ph.D. program and haven’t published anything. What should I do?
    • A: What is the point of writing a paper that nobody reads? It is better to keep learning new things and keep asking the most interesting questions. When you finally publish your first paper, it will be amazing and worth 20 mediocre papers. The impact is what matters, not publication count. The hard stuff takes some time to come together into a quality publication.

Measuring the Quality of Interesting Entities on the Web

Quality is defined in Wikipedia as the standard of something as measured against other things of a similar kind; the degree of excellence of something.

Today’s information and data pools on the Web focus on the quantity of information rather than its quality.

The assessment of the quality of information is especially important because decisions are often based on information from multiple and sometimes unknown sources, though, the reliability and accuracy of the information is questionable.

However, the web lacks quality dependent filter mechanisms, automatic identification of misuse patterns, as well as tools to establish user trust in information and authors.

Hence the need to develop mechanisms for estimating the quality of textual Web documents and to evaluate these mechanisms for their effectiveness and efficiency.


My entities of interest include Websitesnews feeds, social media feeds, digital adverts, and other Web articles. Other important entities include air, water, soil, and life.

Large scale machine learning is playing an increasingly important role in improving the quality and monetisation of Internet properties. A small number of techniques, such as regression, have proven to be widely applicable across Internet properties and applications.

Sibyl: A System for Large Scale Machine Learning at Google

Read More »

My Research and Publication Platforms

My Journal Platforms of Interest

Computational Social Networks

Focus on common principles, algorithms and tools that govern network structures/topologies, network functionalities, security and privacy, network behaviors, information diffusions and influence, social recommendation systems which are applicable to all types of social networks and social media. Topics include (but are not limited to) the following:

  • Social network design and architecture
  • Mathematical modeling and analysis
  • Real-world complex networks
  • Information retrieval in social contexts, political analysts
  • Network structure analysis
  • Network dynamics optimization
  • Complex network robustness and vulnerability
  • Information diffusion models and analysis
  • Security and privacy
  • Searching in complex networks
  • Efficient algorithms
  • Network behaviors
  • Trust and reputation
  • Social Influence
  • Social Recommendation
  • Social media analysis
  • Big data analysis on online social networks

Journal of Big Data

The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems.

Open article collectionRead More »

Comparing Water and Social Data Quality


Quality is the standard of something as measured against other things of a similar kind; the degree of excellence of something.

Pollution is the presence/introduction of a substance which has harmful or poisonous effects in an environment. Pollution can loosely be defined as the deterioration of an existing state.

Water Pollution

Understanding the source of pollution is necessary for eliminating, minimising, reusing or treating their negative effect on the environment.


Classification of the sources of water pollution (Water Quality control Handbook)

Water Quality

Maintaining high water quality or keeping water systems safe is a continuous real-world challenge.  

The key to a successful water quality system in today’s environment is using established parameters to measure change over time at varying locations in the network on a continuous basis. These measurement parameters include pH, conductivity, free chloride, monochloramine, dissolved oxygen, ammonium, turbidity, fluoride, ozone, temperature.  Read more

Read More »