Quality and Credibility Metrics of Online Entities: Academic Review

Quality Vs Credibility of Online Entities

Credibility

The quality of being believable or worthy of trust.

Credibility challenges according to Stanford Web Credibility Research:

What causes people to believe (or not believe) what they find on the Web?
What strategies do users employ in evaluating the credibility of online sources?
What contextual and design factors influence these assessments and strategies?
How and why are credibility evaluation processes on the Web different from those made in face-to-face human interaction, or in other offline contexts?

P.h.D Thesis: How Do People Evaluate a Web Site’s Credibility?

Quality

The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.

degree of excellence or fitness for use

Is quality an indicator of credibility or vice versa?

User Generated Content: How Good is It? Slide

How can we estimate the quality of UGC?
- Directly evaluate the quality.
  - What are the elements of social media that can be used to facilitate automated discovery of high-quality content?
  - What is the utility of links between items, quality rating from members of the community, and other non-content information to the task of estimating the quality of UGC?
    - How are these different factors related?
    - Is content alone enough for identifying high-quality items?
    - Can community feedback approximate judgments of specialists?
- In this work, the authors used a judged question/answer collection where good questions usually have good answers to model a classifier to predict good questions and good answers, obtaining an AUC (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively.
  - The drawback is that the quality gap is balanced by volume. The larger the volume of the UGC, the lower difficult the quality evaluation.
- Obtaining indirect evidence of the quality.
  - use UGC for a given task and then evaluate the quality of the task results.
  - evaluation of the quality of extraction of semantic relations using the Open Directory Project (ODP). Precision of over 60%.
- Crossing different UGC sources and infer from there the quality of those sources.
  - using collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70% of the tags can be semantically classified by using Wordnet and Wikipedia.

The Online Entities Quality Challenge

Entities: social media platforms (Facebook, Twitter, ….) or information systems, and information or contents on the internet (articles: posts, comments, …).

The advent and openness of online social media platforms often leaves them highly susceptible to abuse by suspicious entities. It therefore becomes increasingly important to automatically identify these suspicious entities and mitigate/eliminate their threats.

Anomaly Detection on Social Data: P.h.D Thesis

The rapid growth of the Internet and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems.

inability of Search Engines to wade through the vast expanse of questionable content and return “quality” results to a user’s query

Developing a Framework for Assessing Information Quality on the World Wide Web

Fundamental Definitions

“Data Quality” is described as data that is “Fit-for-use”: data considered appropriate for one use may not possess sufficient attributes for another use!

Common Dimensions of Information or Data Quality

Accuracy: extent to which data are correct, reliable and certified free of error
Consistency: extent to which information is presented in the same format and compatible with previous data
Security: extent to which access to information is restricted appropriately to maintain its security
Timeliness: extent to which the information is sufficiently up-to-date for the task at hand
Completeness: extent to which information is not missing and is of sufficient breadth and depth for the task at hand
Concise: extent to which information is compactly represented without being overwhelming (i.e. brief in presentation, yet complete and to the point)
Reliability: extent to which information is correct and reliable
Accessibility: extent to which information is available, or easily and quickly retrievable
Availability: extent to which information is physically accessible
Objectivity: extent to which information is unbiased, unprejudiced and impartial
Relevancy: extent to which information is applicable and helpful for the task at hand
Useability: extent to which information is clear and easily used
Understandability: extent to which data are clear without ambiguity and easily comprehended
Amount of data: extent to which the quantity or volume of available data is appropriate
Believability: extent to which information is regarded as true and credible
Navigation: extent to which data are easily found and linked to
Reputation: extent to which information is highly regarded in terms of source or content
Useful: extent to which information is applicable and helpful for the task at hand
Efficiency: extent to which data are able to quickly meet the information needs for the task at hand
Value-Added: extent to which information is beneficial, provides advantages from its use

These attributes of data quality can vary depending on the context in which the data is to be used.

Defining what Information Quality means in the context of Search Engines will depend greatly on whether dimensions are being identified for the producers of information, the storage and maintenance systems used for information, or for the searchers and users of information.

Consider the information user, quality dimensions of their interest include relevancy and usefulness. These dimensions are enormously important but extremely difficult to gauge.

Developing a Framework for Assessing Information Quality on the World Wide Web

Quality Metrics

Metrics for IQ in Information Retrieval

Metrics that can assess IQ and can be deployed in Search engines

Six quality metrics: currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness.
Factual density measure: a simple statistical quality measure that is based on facts extracted from Web content. Evaluated on Wikipedia articles.
Seven Wikipedia IQ metrics

Big Data Analytics Hub

Big data: research and practice

Quality and Credibility Metrics of Online Entities: Academic Review

Quality Vs Credibility of Online Entities

Credibility

Quality

The Online Entities Quality Challenge

Fundamental Definitions

Common Dimensions of Information or Data Quality

Quality Metrics

Metrics for IQ in Information Retrieval

Quality Datasets

Leave a comment Cancel reply

Quality Vs Credibility of Online Entities

Credibility

Quality

The Online Entities Quality Challenge

Fundamental Definitions

Common Dimensions of Information or Data Quality

Quality Metrics

Metrics for IQ in Information Retrieval

Quality Datasets

Share this:

Leave a comment Cancel reply