Quality Vs Credibility of Online Entities
The quality of being believable or worthy of trust.
Credibility challenges according to Stanford Web Credibility Research:
- What causes people to believe (or not believe) what they find on the Web?
- What strategies do users employ in evaluating the credibility of online sources?
- What contextual and design factors influence these assessments and strategies?
- How and why are credibility evaluation processes on the Web different from those made in face-to-face human interaction, or in other offline contexts?
The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.
- degree of excellence or fitness for use
Is quality an indicator of credibility or vice versa?
- How can we estimate the quality of UGC?
- Directly evaluate the quality.
- What are the elements of social media that can be used to facilitate automated discovery of high-quality content?
- What is the utility of links between items, quality rating from members of the community, and other non-content information to the task of estimating the quality of UGC?
- How are these different factors related?
- Is content alone enough for identifying high-quality items?
- Can community feedback approximate judgments of specialists?
- In this work, the authors used a judged question/answer collection where good questions usually have good answers to model a classifier to predict good questions and good answers, obtaining an AUC (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively.
- The drawback is that the quality gap is balanced by volume. The larger the volume of the UGC, the lower difficult the quality evaluation.
- Obtaining indirect evidence of the quality.
- use UGC for a given task and then evaluate the quality of the task results.
- evaluation of the quality of extraction of semantic relations using the Open Directory Project (ODP). Precision of over 60%.
- Crossing different UGC sources and infer from there the quality of those sources.
- using collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70% of the tags can be semantically classified by using Wordnet and Wikipedia.
- Directly evaluate the quality.
The Online Entities Quality Challenge
Entities: social media platforms (Facebook, Twitter, ….) or information systems, and information or contents on the internet (articles: posts, comments, …).
The advent and openness of online social media platforms often leaves them highly susceptible to abuse by suspicious entities. It therefore becomes increasingly important to automatically identify these suspicious entities and mitigate/eliminate their threats.
The rapid growth of the Internet and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems.
- inability of Search Engines to wade through the vast expanse of questionable content and return “quality” results to a user’s query
“Data Quality” is described as data that is “Fit-for-use”: data considered appropriate for one use may not possess sufficient attributes for another use!
Common Dimensions of Information or Data Quality
- Accuracy: extent to which data are correct, reliable and certified free of error
- Consistency: extent to which information is presented in the same format and compatible with previous data
- Security: extent to which access to information is restricted appropriately to maintain its security
- Timeliness: extent to which the information is sufficiently up-to-date for the task at hand
- Completeness: extent to which information is not missing and is of sufficient breadth and depth for the task at hand
- Concise: extent to which information is compactly represented without being overwhelming (i.e. brief in presentation, yet complete and to the point)
- Reliability: extent to which information is correct and reliable
- Accessibility: extent to which information is available, or easily and quickly retrievable
- Availability: extent to which information is physically accessible
- Objectivity: extent to which information is unbiased, unprejudiced and impartial
- Relevancy: extent to which information is applicable and helpful for the task at hand
- Useability: extent to which information is clear and easily used
- Understandability: extent to which data are clear without ambiguity and easily comprehended
- Amount of data: extent to which the quantity or volume of available data is appropriate
- Believability: extent to which information is regarded as true and credible
- Navigation: extent to which data are easily found and linked to
- Reputation: extent to which information is highly regarded in terms of source or content
- Useful: extent to which information is applicable and helpful for the task at hand
- Efficiency: extent to which data are able to quickly meet the information needs for the task at hand
- Value-Added: extent to which information is beneficial, provides advantages from its use
These attributes of data quality can vary depending on the context in which the data is to be used.
Defining what Information Quality means in the context of Search Engines will depend greatly on whether dimensions are being identified for the producers of information, the storage and maintenance systems used for information, or for the searchers and users of information.
- Consider the information user, quality dimensions of their interest include relevancy and usefulness. These dimensions are enormously important but extremely difficult to gauge.
Metrics for IQ in Information Retrieval
Metrics that can assess IQ and can be deployed in Search engines
- Six quality metrics: currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness.
- Factual density measure: a simple statistical quality measure that is based on facts extracted from Web content. Evaluated on Wikipedia articles.
- Seven Wikipedia IQ metrics