Quality is defined in Wikipedia as the standard of something as measured against other things of a similar kind; the degree of excellence of something.
Today’s information and data pools on the Web focus on the quantity of information rather than its quality.
The assessment of the quality of information is especially important because decisions are often based on information from multiple and sometimes unknown sources, though, the reliability and accuracy of the information is questionable.
However, the web lacks quality dependent filter mechanisms, automatic identification of misuse patterns, as well as tools to establish user trust in information and authors.
Hence the need to develop mechanisms for estimating the quality of textual Web documents and to evaluate these mechanisms for their effectiveness and efficiency.
My entities of interest include Websites, news feeds, social media feeds, digital adverts, and other Web articles. Other important entities include air, water, soil, and life.
Large scale machine learning is playing an increasingly important role in improving the quality and monetisation of Internet properties. A small number of techniques, such as regression, have proven to be widely applicable across Internet properties and applications.
A few hundreds of millions of times a day people will ask Google questions, and within a fraction of a second Google needs to decide which among the billions of pages on the web to show them — and in what order.
Users want the answer, not trillions of webpages.
Our goal is simple: to give people the most relevant answers to their queries as quickly as possible.
Websites
What are the properties that influence the overall rank, quality, and importance of a Website?
Every day, millions of useless spam pages are created. Every week, over 10 million users encounter harmful websites that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software.The compromised site remains a problem that needs to be fixed.
Helping webmasters re-secure their sites
Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension
Spam sites attempt to game their way to the top of search results through techniques like repeating keywords over and over, buying links that pass PageRank or putting invisible text on the screen.
This is bad for search because relevant websites get buried, and it’s bad for legitimate website owners because their sites become harder to find.
The good news is that Google’s algorithms can detect the vast majority of spam and demote it automatically.
Google fight spam through a combination of computer algorithms and manual review.
ECML/PKDD 2010 Discovery Challenge
High quality is not simply the opposite of Web Spam (any deliberate action that is meant to trigger an unjustifiably favorable [ranking], considering the page’s true value). There are other various and often subjective aspects.
The goal of the challenge is to develop an automatic site-level classifiers including aspects such as trustworthiness, authoritativeness, neutrality, etc. as well as genre classification (editorial, news, commercial, educational, Web spam and more).
The dataset is a large collection (23 million pages in 99,000 hosts in the .EU domain) of annotated Web hosts.
The dataset is composed of Training labels, URLs and hyperlinks, Content-based and link-based Web spam features, Term frequencies, and Natural Language Processing features, all in one: v2-all_in_one.tgz.
Links
- Web Information Quality Evaluation Initiative
- Using Term Frequency Analysis to Measure Your Content Quality
- Introduction to Google Search Quality
- Finding more high-quality sites in Google search
- Hide sites to find more of what you want
- High-quality sites algorithm goes global, incorporates user feedback
- Google Quality Guidelines for Websites
- Google Search Quality Guidelines Now Reward Expertise, Authority, Trust
- Updating Our Search Quality Rating Guidelines
- Finding more mobile-friendly search results
- How Google Search Works
- Google Search Projects and Algorithms
- Fighting Spam: Google Inside Search
- Web Spam Challenge
- Discovery Challenge 2010
- People: Flavio Figueiredo, Guang-Gang Geng
Web articles
How good is Web data (Wikipedia articles, blog articles, e.t.c.)?
Important dimensions of data quality include accuracy, completeness, freshness, and consistency. Web data users will be more interested in accuracy and completeness of Wikipedia articles while freshness in addition to accuracy and completeness is vital for news articles.
Wikipedia articles
Given the daily increase in the amount of data on the Web, machine-based assessment of Information Quality (IQ) is becoming a topic of enormous interest.
The three main research lines related to IQ in Wikipedia include featured articles identification, development of quality measurement metrics, and quality flaws detection.
Blog articles
Blogs serve multiple purposes, resulting in several types of blogs that vary greatly in terms of quality and content. There is a need to build automatic quality blog identification system for the purpose of assisting web users and information specialists to identify quality blogs.
Links
- Web article quality ranking based on web community knowledge
- Probabilistic ranking of web article quality based on evolution patterns
- Ranking Wikipedia article’s data quality by learning dimension distributions
- Quality vs. Quantity: A 6-Month Analysis of the Age-Old Blogging Debate
Patents
Online Adverts (Ads)
An ad is a slang or a short name for an advert which refers to something (short film, notice, image, etc.) shown/presented to the public to help sell a product or to make an announcement.
Mobile advertising is a rapidly growing industry that supports publishers worldwide- eMarketer tipping mobile ad spend to exceed $100 billion in 2016.
Popular Ad platforms: Social Media, Newspaper, Billboards, TV, Radio, Mobile Phones, Landlines, Emails, Mail Posts.
Adverts can be annoying, low-quality adverts are capable of hogging data and disrupting the user experience. Research Problem!
- Mobile advertising is yet to find the right balance between quality and user enjoyment- Mobile ads are going to get more annoying. “A good example of this was a notorious blunder in the children’s game Talking Tom Cat. Monetising in part through in-app purchases but also through adverts, the game hit the headlines when an advert for a payday loan company appeared in a banner at the bottom of the app“.
- Compared to the areas where TV advertising has excelled, mobile advertising has struggled to raise standards. These areas include:
- placement and format of the advert: intrusive banners and interstitials obscuring the screen and breaking up the user interface
- poorly targeted (irrelevant to the user): mobile advertisers regularly reach out to loosely defined audiences
- untrustworthy and dangerous to interact: 34% of programmatic traffic could be fraudulent and the practice of “click spamming” becoming increasingly commonplace
- Compared to the areas where TV advertising has excelled, mobile advertising has struggled to raise standards. These areas include:
Despite our occasional annoyance, advertising has become an integral part of media precisely because we tolerate being shown relevant adverts that cater to our interests- Why Publishers Should Care About Ad Quality.
Research on most annoying TV ads based on 1600 votes obtained the following visualisation:
The most annoying adverts from the past 15 years
Annoying features: repetitive jingles, gender or nationality stereotyping, and patronising tone.
- Advertisers assume brand awareness is the key to making consumers purchase. Research, however, suggests that advertising makes a stronger emotional and behavioural impact when consumers are paying less conscious attention to them- Dr. Haiming Hang
- If consumers are annoyed because they feel an ad is not representing them or is in poor taste then with the power of social media they can let the brand and the world know- Dr. Natascha Radclyffe-Thomas
Existing Solutions
- Use of ad-blocker (end user) to optimize (make it +ive) mobile experience (online shopping, research or communicating on social media) e.g. Adblock Plus. Ad-blockers:
- protect users by hiding intrusive pop-ups and banners
- force marketers to turn to higher quality, more efficient ads that can be viewed more positively by users.
- Upping the Ad quality (publishers): raising advertising quality may likely improve the revenue performance and the reputation of mobile advertising.
- taking inspiration from television by building better quality ad formats and placing them naturally within apps
- publishers need to work with advertising partners to operate tighter guidelines aimed at improving adverts quality, vetting adverts to ensure they meet design guidelines, and filtering out fraudsters
- it’s up to publishers to convince powerful mediation platforms to heighten creative transparency and introduce detective measures for weeding out unwanted ads- Cleaning Up Ad Quality
- publishers need to make sure they are showing relevant adverts to their users without annoying them- e.g. Google Ads options and Microsoft Ads options.
Links
- What Is Mobile Native Advertising?
- Why Publishers Should Care About Ad Quality
- Why Mobile Native Offers an Adblock Solution
- Native Meets Twitter Ads: A Match Made in Heaven
- Ad tech is killing the online experience
- Safari Content Blocker, Before, and After
- 5 Things You Need to Know about Instagram Native Ads
- How Apple’s new ad-blocker could save the media (maybe)
- The most annoying adverts from the past 15 years
- Worst Adverts
- Quality Score: Google, Yahoo and Bing on Wikipedia
- Quality Score: What Is Quality Score & How Does it Affect PPC?
- How Do You Measure The Quality Of Digital Ads?
- ad quality score
- Display Advertising Challenge: Kaggle/Criteo
- Fraud Bots Mess Up Your Big Data
- Are these Ads Safe: Detecting Hidden Attacks through the Mobile App-Web Interfaces
- Quantcast
- Dr. Augustine Fou
News and Social Media Feeds
Facebook News Feed
Can news feed on mobile or Web platforms be shown to feed readers in the order they want to read them? According to Facebook, there are on average 1,500 potential stories (from friends and Pages) for people to see every time they visit Facebook News Feed, and most people don’t have enough time to see them all.
With so many stories, there is a good chance that users would miss updates they want to see if News Feeds are displayed in a continuous, unranked stream of information. Research Problem!
The goal of News Feed is to deliver the right content to the right people at the right time so they don’t miss the stories that are important to them- Facebook.
Existing Solutions
News Feeds can be ranked based on how users interact with it or in chronological order.
- Research by Facebook has shown that the number of stories people read, like and comment decrease when News Feed ranking is just a chronological order.
News Feed algorithm responds to the following signals from users:
- How often a user interact with friends, Pages, or public figures (like an actor or journalist) who posted
- The number of likes, shares, and comments a post receives in total and from the users friends in particular
- How often the user interacted with this type of post in the past
- Whether or not the user and other users are hiding or reporting a given post
Organic stories that users were not able to scroll down far enough to see can reappear near the top of News Feed if the stories are still getting lots of likes and comments.
A better way to surface older stories
Links
- Facebook’s News Feed
- A Window Into News Feed
- Showing More High-Quality Content
- Who Controls Your Facebook Feed
- Facebook Is Testing Topic-Based Feeds
- Facebook’s New Secret Sauce
- More Articles You Want to Spend Time Viewing
- Facebook is testing multiple news feeds on mobile
- Facebook Adverts Basics
- Facebook Learning Hub
Twitter News Feed
Links
- The Twitter Advertising Blog
- Meet the algorithm that can spot and kill Twitterbots before they ever start spamming
- Anomaly Detection on Social Data
Online Health Information
Inappropriate health information can lead people away from evidence-based healthcare. Low-quality health information on the Web can have serious consequences for public health and healthcare services.
Existing Solutions
Existing approaches to measuring information quality (IQ) include the following: using Journal of the American Medical Association (JAMA) score, and Health-on-the-net (HON) criteria. Both JAMA score and HON criteria measure information quality (IQ) in terms of the presence of explicit metadata (such as authorship, ownership, and currency) or broad textual criteria such as readability, with the primary aim of assessing reliability and trustworthiness.
An open research in this area is to identify other useful dimensions of IQ based on a more detailed analysis of the text content of the pages, using techniques of Natural Language Processing (NLP). Such measures might range from relatively superficial analysis of text style or sentiment to deeper ‘understanding’ of the scientific basis of the information provided, particularly in respect to the type of interventions (therapeutic or preventative) presented to the reader-Advert.
Links