Commons:Structured data/Media search

Background edit

The Structured Data team has built a new user interface for search on Commons that aims to improve search in the following ways:

  • An image-focused user interface that makes it easier to find what you're looking for and discover new things.
  • An improved set of search results that utilizes structured data and is more language-agnostic.
  • An autocomplete for search terms that leverages Wikidata's vast repository of labels & aliases.

This new search runs queries that include haswbstatement & incaption structured data, expand on search terms based on wikidata labels, and further expand based on categories and text-based information. It displays images in a user-friendly grid view and differentiates between image, audio, and video results. Media Search is capable of supporting any language served by MediaWiki.

A basic overview of how ranking works in MediaSearch is available at Help:CirrusSearch/MediaSearch.

You can use it at Special:MediaSearch.

A/B test results of MediaSearch from September 2020 are available here.

Latest updates (April 2021) edit

  • Special:MediaSearch is now the default search landing page for anonymous and logged-out users. The change for logged-in users will take place in May 2021. Anonymous and logged-out users will still have access to Special:Search via link in the Media Search interface. When released to logged-in users, contributors will have access to both the link in the interface as well as a preference to keep Special:Search as the default landing page if that's their choice.
  • Users now have the ability to filter by namespace in the Categories and Pages tab (T273073)

Previous updates edit

February 2021 edit

  • The MediaSearch backend now powers image search in Visual Editor, enabling contributors who are using Visual Editor to look for images from Commons to add to articles to see results that are more comprehensive, accurate, and language-agnostic.
  • The "did you mean" feature is now available (T260292)
  • The total number of search results is now visible (T269383)
Still to release
  • Ability to filter by namespace in the Categories and Pages tab (T273073)
  • Display of namespace and snippet information in the Categories and Pages tab (T271174)
  • Support for phrase search (T268236)
  • Expansion of search terms using Wikidata aliases and labels ([[phab:T258053|T258053]

Licensing filters (December 2020) edit

Licensing filters are available. These filters help users find files based on reuse and attribution permissions:

  • All licenses
  • Use with attribution
  • Use with attribution and same license
  • No restrictions
  • Other

New features are available (September 2020) edit

Two new features from the July update are live in Special:MediaSearch as a result of the prototyping, Filters and Quick View. The third feature, Concept Chips, will be available in early October.

Media Search also now has a tab available for "Categories and Pages" where users can find text-based search results from namespaces beyond files.

Media Search works in any language supported by MediaWiki and does not require knowledge of English.

Quick View
 

Quick View allows users to see more information about the media file without leaving the search page. File details include author, uploader, creation date, file type, licensing, among others.

Filters
 

Filters can help sort through media based on some desired specifications. This release includes some basic filters. A more advanced filter, licensing, will be released in the near future.

  • File size (small, medium, large)
  • File type (.tiff, .png, .gif, .jpeg, .webp, .xcf, .svg)
  • Sort (relevance, recency, popularity)
Feedback for the new features

The structured data development team would like to know what you think about Special:MediaSearch with these new features. Are there potential workflow areas that were missed? Is anything particularly unclear in navigating or using the search? Are there other feature ideas that might be useful? Please visit the talk page to leave your comments.

Known issues being fixed
Still to release
  • Licensing filter (T257938)
  • Concept Chips (T256431)
  • Other tab (for listing .stl, .djvu, and .pdf file types) (T257699)
  • Link generation and embed code (T261699)
Still to document
  • How search is ranking and prioritizing results. This is still changing regularly and the final results will be published.

New features coming this month (8 September 2020) edit

The features mentioned below should be deployed here on Commons later this month. This space will be updated with more information as the release becomes more firm.

"Other" and "Category and Pages" tabs (24 July 2020) edit

Other
 

Currently in the media search prototype, we aren’t surfacing any file type results that aren’t images, audio files, video files. We’d like to consider adding a tab named “Other” to surface the remaining file types (pdf, djvu, and stl files).

Phabricator task for "Other" tab: T257699

Categories and Pages
 

There also isn't a location for pages and other text based search results to appear. We want to expand upon the Categories tab and make it the Categories and Pages tab to include any other text based page results including: Category pages, Commons pages, Help pages, Creator pages, Institution pages, and Talk pages (Talk, User talk, Commons talk, File talk, MediaWiki talk, Template talk, Help talk, Category talk, Creator talk, TimedText talk, Sequence talk, Institution talk, Campaign talk, Data talk, GWToolset talk, Module talk, Translation talk, Gadget talk and Gadget definition talk).

Phabricator task for "Categories and Pages" tab: T257700

Vue.js and the next three weeks or so (13 July 2020) edit

MediaSearch is being ported to a new software library, vue.js. During the next few weeks while the port takes place, three features will briefly be removed from MediaSearch that will be restored when the port is complete:

  • Autocomplete–this will return stand-alone.
  • Audio/video playback–this will return as part of the Quick View feature.
  • Filters–these will return as part of the new Filters feature.

First prototype feedback results (2 July 2020) edit

After reviewing comments and conducting user research, the Structured Data team is exploring a handful of changes to improve your experience. We are still exploring the feasibility of these improvements, but hope to start implementing updates as we continue to receive more feedback from the community.

Filters edit

 

In the prototype the only filter provided was “Media Size.” We would like to expand upon that by adding additional ways to filter and help users narrow down their search as needed: License — We hope to help both new and experienced users filter results by showing the following license types:

  • Use with attribution (this is everything with the CC-BY license)
  • Use with attribution and same license (this is everything with the CC BY-SA license)
  • No restrictions (this is everything that’s either CC0 or in the public domain)
  • Other (everything else)

File type — Based on the type of media you are searching, we can help narrow down your search by showing all possible file types related to that specific media type:

  • For example, you might see file names like .svg, .jpg, .png, etc. when searching for images

Sort by — This function will help users sort results by:

  • Relevancy (The current default)
  • Most viewed
  • Most recent
  • Filter type

Phabricator task for filters: T256160.

Quick View edit

 

It became very apparent that having to load an entirely new page to see more information was slowing users down and making the search experience tedious.

We considered the obvious solution of showing more information below each file, or even on hover, but we discovered issues with performance and maintaining UI consistency with metadata that varied in quality and availability. We are thinking of trying an intermediate step that we are calling the “Quick View” that would allow you to see more information about an image without leaving the page. This will also show a larger thumbnail of the file before loading results in a new page. This solution allows us to keep the page load times efficient because we only retrieve complete information one at a time.

Phabricator task for Quick View: T256158

Concept chips edit

 
Concept chips are at the top of the search results.

In order to improve search, the team is exploring a feature called "concept chips." Concept chips are a group of related and specific queries that aid in discovering additional media. A challenge with wikitext and Wikidata-reliant search is that a lot of relational data is incomplete or not yet inter-linked. Concept chips may help alleviate this challenge in finding related media in a relational way using Wikidata.

Some things to be considered: Concept chips may not have images or explanatory text. Concept chips may be limited to a specific set of search term types.

Phabricator task for concept chips: T256431

Next steps edit

The team plans on building these into the prototype for the next round of testing and feedback. We'll let the community know when the next version is ready with these changes.

Previous updates edit

May 25, 2020
  • Integrating the prototype with CirrusSearch
  • This will hopefully come with improved performance and integrated basic search syntax
  • Improving Wikidata autocomplete suggestions
  • Design research is ongoing with the current prototype. Based on the outcome, we will plan future changes to the UI and functionality.

Feedback edit

User feedback is on the talk page. There's also been some other feedback-gathering done: