1
00:00:00,000 --> 00:00:02,651
ESSA LEGENDA AINDA ESTÁ EM CONSTRUÇÃO
SE PUDER AJUDAR TRADUZA OS TERMOS ABAIXO
2
00:00:02,651 --> 00:00:03,900
ASAF BARTOV: Testando, Testando.
3
00:00:03,900 --> 00:00:10,036
4
00:00:10,036 --> 00:00:12,640
Dá para ouvir no salão?
5
00:00:12,640 --> 00:00:15,190
6
00:00:15,190 --> 00:00:15,690
Testando.
7
00:00:15,690 --> 00:00:22,620
8
00:00:22,620 --> 00:00:24,930
Olá a todo.
9
00:00:24,930 --> 00:00:29,460
Essa é uma introdução
leve para o Wikidata
10
00:00:29,460 --> 00:00:31,922
para iniciantes.
11
00:00:31,922 --> 00:00:34,130
If you're an absolute
beginner, if you've never heard
12
00:00:34,130 --> 00:00:38,210
of Wikidata, or if you've heard
of Wikidata but don't quite get
13
00:00:38,210 --> 00:00:41,360
it, don't know what it's
good for, have only used it
14
00:00:41,360 --> 00:00:43,880
for inter-wiki links--
15
00:00:43,880 --> 00:00:46,247
if you're anywhere
on this range,
16
00:00:46,247 --> 00:00:47,330
you're in the right place.
17
00:00:47,330 --> 00:00:50,990
18
00:00:50,990 --> 00:00:52,040
My name is Asaf Bartov.
19
00:00:52,040 --> 00:00:54,590
I work for the
Wikimedia Foundation,
20
00:00:54,590 --> 00:00:59,790
and I am a Wikidata enthusiast.
21
00:00:59,790 --> 00:01:05,620
So the first thing I want to
say is that you are lucky.
22
00:01:05,620 --> 00:01:10,540
You are lucky because
Wikidata is already
23
00:01:10,540 --> 00:01:15,415
and is quickly becoming even
more of an important research
24
00:01:15,415 --> 00:01:21,730
tool for anyone who's
trying to ask questions
25
00:01:21,730 --> 00:01:25,030
about large amounts
of information.
26
00:01:25,030 --> 00:01:29,770
It will become more and more
used across the humanities,
27
00:01:29,770 --> 00:01:33,460
in particular, because of the
things that it's able to do,
28
00:01:33,460 --> 00:01:37,090
some of which we will
demonstrate shortly.
29
00:01:37,090 --> 00:01:40,750
And you are lucky because you
get to find out about it now
30
00:01:40,750 --> 00:01:43,400
before most of the world.
31
00:01:43,400 --> 00:01:49,120
So by the end of this talk,
you will be a Wikidata hipster
32
00:01:49,120 --> 00:01:51,250
because you'll be
able to say, oh yeah.
33
00:01:51,250 --> 00:01:53,470
I knew about Wikidata
before it was cool.
34
00:01:53,470 --> 00:01:56,090
35
00:01:56,090 --> 00:02:00,370
So before we actually
visit Wikidata,
36
00:02:00,370 --> 00:02:08,620
I want to share two key problems
that Wikidata seeks to solve
37
00:02:08,620 --> 00:02:12,940
and which would help us
understand why it exists.
38
00:02:12,940 --> 00:02:17,640
The first problem is that
have of dated data, that
39
00:02:17,640 --> 00:02:20,880
is data that is out of date.
40
00:02:20,880 --> 00:02:23,960
And this is apparent
on Wikipedia
41
00:02:23,960 --> 00:02:27,870
across our free
knowledge encyclopedias.
42
00:02:27,870 --> 00:02:32,160
Data on Wikipedia is
not always up to date.
43
00:02:32,160 --> 00:02:37,470
And the more obscure
it is, the more likely
44
00:02:37,470 --> 00:02:40,280
it is not to be up to date.
45
00:02:40,280 --> 00:02:49,360
So the Polish Wikipedia may have
an article about a small town
46
00:02:49,360 --> 00:02:55,480
in Argentina, and that article
will include information
47
00:02:55,480 --> 00:03:00,910
about that town like population
size, name of the mayor.
48
00:03:00,910 --> 00:03:04,580
And that information,
ideally, was
49
00:03:04,580 --> 00:03:08,540
correct at the time the article
was created on the Polish
50
00:03:08,540 --> 00:03:10,370
Wikipedia--
51
00:03:10,370 --> 00:03:13,760
maybe translated
from another wiki.
52
00:03:13,760 --> 00:03:17,900
But then how likely is
it to be kept up to date?
53
00:03:17,900 --> 00:03:20,960
How likely is it that the
Polish Wikipedia would give us
54
00:03:20,960 --> 00:03:25,880
the correct and latest numbers
or data about the population
55
00:03:25,880 --> 00:03:28,370
size of that town
or the mayor, right?
56
00:03:28,370 --> 00:03:31,720
So this is the kind of data
that does go out of date, right?
57
00:03:31,720 --> 00:03:34,250
Every few years--
five, 10 years--
58
00:03:34,250 --> 00:03:37,850
there is a census, and now there
are new population figures.
59
00:03:37,850 --> 00:03:42,440
Now the census in Argentina will
be made available in Argentina
60
00:03:42,440 --> 00:03:45,500
in Spanish, probably,
which brings us
61
00:03:45,500 --> 00:03:48,710
to another component of the
problem of dated data, which
62
00:03:48,710 --> 00:03:53,810
is there are no obvious
triggers for updating the data.
63
00:03:53,810 --> 00:03:58,520
So the Polish Wikipedian
is not sent an email
64
00:03:58,520 --> 00:04:00,680
by the Argentinean
government saying, hey,
65
00:04:00,680 --> 00:04:01,820
we have a new census.
66
00:04:01,820 --> 00:04:05,420
There are new population numbers
for you to update on Wikipedia.
67
00:04:05,420 --> 00:04:07,550
No such email is sent.
68
00:04:07,550 --> 00:04:10,146
So it's kind of
hard to notice when.
69
00:04:10,146 --> 00:04:12,770
And of course, multiply that by
all the different jurisdictions
70
00:04:12,770 --> 00:04:14,670
around the world.
71
00:04:14,670 --> 00:04:16,610
There's no easy
way and notice when
72
00:04:16,610 --> 00:04:17,790
your data goes out of date.
73
00:04:17,790 --> 00:04:20,620
74
00:04:20,620 --> 00:04:24,070
So that's difficult
to keep up to date.
75
00:04:24,070 --> 00:04:27,940
And even if we were to receive
some kind of indication--
76
00:04:27,940 --> 00:04:31,310
oh, there's a new
census in Argentina,
77
00:04:31,310 --> 00:04:33,100
so a whole bunch of
population figures
78
00:04:33,100 --> 00:04:34,960
have now gone out of date.
79
00:04:34,960 --> 00:04:37,240
Updating it on the
Polish Wikipedia
80
00:04:37,240 --> 00:04:40,090
and the French Wikipedia
and the Indonesian Wikipedia
81
00:04:40,090 --> 00:04:44,920
and the Arabic Wikipedia is a
whole bunch of repetitive work
82
00:04:44,920 --> 00:04:46,540
that a lot of
different volunteers
83
00:04:46,540 --> 00:04:49,900
will need to do just for
that one updated piece
84
00:04:49,900 --> 00:04:54,810
of information about Argentina.
85
00:04:54,810 --> 00:04:57,720
So I hope this is
clear and resonates
86
00:04:57,720 --> 00:05:01,920
with some of your experience
editing Wikipedia--
87
00:05:01,920 --> 00:05:04,170
data that is out of
date or that needs
88
00:05:04,170 --> 00:05:08,640
to be updated
manually, menially,
89
00:05:08,640 --> 00:05:16,190
on a fairly frequent schedule
across the different countries
90
00:05:16,190 --> 00:05:18,410
and data sources.
91
00:05:18,410 --> 00:05:22,340
The other-- and I think
maybe more interesting--
92
00:05:22,340 --> 00:05:26,210
shortcoming or problem
that I want to discuss
93
00:05:26,210 --> 00:05:30,260
is what I call the
inflexible ways
94
00:05:30,260 --> 00:05:36,020
of lateral queries, crosscutting
queries of knowledge.
95
00:05:36,020 --> 00:05:43,980
So if I want an answer to
the question, what countries
96
00:05:43,980 --> 00:05:48,740
in the world export rubber--
97
00:05:48,740 --> 00:05:52,300
98
00:05:52,300 --> 00:05:54,790
that's a reasonable
question, right?
99
00:05:54,790 --> 00:05:57,460
That information
is on Wikipedia.
100
00:05:57,460 --> 00:05:58,630
Do you agree?
101
00:05:58,630 --> 00:06:00,640
If you go to
Wikipedia and read up
102
00:06:00,640 --> 00:06:05,560
about Brazil, about Peru, about
Germany, somewhere in there--
103
00:06:05,560 --> 00:06:09,010
maybe a sub-article called
Economics of Brazil--
104
00:06:09,010 --> 00:06:13,600
you will find the main
exports of that country.
105
00:06:13,600 --> 00:06:15,400
And you can find
out whether or not
106
00:06:15,400 --> 00:06:16,930
that country exports rubber.
107
00:06:16,930 --> 00:06:19,994
But what if I don't want
to go country by country
108
00:06:19,994 --> 00:06:21,160
looking for the word rubber?
109
00:06:21,160 --> 00:06:22,090
I just want an answer.
110
00:06:22,090 --> 00:06:25,540
What are the countries
that export rubber?
111
00:06:25,540 --> 00:06:28,360
Even though that
information is in Wikipedia,
112
00:06:28,360 --> 00:06:29,680
it's hard to get at.
113
00:06:29,680 --> 00:06:31,680
It's hard to query.
114
00:06:31,680 --> 00:06:35,770
Now, you may say, well, that's
what we have categories for,
115
00:06:35,770 --> 00:06:36,270
right?
116
00:06:36,270 --> 00:06:39,820
Categories are a way to
cut across Wikipedia.
117
00:06:39,820 --> 00:06:45,110
So if someone made a
category called rubber
118
00:06:45,110 --> 00:06:48,380
exporting countries, then
you can go to that category
119
00:06:48,380 --> 00:06:51,560
and see a list of countries
that export rubber.
120
00:06:51,560 --> 00:06:53,390
And if nobody has
made it yet, well, you
121
00:06:53,390 --> 00:06:56,990
can create that category and,
with a kind of one-time effort,
122
00:06:56,990 --> 00:06:59,730
populate that category,
and you're done.
123
00:06:59,730 --> 00:07:01,970
Well, yes.
124
00:07:01,970 --> 00:07:04,250
That's still not
very convenient.
125
00:07:04,250 --> 00:07:06,980
But also, it's still
very, very limited,
126
00:07:06,980 --> 00:07:12,380
because what if I only want
countries that export rubber
127
00:07:12,380 --> 00:07:15,950
and have a democratic
system of government,
128
00:07:15,950 --> 00:07:18,770
or any other kind of
additional condition
129
00:07:18,770 --> 00:07:20,510
that I would like
to add to this?
130
00:07:20,510 --> 00:07:22,230
Or take a completely
different example.
131
00:07:22,230 --> 00:07:26,750
What if I want to know
which Flemish town had
132
00:07:26,750 --> 00:07:31,510
the most painters born in it?
133
00:07:31,510 --> 00:07:34,480
There's a ton of
Flemish painters.
134
00:07:34,480 --> 00:07:37,870
Most of them were
born somewhere.
135
00:07:37,870 --> 00:07:39,685
We could theoretically,
just you know,
136
00:07:39,685 --> 00:07:43,900
look up all the birthplaces
of all the Flemish painters
137
00:07:43,900 --> 00:07:46,900
and tally up the
numbers and figure out
138
00:07:46,900 --> 00:07:51,610
what is the place where the
most Flemish painters come from?
139
00:07:51,610 --> 00:07:53,050
I don't know the answer to that.
140
00:07:53,050 --> 00:07:55,420
It would be nice to be
able to get that answer.
141
00:07:55,420 --> 00:07:57,610
Again, the data is in Wikipedia.
142
00:07:57,610 --> 00:08:00,400
Those birthplaces are
listed in the articles
143
00:08:00,400 --> 00:08:01,636
about those painters.
144
00:08:01,636 --> 00:08:05,710
But there's no easy way
to get that information.
145
00:08:05,710 --> 00:08:13,420
What if I want to ask, who are
some painters whose father was
146
00:08:13,420 --> 00:08:14,245
also a painter?
147
00:08:14,245 --> 00:08:16,840
148
00:08:16,840 --> 00:08:18,500
That's a thing
that exists, right?
149
00:08:18,500 --> 00:08:22,630
Some painters are
sons of painters.
150
00:08:22,630 --> 00:08:26,560
You know, Bruegel comes to
mind as an obvious example.
151
00:08:26,560 --> 00:08:28,240
But there's a bunch
of others, right?
152
00:08:28,240 --> 00:08:29,380
So who are those people?
153
00:08:29,380 --> 00:08:30,930
What if I want to
ask that question?
154
00:08:30,930 --> 00:08:33,400
That's the kind of question
that not only Wikipedia
155
00:08:33,400 --> 00:08:34,600
doesn't answer today.
156
00:08:34,600 --> 00:08:41,500
If you walk to your friendly
university library reference
157
00:08:41,500 --> 00:08:45,010
desk and say,
hello, I would like
158
00:08:45,010 --> 00:08:49,290
a list of painters whose
father was also a painter,
159
00:08:49,290 --> 00:08:52,820
how would that
librarian help you?
160
00:08:52,820 --> 00:08:57,960
There's no easy way to get an
answer to a question like that.
161
00:08:57,960 --> 00:09:01,100
What if you only want
a list of painters
162
00:09:01,100 --> 00:09:05,870
who were immigrants, painters
who lived somewhere else
163
00:09:05,870 --> 00:09:08,240
than where they were born?
164
00:09:08,240 --> 00:09:09,770
There's no book.
165
00:09:09,770 --> 00:09:11,720
I guess maybe there
is, but you know,
166
00:09:11,720 --> 00:09:15,590
it's not obvious that there's a
ready resource that says, list
167
00:09:15,590 --> 00:09:17,840
of painters who are immigrants.
168
00:09:17,840 --> 00:09:19,910
And the librarian would
probably refer you
169
00:09:19,910 --> 00:09:22,760
to a book on the shelf
called, I don't know,
170
00:09:22,760 --> 00:09:24,200
The Complete
Dictionary of Flemish
171
00:09:24,200 --> 00:09:26,300
Painters and go,
look up the index,
172
00:09:26,300 --> 00:09:28,520
you know, and if you
see a similar surname,
173
00:09:28,520 --> 00:09:29,910
maybe they're father and son.
174
00:09:29,910 --> 00:09:35,000
And kind of cobble together
the answer on your own.
175
00:09:35,000 --> 00:09:37,100
The reason I'm comparing
this to a library
176
00:09:37,100 --> 00:09:42,170
is to show you that this is a
kind of question that is not
177
00:09:42,170 --> 00:09:46,760
readily satisfiable today.
178
00:09:46,760 --> 00:09:50,240
Now, these questions may
sound contrived to you.
179
00:09:50,240 --> 00:09:52,460
You may say to
yourself, well, you
180
00:09:52,460 --> 00:09:54,860
know, painters who are also
sons of painters, yeah.
181
00:09:54,860 --> 00:09:57,680
You know, that
never occurred to me
182
00:09:57,680 --> 00:09:59,610
as a question I
might care about.
183
00:09:59,610 --> 00:10:01,850
But I want to invite
you to consider
184
00:10:01,850 --> 00:10:06,380
that this kind of question,
questions like that question,
185
00:10:06,380 --> 00:10:09,260
may well be questions
you do care about.
186
00:10:09,260 --> 00:10:12,740
And I also want to suggest
that the fact it is so nearly
187
00:10:12,740 --> 00:10:16,250
impossible, the fact that
there's no obvious way
188
00:10:16,250 --> 00:10:19,250
to ask that kind
of question today,
189
00:10:19,250 --> 00:10:21,200
is partly responsible
to your not
190
00:10:21,200 --> 00:10:22,970
coming up with those
questions, right?
191
00:10:22,970 --> 00:10:25,850
We tend to be limited
by the possible.
192
00:10:25,850 --> 00:10:30,080
You know, until human
flight was made possible,
193
00:10:30,080 --> 00:10:32,840
it did not occur to anyone
to say, oh yeah, by this time
194
00:10:32,840 --> 00:10:34,430
next week I will
be in Australia,
195
00:10:34,430 --> 00:10:36,630
because that was
just impossible.
196
00:10:36,630 --> 00:10:38,587
But when flight is
possible, there's
197
00:10:38,587 --> 00:10:40,670
all kinds of things that
suddenly become possible,
198
00:10:40,670 --> 00:10:42,740
and there's all
kinds of needs that
199
00:10:42,740 --> 00:10:46,430
arise based on the
availability of resources
200
00:10:46,430 --> 00:10:48,600
to fulfill those needs.
201
00:10:48,600 --> 00:10:54,120
So many of these research
questions, compound lateral
202
00:10:54,120 --> 00:10:58,520
cross-cutting queries, are not
being asked because people have
203
00:10:58,520 --> 00:11:00,410
internalized the fact
that there is no way
204
00:11:00,410 --> 00:11:05,750
to get an answer
to questions like,
205
00:11:05,750 --> 00:11:13,270
what is the most popular first
name among British politicians?
206
00:11:13,270 --> 00:11:14,520
I just made that up, you know?
207
00:11:14,520 --> 00:11:15,340
Is it John?
208
00:11:15,340 --> 00:11:16,510
Maybe.
209
00:11:16,510 --> 00:11:19,030
Maybe it's William,
for whatever reason.
210
00:11:19,030 --> 00:11:22,030
You know, these are the kinds
of questions we don't routinely
211
00:11:22,030 --> 00:11:25,855
ask because we know that it's
like, who are you going to ask?
212
00:11:25,855 --> 00:11:28,330
How are you going to
get an answer to that?
213
00:11:28,330 --> 00:11:36,040
So this problem of not having
very flexible ways of querying
214
00:11:36,040 --> 00:11:38,220
the data that we already have--
215
00:11:38,220 --> 00:11:41,230
in Wikipedia, in
Wikisource, elsewhere--
216
00:11:41,230 --> 00:11:45,060
is a significant limitation.
217
00:11:45,060 --> 00:11:50,880
So these two key problems
have one solution.
218
00:11:50,880 --> 00:11:55,500
And that is an editable,
central storage
219
00:11:55,500 --> 00:12:00,510
for structured and
linked data on a wiki,
220
00:12:00,510 --> 00:12:05,160
under a free license, which
is a very long way of saying
221
00:12:05,160 --> 00:12:07,290
Wikidata.
222
00:12:07,290 --> 00:12:08,470
That is Wikidata.
223
00:12:08,470 --> 00:12:11,190
Wikidata is an editable,
central storage
224
00:12:11,190 --> 00:12:15,840
for structured and
linked data on a wiki,
225
00:12:15,840 --> 00:12:17,700
under a free license.
226
00:12:17,700 --> 00:12:22,590
So let's take this
apart and unpack it.
227
00:12:22,590 --> 00:12:24,820
First of all, it's
a central storage.
228
00:12:24,820 --> 00:12:27,660
This relates to the
first problem, right?
229
00:12:27,660 --> 00:12:34,370
If we had one place containing
data like population size,
230
00:12:34,370 --> 00:12:38,270
we would be able to update
that one place and then have
231
00:12:38,270 --> 00:12:42,260
all of the different Wikipedias
draw the data from that one
232
00:12:42,260 --> 00:12:45,320
place so that we wouldn't
have to manually,
233
00:12:45,320 --> 00:12:49,980
repetitively update it across
our hundreds of projects.
234
00:12:49,980 --> 00:12:53,690
So having central storage
makes, I hope, kind
235
00:12:53,690 --> 00:12:57,230
of immediate, intuitive sense.
236
00:12:57,230 --> 00:13:02,840
But what do I mean by
structured and linked data?
237
00:13:02,840 --> 00:13:10,120
So structured data means
that each datum, each piece--
238
00:13:10,120 --> 00:13:15,880
individual piece-- of data
is managed on its own,
239
00:13:15,880 --> 00:13:19,660
is identified and
defined on its own,
240
00:13:19,660 --> 00:13:21,040
as distinct from Wikipedia.
241
00:13:21,040 --> 00:13:22,990
Wikipedia has articles.
242
00:13:22,990 --> 00:13:27,190
The article about Brazil
includes a ton of data,
243
00:13:27,190 --> 00:13:31,570
all kinds of information,
and it's presented as text,
244
00:13:31,570 --> 00:13:34,270
as several paragraphs--
several pages--
245
00:13:34,270 --> 00:13:36,540
of text, right?
246
00:13:36,540 --> 00:13:41,460
Now, we do have an
approximation of structured data
247
00:13:41,460 --> 00:13:43,580
on Wikipedia.
248
00:13:43,580 --> 00:13:45,300
If you've browsed
Wikipedia a little,
249
00:13:45,300 --> 00:13:49,100
you've noticed that we often
have an info box, what we
250
00:13:49,100 --> 00:13:50,750
call an info box on Wikipedia.
251
00:13:50,750 --> 00:13:55,220
That's the table on the right
side if it's a left to right
252
00:13:55,220 --> 00:13:57,200
language, the table
on the right side
253
00:13:57,200 --> 00:14:02,270
that has information that
is easy to tabulate, right?
254
00:14:02,270 --> 00:14:08,210
So you know, birth date, birth
place, death date, death place,
255
00:14:08,210 --> 00:14:09,710
nationality--
256
00:14:09,710 --> 00:14:16,670
or if it's about a country,
area, population, anthem,
257
00:14:16,670 --> 00:14:20,090
type of government, whatever
you are likely to find.
258
00:14:20,090 --> 00:14:23,150
If it's a movie, then
you know, starring,
259
00:14:23,150 --> 00:14:27,350
genre, box office receipts,
whatever pieces of data
260
00:14:27,350 --> 00:14:29,900
are relevant to an
article about a movie.
261
00:14:29,900 --> 00:14:34,940
So we do already kind of
group pieces of information
262
00:14:34,940 --> 00:14:40,160
on Wikipedia into this
kind of structured format.
263
00:14:40,160 --> 00:14:43,630
Those of you who have
ever looked at the source,
264
00:14:43,630 --> 00:14:45,970
at what the wiki code
under that looks like,
265
00:14:45,970 --> 00:14:49,640
know that it's only
semi-structured.
266
00:14:49,640 --> 00:14:52,370
It looks neat and
organized in a table,
267
00:14:52,370 --> 00:14:55,660
but really, it's just a bunch
of text that is put there.
268
00:14:55,660 --> 00:14:57,140
It is not centralized.
269
00:14:57,140 --> 00:15:00,100
Every Wikipedia has its
own copy of that data.
270
00:15:00,100 --> 00:15:02,930
And if I go and update
the population size
271
00:15:02,930 --> 00:15:07,070
on Spanish Wikipedia of
that Argentinean town,
272
00:15:07,070 --> 00:15:10,190
it does not get
updated automagically
273
00:15:10,190 --> 00:15:13,520
on the English Wikipedia or
the Arabic Wikipedia, right?
274
00:15:13,520 --> 00:15:17,150
So the structured data that
we already have on Wikipedia
275
00:15:17,150 --> 00:15:20,939
is not managed centrally.
276
00:15:20,939 --> 00:15:22,480
The other thing
about structured data
277
00:15:22,480 --> 00:15:29,250
is, when you have a notion of an
individual piece of data, that
278
00:15:29,250 --> 00:15:33,390
is the cornerstone of
allowing the kinds of queries
279
00:15:33,390 --> 00:15:34,770
that I was talking about.
280
00:15:34,770 --> 00:15:40,440
That is what will allow
me to ask questions like,
281
00:15:40,440 --> 00:15:43,470
what is the Flemish town where
the most painters were born,
282
00:15:43,470 --> 00:15:46,650
or what are the world's
largest cities that
283
00:15:46,650 --> 00:15:49,730
have a female mayor?
284
00:15:49,730 --> 00:15:52,430
I could come up with other
examples all day long, right?
285
00:15:52,430 --> 00:15:55,280
These are all questions
that you can ask,
286
00:15:55,280 --> 00:15:59,390
once you break down your data
into individual pieces, each
287
00:15:59,390 --> 00:16:02,300
of which is--
288
00:16:02,300 --> 00:16:06,950
you're able to refer to each
of those programmatically.
289
00:16:06,950 --> 00:16:10,430
The computer can
identify, isolate,
290
00:16:10,430 --> 00:16:14,700
and calculate based on each
of those pieces of data.
291
00:16:14,700 --> 00:16:17,060
So that's why the
structure is important.
292
00:16:17,060 --> 00:16:22,520
Now, Wikidata is also a
linked data repository.
293
00:16:22,520 --> 00:16:24,890
What does it mean that
the data is linked?
294
00:16:24,890 --> 00:16:29,700
Well, it means that a single
piece of data can point at,
295
00:16:29,700 --> 00:16:34,770
can link to another
whole bag of data.
296
00:16:34,770 --> 00:16:43,360
So if we are describing,
for example, a person,
297
00:16:43,360 --> 00:16:46,960
and we record the
single piece of data
298
00:16:46,960 --> 00:16:54,820
that this person was born
in Salem, Massachusetts,
299
00:16:54,820 --> 00:17:02,300
that single piece of data
links to the item about Salem,
300
00:17:02,300 --> 00:17:04,060
Massachusetts
because, of course,
301
00:17:04,060 --> 00:17:07,010
we know a lot of things
about that place, Salem,
302
00:17:07,010 --> 00:17:07,869
Massachusetts.
303
00:17:07,869 --> 00:17:09,245
So it's not just the text--
304
00:17:09,245 --> 00:17:13,450
S-A-L-E-M. It's not just,
that's where they were born.
305
00:17:13,450 --> 00:17:17,170
But it's a link to all
the data that we have
306
00:17:17,170 --> 00:17:19,270
about Salem, Massachusetts.
307
00:17:19,270 --> 00:17:24,940
If we say someone's
nationality is French,
308
00:17:24,940 --> 00:17:26,589
that is a link to France.
309
00:17:26,589 --> 00:17:30,700
That is a link to everything we
know about the country France.
310
00:17:30,700 --> 00:17:34,150
The fact that the data
is linked and structured
311
00:17:34,150 --> 00:17:37,630
allows not only humans,
but also computers
312
00:17:37,630 --> 00:17:41,620
to traverse information
and to bring
313
00:17:41,620 --> 00:17:44,950
us different pieces of
relevant information
314
00:17:44,950 --> 00:17:49,000
programmatically, automatically,
based on those links.
315
00:17:49,000 --> 00:17:52,000
Because it's not just
text, it's an actual link
316
00:17:52,000 --> 00:17:56,700
to another chunk of data.
317
00:17:56,700 --> 00:17:58,880
If this sounds a
little abstract,
318
00:17:58,880 --> 00:18:01,190
it will become much
clearer in just a second
319
00:18:01,190 --> 00:18:03,230
when we see it in action.
320
00:18:03,230 --> 00:18:06,200
But the other components of
this little definition are,
321
00:18:06,200 --> 00:18:09,650
of course, this central storage
of structured and linked data
322
00:18:09,650 --> 00:18:12,620
needs to be editable,
of course, because we
323
00:18:12,620 --> 00:18:14,370
need to keep it up to date.
324
00:18:14,370 --> 00:18:16,460
We need to correct mistakes.
325
00:18:16,460 --> 00:18:21,300
And we want it on a wiki
under a free license.
326
00:18:21,300 --> 00:18:23,940
The free license is, of
course, essential to enable
327
00:18:23,940 --> 00:18:30,910
reuse of that data, to enable
all kinds of reuse of the data.
328
00:18:30,910 --> 00:18:34,060
And Wikidata, unlike
Wikipedia, is released
329
00:18:34,060 --> 00:18:36,160
under a different free license.
330
00:18:36,160 --> 00:18:41,590
Wikidata is released
under CC0 waiver.
331
00:18:41,590 --> 00:18:44,920
That means unlike
Wikipedia, where
332
00:18:44,920 --> 00:18:51,160
you have to attribute Wikipedia
when you reuse information
333
00:18:51,160 --> 00:18:55,150
from Wikipedia, you do not
need to attribute Wikidata,
334
00:18:55,150 --> 00:18:57,040
and you do not need to
share alike your work.
335
00:18:57,040 --> 00:19:02,020
It's an unencumbered license to
reuse the data in any way you
336
00:19:02,020 --> 00:19:03,267
want, including commercially.
337
00:19:03,267 --> 00:19:05,350
You don't have to say that
it comes from Wikidata.
338
00:19:05,350 --> 00:19:07,390
I mean, it could be nice,
but you don't have to.
339
00:19:07,390 --> 00:19:09,280
You're under no
obligation to do it.
340
00:19:09,280 --> 00:19:14,080
And that is important to
allow certain kinds of reuse
341
00:19:14,080 --> 00:19:17,140
where, for example, if you're
building some kind of device,
342
00:19:17,140 --> 00:19:20,680
you may not have a practical
way to give attribution.
343
00:19:20,680 --> 00:19:23,920
And had we required
that to use Wikidata,
344
00:19:23,920 --> 00:19:27,250
we would have made
Wikidata less reusable.
345
00:19:27,250 --> 00:19:32,940
So Wikidata is unencumbered by
the requirement of attribution.
346
00:19:32,940 --> 00:19:35,730
And of course, because
it's on a wiki,
347
00:19:35,730 --> 00:19:40,421
we get all the benefits that we
are used to expect from a wiki,
348
00:19:40,421 --> 00:19:40,920
right?
349
00:19:40,920 --> 00:19:42,810
So it's a wiki,
which means, yes.
350
00:19:42,810 --> 00:19:44,910
It has discussion pages.
351
00:19:44,910 --> 00:19:46,500
It has revision histories.
352
00:19:46,500 --> 00:19:47,620
It remembers everything.
353
00:19:47,620 --> 00:19:50,610
So if you screw it up, you
can always go a version back.
354
00:19:50,610 --> 00:19:52,380
Or if someone else
vandalized the content,
355
00:19:52,380 --> 00:19:54,610
we can always go back,
just like Wikipedia.
356
00:19:54,610 --> 00:19:56,880
So we get all the
benefits we're used to--
357
00:19:56,880 --> 00:20:01,260
user talk pages, group
discussion pages, watch lists,
358
00:20:01,260 --> 00:20:03,755
all the features that
we expect in a wiki.
359
00:20:03,755 --> 00:20:06,740
360
00:20:06,740 --> 00:20:11,170
In short, Wikidata is love.
361
00:20:11,170 --> 00:20:14,100
I hope you agree with me
by the end of this talk.
362
00:20:14,100 --> 00:20:18,580
So let's zoom in and see
what this structured data
363
00:20:18,580 --> 00:20:21,420
looks like.
364
00:20:21,420 --> 00:20:29,460
So structured data on Wikidata
is collected in statements.
365
00:20:29,460 --> 00:20:31,930
And statements have
the general form
366
00:20:31,930 --> 00:20:39,490
of this triple, this
tripartite ascription--
367
00:20:39,490 --> 00:20:43,550
items, properties, and values.
368
00:20:43,550 --> 00:20:46,930
Now an item is the
subject, is the topic
369
00:20:46,930 --> 00:20:48,820
that we are trying to describe.
370
00:20:48,820 --> 00:20:52,164
It can be any topic that
Wikipedia can cover,
371
00:20:52,164 --> 00:20:53,830
and many others that
Wikipedia wouldn't.
372
00:20:53,830 --> 00:20:57,490
So the topic, the
item can be Germany,
373
00:20:57,490 --> 00:21:00,520
or it can be Salem,
Massachusetts,
374
00:21:00,520 --> 00:21:03,340
or it can be the
concept of redemption.
375
00:21:03,340 --> 00:21:04,610
It can be anything at all.
376
00:21:04,610 --> 00:21:10,000
Anything you can imagine
describing in any way with data
377
00:21:10,000 --> 00:21:11,990
can be the item.
378
00:21:11,990 --> 00:21:15,430
So the item, consider
it like the title
379
00:21:15,430 --> 00:21:17,480
of the rest of the data.
380
00:21:17,480 --> 00:21:20,860
And then what do we say
about Salem, Massachusetts
381
00:21:20,860 --> 00:21:22,330
or about Germany?
382
00:21:22,330 --> 00:21:26,770
Well, that's a series of
properties and values,
383
00:21:26,770 --> 00:21:28,450
properties and values.
384
00:21:28,450 --> 00:21:32,680
The property is
the kind of datum,
385
00:21:32,680 --> 00:21:39,770
like birth date or language
spoken or manner of death.
386
00:21:39,770 --> 00:21:42,640
These are all real properties.
387
00:21:42,640 --> 00:21:46,030
Or national anthem, if I'm
trying to describe a country--
388
00:21:46,030 --> 00:21:47,830
these are properties.
389
00:21:47,830 --> 00:21:49,880
And then they have
values, right?
390
00:21:49,880 --> 00:21:55,740
So this person, this
imaginary person's place
391
00:21:55,740 --> 00:21:59,640
of birth, the value of the
property place of birth
392
00:21:59,640 --> 00:22:02,430
is Salem, Massachusetts.
393
00:22:02,430 --> 00:22:06,690
So you can think about it
as like a government form--
394
00:22:06,690 --> 00:22:09,540
or not government, just any
form that you're filling out--
395
00:22:09,540 --> 00:22:12,420
where there are field names,
and then empty spaces for you
396
00:22:12,420 --> 00:22:13,110
to fill out.
397
00:22:13,110 --> 00:22:14,460
That's the value, OK?
398
00:22:14,460 --> 00:22:18,150
So the field names
or the categories
399
00:22:18,150 --> 00:22:19,350
are the properties, right?
400
00:22:19,350 --> 00:22:22,960
So name, language,
occupation, date of birth--
401
00:22:22,960 --> 00:22:24,420
these are all properties.
402
00:22:24,420 --> 00:22:26,640
And the values are
the actual piece
403
00:22:26,640 --> 00:22:31,391
of data, the actual
information that we have.
404
00:22:31,391 --> 00:22:33,870
And of course,
different kinds of data
405
00:22:33,870 --> 00:22:40,170
are relevant for describing
different kinds of items.
406
00:22:40,170 --> 00:22:45,030
And the key in the value is it
can be either a literal value--
407
00:22:45,030 --> 00:22:50,370
like if we're describing
the height of a mountain,
408
00:22:50,370 --> 00:22:55,826
we might say just
the number 8,848.
409
00:22:55,826 --> 00:22:57,325
That's the height
of which mountain?
410
00:22:57,325 --> 00:23:01,990
411
00:23:01,990 --> 00:23:04,070
Not everyone at once.
412
00:23:04,070 --> 00:23:07,430
Oh, because it's meters,
the metric system.
413
00:23:07,430 --> 00:23:08,270
Yeah, Mt.
414
00:23:08,270 --> 00:23:12,390
Everest is 8,848 meters.
415
00:23:12,390 --> 00:23:14,160
Yes.
416
00:23:14,160 --> 00:23:15,780
Get with it, America.
417
00:23:15,780 --> 00:23:17,630
The metric system.
418
00:23:17,630 --> 00:23:20,930
All right, so that
can be a literal value
419
00:23:20,930 --> 00:23:22,580
like an actual number.
420
00:23:22,580 --> 00:23:28,280
Or it can be a link to an
item, pointing at another item.
421
00:23:28,280 --> 00:23:30,890
But in this statement,
it is the value.
422
00:23:30,890 --> 00:23:35,150
So if I'm talking about
Germany, the item is Germany.
423
00:23:35,150 --> 00:23:39,680
And the property capital
city has the value Berlin.
424
00:23:39,680 --> 00:23:43,130
But the value is
not B-E-R-L-I-N.
425
00:23:43,130 --> 00:23:48,740
The value is a pointer to
the item Berlin, right?
426
00:23:48,740 --> 00:23:51,410
That's the link.
427
00:23:51,410 --> 00:23:56,671
So a single item is described
by a series of such statements,
428
00:23:56,671 --> 00:23:57,170
right?
429
00:23:57,170 --> 00:24:01,400
There's hundreds and hundreds of
things I can say about Germany.
430
00:24:01,400 --> 00:24:04,280
There's hundreds of things
I can say about a person.
431
00:24:04,280 --> 00:24:06,350
And these will
generally take the form
432
00:24:06,350 --> 00:24:08,330
of a property and a value.
433
00:24:08,330 --> 00:24:11,720
By the way, some properties
may have more than one value.
434
00:24:11,720 --> 00:24:15,920
Consider the property
languages spoken.
435
00:24:15,920 --> 00:24:18,050
People can speak more
than one language, right?
436
00:24:18,050 --> 00:24:20,330
So if I'm from
describing myself,
437
00:24:20,330 --> 00:24:22,400
we can say languages spoken--
438
00:24:22,400 --> 00:24:26,000
English, Hebrew,
Latin, whatever.
439
00:24:26,000 --> 00:24:27,860
So a property can have
more than one value.
440
00:24:27,860 --> 00:24:30,970
441
00:24:30,970 --> 00:24:34,010
So if the item is
about a country,
442
00:24:34,010 --> 00:24:38,890
it would have statements about
properties like population,
443
00:24:38,890 --> 00:24:43,180
land area, official languages,
borders with, anthem,
444
00:24:43,180 --> 00:24:45,070
capital city.
445
00:24:45,070 --> 00:24:48,580
If I'm describing a person, I
have a whole mostly different
446
00:24:48,580 --> 00:24:51,220
set of properties that
are relevant, right?
447
00:24:51,220 --> 00:24:54,160
Date of birth, place of birth,
citizenship, occupation,
448
00:24:54,160 --> 00:24:56,950
father, mother,
religion, notable works--
449
00:24:56,950 --> 00:24:59,780
now, are all of these
relevant for all people?
450
00:24:59,780 --> 00:25:00,970
No, of course not.
451
00:25:00,970 --> 00:25:02,140
It depends.
452
00:25:02,140 --> 00:25:05,220
And different items
about different people
453
00:25:05,220 --> 00:25:08,920
will either have or not
have these fields, right?
454
00:25:08,920 --> 00:25:12,640
So we wouldn't record religion
for absolutely every person.
455
00:25:12,640 --> 00:25:14,200
Some people manage
to do without.
456
00:25:14,200 --> 00:25:17,710
And also, it's not relevant
for a lot of people, like,
457
00:25:17,710 --> 00:25:20,320
what their religion
happens to be.
458
00:25:20,320 --> 00:25:22,840
Date of birth is generally
relevant for most people
459
00:25:22,840 --> 00:25:24,060
that we're documenting.
460
00:25:24,060 --> 00:25:29,390
So some properties kind of crop
up more commonly than others.
461
00:25:29,390 --> 00:25:33,220
A person's height, for
example, is not generally
462
00:25:33,220 --> 00:25:35,596
considered of
encyclopedic value, right?
463
00:25:35,596 --> 00:25:36,970
We don't, for
example, if we have
464
00:25:36,970 --> 00:25:40,840
an article about even a
really well-documented person
465
00:25:40,840 --> 00:25:45,610
like Winston Churchill, does
Wikipedia mention his height?
466
00:25:45,610 --> 00:25:47,620
I don't think it does.
467
00:25:47,620 --> 00:25:50,320
Even though I'm sure
we could probably
468
00:25:50,320 --> 00:25:52,810
find a source somewhere
that lists his height,
469
00:25:52,810 --> 00:25:55,570
it's just not a
very relevant piece
470
00:25:55,570 --> 00:25:57,506
of information about Churchill.
471
00:25:57,506 --> 00:25:59,380
With everything else
that's written about him
472
00:25:59,380 --> 00:26:00,796
and that we know
about him that we
473
00:26:00,796 --> 00:26:03,460
want to include in the
article, a person's height
474
00:26:03,460 --> 00:26:08,180
is not really something of
great value most of the time.
475
00:26:08,180 --> 00:26:14,420
But if we are describing
Michael Jordan, it is relevant.
476
00:26:14,420 --> 00:26:15,430
I'm dating myself.
477
00:26:15,430 --> 00:26:19,230
People still know
Michael Jordan, right?
478
00:26:19,230 --> 00:26:21,600
You know, a basketball
player, that's
479
00:26:21,600 --> 00:26:24,204
when height is very
relevant, right?
480
00:26:24,204 --> 00:26:25,620
That's one of the
first things you
481
00:26:25,620 --> 00:26:28,020
say when you're describing
a basketball player,
482
00:26:28,020 --> 00:26:31,380
is list their height.
483
00:26:31,380 --> 00:26:33,690
So even within the
class of person,
484
00:26:33,690 --> 00:26:36,480
some properties may be
more or less relevant,
485
00:26:36,480 --> 00:26:38,320
depending on the context.
486
00:26:38,320 --> 00:26:40,090
So let's look at some examples.
487
00:26:40,090 --> 00:26:42,870
These are examples
of statements.
488
00:26:42,870 --> 00:26:45,400
Each line is a statement.
489
00:26:45,400 --> 00:26:47,130
So here's the first one.
490
00:26:47,130 --> 00:26:53,270
I want to state, about the
item Earth, our planet.
491
00:26:53,270 --> 00:26:55,760
And what I want
to say about Earth
492
00:26:55,760 --> 00:27:00,980
is that the property
highest point on Earth
493
00:27:00,980 --> 00:27:03,310
has the value Mt.
494
00:27:03,310 --> 00:27:04,817
Everest.
495
00:27:04,817 --> 00:27:05,900
Would you agree with that?
496
00:27:05,900 --> 00:27:09,580
That is the highest
point on Earth.
497
00:27:09,580 --> 00:27:11,100
That's a statement.
498
00:27:11,100 --> 00:27:14,020
It says something
specific, one piece
499
00:27:14,020 --> 00:27:15,517
of information about Earth.
500
00:27:15,517 --> 00:27:17,350
Now of course, there's
a lot of other things
501
00:27:17,350 --> 00:27:18,820
we want to say about Earth--
502
00:27:18,820 --> 00:27:21,165
circumference,
average temperature,
503
00:27:21,165 --> 00:27:22,540
I don't know, all
kinds of things
504
00:27:22,540 --> 00:27:26,750
we can describe the planet
with, density, it's a galaxy,
505
00:27:26,750 --> 00:27:28,250
it belongs to, all that.
506
00:27:28,250 --> 00:27:30,400
But here's one piece
of information,
507
00:27:30,400 --> 00:27:37,370
one very specific field in
the detailed form about Earth.
508
00:27:37,370 --> 00:27:38,990
The highest point is Mt.
509
00:27:38,990 --> 00:27:39,590
Everest.
510
00:27:39,590 --> 00:27:41,570
Now here's a second statement.
511
00:27:41,570 --> 00:27:42,920
This time Mt.
512
00:27:42,920 --> 00:27:46,690
Everest itself is the item
that I'm describing, right?
513
00:27:46,690 --> 00:27:48,590
The topic has changed.
514
00:27:48,590 --> 00:27:50,120
Now I'm saying
something about Mt.
515
00:27:50,120 --> 00:27:52,340
Everest, and what
I'm saying about Mt.
516
00:27:52,340 --> 00:27:56,860
Everest is elevation
above sea level.
517
00:27:56,860 --> 00:28:01,190
Sounds the same but it
isn't, because the highest
518
00:28:01,190 --> 00:28:04,670
point on Earth answers
the question where,
519
00:28:04,670 --> 00:28:08,090
like on the planet, what
is the highest point?
520
00:28:08,090 --> 00:28:08,720
It's Mt.
521
00:28:08,720 --> 00:28:09,630
Everest.
522
00:28:09,630 --> 00:28:12,911
But how high is that highest
point is a different piece
523
00:28:12,911 --> 00:28:13,535
of information.
524
00:28:13,535 --> 00:28:14,710
Do you agree?
525
00:28:14,710 --> 00:28:16,790
It's the actual altitude.
526
00:28:16,790 --> 00:28:19,600
It's not where on
the planet it is.
527
00:28:19,600 --> 00:28:21,680
So it may sound similar,
but these are actually
528
00:28:21,680 --> 00:28:24,030
very different pieces
of information.
529
00:28:24,030 --> 00:28:27,800
So that highest
point, how high is it?
530
00:28:27,800 --> 00:28:31,790
Well, it's 8,848 meters high.
531
00:28:31,790 --> 00:28:36,550
Now the third statement gives
another piece of information
532
00:28:36,550 --> 00:28:37,960
about the first item.
533
00:28:37,960 --> 00:28:40,870
Same item-- I could have
grouped them together.
534
00:28:40,870 --> 00:28:42,400
Another thing I
know about the Earth
535
00:28:42,400 --> 00:28:46,480
is that the deepest
point on the planet
536
00:28:46,480 --> 00:28:53,050
is the Challenger Deep, part
of the so-called Mariana
537
00:28:53,050 --> 00:28:54,760
Trench in the ocean.
538
00:28:54,760 --> 00:28:56,530
So that is the deepest point.
539
00:28:56,530 --> 00:28:58,180
And how deep is it?
540
00:28:58,180 --> 00:29:01,384
I again use the elevation
above sea level.
541
00:29:01,384 --> 00:29:03,550
That's the name of the
property even though it's not
542
00:29:03,550 --> 00:29:04,750
above sea level.
543
00:29:04,750 --> 00:29:08,260
I have a negative value because
the elevation of the Challenger
544
00:29:08,260 --> 00:29:13,700
Deep is minus 11
kilometers, more or less.
545
00:29:13,700 --> 00:29:14,200
All right?
546
00:29:14,200 --> 00:29:15,620
So these are statements.
547
00:29:15,620 --> 00:29:18,820
These are four individual
pieces of data.
548
00:29:18,820 --> 00:29:21,160
And I could also
look at it this way.
549
00:29:21,160 --> 00:29:25,210
Maybe that's closer to the
government form example
550
00:29:25,210 --> 00:29:26,620
that I was giving, right?
551
00:29:26,620 --> 00:29:29,190
So I want to say
something about Earth.
552
00:29:29,190 --> 00:29:30,760
What do I want to say?
553
00:29:30,760 --> 00:29:33,580
Two things-- highest point.
554
00:29:33,580 --> 00:29:36,760
That's the field,
that's the property,
555
00:29:36,760 --> 00:29:37,780
and this is the value.
556
00:29:37,780 --> 00:29:39,190
The highest point is Mt.
557
00:29:39,190 --> 00:29:40,240
Everest.
558
00:29:40,240 --> 00:29:42,880
The deepest point
is Challenger Deep.
559
00:29:42,880 --> 00:29:46,450
And then I have things to
say about Challenger Deep--
560
00:29:46,450 --> 00:29:49,630
the property of elevation
above sea level, the value
561
00:29:49,630 --> 00:29:52,280
is minus 11 kilometers.
562
00:29:52,280 --> 00:29:55,900
563
00:29:55,900 --> 00:30:00,600
Now here's yet another
view of the same data
564
00:30:00,600 --> 00:30:04,530
once more, with numeric IDs.
565
00:30:04,530 --> 00:30:08,150
So this is the same information,
the same four statements.
566
00:30:08,150 --> 00:30:13,020
But this time, in
addition to using words,
567
00:30:13,020 --> 00:30:21,270
I'm also including weird
numbers following either Q or P.
568
00:30:21,270 --> 00:30:25,890
So P stands for property.
569
00:30:25,890 --> 00:30:30,330
So the highest point
property is P610.
570
00:30:30,330 --> 00:30:34,216
And the deepest point
property is P1589.
571
00:30:34,216 --> 00:30:35,340
What do these numbers mean?
572
00:30:35,340 --> 00:30:36,985
They don't mean anything at all.
573
00:30:36,985 --> 00:30:37,860
They're just numbers.
574
00:30:37,860 --> 00:30:39,760
They're just sequential numbers.
575
00:30:39,760 --> 00:30:42,600
And if I create a new
Wikidata item right now,
576
00:30:42,600 --> 00:30:46,020
it'll get just the
next available number.
577
00:30:46,020 --> 00:30:47,790
So they're just numbers.
578
00:30:47,790 --> 00:30:49,080
So P stands for property.
579
00:30:49,080 --> 00:30:51,480
What does Q stand for?
580
00:30:51,480 --> 00:30:53,460
Does anyone know?
581
00:30:53,460 --> 00:30:58,500
It's a trick question
because it's hard to guess.
582
00:30:58,500 --> 00:31:01,896
But the principal
architect of Wikidata,
583
00:31:01,896 --> 00:31:07,860
a Wikipedian named Danny
[INAUDIBLE] and data scientist,
584
00:31:07,860 --> 00:31:10,950
is married to a lovely
lady named [INAUDIBLE]
585
00:31:10,950 --> 00:31:16,320
spelled with a Q. And
this is a loving tribute.
586
00:31:16,320 --> 00:31:21,780
And she's also a Wikipedian and
an admin of Uzbek Wikipedia.
587
00:31:21,780 --> 00:31:31,650
So Q2 is just the numeric
identifier of the item Earth.
588
00:31:31,650 --> 00:31:36,190
And Q513 is the
identifier of Mt.
589
00:31:36,190 --> 00:31:37,310
Everest.
590
00:31:37,310 --> 00:31:42,950
You notice that we use that ID
across the statement, right?
591
00:31:42,950 --> 00:31:48,520
So from Wikidata's
perspective, this
592
00:31:48,520 --> 00:31:53,290
is actually what the
database actually contains.
593
00:31:53,290 --> 00:31:55,030
What we were saying with words--
594
00:31:55,030 --> 00:31:57,650
the Earth, highest
point, whatever--
595
00:31:57,650 --> 00:31:58,540
never mind that.
596
00:31:58,540 --> 00:32:03,250
Q2 has P610 with a value Q513.
597
00:32:03,250 --> 00:32:06,190
That's what Wikidata
cares about, OK?
598
00:32:06,190 --> 00:32:09,770
Now that, you'll agree,
is a little inaccessible.
599
00:32:09,770 --> 00:32:13,120
Just these lists of numbers,
that's a little hard.
600
00:32:13,120 --> 00:32:16,240
So Wikidata
understands and allows
601
00:32:16,240 --> 00:32:19,690
us to continue using our words.
602
00:32:19,690 --> 00:32:23,650
But actually, it gets
translated into numeric IDs.
603
00:32:23,650 --> 00:32:25,050
Now why is this a good idea?
604
00:32:25,050 --> 00:32:30,070
605
00:32:30,070 --> 00:32:33,070
Why can't we just
say Earth or Mt.
606
00:32:33,070 --> 00:32:35,120
Everest?
607
00:32:35,120 --> 00:32:36,170
Any thoughts?
608
00:32:36,170 --> 00:32:39,530
This is an open question.
609
00:32:39,530 --> 00:32:41,540
Why is this a good
idea to use numbers
610
00:32:41,540 --> 00:32:43,260
instead of the names of things?
611
00:32:43,260 --> 00:32:47,000
612
00:32:47,000 --> 00:32:51,750
Yes, because more than one
thing can have the same name.
613
00:32:51,750 --> 00:32:52,590
What do you mean?
614
00:32:52,590 --> 00:32:53,460
There's only one Mt.
615
00:32:53,460 --> 00:32:54,480
Everest.
616
00:32:54,480 --> 00:32:55,510
Well, yeah.
617
00:32:55,510 --> 00:32:58,710
But there there's also a
movie called-- and probably
618
00:32:58,710 --> 00:33:00,000
more than one-- called Mt.
619
00:33:00,000 --> 00:33:04,080
Everest, or a TV documentary
literally called Mt.
620
00:33:04,080 --> 00:33:06,590
Everest.
621
00:33:06,590 --> 00:33:09,960
And of course, if I'm
describing a person named
622
00:33:09,960 --> 00:33:14,930
Frank Johnson, not the only
Frank Johnson on the planet,
623
00:33:14,930 --> 00:33:16,180
right?
624
00:33:16,180 --> 00:33:17,760
But wait, you say.
625
00:33:17,760 --> 00:33:20,640
On Wikipedia we deal
with that problem, right?
626
00:33:20,640 --> 00:33:23,490
How do we deal with that
problem on Wikipedia?
627
00:33:23,490 --> 00:33:26,270
Does anyone in
the audience know?
628
00:33:26,270 --> 00:33:27,969
The standard way to
deal with the fact
629
00:33:27,969 --> 00:33:30,260
that there is more than one
Frank Johnson in the world,
630
00:33:30,260 --> 00:33:35,600
on Wikipedia, is to use
parentheses after the name.
631
00:33:35,600 --> 00:33:39,200
So there is Frank
Johnson (actor)
632
00:33:39,200 --> 00:33:42,620
and Frank Johnson
(politician), for example,
633
00:33:42,620 --> 00:33:44,700
if that's the distinction
we need to make.
634
00:33:44,700 --> 00:33:48,140
So you put in parentheses
kind of the minimal amount
635
00:33:48,140 --> 00:33:51,840
of information you need to tell
apart these Frank Johnsons.
636
00:33:51,840 --> 00:33:54,530
What if there's two
politician Frank Johnsons?
637
00:33:54,530 --> 00:33:58,880
Well, then you would say Frank
Johnson, (Delaware politician)
638
00:33:58,880 --> 00:34:01,960
versus Frank Johnson
(California politician), right?
639
00:34:01,960 --> 00:34:05,210
You just put in that bit of
context to tell them apart.
640
00:34:05,210 --> 00:34:07,640
So that's the solution
that Wikipedians came up
641
00:34:07,640 --> 00:34:12,469
with years and years ago
because they did need
642
00:34:12,469 --> 00:34:15,560
a unique name for the article.
643
00:34:15,560 --> 00:34:18,170
You can't have two
articles literally called
644
00:34:18,170 --> 00:34:20,790
Frank Johnson on Wikipedia.
645
00:34:20,790 --> 00:34:23,570
So that's the
solution on Wikipedia.
646
00:34:23,570 --> 00:34:28,429
But Wikidata was designed
much later, more than a decade
647
00:34:28,429 --> 00:34:31,340
after Wikipedia, and was
able to kind of learn
648
00:34:31,340 --> 00:34:34,520
from the experience
of Wikipedia, which
649
00:34:34,520 --> 00:34:39,380
has tremendous experience
with multilingualism, much
650
00:34:39,380 --> 00:34:42,870
more than most sites and
projects, as we know.
651
00:34:42,870 --> 00:34:44,659
And so the Wikidata
team understood
652
00:34:44,659 --> 00:34:47,840
from the get go that
this will be an issue,
653
00:34:47,840 --> 00:34:50,989
and it's better to use
numbers that are unequivocally
654
00:34:50,989 --> 00:34:54,800
different from each
other instead of labels,
655
00:34:54,800 --> 00:34:57,290
instead of the actual
name, the actual text,
656
00:34:57,290 --> 00:34:59,630
because names are not unique.
657
00:34:59,630 --> 00:35:03,260
Names can change, right?
658
00:35:03,260 --> 00:35:08,960
Just last year, there was a
big naming reform in Ukraine
659
00:35:08,960 --> 00:35:13,610
and a whole bunch of towns
and districts were renamed.
660
00:35:13,610 --> 00:35:17,330
Does that mean we should change
all the data that we have, like
661
00:35:17,330 --> 00:35:19,550
lose all the data that we
have about the old name?
662
00:35:19,550 --> 00:35:22,130
No, we ideally just
want to change the name
663
00:35:22,130 --> 00:35:24,020
without breaking links.
664
00:35:24,020 --> 00:35:28,550
So having the links actually
refer to the numbers
665
00:35:28,550 --> 00:35:32,090
is one way to ensure the
integrity of the data,
666
00:35:32,090 --> 00:35:35,360
of the links, when
renaming happens.
667
00:35:35,360 --> 00:35:39,230
Another reason is well, even
if the name doesn't change,
668
00:35:39,230 --> 00:35:42,230
not all humans call
everything the same, right?
669
00:35:42,230 --> 00:35:46,180
So Earth is Earth
in English, but it's
670
00:35:46,180 --> 00:35:48,210
[SPEAKING ARABIC] in Arabic.
671
00:35:48,210 --> 00:35:49,585
It's [SPEAKING HEBREW]
in Hebrew.
672
00:35:49,585 --> 00:35:53,480
673
00:35:53,480 --> 00:35:56,570
So obviously, Earth--
even that is not
674
00:35:56,570 --> 00:36:01,920
as unambiguous or unequivocal
as you might think.
675
00:36:01,920 --> 00:36:03,500
And so that is the
reason Wikidata,
676
00:36:03,500 --> 00:36:07,640
which is built to be
multilingual from the start,
677
00:36:07,640 --> 00:36:11,230
talks about numbers
rather than labels.
678
00:36:11,230 --> 00:36:12,150
OK.
679
00:36:12,150 --> 00:36:15,370
Ha, I had a whole slide
about that and I forgot.
680
00:36:15,370 --> 00:36:17,830
Yes, so even London,
again, is not
681
00:36:17,830 --> 00:36:20,710
just London, England, which is
what you were thinking about.
682
00:36:20,710 --> 00:36:22,030
It's also a city in Canada.
683
00:36:22,030 --> 00:36:26,260
And it's also a family
name, like Jack London.
684
00:36:26,260 --> 00:36:27,430
It's also a movie company.
685
00:36:27,430 --> 00:36:32,230
There must be some hotel
named London somewhere.
686
00:36:32,230 --> 00:36:36,070
This is a good opportunity
to remind everyone
687
00:36:36,070 --> 00:36:41,110
that the vast
majority of humankind
688
00:36:41,110 --> 00:36:45,700
does not speak a
word of English.
689
00:36:45,700 --> 00:36:48,790
That's a statistic
worth remembering.
690
00:36:48,790 --> 00:36:55,240
The vast majority of the planet
does not speak English at all.
691
00:36:55,240 --> 00:36:57,070
That does not
contradict the datum
692
00:36:57,070 --> 00:37:00,070
that English is the most
widely spoken language.
693
00:37:00,070 --> 00:37:02,860
And yet, in aggregate,
a majority of people
694
00:37:02,860 --> 00:37:07,180
speak other languages,
and not English at all.
695
00:37:07,180 --> 00:37:13,150
So moving swiftly on, this
is a pause for questions
696
00:37:13,150 --> 00:37:15,610
about what I've covered so far.
697
00:37:15,610 --> 00:37:17,390
Any questions in the audience?
698
00:37:17,390 --> 00:37:19,450
If not, we moved to IRC.
699
00:37:19,450 --> 00:37:21,042
If there are any questions--
700
00:37:21,042 --> 00:37:23,880
701
00:37:23,880 --> 00:37:26,891
Any questions?
702
00:37:26,891 --> 00:37:27,390
No?
703
00:37:27,390 --> 00:37:28,305
IRC?
704
00:37:28,305 --> 00:37:29,490
Any questions?
705
00:37:29,490 --> 00:37:33,580
706
00:37:33,580 --> 00:37:34,180
OK.
707
00:37:34,180 --> 00:37:38,170
We will have additional
pauses for questions later.
708
00:37:38,170 --> 00:37:41,470
But enough of my hand-waving.
709
00:37:41,470 --> 00:37:44,590
Let's go explore Wikidata.
710
00:37:44,590 --> 00:37:49,730
So Wikidata lives
at wikidata.org.
711
00:37:49,730 --> 00:37:59,570
And Wikidata already has
more than 25 million items.
712
00:37:59,570 --> 00:38:05,570
That is, it collects
statements about more than 25
713
00:38:05,570 --> 00:38:08,270
million topics.
714
00:38:08,270 --> 00:38:12,170
It has many, many more
than 25 million statements
715
00:38:12,170 --> 00:38:14,660
because many of these items
have dozens or hundreds
716
00:38:14,660 --> 00:38:16,370
of statements.
717
00:38:16,370 --> 00:38:20,720
So it documents 25
million things--
718
00:38:20,720 --> 00:38:23,153
people, books, rivers, whatever.
719
00:38:23,153 --> 00:38:26,010
720
00:38:26,010 --> 00:38:28,800
Just to give us a sense
of how big that number is,
721
00:38:28,800 --> 00:38:32,430
how many articles do we
have on English Wikipedia?
722
00:38:32,430 --> 00:38:35,610
More than-- yes, more
than 5 million articles.
723
00:38:35,610 --> 00:38:37,990
And that's the
largest Wikipedia.
724
00:38:37,990 --> 00:38:41,100
So Wikidata is
already describing
725
00:38:41,100 --> 00:38:45,450
more than five times, or
about five times as many items
726
00:38:45,450 --> 00:38:48,460
as even our largest Wikipedia.
727
00:38:48,460 --> 00:38:50,840
So obviously,
Wikidata contains data
728
00:38:50,840 --> 00:38:56,900
about things that have no
article on any Wikipedia.
729
00:38:56,900 --> 00:39:01,980
It is a much, much larger,
more comprehensive project.
730
00:39:01,980 --> 00:39:04,250
All right, the second
thing we might notice
731
00:39:04,250 --> 00:39:07,610
is, well, this looks kind
of like Wikipedia, right?
732
00:39:07,610 --> 00:39:11,210
If we've never visited, it
looks kind of like Wikipedia.
733
00:39:11,210 --> 00:39:13,490
It has this sidebar.
734
00:39:13,490 --> 00:39:15,290
It has these buttons at the top.
735
00:39:15,290 --> 00:39:17,810
It looks like it's
from the '90s.
736
00:39:17,810 --> 00:39:18,770
Yeah.
737
00:39:18,770 --> 00:39:20,900
So the reason it
looks like Wikipedia
738
00:39:20,900 --> 00:39:24,410
is that it is a wiki running
on Mediawiki software.
739
00:39:24,410 --> 00:39:28,430
It is running on software
very much like Wikipedia.
740
00:39:28,430 --> 00:39:32,180
But it is running on
a kind of modification
741
00:39:32,180 --> 00:39:34,010
of the standard wiki software.
742
00:39:34,010 --> 00:39:36,170
It has an additional,
very important component
743
00:39:36,170 --> 00:39:38,630
named Wikibase,
which gives it all
744
00:39:38,630 --> 00:39:42,700
of its structured and
linked data power.
745
00:39:42,700 --> 00:39:46,763
So let's start
exploring Wikidata.
746
00:39:46,763 --> 00:39:52,830
747
00:39:52,830 --> 00:39:55,770
Let's take something local--
748
00:39:55,770 --> 00:39:57,530
Harvey Milk.
749
00:39:57,530 --> 00:40:00,190
Harvey Milk.
750
00:40:00,190 --> 00:40:03,460
What does Wikidata
know about Harvey Milk?
751
00:40:03,460 --> 00:40:06,730
For those on YouTube
who may not be local,
752
00:40:06,730 --> 00:40:15,580
he's a San Francisco politician
and gay rights activist
753
00:40:15,580 --> 00:40:18,380
who was murdered in the '70s.
754
00:40:18,380 --> 00:40:21,280
It was very significant in
the history of those struggles
755
00:40:21,280 --> 00:40:22,710
in this country.
756
00:40:22,710 --> 00:40:27,220
So what does Wikidata
tell us about Harvey Milk?
757
00:40:27,220 --> 00:40:29,770
Well, the first
thing is it knows
758
00:40:29,770 --> 00:40:34,562
that Harvey Milk is Q17141.
759
00:40:34,562 --> 00:40:36,520
That's the most important
piece of information,
760
00:40:36,520 --> 00:40:38,770
is first of all, that
is the identifier.
761
00:40:38,770 --> 00:40:42,490
That is the item
number of all the data
762
00:40:42,490 --> 00:40:46,150
that we will collect
about Harvey Milk.
763
00:40:46,150 --> 00:40:50,020
The second thing you see
right under the title
764
00:40:50,020 --> 00:40:54,730
is this line, this very,
very brief summary, right?
765
00:40:54,730 --> 00:40:59,620
"American politician who became
a martyr in the gay community."
766
00:40:59,620 --> 00:41:02,080
This line is the
description line.
767
00:41:02,080 --> 00:41:04,640
So the name of the item--
768
00:41:04,640 --> 00:41:05,980
this is the label.
769
00:41:05,980 --> 00:41:07,450
We call it label on Wikidata.
770
00:41:07,450 --> 00:41:08,740
That's the label.
771
00:41:08,740 --> 00:41:10,990
And this line is
the description.
772
00:41:10,990 --> 00:41:13,480
Now why is this
description important?
773
00:41:13,480 --> 00:41:16,990
This is the description that
helps us tell this Harvey
774
00:41:16,990 --> 00:41:23,230
Milk from any other Harvey
Milk that may exist, all right?
775
00:41:23,230 --> 00:41:26,530
So again, this would
be useful if I'm
776
00:41:26,530 --> 00:41:30,190
looking up someone with a
slightly more generic name.
777
00:41:30,190 --> 00:41:33,910
That line will help me tell
apart the item about Harvey
778
00:41:33,910 --> 00:41:38,860
Milk the gay activist rather
than Harvey Milk the film
779
00:41:38,860 --> 00:41:41,750
actor, OK?
780
00:41:41,750 --> 00:41:43,100
And where is it coming from?
781
00:41:43,100 --> 00:41:48,690
Well, Wikidata has
this whole table,
782
00:41:48,690 --> 00:41:52,790
as you can see, with
descriptions and labels
783
00:41:52,790 --> 00:41:54,750
in other languages.
784
00:41:54,750 --> 00:41:59,600
So Wikidata is able to refer
to Harvey Milk in Arabic which,
785
00:41:59,600 --> 00:42:04,010
don't panic, is written
from right to left.
786
00:42:04,010 --> 00:42:07,730
It also knows what to
call him in Bulgarian.
787
00:42:07,730 --> 00:42:11,030
I mean, it's the same name,
but it's in a different script.
788
00:42:11,030 --> 00:42:13,640
In French, in Hebrew,
and that's it?
789
00:42:13,640 --> 00:42:17,960
Does it not know a name
for Harvey Milk in Italian?
790
00:42:17,960 --> 00:42:19,760
Of course it does.
791
00:42:19,760 --> 00:42:22,250
It actually has
labels for this person
792
00:42:22,250 --> 00:42:24,435
in many, many, many languages.
793
00:42:24,435 --> 00:42:30,080
It doesn't have descriptions in
every language, as you can see.
794
00:42:30,080 --> 00:42:30,800
OK?
795
00:42:30,800 --> 00:42:36,240
So why was Wikidata showing me
these languages and not others?
796
00:42:36,240 --> 00:42:39,260
I mean, why this somewhat
arbitrary collection--
797
00:42:39,260 --> 00:42:42,860
English, Arabic, Bulgarian,
German, French, and Hebrew?
798
00:42:42,860 --> 00:42:45,300
Because I told it to.
799
00:42:45,300 --> 00:42:50,390
So if we briefly click
over to my user page--
800
00:42:50,390 --> 00:42:52,730
again, like every wiki,
you have user accounts.
801
00:42:52,730 --> 00:42:53,960
You have user pages.
802
00:42:53,960 --> 00:42:55,380
This is my user page.
803
00:42:55,380 --> 00:42:59,750
And as you can see,
there's this little user
804
00:42:59,750 --> 00:43:03,230
information box here called
a Babel box by Wikipedians,
805
00:43:03,230 --> 00:43:06,610
where I list the
languages that I speak.
806
00:43:06,610 --> 00:43:11,000
And Wikidata uses this box
just to kind of helpfully
807
00:43:11,000 --> 00:43:12,944
show me these languages.
808
00:43:12,944 --> 00:43:14,360
Of course, all the
other languages
809
00:43:14,360 --> 00:43:19,580
are still available, as you saw,
by clicking the more languages.
810
00:43:19,580 --> 00:43:22,940
But this is just a
useful little way
811
00:43:22,940 --> 00:43:27,590
of getting the languages I
care about up there first.
812
00:43:27,590 --> 00:43:29,060
By the way, this is a lie.
813
00:43:29,060 --> 00:43:31,170
I don't actually
speak Bulgarian.
814
00:43:31,170 --> 00:43:33,740
That stayed on my user page
because I was demonstrating
815
00:43:33,740 --> 00:43:37,010
this in Bulgaria and I wanted
that label to show up there
816
00:43:37,010 --> 00:43:38,420
during the talk--
817
00:43:38,420 --> 00:43:40,250
just in case you
were going to tell me
818
00:43:40,250 --> 00:43:43,840
a really good Bulgarian joke.
819
00:43:43,840 --> 00:43:48,470
OK so for example, Hebrew
is my mother tongue.
820
00:43:48,470 --> 00:43:51,730
And we have a Hebrew
label for Harvey Milk.
821
00:43:51,730 --> 00:43:53,810
But we don't have a description.
822
00:43:53,810 --> 00:44:00,950
So let's fix that right now by
clicking the edit button right
823
00:44:00,950 --> 00:44:01,960
here.
824
00:44:01,960 --> 00:44:05,930
I click edit, and this
table became editable.
825
00:44:05,930 --> 00:44:09,661
And now I can very briefly
type a description.
826
00:44:09,661 --> 00:44:22,899
827
00:44:22,899 --> 00:44:24,440
AUDIENCE: Online in
about 20 seconds.
828
00:44:24,440 --> 00:44:25,400
But can we hold it?
829
00:44:25,400 --> 00:44:26,066
ASAF BARTOV: OK.
830
00:44:26,066 --> 00:44:28,454
831
00:44:28,454 --> 00:44:30,430
That was good timing
for the screen to crash.
832
00:44:30,430 --> 00:44:53,642
833
00:44:53,642 --> 00:44:54,142
OK?
834
00:44:54,142 --> 00:44:59,082
835
00:44:59,082 --> 00:45:01,800
Are we back?
836
00:45:01,800 --> 00:45:02,850
OK.
837
00:45:02,850 --> 00:45:03,690
Sorry about that.
838
00:45:03,690 --> 00:45:07,500
So this was all about what to
call him in different languages
839
00:45:07,500 --> 00:45:09,930
and scripts and how to
tell this person apart
840
00:45:09,930 --> 00:45:13,590
from other people with
potentially the same name.
841
00:45:13,590 --> 00:45:17,930
Let's scroll down and see
what else does Wikidata
842
00:45:17,930 --> 00:45:19,680
know about this person?
843
00:45:19,680 --> 00:45:24,060
So as you can see, this is
a list of statements, right?
844
00:45:24,060 --> 00:45:25,500
This is a list of statements.
845
00:45:25,500 --> 00:45:27,900
And the properties
are on the left,
846
00:45:27,900 --> 00:45:30,340
the values are on the right.
847
00:45:30,340 --> 00:45:33,870
So the first thing Wikidata
knows about Harvey Milk
848
00:45:33,870 --> 00:45:38,520
is a very important
property called instance of.
849
00:45:38,520 --> 00:45:39,910
Instance of.
850
00:45:39,910 --> 00:45:44,690
And the property instance of
answers the very basic question
851
00:45:44,690 --> 00:45:49,460
what kind of thing is
this that I'm describing?
852
00:45:49,460 --> 00:45:50,870
Is it a book?
853
00:45:50,870 --> 00:45:51,980
Is it a poem?
854
00:45:51,980 --> 00:45:53,570
Is it a mountain?
855
00:45:53,570 --> 00:45:55,520
Is it a theological concept?
856
00:45:55,520 --> 00:45:57,800
No, it's a human.
857
00:45:57,800 --> 00:46:00,020
It's a person, OK?
858
00:46:00,020 --> 00:46:01,880
The item about Mt.
859
00:46:01,880 --> 00:46:07,070
Everest will say
instance of mountain, OK?
860
00:46:07,070 --> 00:46:10,790
This is a very
important property.
861
00:46:10,790 --> 00:46:12,500
Why is it important?
862
00:46:12,500 --> 00:46:14,630
Wouldn't anyone looking
at this know that this is
863
00:46:14,630 --> 00:46:15,550
a human being?
864
00:46:15,550 --> 00:46:16,310
Yes.
865
00:46:16,310 --> 00:46:18,720
Anyone looking at
this will know.
866
00:46:18,720 --> 00:46:23,780
But if I want a computer to
be able to pull information
867
00:46:23,780 --> 00:46:28,160
about people, I want to
be able to easily exclude
868
00:46:28,160 --> 00:46:30,680
all the mountains and
poems and other things that
869
00:46:30,680 --> 00:46:33,440
are not people from my query.
870
00:46:33,440 --> 00:46:37,400
So this single datum,
this single piece of data,
871
00:46:37,400 --> 00:46:41,720
is what tells computers and
algorithms very clearly,
872
00:46:41,720 --> 00:46:42,890
this is a human.
873
00:46:42,890 --> 00:46:47,340
Things that aren't instance
of human are other things.
874
00:46:47,340 --> 00:46:48,230
OK?
875
00:46:48,230 --> 00:46:50,145
So it may sound very
trivial, but it's not.
876
00:46:50,145 --> 00:46:51,770
It's very important
to have an instance
877
00:46:51,770 --> 00:46:54,077
of field for Wikidata items.
878
00:46:54,077 --> 00:46:55,410
All right, what else do we know?
879
00:46:55,410 --> 00:46:59,360
Well, Wikidata knows about
an image for Harvey Milk.
880
00:46:59,360 --> 00:47:02,982
Again, we can find a ton of
images-- or maybe not a ton,
881
00:47:02,982 --> 00:47:04,940
but we can find dozens
of images of Harvey Milk
882
00:47:04,940 --> 00:47:10,430
on Commons, on our Wikimedia
multimedia repository.
883
00:47:10,430 --> 00:47:13,430
So why should we have a
single image here on Wikidata?
884
00:47:13,430 --> 00:47:16,280
Again, this is
mostly for reusers.
885
00:47:16,280 --> 00:47:18,920
If I'm building some kind of
tool that pulls information
886
00:47:18,920 --> 00:47:21,680
from Wikidata, it's
nice if there's
887
00:47:21,680 --> 00:47:24,680
at least one representative
image to kind of use
888
00:47:24,680 --> 00:47:30,300
as the default or immediate
image for Harvey Milk
889
00:47:30,300 --> 00:47:33,120
in some other reused context.
890
00:47:33,120 --> 00:47:34,770
All right, sex or gender--
891
00:47:34,770 --> 00:47:35,670
male.
892
00:47:35,670 --> 00:47:38,790
Country of citizenship--
United States of America.
893
00:47:38,790 --> 00:47:39,910
Given name is Harvey.
894
00:47:39,910 --> 00:47:41,580
The date of birth is so and so.
895
00:47:41,580 --> 00:47:44,340
The place of birth is Woodmere.
896
00:47:44,340 --> 00:47:45,870
The place of death
is San Francisco.
897
00:47:45,870 --> 00:47:48,640
The manner of death is homicide.
898
00:47:48,640 --> 00:47:50,930
Wikidata knows that.
899
00:47:50,930 --> 00:47:55,700
Now again, every
little datum like that
900
00:47:55,700 --> 00:48:02,210
is the basis for later querying
and answering questions.
901
00:48:02,210 --> 00:48:07,390
So the fact that we record the
manner of death of people--
902
00:48:07,390 --> 00:48:09,230
or at least of some people--
903
00:48:09,230 --> 00:48:11,900
will allow us later
to go, you know,
904
00:48:11,900 --> 00:48:17,120
who are some people from
Belgium who died by homicide?
905
00:48:17,120 --> 00:48:24,650
That's a question Wikidata can
answer, thanks to this field.
906
00:48:24,650 --> 00:48:27,680
The other thing I mentioned
is that things are links.
907
00:48:27,680 --> 00:48:29,680
So the place of
birth is Woodmere.
908
00:48:29,680 --> 00:48:31,900
I don't know where
Woodmere is, but I
909
00:48:31,900 --> 00:48:34,390
can click that and find out.
910
00:48:34,390 --> 00:48:38,270
Here is the Wikidata item
about Woodmere, right?
911
00:48:38,270 --> 00:48:41,230
It was the value in the
statement about Harvey Milk,
912
00:48:41,230 --> 00:48:43,900
but now I'm looking at
the item about Woodmere.
913
00:48:43,900 --> 00:48:48,047
And it turns out it's in
Nassau County, New York, right?
914
00:48:48,047 --> 00:48:50,380
And of course, Wikidata has
a whole bunch of information
915
00:48:50,380 --> 00:48:55,450
for me about Woodmere--
916
00:48:55,450 --> 00:48:59,720
what country it's in and the
coordinates and the population
917
00:48:59,720 --> 00:49:06,230
and the area, all the things you
would expect about a place, OK?
918
00:49:06,230 --> 00:49:07,512
Let's get back to Harvey Milk.
919
00:49:07,512 --> 00:49:10,370
920
00:49:10,370 --> 00:49:13,260
So the manner of death,
the cause of death--
921
00:49:13,260 --> 00:49:16,880
now here, Wikidata gives
us excellent information.
922
00:49:16,880 --> 00:49:20,390
The actual cause of death
is ballistic trauma.
923
00:49:20,390 --> 00:49:22,160
That's a professional term.
924
00:49:22,160 --> 00:49:27,560
And this statement
has qualifiers.
925
00:49:27,560 --> 00:49:30,650
So until now, I was talking
about triples, right?
926
00:49:30,650 --> 00:49:33,260
The item has a property
with a certain value.
927
00:49:33,260 --> 00:49:35,270
Actually, each
statement can also
928
00:49:35,270 --> 00:49:38,030
have a number of
qualifiers which
929
00:49:38,030 --> 00:49:45,424
add aspects of information,
still about that one question
930
00:49:45,424 --> 00:49:46,590
that we're answering, right?
931
00:49:46,590 --> 00:49:49,904
So if this property
answers cause of death,
932
00:49:49,904 --> 00:49:51,320
it's not discussing
anything else.
933
00:49:51,320 --> 00:49:52,880
It's not discussing languages.
934
00:49:52,880 --> 00:49:54,920
It's not discussing
date of birth, right?
935
00:49:54,920 --> 00:49:56,930
It's talking about
the cause of death.
936
00:49:56,930 --> 00:49:59,300
But we're not just
saying ballistic trauma.
937
00:49:59,300 --> 00:50:04,550
We're saying ballistic trauma
with the quantity attribute
938
00:50:04,550 --> 00:50:05,660
being five.
939
00:50:05,660 --> 00:50:07,550
What does that mean?
940
00:50:07,550 --> 00:50:08,870
Five bullets, right?
941
00:50:08,870 --> 00:50:12,780
There are five
ballistic traumas.
942
00:50:12,780 --> 00:50:15,300
He was he was shot five times.
943
00:50:15,300 --> 00:50:18,210
And he was shot by this
person named Dan White.
944
00:50:18,210 --> 00:50:25,020
And this ballistic trauma,
like this actual shooting,
945
00:50:25,020 --> 00:50:28,420
is itself the subject
of this other thing.
946
00:50:28,420 --> 00:50:31,440
This is a link to a
whole other Wikidata
947
00:50:31,440 --> 00:50:35,510
item about the Moscone-Milk
assassinations.
948
00:50:35,510 --> 00:50:38,610
Moscone was the San
Francisco mayor at the time.
949
00:50:38,610 --> 00:50:43,540
950
00:50:43,540 --> 00:50:47,510
We'll see slightly better or
easier to understand examples
951
00:50:47,510 --> 00:50:49,460
of qualifiers in a bit.
952
00:50:49,460 --> 00:50:54,440
So if this was
confusing, hang on.
953
00:50:54,440 --> 00:50:55,970
So he was killed by Dan White.
954
00:50:55,970 --> 00:50:57,800
He spoke English.
955
00:50:57,800 --> 00:50:59,960
His occupation--
here's an example
956
00:50:59,960 --> 00:51:03,140
of a property with more
than one value, right?
957
00:51:03,140 --> 00:51:06,260
So Milk was a politician.
958
00:51:06,260 --> 00:51:09,710
But he was also a Navy
officer, at least for a while.
959
00:51:09,710 --> 00:51:12,980
That was another thing that
he did during his life.
960
00:51:12,980 --> 00:51:15,350
And he was a human
rights activist, right?
961
00:51:15,350 --> 00:51:20,600
So some people are
writers and translators.
962
00:51:20,600 --> 00:51:22,610
So people can have more
than one occupation.
963
00:51:22,610 --> 00:51:26,310
People can speak more
than one language.
964
00:51:26,310 --> 00:51:29,130
Here's a better
example of a qualifier.
965
00:51:29,130 --> 00:51:35,090
So the property award received
has the value Presidential
966
00:51:35,090 --> 00:51:37,560
Medal of Freedom.
967
00:51:37,560 --> 00:51:42,570
And that award has an
attribute called point in time,
968
00:51:42,570 --> 00:51:44,070
like when was this?
969
00:51:44,070 --> 00:51:46,580
This was in 2009.
970
00:51:46,580 --> 00:51:50,510
Do you see that
this piece of data--
971
00:51:50,510 --> 00:52:04,780
2009-- is a sub-statement
or is subjugated
972
00:52:04,780 --> 00:52:09,621
to the context of this award,
was the Presidential Medal
973
00:52:09,621 --> 00:52:10,120
of Freedom?
974
00:52:10,120 --> 00:52:13,430
It can't just kind of
free float in the article.
975
00:52:13,430 --> 00:52:17,650
It's not that 2009 is itself
a meaningful thing, right?
976
00:52:17,650 --> 00:52:21,550
This medal was awarded in 2009.
977
00:52:21,550 --> 00:52:22,170
If
978
00:52:22,170 --> 00:52:24,070
Wikidata doesn't
tell us, for example,
979
00:52:24,070 --> 00:52:27,130
when he was a Navy officer, OK?
980
00:52:27,130 --> 00:52:30,100
But if we were, for example,
to look that up right now
981
00:52:30,100 --> 00:52:33,820
and find out that Milk was
a Navy officer between 1962
982
00:52:33,820 --> 00:52:39,542
and 1964, we could go back
here to the Navy officer bit
983
00:52:39,542 --> 00:52:41,010
and click edit.
984
00:52:41,010 --> 00:52:44,190
This is how I edit this
particular little piece
985
00:52:44,190 --> 00:52:45,360
of information.
986
00:52:45,360 --> 00:52:49,350
And add a qualifier like this.
987
00:52:49,350 --> 00:52:51,300
I click Add Qualifier.
988
00:52:51,300 --> 00:52:57,660
And I could pick start
time and end time, right?
989
00:52:57,660 --> 00:53:04,990
And then I could
type 1962 to 1964,
990
00:53:04,990 --> 00:53:08,000
and that would be
teaching Wikidata.
991
00:53:08,000 --> 00:53:10,660
Oh, I'm sorry, I meant to
do that for Navy officer.
992
00:53:10,660 --> 00:53:11,230
OK.
993
00:53:11,230 --> 00:53:14,800
But, you know,
that is the exact--
994
00:53:14,800 --> 00:53:18,400
the accurate time span
of that statement.
995
00:53:18,400 --> 00:53:22,850
So it's true to say about a
person, he was a Navy officer,
996
00:53:22,850 --> 00:53:25,990
even if of course he wasn't a
Navy officer his entire life.
997
00:53:25,990 --> 00:53:28,120
But it's better and
it's more accurate,
998
00:53:28,120 --> 00:53:32,260
to say he was a Navy officer
between 1962 and 1964.
999
00:53:32,260 --> 00:53:35,380
Don't worry, I'm
not saving this.
1000
00:53:35,380 --> 00:53:39,150
No vandalizing of
Wikidata in this session.
1001
00:53:39,150 --> 00:53:40,450
OK.
1002
00:53:40,450 --> 00:53:41,140
Moving on.
1003
00:53:41,140 --> 00:53:42,430
What else does Wikidata know?
1004
00:53:42,430 --> 00:53:43,960
He was educated at
this university.
1005
00:53:43,960 --> 00:53:46,970
He was a member of
this political party.
1006
00:53:46,970 --> 00:53:47,470
Right?
1007
00:53:47,470 --> 00:53:49,428
That's of course if
they're a relevant property
1008
00:53:49,428 --> 00:53:52,270
for a politician.
1009
00:53:52,270 --> 00:53:56,500
Religion, military branch,
what is the category on commons
1010
00:53:56,500 --> 00:53:58,720
that discusses this
item, is something
1011
00:53:58,720 --> 00:54:00,790
that Wikidata can tell us.
1012
00:54:00,790 --> 00:54:02,200
And that's it.
1013
00:54:02,200 --> 00:54:04,570
Now, is that everything
that we could possibly
1014
00:54:04,570 --> 00:54:07,780
say in a structured
way about Harvey Milk?
1015
00:54:07,780 --> 00:54:08,680
No.
1016
00:54:08,680 --> 00:54:13,570
We could probably find at
least a few more things to say.
1017
00:54:13,570 --> 00:54:17,170
We will see how to contribute
new information to Wikidata
1018
00:54:17,170 --> 00:54:19,990
in just a minute with
a different example.
1019
00:54:19,990 --> 00:54:23,360
But this-- all this was
a set of statements.
1020
00:54:23,360 --> 00:54:23,860
Right?
1021
00:54:23,860 --> 00:54:25,927
This was the title
statements here.
1022
00:54:25,927 --> 00:54:28,840
1023
00:54:28,840 --> 00:54:31,160
But at the bottom of the
list of statements is
1024
00:54:31,160 --> 00:54:34,300
another section
called identifiers.
1025
00:54:34,300 --> 00:54:36,960
And I want to spend a minute
talking about what that is.
1026
00:54:36,960 --> 00:54:43,630
So identifiers is a
collection of keys.
1027
00:54:43,630 --> 00:54:47,980
A collection of
IDs, or codes, that
1028
00:54:47,980 --> 00:54:52,890
are keys to other
information sources.
1029
00:54:52,890 --> 00:54:58,560
And a lot of Wikidata items
have a whole series of keys
1030
00:54:58,560 --> 00:55:03,030
to other databases, other
sites, other repositories,
1031
00:55:03,030 --> 00:55:08,340
that help you or a computer
be able to access not just
1032
00:55:08,340 --> 00:55:12,240
some database and look for
information about Harvey Milk,
1033
00:55:12,240 --> 00:55:16,950
but access the exact record
relevant to Harvey Milk.
1034
00:55:16,950 --> 00:55:20,280
And again, if you imagine
someone named John Smith,
1035
00:55:20,280 --> 00:55:21,690
that is really valuable, right?
1036
00:55:21,690 --> 00:55:23,250
If you're not just
told, oh yeah,
1037
00:55:23,250 --> 00:55:24,875
you can look at the
Library of Congress
1038
00:55:24,875 --> 00:55:27,840
for John Smith,
good luck with that.
1039
00:55:27,840 --> 00:55:30,240
Or if I tell you, go to
the Library of Congress
1040
00:55:30,240 --> 00:55:35,810
to this record for this John
Smith, you see the difference.
1041
00:55:35,810 --> 00:55:42,080
So Wikidata tells us that on
VIAF, which is the Virtual
1042
00:55:42,080 --> 00:55:44,570
International Authority File.
1043
00:55:44,570 --> 00:55:50,140
It's an aggregated master
index built by bibliographers,
1044
00:55:50,140 --> 00:55:52,831
by librarians, of people.
1045
00:55:52,831 --> 00:55:53,330
Right?
1046
00:55:53,330 --> 00:55:56,720
It tries to kind of aggregate
information about people
1047
00:55:56,720 --> 00:55:59,270
across library
catalogs everywhere.
1048
00:55:59,270 --> 00:56:05,120
So the VIAF ID for Harvey
Milk is this number.
1049
00:56:05,120 --> 00:56:07,340
And conveniently,
if I click that,
1050
00:56:07,340 --> 00:56:10,160
I'm not taking to
some Wikidata item.
1051
00:56:10,160 --> 00:56:13,010
I'm actually taken
to the relevant site.
1052
00:56:13,010 --> 00:56:16,760
So this took me right
to viaf.org, the Virtual
1053
00:56:16,760 --> 00:56:21,770
International Authority File,
directly to their record
1054
00:56:21,770 --> 00:56:23,310
about Harvey Milk.
1055
00:56:23,310 --> 00:56:23,810
All right?
1056
00:56:23,810 --> 00:56:27,290
And that itself leads
me to national catalogs
1057
00:56:27,290 --> 00:56:29,630
of national libraries
all over the world.
1058
00:56:29,630 --> 00:56:32,360
We won't get into the
things you can do with VIAF.
1059
00:56:32,360 --> 00:56:37,220
The point is Wikidata
contained the piece of thread
1060
00:56:37,220 --> 00:56:40,820
that I could tug on
to arrive directly
1061
00:56:40,820 --> 00:56:44,840
to that information
in other databases.
1062
00:56:44,840 --> 00:56:45,680
Yes.
1063
00:56:45,680 --> 00:56:49,670
And it has that for many,
many kinds of databases.
1064
00:56:49,670 --> 00:56:53,150
The BNF, for example, that's
the National Library of France.
1065
00:56:53,150 --> 00:56:56,270
And that will take me
to that index card.
1066
00:56:56,270 --> 00:56:57,320
IMDB.
1067
00:56:57,320 --> 00:56:58,620
We all know IMDB, right?
1068
00:56:58,620 --> 00:57:03,320
So here I have the key
to Harvey Milk in IMDB.
1069
00:57:03,320 --> 00:57:05,810
And this is what IMDB says
about Harvey Milk, right?
1070
00:57:05,810 --> 00:57:08,480
They have their own piece
of information about him,
1071
00:57:08,480 --> 00:57:11,590
of course, with filmography
and everything else.
1072
00:57:11,590 --> 00:57:15,140
And see, I did not have
to search IMDB for it.
1073
00:57:15,140 --> 00:57:19,070
I just had the key right
there waiting for me.
1074
00:57:19,070 --> 00:57:21,080
Now, again, this is
very convenient for me
1075
00:57:21,080 --> 00:57:24,590
as I just showed you the
human use case for this.
1076
00:57:24,590 --> 00:57:27,530
But it's even more
powerful in aggregate
1077
00:57:27,530 --> 00:57:35,450
when we allow computers to
traverse this network of links
1078
00:57:35,450 --> 00:57:36,110
between--
1079
00:57:36,110 --> 00:57:41,690
not just within wiki data, but
between data storage facilities
1080
00:57:41,690 --> 00:57:43,850
and repositories.
1081
00:57:43,850 --> 00:57:49,790
This is sometimes referred to
as the linked data open cloud.
1082
00:57:49,790 --> 00:57:52,670
Cloud, because it's multiple
different repositories
1083
00:57:52,670 --> 00:57:54,740
that are interlinked.
1084
00:57:54,740 --> 00:58:02,210
And Wikidata is already, and
to a growing extent, the Nexus,
1085
00:58:02,210 --> 00:58:04,460
the connection
point between a lot
1086
00:58:04,460 --> 00:58:06,780
of these different databases.
1087
00:58:06,780 --> 00:58:09,230
So IMDB, for example,
it's a good example
1088
00:58:09,230 --> 00:58:11,300
because it's site
almost everyone knows,
1089
00:58:11,300 --> 00:58:14,000
IMDB has information
about Harvey Milk.
1090
00:58:14,000 --> 00:58:16,670
But that information
does not include a link
1091
00:58:16,670 --> 00:58:19,140
to the French National Library.
1092
00:58:19,140 --> 00:58:19,645
Right?
1093
00:58:19,645 --> 00:58:20,770
Do you see what I'm saying?
1094
00:58:20,770 --> 00:58:25,550
So IMDB is a data repository
with IDs and allows linking.
1095
00:58:25,550 --> 00:58:28,100
But it does not give you
what Wikidata gives you which
1096
00:58:28,100 --> 00:58:32,850
is this kind of collection of--
1097
00:58:32,850 --> 00:58:36,330
it's like a junction of all
these different data sources.
1098
00:58:36,330 --> 00:58:37,910
So Wikidata is the
place where you
1099
00:58:37,910 --> 00:58:40,730
can document these
interrelationships
1100
00:58:40,730 --> 00:58:41,640
or equivalencies.
1101
00:58:41,640 --> 00:58:42,140
Right?
1102
00:58:42,140 --> 00:58:48,770
So ID, you know, 587548 on IMDB
is discussing the same topic
1103
00:58:48,770 --> 00:58:52,260
as French National
Library ID whatever.
1104
00:58:52,260 --> 00:58:55,210
Wikidata contains that
piece of information.
1105
00:58:55,210 --> 00:58:59,090
that this ID in this database
is about the same person
1106
00:58:59,090 --> 00:59:04,050
as that ID in that database.
1107
00:59:04,050 --> 00:59:05,290
OK.
1108
00:59:05,290 --> 00:59:07,420
So that's what
identifiers are about.
1109
00:59:07,420 --> 00:59:11,320
Still scrolling down the
Wikidata item about Harvey
1110
00:59:11,320 --> 00:59:15,500
Milk, we have the site links.
1111
00:59:15,500 --> 00:59:20,840
The site links are links
to Wikimedia projects
1112
00:59:20,840 --> 00:59:22,770
that are related to this item.
1113
00:59:22,770 --> 00:59:25,250
So of course there
are Wikipedia articles
1114
00:59:25,250 --> 00:59:28,880
about Harvey Milk in many,
many different wikipedias.
1115
00:59:28,880 --> 00:59:31,700
Quite a few language versions.
1116
00:59:31,700 --> 00:59:34,960
And there are
pages on Wikiquote,
1117
00:59:34,960 --> 00:59:36,680
one of the sister projects.
1118
00:59:36,680 --> 00:59:38,630
There are pages on
Wikiquote with some quotes
1119
00:59:38,630 --> 00:59:40,130
from Harvey Milk.
1120
00:59:40,130 --> 00:59:45,060
And there is even a page for
Harvey Milk on Wikisource.
1121
00:59:45,060 --> 00:59:45,560
Right?
1122
00:59:45,560 --> 00:59:47,840
So this is a collection
of those links.
1123
00:59:47,840 --> 00:59:52,760
And those of you who have maybe
only dealt with Wikidata data
1124
00:59:52,760 --> 00:59:57,290
for inter-wiki links, which
we used to do in the old days
1125
00:59:57,290 --> 00:59:59,600
manually within
the article text,
1126
00:59:59,600 --> 01:00:01,716
now we do it through
Wikidata, so maybe that's
1127
01:00:01,716 --> 01:00:03,590
the only thing you didn't
know about Wikidata
1128
01:00:03,590 --> 01:00:10,130
is how to update these
inter-wiki tables on Wikidata.
1129
01:00:10,130 --> 01:00:11,430
All right.
1130
01:00:11,430 --> 01:00:14,090
So that concludes
our little tour
1131
01:00:14,090 --> 01:00:18,560
of the anatomy of
a Wikidata page.
1132
01:00:18,560 --> 01:00:22,370
I will just remind you that
it's a wiki page, which
1133
01:00:22,370 --> 01:00:26,120
means it has a discussion
page, a talk page.
1134
01:00:26,120 --> 01:00:27,960
This one happens to be empty.
1135
01:00:27,960 --> 01:00:30,092
But, you know, if we have
concerns or arguments
1136
01:00:30,092 --> 01:00:31,550
about some of the
data here that is
1137
01:00:31,550 --> 01:00:33,290
what we would use
to discuss this
1138
01:00:33,290 --> 01:00:36,830
and to arrive at consensus.
1139
01:00:36,830 --> 01:00:41,760
It also has a history view just
like every Wikipedia article.
1140
01:00:41,760 --> 01:00:47,402
So you can see here
a list of edits.
1141
01:00:47,402 --> 01:00:48,860
Maybe some of you
have never looked
1142
01:00:48,860 --> 01:00:51,710
at a history page on Wikipedia,
so this looks overwhelming.
1143
01:00:51,710 --> 01:00:55,040
But every line here,
every entry here,
1144
01:00:55,040 --> 01:00:58,240
is a single edit, a single
revision, a single change
1145
01:00:58,240 --> 01:01:00,440
to this Wikidata item.
1146
01:01:00,440 --> 01:01:01,670
Just Harvey Milk.
1147
01:01:01,670 --> 01:01:04,250
And you can see at the very
top this edit that I just
1148
01:01:04,250 --> 01:01:06,680
made-- this is my
volunteer account
1149
01:01:06,680 --> 01:01:09,650
and I just made this edit,
and in parentheses you
1150
01:01:09,650 --> 01:01:10,790
can see what I did.
1151
01:01:10,790 --> 01:01:14,640
I added an HE,
Hebrew, description.
1152
01:01:14,640 --> 01:01:16,930
And this is the text
that I added in Hebrew.
1153
01:01:16,930 --> 01:01:17,430
Right?
1154
01:01:17,430 --> 01:01:21,470
So we can see who added
what to the Wikidata item,
1155
01:01:21,470 --> 01:01:24,960
just like we can do
the same on Wikipedia.
1156
01:01:24,960 --> 01:01:26,390
So we have the revision history.
1157
01:01:26,390 --> 01:01:27,560
We can undo edits.
1158
01:01:27,560 --> 01:01:30,320
We can revert, just
like on Wikipedia.
1159
01:01:30,320 --> 01:01:34,420
1160
01:01:34,420 --> 01:01:36,940
And what else did I
want to show here?
1161
01:01:36,940 --> 01:01:40,930
We can add an item to my
watch list using the star,
1162
01:01:40,930 --> 01:01:42,020
just like on Wikipedia.
1163
01:01:42,020 --> 01:01:46,670
So we have all these
standard wiki features
1164
01:01:46,670 --> 01:01:47,878
that we would come to expect.
1165
01:01:47,878 --> 01:01:50,440
1166
01:01:50,440 --> 01:01:54,270
Let's pause for questions.
1167
01:01:54,270 --> 01:01:58,412
Any questions about what
we've covered so far?
1168
01:01:58,412 --> 01:02:02,573
1169
01:02:02,573 --> 01:02:03,073
Yes.
1170
01:02:03,073 --> 01:02:06,950
1171
01:02:06,950 --> 01:02:11,345
Are attributes of statements
precept for the specific value?
1172
01:02:11,345 --> 01:02:16,640
1173
01:02:16,640 --> 01:02:19,830
No they're not reset.
1174
01:02:19,830 --> 01:02:29,760
And generally Wikidata data does
not enforce by default logic.
1175
01:02:29,760 --> 01:02:32,130
So, I mean, there's
nothing to prevent you
1176
01:02:32,130 --> 01:02:38,700
from editing the
item about Brazil,
1177
01:02:38,700 --> 01:02:42,990
and adding the property height.
1178
01:02:42,990 --> 01:02:46,690
1179
01:02:46,690 --> 01:02:50,430
Now height is not a relevant
property for a country.
1180
01:02:50,430 --> 01:02:50,970
Right?
1181
01:02:50,970 --> 01:02:53,880
I mean, maybe average
elevation, maybe.
1182
01:02:53,880 --> 01:02:56,400
But not just height,
which is used for humans
1183
01:02:56,400 --> 01:02:59,040
or for physical things.
1184
01:02:59,040 --> 01:03:02,400
So you could add that
property to Brazil and save it
1185
01:03:02,400 --> 01:03:04,650
and the wiki would not complain.
1186
01:03:04,650 --> 01:03:07,590
Now in the background
there are kind
1187
01:03:07,590 --> 01:03:13,020
of extra wiki outside the
wiki prostheses for constraint
1188
01:03:13,020 --> 01:03:13,710
validation.
1189
01:03:13,710 --> 01:03:16,050
So there are bots and
other processes that
1190
01:03:16,050 --> 01:03:17,940
run, and occasionally,
for example,
1191
01:03:17,940 --> 01:03:26,570
identify non-living things
with a date of birth field.
1192
01:03:26,570 --> 01:03:27,720
That's nonsensical.
1193
01:03:27,720 --> 01:03:29,010
That should not exist.
1194
01:03:29,010 --> 01:03:31,710
If someone mistakenly added
that there are processes
1195
01:03:31,710 --> 01:03:34,350
that would flag
that to be fixed.
1196
01:03:34,350 --> 01:03:36,690
But the wiki itself,
Wikidata, will not
1197
01:03:36,690 --> 01:03:38,550
prevent you from adding that.
1198
01:03:38,550 --> 01:03:41,940
And that is by design
to keep things flexible.
1199
01:03:41,940 --> 01:03:43,930
So that people don't
run into, oh wait,
1200
01:03:43,930 --> 01:03:46,560
but I can't add this
because nobody thought
1201
01:03:46,560 --> 01:03:49,830
that I would need this, maybe.
1202
01:03:49,830 --> 01:03:54,530
I hope that answers
your question.
1203
01:03:54,530 --> 01:03:57,290
You say helpful
answer, question mark.
1204
01:03:57,290 --> 01:03:59,510
So was it a helpful answer, or?
1205
01:03:59,510 --> 01:04:03,940
1206
01:04:03,940 --> 01:04:04,440
OK.
1207
01:04:04,440 --> 01:04:05,426
Yes, Eleanor.
1208
01:04:05,426 --> 01:04:10,707
AUDIENCE: [INAUDIBLE]
1209
01:04:10,707 --> 01:04:12,040
ASAF BARTOV: Excellent question.
1210
01:04:12,040 --> 01:04:13,030
I'll repeat it.
1211
01:04:13,030 --> 01:04:16,180
You ask how do I find
the wiki data item
1212
01:04:16,180 --> 01:04:18,370
number from Wikipedia.
1213
01:04:18,370 --> 01:04:21,580
If I'm reading about Harvey Milk
and I want to look at the data
1214
01:04:21,580 --> 01:04:23,600
how do I do that?
1215
01:04:23,600 --> 01:04:27,400
That is an excellent question
and let's skip to Wikipedia.
1216
01:04:27,400 --> 01:04:32,030
Conveniently I have the
link right here on English.
1217
01:04:32,030 --> 01:04:35,600
So this is the Wikipedia
article about Harvey Milk
1218
01:04:35,600 --> 01:04:42,740
and every item on Wikipedia
should have a wiki data
1219
01:04:42,740 --> 01:04:47,660
item associated with it, but it
doesn't happen automatically.
1220
01:04:47,660 --> 01:04:51,470
So if I just created
a page on Wikipedia
1221
01:04:51,470 --> 01:04:55,010
I also need to create a
Wikidata entity for it
1222
01:04:55,010 --> 01:04:57,170
if it doesn't already exist.
1223
01:04:57,170 --> 01:04:59,420
It could already exist
because it was already
1224
01:04:59,420 --> 01:05:01,970
covered in a different
language, for example.
1225
01:05:01,970 --> 01:05:05,390
So that was parenthetical.
1226
01:05:05,390 --> 01:05:09,020
But every article on Wikipedia
should have, here on the side,
1227
01:05:09,020 --> 01:05:14,270
on the side are under Tools,
a link called Wikidata item.
1228
01:05:14,270 --> 01:05:15,450
Right here.
1229
01:05:15,450 --> 01:05:16,160
OK.
1230
01:05:16,160 --> 01:05:18,110
That Wikidata data
item is a link
1231
01:05:18,110 --> 01:05:21,710
that takes you to
Wikidata, to the entity,
1232
01:05:21,710 --> 01:05:23,510
and there you find the number.
1233
01:05:23,510 --> 01:05:25,370
You can-- you don't
even have to click it.
1234
01:05:25,370 --> 01:05:27,830
I mean, the URL itself
tells you the number.
1235
01:05:27,830 --> 01:05:34,620
The number, you see, it's
wikidata.org/wiki/q17141.
1236
01:05:34,620 --> 01:05:35,444
OK.
1237
01:05:35,444 --> 01:05:36,860
So that was an
excellent question.
1238
01:05:36,860 --> 01:05:37,686
Other questions?
1239
01:05:37,686 --> 01:05:38,185
Yes.
1240
01:05:38,185 --> 01:05:41,470
1241
01:05:41,470 --> 01:05:44,430
Yeah, about the additional
attributes, the qualifiers.
1242
01:05:44,430 --> 01:05:46,920
So, yes, I answered
more generically.
1243
01:05:46,920 --> 01:05:49,370
But just like the
properties themselves
1244
01:05:49,370 --> 01:05:53,390
are not limited per item,
the qualifiers per statement
1245
01:05:53,390 --> 01:05:57,750
are also not
entirely preordained.
1246
01:05:57,750 --> 01:05:59,570
But there is some
structure to it.
1247
01:05:59,570 --> 01:06:03,140
I don't want to go into it
at great length right now.
1248
01:06:03,140 --> 01:06:06,320
If we have time in the end
we can get back to that.
1249
01:06:06,320 --> 01:06:09,590
But some qualifiers are again
relevant for some things,
1250
01:06:09,590 --> 01:06:13,180
start time, end time,
and others won't be.
1251
01:06:13,180 --> 01:06:16,280
Wikidata does try to offer you--
1252
01:06:16,280 --> 01:06:18,710
you may remember when I
clicked add qualifier,
1253
01:06:18,710 --> 01:06:22,170
it gave me kind of drop down
of some relevant qualifiers.
1254
01:06:22,170 --> 01:06:24,475
So it does try to
help you in that way.
1255
01:06:24,475 --> 01:06:27,280
1256
01:06:27,280 --> 01:06:28,160
Other question?
1257
01:06:28,160 --> 01:06:31,180
Are the values for
instance of already
1258
01:06:31,180 --> 01:06:33,310
mappable to external ontologies?
1259
01:06:33,310 --> 01:06:36,500
1260
01:06:36,500 --> 01:06:41,310
That is a complicated question.
1261
01:06:41,310 --> 01:06:43,490
I'll help people understand
the question first.
1262
01:06:43,490 --> 01:06:48,570
So an ontology is a
structure, some kind
1263
01:06:48,570 --> 01:06:52,350
of hierarchy or
cloud, of entities
1264
01:06:52,350 --> 01:06:54,510
and their interrelationships.
1265
01:06:54,510 --> 01:06:56,920
An ontology would
say, for example,
1266
01:06:56,920 --> 01:06:58,710
a person is a living thing.
1267
01:06:58,710 --> 01:06:59,670
So is a dog.
1268
01:06:59,670 --> 01:07:02,340
They're both living things,
but they're different things.
1269
01:07:02,340 --> 01:07:09,910
And then, you know, say
things about those entities
1270
01:07:09,910 --> 01:07:11,350
and their interrelationships.
1271
01:07:11,350 --> 01:07:13,300
Now there are many,
many competing,
1272
01:07:13,300 --> 01:07:17,230
or coexisting models
of ontology's.
1273
01:07:17,230 --> 01:07:19,840
Many of them were created
for specific needs.
1274
01:07:19,840 --> 01:07:25,170
Many of them want to be
a universal ontology.
1275
01:07:25,170 --> 01:07:27,790
But of course it's
impossible to quite
1276
01:07:27,790 --> 01:07:32,150
agree on one complete
and simple ontology.
1277
01:07:32,150 --> 01:07:34,240
And so there are
many ontology's.
1278
01:07:34,240 --> 01:07:38,520
Which brings up your question,
can we map across ontology's?
1279
01:07:38,520 --> 01:07:43,840
Can we say that when wiki data
says instance of book that
1280
01:07:43,840 --> 01:07:47,260
is equivalent to some other
ontology saying instance
1281
01:07:47,260 --> 01:07:49,940
of bibliographic record?
1282
01:07:49,940 --> 01:07:50,860
And the answer is yes.
1283
01:07:50,860 --> 01:07:52,360
There are some such mappings.
1284
01:07:52,360 --> 01:07:54,420
They are incomplete.
1285
01:07:54,420 --> 01:07:58,240
And there's no kind of
auto magic thing happening
1286
01:07:58,240 --> 01:08:01,180
in the wiki vis-a-vis
those other ontology's.
1287
01:08:01,180 --> 01:08:03,250
That's kind of
left as an exercise
1288
01:08:03,250 --> 01:08:06,280
for those dealing with those
other ontology's, and for tool
1289
01:08:06,280 --> 01:08:09,880
builders and other
platform improvements
1290
01:08:09,880 --> 01:08:13,050
beyond Wikidata itself.
1291
01:08:13,050 --> 01:08:13,750
OK.
1292
01:08:13,750 --> 01:08:15,190
Other questions?
1293
01:08:15,190 --> 01:08:17,430
Yeah, we have one from
the YouTube stream.
1294
01:08:17,430 --> 01:08:21,160
Someone asked, why can't I
link Howard Carter's occupation
1295
01:08:21,160 --> 01:08:26,439
to archeologists when I use
an info box that fetches info
1296
01:08:26,439 --> 01:08:28,960
from Wikidata?
1297
01:08:28,960 --> 01:08:33,160
Why can't I link it
from the info box?
1298
01:08:33,160 --> 01:08:35,500
So, someone on the
stream answered
1299
01:08:35,500 --> 01:08:37,659
saying, because it's
an improper connection,
1300
01:08:37,659 --> 01:08:39,700
because the target is not
about the subject only.
1301
01:08:39,700 --> 01:08:43,020
1302
01:08:43,020 --> 01:08:46,710
The target is not
about the subject?
1303
01:08:46,710 --> 01:08:48,479
If I understand the
question correctly,
1304
01:08:48,479 --> 01:08:53,130
what you would want to be able
to do is from within Wikipedia
1305
01:08:53,130 --> 01:08:59,130
be able to say occupation
and link to a Wikidata entry
1306
01:08:59,130 --> 01:09:01,050
about archeology.
1307
01:09:01,050 --> 01:09:03,569
That doesn't quite
work that way.
1308
01:09:03,569 --> 01:09:05,430
We will get to a
little discussion
1309
01:09:05,430 --> 01:09:08,460
of that in an upcoming
section of this talk.
1310
01:09:08,460 --> 01:09:13,260
So I will defer the rest
of my answer to then.
1311
01:09:13,260 --> 01:09:15,319
OK.
1312
01:09:15,319 --> 01:09:19,160
So we're done with
questions for this phase,
1313
01:09:19,160 --> 01:09:22,850
and my browser got
tired of waiting for me.
1314
01:09:22,850 --> 01:09:26,551
So, yes.
1315
01:09:26,551 --> 01:09:27,050
All right.
1316
01:09:27,050 --> 01:09:36,850
So we took a look at Wikidata,
and we took questions.
1317
01:09:36,850 --> 01:09:41,020
So now, let's teach
Wikidata some new things.
1318
01:09:41,020 --> 01:09:44,020
Some things it
doesn't already know.
1319
01:09:44,020 --> 01:09:47,109
Let's look at this item here.
1320
01:09:47,109 --> 01:09:50,950
So this item is about one
of my favorite writers,
1321
01:09:50,950 --> 01:09:53,840
an American writer
named Helen Dewitt.
1322
01:09:53,840 --> 01:10:01,570
Wikidata, of course, fondly
refers to her as q54674,
1323
01:10:01,570 --> 01:10:03,070
but we can call
her Helen Dewitt.
1324
01:10:03,070 --> 01:10:05,740
And what can we contribute here?
1325
01:10:05,740 --> 01:10:10,600
So Wikidata has far less
information about Helen Dewitt.
1326
01:10:10,600 --> 01:10:13,144
Most of you probably haven't
heard of her, that's OK.
1327
01:10:13,144 --> 01:10:14,560
What does Wikidata
know about her?
1328
01:10:14,560 --> 01:10:16,450
Well instance of human.
1329
01:10:16,450 --> 01:10:17,800
We have a photo of her.
1330
01:10:17,800 --> 01:10:18,780
She's female.
1331
01:10:18,780 --> 01:10:20,530
She's an American.
1332
01:10:20,530 --> 01:10:21,790
Her name is Helen.
1333
01:10:21,790 --> 01:10:22,630
Date of birth.
1334
01:10:22,630 --> 01:10:23,650
Place of birth.
1335
01:10:23,650 --> 01:10:25,970
She's an author, a
novelist, a writer.
1336
01:10:25,970 --> 01:10:28,840
She was educated at the
University of Oxford.
1337
01:10:28,840 --> 01:10:33,160
And Wikidata knows what
her official website is.
1338
01:10:33,160 --> 01:10:35,780
That's useful, but that's it.
1339
01:10:35,780 --> 01:10:37,780
Now we can contribute
information here.
1340
01:10:37,780 --> 01:10:43,120
For example, she's an American
author writing in English.
1341
01:10:43,120 --> 01:10:45,550
So we could add
that information.
1342
01:10:45,550 --> 01:10:48,430
We could click the
Add button here.
1343
01:10:48,430 --> 01:10:50,200
And this is a good
moment to acknowledge
1344
01:10:50,200 --> 01:10:54,830
that the user interface of
Wikidata is a work in progress.
1345
01:10:54,830 --> 01:10:56,740
It's not as intuitive
as it might be.
1346
01:10:56,740 --> 01:10:58,570
So you need to
understand that click--
1347
01:10:58,570 --> 01:11:01,630
to add a completely
new property,
1348
01:11:01,630 --> 01:11:04,060
You need to click
this Add button.
1349
01:11:04,060 --> 01:11:08,020
If you want to add an additional
value to the property official
1350
01:11:08,020 --> 01:11:11,530
website, you need to
click this Add button.
1351
01:11:11,530 --> 01:11:13,780
It makes a kind of
sense with a shaded box.
1352
01:11:13,780 --> 01:11:15,880
But, you know, you need
to kind of pay attention,
1353
01:11:15,880 --> 01:11:18,901
and it's not as
friendly as it might be.
1354
01:11:18,901 --> 01:11:20,650
[COUGHING] Excuse me.
1355
01:11:20,650 --> 01:11:23,380
So, let's add a property here.
1356
01:11:23,380 --> 01:11:25,690
Click the Add button.
1357
01:11:25,690 --> 01:11:29,740
Again, Wikidata tries to
be useful by suggesting
1358
01:11:29,740 --> 01:11:32,760
some relevant
properties for humans.
1359
01:11:32,760 --> 01:11:36,640
A bit more morbidly it suggests,
how about date of death?
1360
01:11:36,640 --> 01:11:38,700
That's not cool, Wikidata.
1361
01:11:38,700 --> 01:11:40,480
Helen Dewitt is still alive.
1362
01:11:40,480 --> 01:11:42,700
So I will not add
date of death, but I
1363
01:11:42,700 --> 01:11:46,140
can add languages spoken,
written, or signed.
1364
01:11:46,140 --> 01:11:48,370
OK, so I click that.
1365
01:11:48,370 --> 01:11:51,670
And she writes in English.
1366
01:11:51,670 --> 01:11:54,450
I just type English-- whoops.
1367
01:11:54,450 --> 01:11:56,750
Not in Hebrew.
1368
01:11:56,750 --> 01:11:58,380
Don't panic.
1369
01:11:58,380 --> 01:12:01,010
I type English here.
1370
01:12:01,010 --> 01:12:04,250
And, oh, and of course Wikidata
has auto-complete, right?
1371
01:12:04,250 --> 01:12:06,080
So it tries to help me along.
1372
01:12:06,080 --> 01:12:10,100
But you will notice that
it has all kinds of things
1373
01:12:10,100 --> 01:12:10,940
called English.
1374
01:12:10,940 --> 01:12:14,030
I mean, it turns out that
there is a place in Indiana
1375
01:12:14,030 --> 01:12:16,370
called English, Indiana.
1376
01:12:16,370 --> 01:12:17,150
Did I mean that?
1377
01:12:17,150 --> 01:12:20,210
No, of course I didn't mean
that she writes her books
1378
01:12:20,210 --> 01:12:21,961
in English, Indiana.
1379
01:12:21,961 --> 01:12:22,460
Right?
1380
01:12:22,460 --> 01:12:26,180
But, you know, Wikidata gives me
the option of linking to that.
1381
01:12:26,180 --> 01:12:30,530
I also don't mean the botanist
Carl Schwartz English.
1382
01:12:30,530 --> 01:12:32,870
No, no I mean the
west Germanic language
1383
01:12:32,870 --> 01:12:34,029
originating in England.
1384
01:12:34,029 --> 01:12:34,820
That's what I mean.
1385
01:12:34,820 --> 01:12:36,110
So I click that.
1386
01:12:36,110 --> 01:12:37,760
And I click Save.
1387
01:12:37,760 --> 01:12:38,450
And that's it.
1388
01:12:38,450 --> 01:12:41,780
Again I have just made
an edit to Wikidata.
1389
01:12:41,780 --> 01:12:47,750
I have just taught Wikidata
that this author speaks English.
1390
01:12:47,750 --> 01:12:50,370
Now, again, this
may be very obvious.
1391
01:12:50,370 --> 01:12:52,280
She's American.
1392
01:12:52,280 --> 01:12:54,560
Of course not all
Americans write in English.
1393
01:12:54,560 --> 01:12:56,930
It may be obvious if
you look at her books.
1394
01:12:56,930 --> 01:12:59,060
The important thing
is that now Wikidata
1395
01:12:59,060 --> 01:13:02,090
knows this as a piece of data.
1396
01:13:02,090 --> 01:13:04,610
And, again, think ahead
to queries, which we will
1397
01:13:04,610 --> 01:13:06,980
demonstrate in a little bit.
1398
01:13:06,980 --> 01:13:09,000
Without this piece
of information
1399
01:13:09,000 --> 01:13:14,060
that I just added, if I were to
ask Wikidata five minutes ago,
1400
01:13:14,060 --> 01:13:19,760
give me a list of novelists
writing in English, OK,
1401
01:13:19,760 --> 01:13:22,730
Wikidata would have returned
thousands of results.
1402
01:13:22,730 --> 01:13:27,600
But Helen Dewitt would
not have been among them.
1403
01:13:27,600 --> 01:13:32,000
Because up until two
minutes ago Wikidata
1404
01:13:32,000 --> 01:13:35,640
didn't know that Helen Dewitt
writes in English and not
1405
01:13:35,640 --> 01:13:37,520
in Spanish.
1406
01:13:37,520 --> 01:13:38,730
Do you see?
1407
01:13:38,730 --> 01:13:42,570
It is this explicit
statement that will now
1408
01:13:42,570 --> 01:13:46,560
make her be included in any
future queries that asks,
1409
01:13:46,560 --> 01:13:48,700
who are novelists
writing in English?
1410
01:13:48,700 --> 01:13:53,250
1411
01:13:53,250 --> 01:13:54,500
OK.
1412
01:13:54,500 --> 01:13:58,560
By the way, she's
a PhD in Classics.
1413
01:13:58,560 --> 01:14:05,590
She speaks-- or at least reads
and writes Latin and Greek,
1414
01:14:05,590 --> 01:14:07,270
ancient Greek, and I could--
1415
01:14:07,270 --> 01:14:09,610
I can-- I mean, I
happen to know that.
1416
01:14:09,610 --> 01:14:12,420
But wait, wait, wait,
wait, wait, you say.
1417
01:14:12,420 --> 01:14:14,130
What about original research?
1418
01:14:14,130 --> 01:14:18,890
I mean, you can't just add
stuff like that to Wikidata.
1419
01:14:18,890 --> 01:14:19,920
Don't you need sources?
1420
01:14:19,920 --> 01:14:22,860
Citations?
1421
01:14:22,860 --> 01:14:23,890
Of course I do.
1422
01:14:23,890 --> 01:14:25,020
Yes.
1423
01:14:25,020 --> 01:14:27,720
Let's add some sources to this.
1424
01:14:27,720 --> 01:14:31,410
So on Wikidata,
just like Wikipedia,
1425
01:14:31,410 --> 01:14:34,980
things should generally
be supported by citations,
1426
01:14:34,980 --> 01:14:36,990
by references.
1427
01:14:36,990 --> 01:14:43,290
And just like Wikipedia,
they aren't always supported
1428
01:14:43,290 --> 01:14:44,650
in that way.
1429
01:14:44,650 --> 01:14:48,870
OK so, I mean, I can
just add it to Wikidata.
1430
01:14:48,870 --> 01:14:49,442
Watch me.
1431
01:14:49,442 --> 01:14:50,400
I just did that, right?
1432
01:14:50,400 --> 01:14:54,450
I just added English and
Latin without any citation,
1433
01:14:54,450 --> 01:14:56,850
and I will not be
arrested for it.
1434
01:14:56,850 --> 01:14:59,520
Just like I could edit
a Wikipedia article
1435
01:14:59,520 --> 01:15:02,610
and add some information
without a citation.
1436
01:15:02,610 --> 01:15:03,600
It may stick.
1437
01:15:03,600 --> 01:15:06,810
It may stay in the article,
or it may be reverted.
1438
01:15:06,810 --> 01:15:11,010
It depends on the kind of
information I'm adding.
1439
01:15:11,010 --> 01:15:13,740
It depends how many people
are paying attention
1440
01:15:13,740 --> 01:15:15,060
to the article on Wikipedia.
1441
01:15:15,060 --> 01:15:18,420
And it works the
same way on Wikidata.
1442
01:15:18,420 --> 01:15:21,780
OK, so, you can add some
things without references.
1443
01:15:21,780 --> 01:15:23,970
Ideally, when you
add, information you
1444
01:15:23,970 --> 01:15:25,570
should include references.
1445
01:15:25,570 --> 01:15:30,990
So let's be good Wikidata
citizens and add a source.
1446
01:15:30,990 --> 01:15:34,395
Here is an article that
I prepared in advance.
1447
01:15:34,395 --> 01:15:38,100
1448
01:15:38,100 --> 01:15:39,370
This is Helen Dewitt.
1449
01:15:39,370 --> 01:15:44,450
And in this article,
somewhere, it actually
1450
01:15:44,450 --> 01:15:51,770
says right at the
bottom here, see,
1451
01:15:51,770 --> 01:15:54,990
Dewitt knows, in descending
order of proficiency, Latin,
1452
01:15:54,990 --> 01:15:57,010
ancient Greek, French,
German, Spanish,
1453
01:15:57,010 --> 01:15:59,460
and Portuguese, Dutch, Danish,
Norwegian, Swedish, Arabic,
1454
01:15:59,460 --> 01:16:01,680
Hebrew and Japanese.
1455
01:16:01,680 --> 01:16:04,770
This may sound
excessive, but it's true.
1456
01:16:04,770 --> 01:16:06,330
I met this woman.
1457
01:16:06,330 --> 01:16:09,670
So anyway, we don't have
to include all of that.
1458
01:16:09,670 --> 01:16:13,050
The point is this article from
a reasonably reliable source,
1459
01:16:13,050 --> 01:16:15,840
this magazine,
this interview, can
1460
01:16:15,840 --> 01:16:19,270
count as a source for
the languages she speaks.
1461
01:16:19,270 --> 01:16:20,700
So I copy the URL.
1462
01:16:20,700 --> 01:16:23,130
I just copied off my browser.
1463
01:16:23,130 --> 01:16:27,530
And, whoops-- that's not--
1464
01:16:27,530 --> 01:16:28,580
here we go.
1465
01:16:28,580 --> 01:16:31,610
And I can just add
a reference here
1466
01:16:31,610 --> 01:16:34,670
to the information that I
just added to Wikidata, right?
1467
01:16:34,670 --> 01:16:38,300
I can click Add Reference.
1468
01:16:38,300 --> 01:16:45,800
And then just say the reference
URL is, and I just paste.
1469
01:16:45,800 --> 01:16:48,840
I paste this URL.
1470
01:16:48,840 --> 01:16:50,160
Hit Enter.
1471
01:16:50,160 --> 01:16:51,060
And that's it.
1472
01:16:51,060 --> 01:16:55,380
And now the fact that she
speaks Latin has a reference.
1473
01:16:55,380 --> 01:16:58,320
If you look at the other
things here on Wikidata,
1474
01:16:58,320 --> 01:17:02,660
you can see that these IDs, for
example, have references, too.
1475
01:17:02,660 --> 01:17:03,420
Right?
1476
01:17:03,420 --> 01:17:06,570
In this case, the reference
just says, excuse me--
1477
01:17:06,570 --> 01:17:14,760
1478
01:17:14,760 --> 01:17:18,600
In this case it just as
imported from English Wikipedia.
1479
01:17:18,600 --> 01:17:24,970
But wait, you say, can
Wikipedia be a source?
1480
01:17:24,970 --> 01:17:26,620
Not properly, no.
1481
01:17:26,620 --> 01:17:30,100
I mean, just like Wikipedia
itself doesn't cite itself.
1482
01:17:30,100 --> 01:17:33,790
We don't say, this person
was born in this city
1483
01:17:33,790 --> 01:17:34,870
how do we know?
1484
01:17:34,870 --> 01:17:37,210
We read it on Wikipedia
in another language.
1485
01:17:37,210 --> 01:17:39,610
That's not a good citation.
1486
01:17:39,610 --> 01:17:41,400
It's not a good
citation for Wikidata
1487
01:17:41,400 --> 01:17:45,040
either so why do we put it here?
1488
01:17:45,040 --> 01:17:49,240
Well you can see the qualifier
here is different, right?
1489
01:17:49,240 --> 01:17:53,535
It's not reference URL, which
is what I put in for Latin here.
1490
01:17:53,535 --> 01:18:17,020
1491
01:18:17,020 --> 01:18:20,320
It's not reference URL here,
it's a different qualifier.
1492
01:18:20,320 --> 01:18:23,020
It says-- saying, imported from.
1493
01:18:23,020 --> 01:18:25,960
So this is not an
actual reference that
1494
01:18:25,960 --> 01:18:27,610
supports this piece of data.
1495
01:18:27,610 --> 01:18:30,730
It just shows where did
this data come from.
1496
01:18:30,730 --> 01:18:33,670
It's a slightly different
thing, because this data was
1497
01:18:33,670 --> 01:18:37,210
mass imported into Wikidata.
1498
01:18:37,210 --> 01:18:40,960
So it wasn't input by
hand by some volunteer.
1499
01:18:40,960 --> 01:18:44,770
It was imported into Wikidata
en masse by a script,
1500
01:18:44,770 --> 01:18:46,180
by a program.
1501
01:18:46,180 --> 01:18:49,820
And we want to know, where
did this number come from?
1502
01:18:49,820 --> 01:18:51,440
Well it came from
English Wikipedia.
1503
01:18:51,440 --> 01:18:54,130
So again, that's not
a proper reference
1504
01:18:54,130 --> 01:18:56,200
for the validity
of the information,
1505
01:18:56,200 --> 01:18:59,200
but it does at least tell us
it came from English Wikipedia.
1506
01:18:59,200 --> 01:19:03,460
We can click and look on
English Wikipedia and find out.
1507
01:19:03,460 --> 01:19:05,230
Maybe there's a
footnote there that
1508
01:19:05,230 --> 01:19:08,970
says where it did come from.
1509
01:19:08,970 --> 01:19:11,000
OK.
1510
01:19:11,000 --> 01:19:15,320
So this was an example of
teaching Wikidata something
1511
01:19:15,320 --> 01:19:16,910
that it didn't know.
1512
01:19:16,910 --> 01:19:18,512
Something about the languages.
1513
01:19:18,512 --> 01:19:20,720
And of course I could add
this reference for English.
1514
01:19:20,720 --> 01:19:23,210
I could add all the other
languages that she speaks.
1515
01:19:23,210 --> 01:19:26,060
And I won't bore you with
that, but that is basically
1516
01:19:26,060 --> 01:19:27,050
how it's done.
1517
01:19:27,050 --> 01:19:29,720
So you click this Add to
add a completely new--
1518
01:19:29,720 --> 01:19:32,650
1519
01:19:32,650 --> 01:19:34,030
completely new statement.
1520
01:19:34,030 --> 01:19:36,250
Now, by the way, the fact
that these are the only two
1521
01:19:36,250 --> 01:19:39,220
suggestions that
Wikidata can think of,
1522
01:19:39,220 --> 01:19:42,100
doesn't mean these
are the only options.
1523
01:19:42,100 --> 01:19:46,750
OK, you can just type
anything that may be relevant.
1524
01:19:46,750 --> 01:19:50,950
We could add, for
example, award.
1525
01:19:50,950 --> 01:19:52,570
Just start typing award.
1526
01:19:52,570 --> 01:19:54,910
And here I have I have
a bunch of properties
1527
01:19:54,910 --> 01:19:56,510
that are relevant for awards.
1528
01:19:56,510 --> 01:20:00,100
Awards received, together
with, conferred by, right?
1529
01:20:00,100 --> 01:20:05,790
There's all kinds of properties
that I could rely on.
1530
01:20:05,790 --> 01:20:09,600
And of course there is a list of
all the properties of Wikidata.
1531
01:20:09,600 --> 01:20:11,580
And that list is
also sorted by type.
1532
01:20:11,580 --> 01:20:15,480
So yes, there is a list of
properties relevant to people
1533
01:20:15,480 --> 01:20:17,130
so that you don't have to guess.
1534
01:20:17,130 --> 01:20:18,660
But a surprising
amount of the time
1535
01:20:18,660 --> 01:20:22,760
you can just start typing
and get the right properties
1536
01:20:22,760 --> 01:20:25,340
suggested to you.
1537
01:20:25,340 --> 01:20:27,230
OK.
1538
01:20:27,230 --> 01:20:33,050
So we taught Wikidata
something new,
1539
01:20:33,050 --> 01:20:38,980
and now let's teach Wikidata
something completely new.
1540
01:20:38,980 --> 01:20:39,480
Right?
1541
01:20:39,480 --> 01:20:42,480
So how do we create
a new Wikidata item?
1542
01:20:42,480 --> 01:20:46,880
So, like I said, if I
created a Wikipedia article
1543
01:20:46,880 --> 01:20:49,520
about something that was
not previously covered
1544
01:20:49,520 --> 01:20:53,540
on any other
Wikipedia, chances are
1545
01:20:53,540 --> 01:20:57,170
there would not be an already
existing Wikidata item.
1546
01:20:57,170 --> 01:21:03,190
Sometimes there might
be, because Wikidata
1547
01:21:03,190 --> 01:21:06,857
does have 25 million entities.
1548
01:21:06,857 --> 01:21:08,190
But sometimes there wouldn't be.
1549
01:21:08,190 --> 01:21:10,148
So, first of all, I could
search for it, right?
1550
01:21:10,148 --> 01:21:14,210
So I could go to Wikidata
to the search box
1551
01:21:14,210 --> 01:21:17,390
here and just start typing, and
search for what I want, right?
1552
01:21:17,390 --> 01:21:20,690
So if I'm searching for Helen
Dewitt I just say Helen,
1553
01:21:20,690 --> 01:21:25,590
and I can see whether
or not it exists.
1554
01:21:25,590 --> 01:21:29,240
And there's a detailed search
results page, et cetera,
1555
01:21:29,240 --> 01:21:33,074
where I can where I can find out
if the item does exist or not.
1556
01:21:33,074 --> 01:21:35,240
Excuse me, this reminds me
of a very important thing
1557
01:21:35,240 --> 01:21:36,620
I wanted to
demonstrate, and that
1558
01:21:36,620 --> 01:21:42,710
is the multilingualism
of Wikidata.
1559
01:21:42,710 --> 01:21:49,340
So remember all these
labels in other languages.
1560
01:21:49,340 --> 01:21:54,390
Wikidata knows what to call
Helen Dewitt in Hebrew.
1561
01:21:54,390 --> 01:22:00,800
And it will show it to Wikidata
users whose language is Hebrew.
1562
01:22:00,800 --> 01:22:04,220
Mine is set to
English, for your sake.
1563
01:22:04,220 --> 01:22:08,830
But if I change this I go to
Preferences here and change
1564
01:22:08,830 --> 01:22:09,740
my language.
1565
01:22:09,740 --> 01:22:15,475
[INAUDIBLE] All
right, and I hit Save.
1566
01:22:15,475 --> 01:22:20,350
Wikidata will start
talking to me in Hebrew.
1567
01:22:20,350 --> 01:22:23,090
Now brace yourselves.
1568
01:22:23,090 --> 01:22:24,620
Are you ready?
1569
01:22:24,620 --> 01:22:28,430
Don't panic, it's right to left.
1570
01:22:28,430 --> 01:22:32,630
Oh my god everything
is topsy-turvy.
1571
01:22:32,630 --> 01:22:36,590
So this is the same
article in Hebrew.
1572
01:22:36,590 --> 01:22:39,290
So the sidebar has
switched direction,
1573
01:22:39,290 --> 01:22:41,300
and I know most of
you cannot read it.
1574
01:22:41,300 --> 01:22:42,480
Bear with me.
1575
01:22:42,480 --> 01:22:44,750
This is the label
that we previously
1576
01:22:44,750 --> 01:22:46,840
saw in the label box.
1577
01:22:46,840 --> 01:22:49,580
This is how you spell
Helen Dewitt in Hebrew.
1578
01:22:49,580 --> 01:22:52,550
And here is the
description in Hebrew.
1579
01:22:52,550 --> 01:22:54,980
It's not the description in
English, this description,
1580
01:22:54,980 --> 01:22:57,380
American writer, which
I was shown previously.
1581
01:22:57,380 --> 01:23:00,740
Now I'm shown the Hebrew
description, appropriately.
1582
01:23:00,740 --> 01:23:03,500
But more interestingly,
oh my god!
1583
01:23:03,500 --> 01:23:07,640
All these statements
are suddenly in Hebrew.
1584
01:23:07,640 --> 01:23:08,940
How did that happen?
1585
01:23:08,940 --> 01:23:11,570
1586
01:23:11,570 --> 01:23:15,560
Well this tiny word here
is the very concise way
1587
01:23:15,560 --> 01:23:22,450
to say in Hebrew, instance of,
and this word here means human.
1588
01:23:22,450 --> 01:23:25,960
So these are links to
the same things, right?
1589
01:23:25,960 --> 01:23:28,100
It still links to Q5.
1590
01:23:28,100 --> 01:23:31,780
Q5 is the Wikidata
entity for human.
1591
01:23:31,780 --> 01:23:33,370
These are still the same things.
1592
01:23:33,370 --> 01:23:37,600
But because Wikidata has
multiple labels for everything,
1593
01:23:37,600 --> 01:23:39,580
it has multiple
labels for items.
1594
01:23:39,580 --> 01:23:42,760
And it also has multiple
labels for property names.
1595
01:23:42,760 --> 01:23:46,450
So Wikidata knows how
to say, instance of,
1596
01:23:46,450 --> 01:23:50,140
and award received,
in other languages.
1597
01:23:50,140 --> 01:23:54,490
That is why it is able to show
me all this data in Hebrew
1598
01:23:54,490 --> 01:23:59,890
even if none of that data was
actually input into Wikidata
1599
01:23:59,890 --> 01:24:01,870
by a Hebrew speaker.
1600
01:24:01,870 --> 01:24:04,900
That data could have been
input by English speakers,
1601
01:24:04,900 --> 01:24:08,230
but thanks to the
fact that someone once
1602
01:24:08,230 --> 01:24:12,760
translated the word
photo into Hebrew,
1603
01:24:12,760 --> 01:24:14,830
I can see this field in Hebrew.
1604
01:24:14,830 --> 01:24:17,750
1605
01:24:17,750 --> 01:24:21,230
So one of the things you
can do to help Wikidata,
1606
01:24:21,230 --> 01:24:23,600
right now, without
any special knowledge
1607
01:24:23,600 --> 01:24:26,210
is to help translate
those labels.
1608
01:24:26,210 --> 01:24:29,030
Every label only needs to
be translated just once.
1609
01:24:29,030 --> 01:24:31,310
So you can see that all
of these properties, date
1610
01:24:31,310 --> 01:24:34,720
of birth, name et cetera,
they all have Hebrew labels.
1611
01:24:34,720 --> 01:24:36,760
Maybe one of these would not.
1612
01:24:36,760 --> 01:24:38,361
No, they all have Hebrew labels.
1613
01:24:38,361 --> 01:24:39,110
Doing pretty good.
1614
01:24:39,110 --> 01:24:42,960
1615
01:24:42,960 --> 01:24:45,810
And I'm able to search
in my own language.
1616
01:24:45,810 --> 01:24:48,210
I'm able to click Add.
1617
01:24:48,210 --> 01:24:49,890
This word is Add,
so I click this,
1618
01:24:49,890 --> 01:24:51,780
and now I have the Add screen.
1619
01:24:51,780 --> 01:24:55,860
It all speaks my language,
and it's awesome.
1620
01:24:55,860 --> 01:25:00,330
And now for your sake I
will switch back to English,
1621
01:25:00,330 --> 01:25:03,090
but it is important
to know you can
1622
01:25:03,090 --> 01:25:05,740
edit Wikidata in any language.
1623
01:25:05,740 --> 01:25:09,050
And it is far more multi-lingual
and multi-lingual friendly
1624
01:25:09,050 --> 01:25:13,260
than, for example commons, which
is also a project we all share.
1625
01:25:13,260 --> 01:25:17,730
But commons has some limitations
on how multi-lingual it is.
1626
01:25:17,730 --> 01:25:21,410
For example, the category
names, et cetera.
1627
01:25:21,410 --> 01:25:23,270
OK.
1628
01:25:23,270 --> 01:25:25,670
So we were beginning
to discuss creating
1629
01:25:25,670 --> 01:25:27,140
something completely new.
1630
01:25:27,140 --> 01:25:29,360
AUDIENCE: Quick
questions, if that's OK?
1631
01:25:29,360 --> 01:25:30,980
So there's two questions on IRC.
1632
01:25:30,980 --> 01:25:33,890
The first one is, can you
show search for something
1633
01:25:33,890 --> 01:25:35,420
like getting the list of things?
1634
01:25:35,420 --> 01:25:38,360
I want to learn how to search
for something properly like,
1635
01:25:38,360 --> 01:25:43,705
show me all the items with
this value of this property.
1636
01:25:43,705 --> 01:25:45,080
ASAF BARTOV: Yes.
1637
01:25:45,080 --> 01:25:47,540
That is part of
this talk, but I'll
1638
01:25:47,540 --> 01:25:49,250
get to that in a
little bit later.
1639
01:25:49,250 --> 01:25:52,010
There's a whole section where I
will demonstrate the very, very
1640
01:25:52,010 --> 01:25:55,190
powerful query
system of Wikidata
1641
01:25:55,190 --> 01:25:57,170
where I will cash
that check that I gave
1642
01:25:57,170 --> 01:25:59,090
at the beginning of
all these painters
1643
01:25:59,090 --> 01:26:01,029
who are sons of painters
queries et cetera
1644
01:26:01,029 --> 01:26:02,570
So I will demonstrate
how to do that.
1645
01:26:02,570 --> 01:26:04,190
AUDIENCE: Other question.
1646
01:26:04,190 --> 01:26:07,250
How does Wikidata data deal
with link rot, and other issues
1647
01:26:07,250 --> 01:26:09,680
streaming from their URL refs.
1648
01:26:09,680 --> 01:26:13,528
1649
01:26:13,528 --> 01:26:16,290
ASAF BARTOV: URLs break.
1650
01:26:16,290 --> 01:26:18,730
We call that link rot.
1651
01:26:18,730 --> 01:26:22,470
Wikidata doesn't have
any particular magic
1652
01:26:22,470 --> 01:26:24,730
around link rot,
just like Wikipedia.
1653
01:26:24,730 --> 01:26:29,100
So if you do use a bare
URL it may well rot.
1654
01:26:29,100 --> 01:26:34,230
But you can add qualifiers
with back up URLs else
1655
01:26:34,230 --> 01:26:37,680
on the Internet Archive, or
another mirroring service.
1656
01:26:37,680 --> 01:26:42,780
And potentially that could be
a software feature for Wikidata
1657
01:26:42,780 --> 01:26:46,590
to automatically save
or ensure that something
1658
01:26:46,590 --> 01:26:48,660
is saved on Internet
Archive, but I don't
1659
01:26:48,660 --> 01:26:50,670
know that it is doing so now.
1660
01:26:50,670 --> 01:26:56,040
So, just like Wikipedia, if
it is a bear URL it may rot.
1661
01:26:56,040 --> 01:27:00,240
And may need to be
replaced, possibly by bot.
1662
01:27:00,240 --> 01:27:01,390
Other questions?
1663
01:27:01,390 --> 01:27:09,840
1664
01:27:09,840 --> 01:27:12,650
All right, so let's
talk about how you
1665
01:27:12,650 --> 01:27:15,090
create a completely new item.
1666
01:27:15,090 --> 01:27:16,300
It's very simple.
1667
01:27:16,300 --> 01:27:21,810
You go to Wikidata and you
click here on the side.
1668
01:27:21,810 --> 01:27:30,180
There's a link, create new item,
which gives you this screen.
1669
01:27:30,180 --> 01:27:35,030
And let's create an
item about a book
1670
01:27:35,030 --> 01:27:39,500
that I'm reading right now
by this Bulgarian writer.
1671
01:27:39,500 --> 01:27:43,950
So we have an article about this
writer guy named Deyan Enev.
1672
01:27:43,950 --> 01:27:48,530
But we don't have an
article or a Wikidata item
1673
01:27:48,530 --> 01:28:07,980
about one of his famous
books called Circus Bulgaria.
1674
01:28:07,980 --> 01:28:10,050
That's the book I'm reading,
his first collection
1675
01:28:10,050 --> 01:28:11,216
of short stories in English.
1676
01:28:11,216 --> 01:28:14,280
Circus Bulgaria came out
in 2010, Portobello Books,
1677
01:28:14,280 --> 01:28:17,099
translated by Kapka Kassabova.
1678
01:28:17,099 --> 01:28:18,390
So that's the book I'm reading.
1679
01:28:18,390 --> 01:28:20,520
As you can see it's not
a link on Wikipedia.
1680
01:28:20,520 --> 01:28:23,370
There's no article about
it, and there's not even
1681
01:28:23,370 --> 01:28:26,310
a Wikidata entity item about it.
1682
01:28:26,310 --> 01:28:32,220
But we can totally create
it, even without a Wikipedia
1683
01:28:32,220 --> 01:28:33,090
article.
1684
01:28:33,090 --> 01:28:34,980
So let's create this new item.
1685
01:28:34,980 --> 01:28:37,260
Let's create it in
English for the purposes
1686
01:28:37,260 --> 01:28:38,880
of our demonstration.
1687
01:28:38,880 --> 01:28:44,910
The name of the item
is Circus Bulgaria.
1688
01:28:44,910 --> 01:28:47,520
Circus Bulgaria,
that's the name.
1689
01:28:47,520 --> 01:28:50,670
Not Circus Bulgaria
parentheses book,
1690
01:28:50,670 --> 01:28:53,520
or anything you may be
used to from Wikipedia.
1691
01:28:53,520 --> 01:28:56,520
It's the actual
name of the book,
1692
01:28:56,520 --> 01:29:00,450
and the description,
again, remember,
1693
01:29:00,450 --> 01:29:03,270
the description field
is just to kind of help
1694
01:29:03,270 --> 01:29:08,681
tell apart this Circus Bulgaria
from any other potential Circus
1695
01:29:08,681 --> 01:29:09,180
Bulgaria.
1696
01:29:09,180 --> 01:29:11,280
Maybe there's a
film or something.
1697
01:29:11,280 --> 01:29:20,480
So it's enough to just say
something like short story
1698
01:29:20,480 --> 01:29:23,270
collection.
1699
01:29:23,270 --> 01:29:27,830
I might add by Deyan Enev
and if just in case, again,
1700
01:29:27,830 --> 01:29:31,910
some future other short story
collection by some other author
1701
01:29:31,910 --> 01:29:33,560
happens to have that same name.
1702
01:29:33,560 --> 01:29:36,391
That should be
disambiguating enough.
1703
01:29:36,391 --> 01:29:36,890
OK.
1704
01:29:36,890 --> 01:29:39,770
Short story collection
by Deyan Enev.
1705
01:29:39,770 --> 01:29:42,050
I could have aliases for this.
1706
01:29:42,050 --> 01:29:47,240
The aliases assist find-ability.
1707
01:29:47,240 --> 01:29:51,020
This particular book has just
this one name, so that's fine.
1708
01:29:51,020 --> 01:29:52,260
And I click Create.
1709
01:29:52,260 --> 01:29:52,760
That's it.
1710
01:29:52,760 --> 01:29:55,990
I just start with a
label, and a description.
1711
01:29:55,990 --> 01:29:58,740
I click Create.
1712
01:29:58,740 --> 01:30:03,890
I have a brand new queue number
for my new Wikidata item.
1713
01:30:03,890 --> 01:30:05,960
And Wikidata knows
what to call it.
1714
01:30:05,960 --> 01:30:09,320
And a description in
one language at least.
1715
01:30:09,320 --> 01:30:11,930
And that's it, and I
can start populating it.
1716
01:30:11,930 --> 01:30:15,050
As it can see, it it
has no site links,
1717
01:30:15,050 --> 01:30:17,450
but it's ready to be taught.
1718
01:30:17,450 --> 01:30:20,450
So, for example, I
can start by teaching
1719
01:30:20,450 --> 01:30:24,610
it the name of the book
in another language
1720
01:30:24,610 --> 01:30:25,870
that I happened to speak.
1721
01:30:25,870 --> 01:30:29,050
1722
01:30:29,050 --> 01:30:31,720
Now it has two labels
in English and Hebrew.
1723
01:30:31,720 --> 01:30:36,880
I could also look
up the book Areon,
1724
01:30:36,880 --> 01:30:39,510
the original Bulgarian
label for this book.
1725
01:30:39,510 --> 01:30:41,550
Seems relevant.
1726
01:30:41,550 --> 01:30:43,320
Again, I do not speak Bulgarian.
1727
01:30:43,320 --> 01:30:49,860
But I can go to the Bulgarian
Wikipedia through into Wiki.
1728
01:30:49,860 --> 01:30:51,510
This is this gentleman.
1729
01:30:51,510 --> 01:30:54,510
And I could find--
1730
01:30:54,510 --> 01:30:59,190
I can read Cyrillic so
I could easily find--
1731
01:30:59,190 --> 01:31:00,030
when I say easily--
1732
01:31:00,030 --> 01:31:02,940
1733
01:31:02,940 --> 01:31:05,710
when I say easily--
1734
01:31:05,710 --> 01:31:12,731
maybe not so easy, but
I can search for it.
1735
01:31:12,731 --> 01:31:21,070
1736
01:31:21,070 --> 01:31:22,180
Here we go.
1737
01:31:22,180 --> 01:31:25,190
Tsirk Bulgaria.
1738
01:31:25,190 --> 01:31:27,510
That is the name of the book.
1739
01:31:27,510 --> 01:31:28,910
Tsirk, as in circus.
1740
01:31:28,910 --> 01:31:30,440
No problem.
1741
01:31:30,440 --> 01:31:32,725
So I just copy this right here.
1742
01:31:32,725 --> 01:31:35,240
1743
01:31:35,240 --> 01:31:38,090
And I go back to my new item.
1744
01:31:38,090 --> 01:31:45,725
My new item, which is here,
and I edit the Bulgarian field.
1745
01:31:45,725 --> 01:31:48,260
1746
01:31:48,260 --> 01:31:49,950
And here it is.
1747
01:31:49,950 --> 01:31:50,720
Awesome.
1748
01:31:50,720 --> 01:31:51,220
All right.
1749
01:31:51,220 --> 01:31:55,420
But I still haven't told
Wikidata anything about this.
1750
01:31:55,420 --> 01:31:56,920
I know I'm talking about a book.
1751
01:31:56,920 --> 01:31:59,110
Wikidata that doesn't
know that yet.
1752
01:31:59,110 --> 01:32:02,630
So let's start by
adding some statements.
1753
01:32:02,630 --> 01:32:05,390
First of all, I click Add.
1754
01:32:05,390 --> 01:32:07,190
Wikidata sensibly
says, how about we
1755
01:32:07,190 --> 01:32:08,630
start with instance of.
1756
01:32:08,630 --> 01:32:11,090
Tell me what kind of animal--
no, not kind of animal.
1757
01:32:11,090 --> 01:32:13,940
What kind of thing are you
trying to describe here?
1758
01:32:13,940 --> 01:32:18,130
Well it's an instance of a book.
1759
01:32:18,130 --> 01:32:20,930
Not in Hebrew, please.
1760
01:32:20,930 --> 01:32:22,180
So it's an instance of a book.
1761
01:32:22,180 --> 01:32:23,763
I could even be a
little more specific
1762
01:32:23,763 --> 01:32:31,920
and say it's an instance of
a short story collection.
1763
01:32:31,920 --> 01:32:34,620
There we go, short
story collection.
1764
01:32:34,620 --> 01:32:36,800
I hit Save.
1765
01:32:36,800 --> 01:32:37,430
Awesome.
1766
01:32:37,430 --> 01:32:39,680
So now we know what
kind of thing it is.
1767
01:32:39,680 --> 01:32:42,860
It's not a human, it's not a
mountain, it's not a concept.
1768
01:32:42,860 --> 01:32:44,760
It's a short story collection.
1769
01:32:44,760 --> 01:32:46,400
Now I can add some other things.
1770
01:32:46,400 --> 01:32:48,770
See, Wikidata is
already working for me.
1771
01:32:48,770 --> 01:32:51,020
Because it's a short
story collection
1772
01:32:51,020 --> 01:32:53,960
it's offering me to populate
these properties, and not
1773
01:32:53,960 --> 01:32:54,890
other ones.
1774
01:32:54,890 --> 01:32:56,990
Publication date,
original language,
1775
01:32:56,990 --> 01:33:00,350
genre, country of origin,
these are all relevant, right?
1776
01:33:00,350 --> 01:33:04,220
So let's start with original
language of the work
1777
01:33:04,220 --> 01:33:07,410
is Bulgarian.
1778
01:33:07,410 --> 01:33:09,810
Not Bulgaria, Bulgarian.
1779
01:33:09,810 --> 01:33:12,040
This is the item I want to link.
1780
01:33:12,040 --> 01:33:21,570
Hit Save, and whatever.
1781
01:33:21,570 --> 01:33:22,890
Author.
1782
01:33:22,890 --> 01:33:26,540
Let's identify the author.
1783
01:33:26,540 --> 01:33:29,350
So the author, the main
creator of the work,
1784
01:33:29,350 --> 01:33:32,470
is that gentleman Deyan Enev.
1785
01:33:32,470 --> 01:33:34,750
And remember, he has
a Wikipedia article.
1786
01:33:34,750 --> 01:33:37,210
He also has a Wikidata entity.
1787
01:33:37,210 --> 01:33:39,640
So Wikidata does know about him.
1788
01:33:39,640 --> 01:33:48,930
So I hit Save, and I can add
something about the translator.
1789
01:33:48,930 --> 01:33:52,530
1790
01:33:52,530 --> 01:33:54,390
And what was that lady's name?
1791
01:33:54,390 --> 01:33:57,990
1792
01:33:57,990 --> 01:34:00,120
Kapka Kassabova.
1793
01:34:00,120 --> 01:34:05,430
Now it so happens that Wikidata
already knows about this lady.
1794
01:34:05,430 --> 01:34:08,330
1795
01:34:08,330 --> 01:34:08,840
See?
1796
01:34:08,840 --> 01:34:12,290
So I can just start typing
and then just link to it.
1797
01:34:12,290 --> 01:34:12,840
Awesome.
1798
01:34:12,840 --> 01:34:13,824
But what if it didn't?
1799
01:34:13,824 --> 01:34:15,740
What if it was translated
by someone who isn't
1800
01:34:15,740 --> 01:34:17,690
already covered on Wikidata?
1801
01:34:17,690 --> 01:34:22,190
Well I could just type
the name as a string,
1802
01:34:22,190 --> 01:34:25,760
but ideally I could
create a Wikidata entity
1803
01:34:25,760 --> 01:34:28,940
about this translator so
that there is a possibility
1804
01:34:28,940 --> 01:34:30,350
to link to her.
1805
01:34:30,350 --> 01:34:33,560
1806
01:34:33,560 --> 01:34:36,920
Now I might actually
add a qualifier here
1807
01:34:36,920 --> 01:34:40,310
because, she's not the
translator of the book, right?
1808
01:34:40,310 --> 01:34:43,620
She's the translator of
the book into English.
1809
01:34:43,620 --> 01:34:44,440
Right.
1810
01:34:44,440 --> 01:34:50,151
So the language that she
translated into is English.
1811
01:34:50,151 --> 01:34:50,650
Right?
1812
01:34:50,650 --> 01:34:53,620
This book-- remember
I'm describing the book.
1813
01:34:53,620 --> 01:34:55,376
The item is about the book.
1814
01:34:55,376 --> 01:34:57,250
So the book would have
a different translator
1815
01:34:57,250 --> 01:34:58,510
into Polish.
1816
01:34:58,510 --> 01:35:02,320
So this is an example of
a property or a statement
1817
01:35:02,320 --> 01:35:06,430
that doesn't make sense without
one of those qualifiers.
1818
01:35:06,430 --> 01:35:08,140
It's just not correct.
1819
01:35:08,140 --> 01:35:11,320
It doesn't make sense to
say that translator is.
1820
01:35:11,320 --> 01:35:14,950
The English translator, or
even this English translator.
1821
01:35:14,950 --> 01:35:17,770
In 50 years maybe there would
be an additional English
1822
01:35:17,770 --> 01:35:18,940
translation.
1823
01:35:18,940 --> 01:35:24,774
So that's an example of
needing that qualifier.
1824
01:35:24,774 --> 01:35:27,190
And of course I could go on
and populate the other fields.
1825
01:35:27,190 --> 01:35:29,710
We don't have to
do that right now.
1826
01:35:29,710 --> 01:35:32,960
Publication date, country
of origin, et cetera.
1827
01:35:32,960 --> 01:35:35,440
So this is already beginning
to look like all those items
1828
01:35:35,440 --> 01:35:38,440
that we already saw, but just
a moment ago it didn't exist.
1829
01:35:38,440 --> 01:35:43,920
Just a moment ago Wikidata
had no concept of this work.
1830
01:35:43,920 --> 01:35:46,500
This happens to be one
of his notable works.
1831
01:35:46,500 --> 01:35:52,080
So I could actually go to the
item about Deyan Enev which
1832
01:35:52,080 --> 01:35:56,190
has all this information
already, occupation, languages,
1833
01:35:56,190 --> 01:35:59,170
and add a property.
1834
01:35:59,170 --> 01:36:01,050
Remember, I'm not
limited to these.
1835
01:36:01,050 --> 01:36:06,180
I can add a property
called notable works,
1836
01:36:06,180 --> 01:36:08,670
and mention my new item.
1837
01:36:08,670 --> 01:36:12,120
Circus Bulgaria.
1838
01:36:12,120 --> 01:36:12,750
See?
1839
01:36:12,750 --> 01:36:15,180
My new item is
showing up, and thanks
1840
01:36:15,180 --> 01:36:18,660
to this description that I
wrote, short story collection,
1841
01:36:18,660 --> 01:36:22,650
it's already appearing here in
the dropdown very conveniently.
1842
01:36:22,650 --> 01:36:24,270
So I linked to this.
1843
01:36:24,270 --> 01:36:25,154
I hit Save.
1844
01:36:25,154 --> 01:36:28,680
1845
01:36:28,680 --> 01:36:32,310
Ideally again I should find
some references showing
1846
01:36:32,310 --> 01:36:34,620
that this is a
notable work by him,
1847
01:36:34,620 --> 01:36:37,000
but we won't spend
time on that right now.
1848
01:36:37,000 --> 01:36:39,010
But the point is we
created a new item.
1849
01:36:39,010 --> 01:36:40,410
We populated it a little bit.
1850
01:36:40,410 --> 01:36:44,400
We linked to it so that it's
more discoverable by mentioning
1851
01:36:44,400 --> 01:36:47,760
it in the author name, and
of course the book item
1852
01:36:47,760 --> 01:36:50,710
itself mentions the author
and links to the author.
1853
01:36:50,710 --> 01:36:52,770
So that's all good.
1854
01:36:52,770 --> 01:36:57,780
One last thing we shall do is
give it some useful identifier
1855
01:36:57,780 --> 01:37:02,880
so let's add, say, the
Library of Congress record
1856
01:37:02,880 --> 01:37:03,940
for this book.
1857
01:37:03,940 --> 01:37:04,440
OK.
1858
01:37:04,440 --> 01:37:07,710
So I have prepared
this in advance.
1859
01:37:07,710 --> 01:37:08,760
Ooh.
1860
01:37:08,760 --> 01:37:12,720
Just in time, with 80 seconds to
go before it's giving up on me.
1861
01:37:12,720 --> 01:37:14,310
Oh it has already
given up on me.
1862
01:37:14,310 --> 01:37:15,490
That is very unfortunate.
1863
01:37:15,490 --> 01:37:23,300
1864
01:37:23,300 --> 01:37:29,110
So I go to the Library of
Congress and I find this book.
1865
01:37:29,110 --> 01:37:33,050
I find this entry, right?
1866
01:37:33,050 --> 01:37:37,320
In the Library of Congress
database about this book.
1867
01:37:37,320 --> 01:37:39,120
And it has a permalink.
1868
01:37:39,120 --> 01:37:42,570
It has a kind of guaranteed
to be permanent link.
1869
01:37:42,570 --> 01:37:47,950
I can just copy that link,
go back to my little book,
1870
01:37:47,950 --> 01:37:55,770
and say the Library of Congress.
1871
01:37:55,770 --> 01:38:01,070
Yeah, LCCN, that's what they
call their IDs, the call
1872
01:38:01,070 --> 01:38:02,120
number.
1873
01:38:02,120 --> 01:38:06,502
And I paste it here.
1874
01:38:06,502 --> 01:38:08,210
I actually don't need the URL.
1875
01:38:08,210 --> 01:38:09,136
I need just a number.
1876
01:38:09,136 --> 01:38:12,440
1877
01:38:12,440 --> 01:38:13,520
And there we go.
1878
01:38:13,520 --> 01:38:16,550
I have added it,
and now Wikidata
1879
01:38:16,550 --> 01:38:20,630
knows how to find bibliographic
information about this book.
1880
01:38:20,630 --> 01:38:24,710
And any re-user of
Wikidata, some program,
1881
01:38:24,710 --> 01:38:28,950
some tool that connects
books to authors
1882
01:38:28,950 --> 01:38:32,870
or does statistical analysis or
whatever, some future yet to be
1883
01:38:32,870 --> 01:38:35,090
imagined tool
could automatically
1884
01:38:35,090 --> 01:38:39,170
find additional metadata on the
Library of Congress site thanks
1885
01:38:39,170 --> 01:38:41,840
to this connection
that I just made.
1886
01:38:41,840 --> 01:38:44,150
And of course I could
add many other IDs
1887
01:38:44,150 --> 01:38:46,460
to other catalogs
around the world,
1888
01:38:46,460 --> 01:38:48,150
and we won't do that right now.
1889
01:38:48,150 --> 01:38:51,840
You can see that it's now
showing up under identifiers.
1890
01:38:51,840 --> 01:38:56,330
So this is how we created
a brand new piece of data.
1891
01:38:56,330 --> 01:38:59,632
Questions about this,
about creating new items?
1892
01:38:59,632 --> 01:39:18,100
1893
01:39:18,100 --> 01:39:19,180
Yeah, all right.
1894
01:39:19,180 --> 01:39:25,510
So we've seen how to contribute
to Wikidata on our own,
1895
01:39:25,510 --> 01:39:26,350
kind of through--
1896
01:39:26,350 --> 01:39:27,840
directly through Wikidata.
1897
01:39:27,840 --> 01:39:30,680
1898
01:39:30,680 --> 01:39:35,220
Now you may you may be
thinking, but Asaf, this
1899
01:39:35,220 --> 01:39:39,880
sounds like a ton
of work recording
1900
01:39:39,880 --> 01:39:44,500
all of these little tiny bits of
information about every person
1901
01:39:44,500 --> 01:39:47,410
and every book and every town.
1902
01:39:47,410 --> 01:39:50,520
And if you think that
you would be correct.
1903
01:39:50,520 --> 01:39:52,730
That is a ton of work.
1904
01:39:52,730 --> 01:39:54,600
It's a lot of work.
1905
01:39:54,600 --> 01:39:59,930
However, it is centralized, so
it is reusable on other wikis
1906
01:39:59,930 --> 01:40:03,860
and we will show in just a
moment how we pull information
1907
01:40:03,860 --> 01:40:07,296
from Wikidata into
Wikipedia or other projects.
1908
01:40:07,296 --> 01:40:10,860
1909
01:40:10,860 --> 01:40:13,780
We will show that
in just a moment.
1910
01:40:13,780 --> 01:40:18,660
But here's an
awesome little game
1911
01:40:18,660 --> 01:40:23,205
that we Wikidata
volunteer, Magnis Monska,
1912
01:40:23,205 --> 01:40:30,900
has authored called the
Wikidata game, in which he
1913
01:40:30,900 --> 01:40:31,920
tricks people--
1914
01:40:31,920 --> 01:40:35,730
sorry, helps people
make contributions
1915
01:40:35,730 --> 01:40:41,500
to Wikidata in a very,
very easy and pleasant way.
1916
01:40:41,500 --> 01:40:44,410
Let's look at the Wikidata game.
1917
01:40:44,410 --> 01:40:47,840
So the first thing you need
to do in that Wikidata game
1918
01:40:47,840 --> 01:40:50,660
is to log in,
because the Wikidata
1919
01:40:50,660 --> 01:40:53,150
game makes edits in your name.
1920
01:40:53,150 --> 01:40:54,980
So we need to authorize it.
1921
01:40:54,980 --> 01:40:57,250
It's perfectly safe.
1922
01:40:57,250 --> 01:41:01,090
And after you do that you
can go to the Wikidata game.
1923
01:41:01,090 --> 01:41:02,020
So this is the game.
1924
01:41:02,020 --> 01:41:03,520
Now I'm logged in.
1925
01:41:03,520 --> 01:41:05,230
And the Wikidata game
actually includes
1926
01:41:05,230 --> 01:41:06,970
a number of different games.
1927
01:41:06,970 --> 01:41:09,310
Let's start with a person game.
1928
01:41:09,310 --> 01:41:14,170
So Wikidata shows you--
1929
01:41:14,170 --> 01:41:20,800
shows you an item, and asks
you a very simple question.
1930
01:41:20,800 --> 01:41:23,200
Person, or not a person?
1931
01:41:23,200 --> 01:41:26,410
1932
01:41:26,410 --> 01:41:30,550
So Wikidata goes through
Wikidata entities
1933
01:41:30,550 --> 01:41:35,540
that don't even have the
instance of property.
1934
01:41:35,540 --> 01:41:37,520
Which is why Wikidata
doesn't know,
1935
01:41:37,520 --> 01:41:41,120
literally doesn't know, if this
is a person, or a mountain,
1936
01:41:41,120 --> 01:41:44,390
or a city, or a country,
or anything else.
1937
01:41:44,390 --> 01:41:47,150
So it asks you, because this
is the kind of question that
1938
01:41:47,150 --> 01:41:50,300
Wikidata cannot
decide on its own,
1939
01:41:50,300 --> 01:41:54,800
but for us humans it's generally
trivial to be able to say
1940
01:41:54,800 --> 01:41:58,220
whether something that we're
looking at is a person or not.
1941
01:41:58,220 --> 01:42:03,590
It gets slightly trickier when
the information is in Javanese,
1942
01:42:03,590 --> 01:42:06,470
as it is here,
rather than English.
1943
01:42:06,470 --> 01:42:10,010
So this item happens to
be described in Javanese.
1944
01:42:10,010 --> 01:42:14,360
My Javanese, spoken in
Indonesia, is very weak.
1945
01:42:14,360 --> 01:42:19,620
However, I can tell that
this is not a person.
1946
01:42:19,620 --> 01:42:20,730
How can I tell?
1947
01:42:20,730 --> 01:42:23,220
Without understanding
a word of Japanese
1948
01:42:23,220 --> 01:42:25,950
I see that it mentions
1000 kilometers
1949
01:42:25,950 --> 01:42:28,860
and square kilometers, see?
1950
01:42:28,860 --> 01:42:32,520
So this is about a
place, or an area,
1951
01:42:32,520 --> 01:42:36,090
or a region, or whatever,
but not a person.
1952
01:42:36,090 --> 01:42:39,060
So this is an
example of how even
1953
01:42:39,060 --> 01:42:41,100
without understanding
language you can sometimes
1954
01:42:41,100 --> 01:42:42,400
make a determination.
1955
01:42:42,400 --> 01:42:45,030
However, of course,
you should be sure.
1956
01:42:45,030 --> 01:42:47,700
This is definitely not
what the Wikipedia article
1957
01:42:47,700 --> 01:42:49,150
about a person looks like.
1958
01:42:49,150 --> 01:42:50,430
So this is not a person.
1959
01:42:50,430 --> 01:42:52,780
I just click it and I'm
shown the next item.
1960
01:42:52,780 --> 01:42:56,600
1961
01:42:56,600 --> 01:42:59,660
This item is in another
language I do not speak,
1962
01:42:59,660 --> 01:43:00,950
and I just don't know.
1963
01:43:00,950 --> 01:43:03,740
I do not know if this is
about a person or not.
1964
01:43:03,740 --> 01:43:07,350
So I click Not Sure.
1965
01:43:07,350 --> 01:43:11,190
This is in Swedish, and
it's about Sulawesi, still
1966
01:43:11,190 --> 01:43:13,770
Indonesia.
1967
01:43:13,770 --> 01:43:16,530
And it is not about a person.
1968
01:43:16,530 --> 01:43:18,150
I have enough Swedish for that.
1969
01:43:18,150 --> 01:43:21,750
So I click not a person.
1970
01:43:21,750 --> 01:43:24,420
Now, you may say,
well, do I really
1971
01:43:24,420 --> 01:43:28,350
have to deal with all these
languages that I don't speak?
1972
01:43:28,350 --> 01:43:29,190
The answer is no.
1973
01:43:29,190 --> 01:43:30,630
You don't have to.
1974
01:43:30,630 --> 01:43:32,580
Here at the bottom
of the Wikidata game
1975
01:43:32,580 --> 01:43:33,840
there are settings.
1976
01:43:33,840 --> 01:43:38,270
You can click that
and tell Wikidata,
1977
01:43:38,270 --> 01:43:41,840
I cannot even read
Chinese or Japanese,
1978
01:43:41,840 --> 01:43:44,600
so please don't show me
items in those languages.
1979
01:43:44,600 --> 01:43:47,060
Because I wouldn't
even be able to guess.
1980
01:43:47,060 --> 01:43:50,000
I prefer these languages in
which I can relatively easily
1981
01:43:50,000 --> 01:43:51,380
make determinations.
1982
01:43:51,380 --> 01:43:54,601
And I can even tell Wikidata to
only show me these languages.
1983
01:43:54,601 --> 01:43:55,100
You see?
1984
01:43:55,100 --> 01:43:57,350
This was not selected,
which is why I
1985
01:43:57,350 --> 01:44:00,600
was shown some other languages.
1986
01:44:00,600 --> 01:44:04,240
I could say, only use
these languages, and save.
1987
01:44:04,240 --> 01:44:06,100
And now I can try
this game again.
1988
01:44:06,100 --> 01:44:07,980
However, that can
slow it down a little.
1989
01:44:07,980 --> 01:44:09,000
So here we go.
1990
01:44:09,000 --> 01:44:11,640
Here's a Spanish-- which
is one of the languages I
1991
01:44:11,640 --> 01:44:14,640
told Wikidata game it can use.
1992
01:44:14,640 --> 01:44:16,480
This is a Spanish item.
1993
01:44:16,480 --> 01:44:19,265
Now is it about a person or not?
1994
01:44:19,265 --> 01:44:22,120
1995
01:44:22,120 --> 01:44:23,230
It is not about a person.
1996
01:44:23,230 --> 01:44:25,906
1997
01:44:25,906 --> 01:44:26,780
Is it about a person?
1998
01:44:26,780 --> 01:44:29,155
1999
01:44:29,155 --> 01:44:29,655
No.
2000
01:44:29,655 --> 01:44:32,900
2001
01:44:32,900 --> 01:44:35,180
Yes, it is right?
2002
01:44:35,180 --> 01:44:38,550
Monk Cistercian, Pedro
de Ovideo Falconi.
2003
01:44:38,550 --> 01:44:40,890
That sounds like a person.
2004
01:44:40,890 --> 01:44:42,680
Frau Pedro Nasser.
2005
01:44:42,680 --> 01:44:44,960
Yeah, he was born
in Madrid 1577.
2006
01:44:44,960 --> 01:44:46,280
This is a person.
2007
01:44:46,280 --> 01:44:47,060
OK.
2008
01:44:47,060 --> 01:44:49,730
So I click person.
2009
01:44:49,730 --> 01:44:52,100
Again, if you're not
sure, click not sure.
2010
01:44:52,100 --> 01:44:55,100
The point is, just by clicking
person and as you can see
2011
01:44:55,100 --> 01:44:57,780
this would work
very well on mobile,
2012
01:44:57,780 --> 01:45:01,430
which is why I said you can
contribute on your commute.
2013
01:45:01,430 --> 01:45:04,100
You can just hold your
phone or tablet or whatever,
2014
01:45:04,100 --> 01:45:05,840
and just tap.
2015
01:45:05,840 --> 01:45:07,040
Person, not a person.
2016
01:45:07,040 --> 01:45:08,900
Person, not a person.
2017
01:45:08,900 --> 01:45:12,500
The amazing thing is that just
tapping person has actually
2018
01:45:12,500 --> 01:45:15,830
made an edit to Wikidata
on my behalf, which
2019
01:45:15,830 --> 01:45:21,560
I can find out, like every
wiki, by clicking contributions.
2020
01:45:21,560 --> 01:45:24,200
And as you can see in addition
to the stuff about circus
2021
01:45:24,200 --> 01:45:28,340
Bulgaria, my latest edit is in
fact about this Pedro de Ovideo
2022
01:45:28,340 --> 01:45:30,130
Falconi person.
2023
01:45:30,130 --> 01:45:32,000
And the edit was, you can--
2024
01:45:32,000 --> 01:45:38,030
I hope you can see this, created
the claim instance of human.
2025
01:45:38,030 --> 01:45:39,110
So I added--
2026
01:45:39,110 --> 01:45:43,100
I mean Wikidata game
added for me the statement
2027
01:45:43,100 --> 01:45:44,180
instance of human.
2028
01:45:44,180 --> 01:45:47,780
Now, the awesome thing is
that it was super easy to do.
2029
01:45:47,780 --> 01:45:51,890
I didn't have to go into that
entity, click the Add button,
2030
01:45:51,890 --> 01:45:57,080
choose the instance of property,
choose human, hit Save.
2031
01:45:57,080 --> 01:45:59,210
Instead of all these
operations I just
2032
01:45:59,210 --> 01:46:04,250
tapped on my screen,
person, not a person.
2033
01:46:04,250 --> 01:46:10,280
And I can do hundreds of
edits during my daily commute.
2034
01:46:10,280 --> 01:46:12,410
There are other games,
like the gender game.
2035
01:46:12,410 --> 01:46:14,810
So this is about--
2036
01:46:14,810 --> 01:46:17,240
this is when Wikidata
already knows
2037
01:46:17,240 --> 01:46:19,760
that this item is a
person, but it doesn't
2038
01:46:19,760 --> 01:46:21,710
know the gender of this person.
2039
01:46:21,710 --> 01:46:25,340
Which is another one of
the more basic items.
2040
01:46:25,340 --> 01:46:27,770
And this is taking a long
time because of the language
2041
01:46:27,770 --> 01:46:29,870
limitations that I set on it.
2042
01:46:29,870 --> 01:46:32,660
I guess the less exotic
languages have already
2043
01:46:32,660 --> 01:46:35,130
been exhausted in the game.
2044
01:46:35,130 --> 01:46:36,880
We don't have to
wait all this time.
2045
01:46:36,880 --> 01:46:40,280
2046
01:46:40,280 --> 01:46:44,970
We can try something else.
2047
01:46:44,970 --> 01:46:45,950
How about occupation?
2048
01:46:45,950 --> 01:46:46,850
The occupation game.
2049
01:46:46,850 --> 01:46:49,400
Here we go, this is in Russian.
2050
01:46:49,400 --> 01:46:55,540
And what is the occupation
of this gentleman?
2051
01:46:55,540 --> 01:46:58,630
Well he is an [INAUDIBLE].
2052
01:46:58,630 --> 01:47:00,700
He's a church person.
2053
01:47:00,700 --> 01:47:04,300
However, so the
occupation game is
2054
01:47:04,300 --> 01:47:06,490
where Wikidata game
will automatically
2055
01:47:06,490 --> 01:47:10,990
pull likely occupations
from the article text
2056
01:47:10,990 --> 01:47:13,810
and ask for confirmation.
2057
01:47:13,810 --> 01:47:16,840
So if he-- if this person
really is a deacon,
2058
01:47:16,840 --> 01:47:17,770
I should click that.
2059
01:47:17,770 --> 01:47:19,990
But I'm not sure.
2060
01:47:19,990 --> 01:47:24,950
I'm not clear on the Russian
church's distinctions between--
2061
01:47:24,950 --> 01:47:26,620
I mean [INAUDIBLE]
is pretty senior,
2062
01:47:26,620 --> 01:47:28,690
but I don't know if that
automatically also means
2063
01:47:28,690 --> 01:47:30,100
he's a deacon or not.
2064
01:47:30,100 --> 01:47:32,720
And [INAUDIBLE] is
not listed here.
2065
01:47:32,720 --> 01:47:36,380
So I will click not listed.
2066
01:47:36,380 --> 01:47:39,540
Also, these guesses
are not always correct.
2067
01:47:39,540 --> 01:47:42,680
So, this guy for
example, is in Russian.
2068
01:47:42,680 --> 01:47:43,430
I can read this.
2069
01:47:43,430 --> 01:47:44,470
He's a philologist.
2070
01:47:44,470 --> 01:47:45,380
He's a linguist.
2071
01:47:45,380 --> 01:47:48,510
So I can confirm it
and click linguist.
2072
01:47:48,510 --> 01:47:49,010
All right?
2073
01:47:49,010 --> 01:47:51,950
And again, if we look
at my contributions
2074
01:47:51,950 --> 01:47:55,700
we can see the Wikidata
game on my behalf
2075
01:47:55,700 --> 01:47:59,930
created occupation linguist.
2076
01:47:59,930 --> 01:48:02,450
OK.
2077
01:48:02,450 --> 01:48:04,370
Just by typing linguist there.
2078
01:48:04,370 --> 01:48:07,040
Now if it's taken
from the article,
2079
01:48:07,040 --> 01:48:09,860
why would it ever be wrong?
2080
01:48:09,860 --> 01:48:15,970
Well Jesus was the
son of a carpenter.
2081
01:48:15,970 --> 01:48:18,870
The word carpenter
appears in the text.
2082
01:48:18,870 --> 01:48:22,840
That doesn't mean it's correct
to say Jesus was a carpenter.
2083
01:48:22,840 --> 01:48:23,340
OK?
2084
01:48:23,340 --> 01:48:24,660
Just a trivial example, right?
2085
01:48:24,660 --> 01:48:30,250
So many, many articles will say,
you know, born to a physician.
2086
01:48:30,250 --> 01:48:32,850
And so the word physician
could be guessed,
2087
01:48:32,850 --> 01:48:36,030
but it wouldn't be correct
unless the son is also
2088
01:48:36,030 --> 01:48:38,090
a physician.
2089
01:48:38,090 --> 01:48:43,540
So I hope it gives
you the gist of it.
2090
01:48:43,540 --> 01:48:47,500
There is also a
distributed Wikidata game,
2091
01:48:47,500 --> 01:48:48,774
which is pretty awesome.
2092
01:48:48,774 --> 01:48:51,450
2093
01:48:51,450 --> 01:48:54,320
Here we go, which
has additional games.
2094
01:48:54,320 --> 01:49:02,610
So, for example, the
key on game gives you,
2095
01:49:02,610 --> 01:49:06,940
maybe it gives you,
some items to play with.
2096
01:49:06,940 --> 01:49:16,610
2097
01:49:16,610 --> 01:49:17,110
Yes?
2098
01:49:17,110 --> 01:49:17,610
No?
2099
01:49:17,610 --> 01:49:18,430
OK.
2100
01:49:18,430 --> 01:49:20,830
So it gives you
this little card,
2101
01:49:20,830 --> 01:49:27,940
and asks you to confirm is this
instance of human settlement?
2102
01:49:27,940 --> 01:49:30,480
That is, is it a village,
town, city, whatever.
2103
01:49:30,480 --> 01:49:33,310
Is it a kind of human
settlement or not?
2104
01:49:33,310 --> 01:49:34,340
Or maybe it's a book.
2105
01:49:34,340 --> 01:49:35,540
Maybe it's a poem.
2106
01:49:35,540 --> 01:49:38,980
Again, so, is it an
English settlement?
2107
01:49:38,980 --> 01:49:41,500
And you can click the languages
here to see the information.
2108
01:49:41,500 --> 01:49:43,270
So I can click English.
2109
01:49:43,270 --> 01:49:44,572
And indeed the article--
2110
01:49:44,572 --> 01:49:46,030
I mean the actual
Wikipedia article
2111
01:49:46,030 --> 01:49:49,360
says Camigji is a
town and territory
2112
01:49:49,360 --> 01:49:51,370
in this district in the Congo.
2113
01:49:51,370 --> 01:49:54,640
So yes, this is an instance
of human settlement.
2114
01:49:54,640 --> 01:49:57,580
So I clicked yes.
2115
01:49:57,580 --> 01:50:00,460
And just clicking yes
again went to that item,
2116
01:50:00,460 --> 01:50:02,740
and added property
of human settlement.
2117
01:50:02,740 --> 01:50:05,560
Now the point of
all these games is
2118
01:50:05,560 --> 01:50:08,140
these are tools,
written by programmers,
2119
01:50:08,140 --> 01:50:12,490
making kind of semi educated
guesses about these fairly
2120
01:50:12,490 --> 01:50:14,120
basic properties.
2121
01:50:14,120 --> 01:50:17,770
And they are meant to
semi automate, to assist,
2122
01:50:17,770 --> 01:50:23,730
in the accumulation of all
these important pieces of data.
2123
01:50:23,730 --> 01:50:26,640
Now every single
click here helps
2124
01:50:26,640 --> 01:50:31,000
Wikidata give better
results, richer results
2125
01:50:31,000 --> 01:50:32,380
in future queries.
2126
01:50:32,380 --> 01:50:38,130
Again, as of right now
Wikidata can include Camigji
2127
01:50:38,130 --> 01:50:42,690
if I ask it, you know, what
are some towns in Congo?
2128
01:50:42,690 --> 01:50:44,220
Until now it could not.
2129
01:50:44,220 --> 01:50:46,830
Because it literally
didn't know.
2130
01:50:46,830 --> 01:50:51,950
So every time we click male,
female, person, not a person,
2131
01:50:51,950 --> 01:50:56,640
make these decisions,
we help improve Wikidata
2132
01:50:56,640 --> 01:51:01,560
and enrich the results
that we could receive.
2133
01:51:01,560 --> 01:51:04,590
Any questions about this, about
kind of micro contributions
2134
01:51:04,590 --> 01:51:07,010
through the Wikidata game?
2135
01:51:07,010 --> 01:51:09,890
If that looks
appealing I encourage
2136
01:51:09,890 --> 01:51:12,860
you to go and visit
the Wikidata game
2137
01:51:12,860 --> 01:51:15,205
and start contributing
in that way.
2138
01:51:15,205 --> 01:51:19,580
2139
01:51:19,580 --> 01:51:21,650
There is a question here.
2140
01:51:21,650 --> 01:51:24,650
If I make an article about
Circus Bulgaria how should
2141
01:51:24,650 --> 01:51:26,630
I correctly connect them?
2142
01:51:26,630 --> 01:51:28,740
That is an excellent question.
2143
01:51:28,740 --> 01:51:33,090
So once-- so now there is a
Wikidata item about that book,
2144
01:51:33,090 --> 01:51:37,650
but there is no Wikipedia
article anywhere.
2145
01:51:37,650 --> 01:51:41,460
Now suppose I write one
in, Bulgarian maybe,
2146
01:51:41,460 --> 01:51:42,870
you go to Wikidata.
2147
01:51:42,870 --> 01:51:45,180
You find the item by searching.
2148
01:51:45,180 --> 01:51:49,170
You find the item, and then
the empty site links section
2149
01:51:49,170 --> 01:51:50,850
right at the bottom there--
2150
01:51:50,850 --> 01:51:52,020
where are we?
2151
01:51:52,020 --> 01:51:53,100
We have this?
2152
01:51:53,100 --> 01:51:55,050
Circus Bulgaria.
2153
01:51:55,050 --> 01:51:56,010
Let's demonstrate this.
2154
01:51:56,010 --> 01:51:58,000
So here is the item
about the book.
2155
01:51:58,000 --> 01:52:01,030
Let's say that now
there is an article
2156
01:52:01,030 --> 01:52:03,670
because I just created it.
2157
01:52:03,670 --> 01:52:07,450
I can go here to the empty
Wikipedia link section,
2158
01:52:07,450 --> 01:52:11,760
click Edit, type the
name of the wiki,
2159
01:52:11,760 --> 01:52:16,430
let's say English, and then
type the name of the page
2160
01:52:16,430 --> 01:52:18,230
that I just created.
2161
01:52:18,230 --> 01:52:20,790
Circus-- right?
2162
01:52:20,790 --> 01:52:23,400
And again, it offers
me auto-complete
2163
01:52:23,400 --> 01:52:25,080
for my convenience.
2164
01:52:25,080 --> 01:52:28,260
Now we don't actually
have the article created,
2165
01:52:28,260 --> 01:52:30,480
but I could let's just
say this was the article.
2166
01:52:30,480 --> 01:52:33,330
I can just click this,
hit Save, and that
2167
01:52:33,330 --> 01:52:36,450
would associate the
new Wikipedia article
2168
01:52:36,450 --> 01:52:38,130
with this Wikidata item.
2169
01:52:38,130 --> 01:52:41,940
That is the beginning of the
inter-wiki list for this item.
2170
01:52:41,940 --> 01:52:43,620
I will not click
Save Now, because we
2171
01:52:43,620 --> 01:52:45,289
didn't have the article yet.
2172
01:52:45,289 --> 01:52:46,830
So I hope that
answers that question.
2173
01:52:46,830 --> 01:52:50,340
Was there another question
that I missed here?
2174
01:52:50,340 --> 01:52:51,450
No.
2175
01:52:51,450 --> 01:52:53,170
OK.
2176
01:52:53,170 --> 01:52:55,300
Any questions about
the Wikidata game?
2177
01:52:55,300 --> 01:53:00,740
About this idea of
micro contributions?
2178
01:53:00,740 --> 01:53:05,330
If not then we can move
on to embedding data,
2179
01:53:05,330 --> 01:53:07,490
and after that we
can discuss queries,
2180
01:53:07,490 --> 01:53:12,000
how to get at all this
data from Wikidata.
2181
01:53:12,000 --> 01:53:16,500
So the short version of how
to embed data from Wikidata
2182
01:53:16,500 --> 01:53:19,920
is that there is this
little magic incantation.
2183
01:53:19,920 --> 01:53:25,410
Curly brace, curly brace,
hash mark, property.
2184
01:53:25,410 --> 01:53:29,820
It looks like a template, but
it isn't because of that hash.
2185
01:53:29,820 --> 01:53:31,320
And that is magic.
2186
01:53:31,320 --> 01:53:34,170
Take a look at this little
demo that I prepared.
2187
01:53:34,170 --> 01:53:37,950
This page, which is off
my user page on meta,
2188
01:53:37,950 --> 01:53:40,110
but it could be on any wiki.
2189
01:53:40,110 --> 01:53:42,490
OK.
2190
01:53:42,490 --> 01:53:49,420
Says, since San Francisco
is item Q62 in Wikidata,
2191
01:53:49,420 --> 01:53:55,240
and since population is
property P1082, I can tell you
2192
01:53:55,240 --> 01:53:58,840
that according to Wikidata the
population of San Francisco
2193
01:53:58,840 --> 01:54:02,180
is this.
2194
01:54:02,180 --> 01:54:08,420
And this bolded number here was
produced with this incantation.
2195
01:54:08,420 --> 01:54:14,420
Curly brace, curly brace,
hash mark, property P1082,
2196
01:54:14,420 --> 01:54:18,751
that's population,
type from what item?
2197
01:54:18,751 --> 01:54:19,250
Right?
2198
01:54:19,250 --> 01:54:21,650
Cause I'm pulling
an arbitrary number.
2199
01:54:21,650 --> 01:54:23,570
I could put any
property in any item
2200
01:54:23,570 --> 01:54:27,020
here, and kind of include
it, embedded, into my text.
2201
01:54:27,020 --> 01:54:29,630
This isn't even about-- you
notice this is my user page.
2202
01:54:29,630 --> 01:54:32,480
This isn't even the article
about San Francisco.
2203
01:54:32,480 --> 01:54:35,210
I just want to pull that
number into this thing
2204
01:54:35,210 --> 01:54:36,410
that I'm writing.
2205
01:54:36,410 --> 01:54:38,820
So it's fairly simple.
2206
01:54:38,820 --> 01:54:40,970
I identify the property.
2207
01:54:40,970 --> 01:54:43,440
I identify the item
to take it from.
2208
01:54:43,440 --> 01:54:47,120
And Wikidata will,
I mean Wikipedia,
2209
01:54:47,120 --> 01:54:50,480
or the wiki I'm on, in this
case meta, will go to Wikipedia
2210
01:54:50,480 --> 01:54:52,820
and fetch it for me.
2211
01:54:52,820 --> 01:54:56,480
Likewise, since Denny Vrandecic,
the designer of Wikidata
2212
01:54:56,480 --> 01:55:01,370
is item 18618629, right?
2213
01:55:01,370 --> 01:55:04,790
I mean, he's a notable person,
so he has a Wikidata entity.
2214
01:55:04,790 --> 01:55:09,160
And since occupation is property
106, and date of birth is 569,
2215
01:55:09,160 --> 01:55:12,290
and place of birth
is 19, because
2216
01:55:12,290 --> 01:55:14,720
of all that I can tell you
that Vrandecic was born
2217
01:55:14,720 --> 01:55:19,130
in Stuttgart, on this date,
and is researcher, programmer,
2218
01:55:19,130 --> 01:55:20,850
and computer scientist.
2219
01:55:20,850 --> 01:55:25,010
If you look at the source for
this page, click Edit Source,
2220
01:55:25,010 --> 01:55:28,700
you can see that the word
Stuttgart does not appear here,
2221
01:55:28,700 --> 01:55:30,530
because it came from Wikidata.
2222
01:55:30,530 --> 01:55:34,171
I did not write this into
my little demo page here.
2223
01:55:34,171 --> 01:55:34,670
See?
2224
01:55:34,670 --> 01:55:37,380
Place of birth is--
2225
01:55:37,380 --> 01:55:37,880
where is it?
2226
01:55:37,880 --> 01:55:38,380
Here.
2227
01:55:38,380 --> 01:55:43,790
Born in property 19 from
queue number so-and-so.
2228
01:55:43,790 --> 01:55:46,970
That is how easy
it is to pull stuff
2229
01:55:46,970 --> 01:55:51,890
into a wiki from Wikidata.
2230
01:55:51,890 --> 01:55:55,280
OK now there's
some nuance to it.
2231
01:55:55,280 --> 01:55:57,470
And there's there are
some additional parameters
2232
01:55:57,470 --> 01:55:58,130
you can give.
2233
01:55:58,130 --> 01:56:00,230
And you can ask
Wikidata to give you
2234
01:56:00,230 --> 01:56:03,635
not just the text of the values,
but actually make it links.
2235
01:56:03,635 --> 01:56:06,750
2236
01:56:06,750 --> 01:56:14,825
So, for example, if I change
this from property to values--
2237
01:56:14,825 --> 01:56:25,950
2238
01:56:25,950 --> 01:56:29,142
No, that did not work at all.
2239
01:56:29,142 --> 01:56:29,850
Wasn't it values?
2240
01:56:29,850 --> 01:56:30,350
What was it?
2241
01:56:30,350 --> 01:56:33,370
2242
01:56:33,370 --> 01:56:34,614
Values and then--
2243
01:56:34,614 --> 01:57:19,265
2244
01:57:19,265 --> 01:57:19,890
Oh, statements.
2245
01:57:19,890 --> 01:57:20,710
My bad, sorry.
2246
01:57:20,710 --> 01:57:22,980
The Magic word is statements.
2247
01:57:22,980 --> 01:57:24,010
Statements.
2248
01:57:24,010 --> 01:57:28,680
So going back here.
2249
01:57:28,680 --> 01:57:35,385
If I change the word property
to the word statements
2250
01:57:35,385 --> 01:57:40,890
here then this same value--
2251
01:57:40,890 --> 01:57:43,300
that did not work at all.
2252
01:57:43,300 --> 01:57:46,690
Oh, because I'm on meta.
2253
01:57:46,690 --> 01:57:48,670
So because I'm on
meta, meta doesn't
2254
01:57:48,670 --> 01:57:52,230
have an article named
researcher, programmer,
2255
01:57:52,230 --> 01:57:53,500
or computer scientist.
2256
01:57:53,500 --> 01:57:55,120
But Wikipedia does.
2257
01:57:55,120 --> 01:58:00,210
If I included this same
syntax in Wikipedia,
2258
01:58:00,210 --> 01:58:02,950
like English Wikipedia,
for example--
2259
01:58:02,950 --> 01:58:04,855
So let's go there right now.
2260
01:58:04,855 --> 01:58:11,240
2261
01:58:11,240 --> 01:58:13,480
And go-- go to my--
2262
01:58:13,480 --> 01:58:18,550
2263
01:58:18,550 --> 01:58:19,345
Go to my sandbox.
2264
01:58:19,345 --> 01:58:23,090
2265
01:58:23,090 --> 01:58:27,982
If I just brutally paste
this on my sandbox here--
2266
01:58:27,982 --> 01:58:32,690
2267
01:58:32,690 --> 01:58:35,810
So, see, these became links.
2268
01:58:35,810 --> 01:58:39,740
Because Wikipedia has an article
called programmer and computer
2269
01:58:39,740 --> 01:58:40,910
scientist.
2270
01:58:40,910 --> 01:58:43,460
So, like I said, there's
some additional nuance
2271
01:58:43,460 --> 01:58:44,840
to the embedding.
2272
01:58:44,840 --> 01:58:47,030
The important thing
is that this is
2273
01:58:47,030 --> 01:58:51,470
the key to delivering on that
first problem that I mentioned.
2274
01:58:51,470 --> 01:58:55,970
How to get data from
a central location
2275
01:58:55,970 --> 01:58:58,850
onto your wiki in your language.
2276
01:58:58,850 --> 01:59:04,460
Basically using property and
statements magic incantations.
2277
01:59:04,460 --> 01:59:07,100
And of course,
usually, this would be
2278
01:59:07,100 --> 01:59:10,010
in the context of an info box.
2279
01:59:10,010 --> 01:59:14,180
Some wikis-- English Wikipedia
is not leading the way there.
2280
01:59:14,180 --> 01:59:16,490
Some smaller wikis
are more advanced
2281
01:59:16,490 --> 01:59:22,070
actually in integrating
Wikidata embeddings like this
2282
01:59:22,070 --> 01:59:24,620
into their info boxes.
2283
01:59:24,620 --> 01:59:26,300
So that instead of
the info box just
2284
01:59:26,300 --> 01:59:30,620
being a template on the wiki
with field equals value,
2285
01:59:30,620 --> 01:59:31,685
field equals value.
2286
01:59:31,685 --> 01:59:35,700
That template of the
info box on the wiki
2287
01:59:35,700 --> 01:59:40,160
pulls the values, the birthdate,
the languages, et cetera,
2288
01:59:40,160 --> 01:59:44,210
pulls them from Wikidata.
2289
01:59:44,210 --> 01:59:49,820
So basically just-- I just
demonstrated single calls
2290
01:59:49,820 --> 01:59:52,550
to this, but of course
an info box template
2291
01:59:52,550 --> 01:59:56,270
would include maybe
20 or 40 such embeds,
2292
01:59:56,270 --> 01:59:57,710
and that is not a problem.
2293
01:59:57,710 --> 02:00:01,460
Of course, before you go and
edit the English Wikipedia's
2294
02:00:01,460 --> 02:00:06,050
info box person and replace
it all with Wikidata embeds,
2295
02:00:06,050 --> 02:00:09,050
you should discuss it with the
English Wikipedia community.
2296
02:00:09,050 --> 02:00:12,000
These discussions have
already been taking place.
2297
02:00:12,000 --> 02:00:13,640
There are some
concerns about how
2298
02:00:13,640 --> 02:00:17,150
to patrol this, how to keep
it newbie friendly, et cetera.
2299
02:00:17,150 --> 02:00:20,690
So there are legitimate concerns
with just moving everything
2300
02:00:20,690 --> 02:00:22,910
to be embedded from Wikidata.
2301
02:00:22,910 --> 02:00:26,450
But the communities are
gradually handling this.
2302
02:00:26,450 --> 02:00:29,390
I mean this ability to embed
from Wikidata is not very old.
2303
02:00:29,390 --> 02:00:31,550
It's been around
for about a year.
2304
02:00:31,550 --> 02:00:35,150
So communities are
still working on kind
2305
02:00:35,150 --> 02:00:37,560
of integrating that technology.
2306
02:00:37,560 --> 02:00:40,190
But that is that is kind
of just the basics of how
2307
02:00:40,190 --> 02:00:44,210
to pull data, individual bits
of data, that's not querying,
2308
02:00:44,210 --> 02:00:47,330
that's not asking those sweeping
questions that I was talking
2309
02:00:47,330 --> 02:00:48,850
about yet.
2310
02:00:48,850 --> 02:00:50,720
We'll get to that
right now this is
2311
02:00:50,720 --> 02:00:55,310
how to pull a specific datum,
a specific piece of data,
2312
02:00:55,310 --> 02:00:57,395
from Wikidata.
2313
02:00:57,395 --> 02:01:01,530
2314
02:01:01,530 --> 02:01:02,530
OK.
2315
02:01:02,530 --> 02:01:07,080
So here's another quick
thing to demonstrate
2316
02:01:07,080 --> 02:01:09,880
before we go to
queries, and that
2317
02:01:09,880 --> 02:01:12,010
is the article placeholder.
2318
02:01:12,010 --> 02:01:15,010
The article placeholder
is a feature
2319
02:01:15,010 --> 02:01:19,660
that is being tested on the
Esperanto Wikipedia, and maybe
2320
02:01:19,660 --> 02:01:22,180
another wiki, I don't remember.
2321
02:01:22,180 --> 02:01:28,490
And it is using the
potential of Wikidata
2322
02:01:28,490 --> 02:01:32,690
to offer a placeholder
for an article.
2323
02:01:32,690 --> 02:01:37,940
An automatically generated
Wikidata powered replacement
2324
02:01:37,940 --> 02:01:41,720
placeholder for an article
for articles that don't yet
2325
02:01:41,720 --> 02:01:45,950
exist on Esperanto.
2326
02:01:45,950 --> 02:01:50,440
So let's go to the
Esperanto Wikipedia.
2327
02:01:50,440 --> 02:01:52,440
I don't speak Esperanto.
2328
02:01:52,440 --> 02:01:56,760
But let's look for Helen
Dewitt, our friend,
2329
02:01:56,760 --> 02:01:58,170
in Esperanto Wikipedia.
2330
02:01:58,170 --> 02:02:00,270
Now Esperanto is not
one of the Wikipedias
2331
02:02:00,270 --> 02:02:03,060
that have an article
about Helen Dewitt.
2332
02:02:03,060 --> 02:02:04,890
And so it tells me that, right?
2333
02:02:04,890 --> 02:02:06,570
There is no Helen Dewitt.
2334
02:02:06,570 --> 02:02:08,670
Maybe you were looking
for Helena Dewitt.
2335
02:02:08,670 --> 02:02:10,200
No, I was not.
2336
02:02:10,200 --> 02:02:13,650
You can start an article
about Helen Dewitt.
2337
02:02:13,650 --> 02:02:15,390
You can search.
2338
02:02:15,390 --> 02:02:17,820
You know, there's
all this stuff.
2339
02:02:17,820 --> 02:02:24,180
But there is also this
little option here, hiding,
2340
02:02:24,180 --> 02:02:30,640
which tells me that the
Esperanto Wikipedia is--
2341
02:02:30,640 --> 02:02:31,580
what's happening here?
2342
02:02:31,580 --> 02:02:35,140
2343
02:02:35,140 --> 02:02:35,890
Yes.
2344
02:02:35,890 --> 02:02:40,520
The Esperanto Wikipedia is
ready to give me this page.
2345
02:02:40,520 --> 02:02:44,020
This page, as you can see, it's
on the Esperanto Wikipedia,
2346
02:02:44,020 --> 02:02:46,090
but it's not an article.
2347
02:02:46,090 --> 02:02:47,480
See, it's a special page.
2348
02:02:47,480 --> 02:02:49,700
It's machine generated.
2349
02:02:49,700 --> 02:02:52,150
You can see the URL as well.
2350
02:02:52,150 --> 02:02:54,410
It's not, you know,
slash Helen Dewitt.
2351
02:02:54,410 --> 02:02:58,450
It's slash specialio,
about topic,
2352
02:02:58,450 --> 02:03:01,570
and then the Wikidata
ID of Helen Dewitt.
2353
02:03:01,570 --> 02:03:03,760
And what I get here--
2354
02:03:03,760 --> 02:03:05,860
I get an English
description, by the way,
2355
02:03:05,860 --> 02:03:08,300
because there is no
Esperanto description.
2356
02:03:08,300 --> 02:03:10,420
Wikidata can't make it up.
2357
02:03:10,420 --> 02:03:13,600
But what it can do is
offer me these pieces
2358
02:03:13,600 --> 02:03:16,960
of data in my language,
in this case Esperanto.
2359
02:03:16,960 --> 02:03:18,921
I'm on the Esperanto Wikipedia.
2360
02:03:18,921 --> 02:03:19,420
OK.
2361
02:03:19,420 --> 02:03:23,380
So it tells me that she's
American, for example,
2362
02:03:23,380 --> 02:03:26,090
and it tells me
that in Esperanto.
2363
02:03:26,090 --> 02:03:29,350
OK and it tells me
that she speaks Latin.
2364
02:03:29,350 --> 02:03:32,410
Remember we taught
Wikidata that?
2365
02:03:32,410 --> 02:03:35,800
It tells me that she
was educated in Oxford,
2366
02:03:35,800 --> 02:03:38,050
you know, and gives me the
references to the extent
2367
02:03:38,050 --> 02:03:39,130
that they exist.
2368
02:03:39,130 --> 02:03:41,560
I mean this is not an article.
2369
02:03:41,560 --> 02:03:46,650
It's not, you know, paragraphs
of fluent Esperanto text.
2370
02:03:46,650 --> 02:03:50,190
But it is information
that I can understand
2371
02:03:50,190 --> 02:03:51,960
if I speak this language.
2372
02:03:51,960 --> 02:03:55,380
And it's better than nothing.
2373
02:03:55,380 --> 02:04:00,120
And remember Helen Dewitt was
not a very detailed article.
2374
02:04:00,120 --> 02:04:03,690
If I were to ask about, I
don't know, some politician,
2375
02:04:03,690 --> 02:04:08,340
or popular singer that
has more data in Wikidata,
2376
02:04:08,340 --> 02:04:12,690
than this machine generated
thing would have been richer.
2377
02:04:12,690 --> 02:04:16,320
So this feature is available
and is under beta testing
2378
02:04:16,320 --> 02:04:19,530
right now, but generally if
this sounds interesting for you
2379
02:04:19,530 --> 02:04:21,600
especially if you come
from a smaller wiki that
2380
02:04:21,600 --> 02:04:25,230
is missing a lot of articles
that people may want to learn
2381
02:04:25,230 --> 02:04:28,320
about, you can contact
the Wikimedia foundation
2382
02:04:28,320 --> 02:04:33,486
and ask for article placeholder
to be enabled on your wiki.
2383
02:04:33,486 --> 02:04:34,860
And again, this
is a placeholder.
2384
02:04:34,860 --> 02:04:37,890
Of course, it exists only
until someone actually
2385
02:04:37,890 --> 02:04:43,290
writes a proper Esperanto
article about Helen Dewitt.
2386
02:04:43,290 --> 02:04:45,060
So I hope this is clear.
2387
02:04:45,060 --> 02:04:50,810
This is all coming from
Wikidata on the fly.
2388
02:04:50,810 --> 02:04:51,470
In real time.
2389
02:04:51,470 --> 02:04:57,500
As you can see it includes my
latest edits to Helen Dewitt.
2390
02:04:57,500 --> 02:04:58,940
OK.
2391
02:04:58,940 --> 02:05:05,250
Questions about the-- questions
about the article placeholder?
2392
02:05:05,250 --> 02:05:09,580
If there are try and
put them on the channel.
2393
02:05:09,580 --> 02:05:13,300
And this brings us to one of
the main courses of this talk,
2394
02:05:13,300 --> 02:05:15,270
which is querying Wikidata.
2395
02:05:15,270 --> 02:05:18,660
So I've explained
how Wikidata works.
2396
02:05:18,660 --> 02:05:19,680
We've walked through it.
2397
02:05:19,680 --> 02:05:20,850
We've added to it.
2398
02:05:20,850 --> 02:05:22,800
We've created a new item.
2399
02:05:22,800 --> 02:05:26,360
We learned how to contribute
during our commutes.
2400
02:05:26,360 --> 02:05:30,150
And all this was you
kept promising us,
2401
02:05:30,150 --> 02:05:32,050
Asaf, that this would be--
2402
02:05:32,050 --> 02:05:34,690
this would enable
these amazing queries.
2403
02:05:34,690 --> 02:05:37,960
So time to make good on that.
2404
02:05:37,960 --> 02:05:42,880
The URL you need to remember
is query.wikidata.org.
2405
02:05:42,880 --> 02:05:49,390
And that will take you
to a query system that
2406
02:05:49,390 --> 02:05:52,510
uses a language called SPARQL.
2407
02:05:52,510 --> 02:05:58,150
SPARQL, spelt with
a Q. This language
2408
02:05:58,150 --> 02:06:01,690
is not a Wikimedia creation.
2409
02:06:01,690 --> 02:06:06,010
It's a standardized language
used for querying linked data
2410
02:06:06,010 --> 02:06:07,540
sources.
2411
02:06:07,540 --> 02:06:10,720
And because of that
there are there
2412
02:06:10,720 --> 02:06:14,590
are certain usability prices
that we pay for using SPARQL,
2413
02:06:14,590 --> 02:06:16,010
for using a standard language.
2414
02:06:16,010 --> 02:06:19,570
It's not completely custom
made for querying Wikidata,
2415
02:06:19,570 --> 02:06:21,740
and we'll see that
in just a moment.
2416
02:06:21,740 --> 02:06:23,530
The principle to
remember about Wikidata
2417
02:06:23,530 --> 02:06:27,880
query is that Wikidata will
tell you everything it knows,
2418
02:06:27,880 --> 02:06:29,470
but no more.
2419
02:06:29,470 --> 02:06:32,440
I have anticipated this
several times already, right?
2420
02:06:32,440 --> 02:06:35,980
Until this moment when
we taught Wikidata data
2421
02:06:35,980 --> 02:06:38,590
that Helen Dewitt
speaks Latin, she
2422
02:06:38,590 --> 02:06:41,500
would not have appeared
in query results
2423
02:06:41,500 --> 02:06:45,974
asking who are American
writers who speak Latin?
2424
02:06:45,974 --> 02:06:47,140
She would not have appeared.
2425
02:06:47,140 --> 02:06:49,090
But as of this
afternoon, she will
2426
02:06:49,090 --> 02:06:52,950
appear because I've added
that piece of information.
2427
02:06:52,950 --> 02:07:01,380
So a result of that principle
is that you can never say,
2428
02:07:01,380 --> 02:07:05,950
well I ran a Wikidata
query and this
2429
02:07:05,950 --> 02:07:11,510
is the list of Flemish painters
who are sons of painters.
2430
02:07:11,510 --> 02:07:12,310
The list.
2431
02:07:12,310 --> 02:07:14,110
That these are all
the Flemish painters
2432
02:07:14,110 --> 02:07:15,220
who are sons of painters.
2433
02:07:15,220 --> 02:07:19,390
That is never something you can
say based on a Wikidata query,
2434
02:07:19,390 --> 02:07:22,390
because of course, maybe
not all the Flemish painters
2435
02:07:22,390 --> 02:07:26,020
who are sons of painters have
been expressed in Wikidata data
2436
02:07:26,020 --> 02:07:26,760
yet.
2437
02:07:26,760 --> 02:07:28,840
Wikidata doesn't know
about some of them,
2438
02:07:28,840 --> 02:07:30,340
or maybe it knows
about all of them
2439
02:07:30,340 --> 02:07:32,500
but doesn't know
the important fact
2440
02:07:32,500 --> 02:07:35,200
that this person is
the son of that person,
2441
02:07:35,200 --> 02:07:38,740
because those properties
have not been added.
2442
02:07:38,740 --> 02:07:40,940
And so they cannot be
included in the results.
2443
02:07:40,940 --> 02:07:42,550
So the results of
a Wikidata query
2444
02:07:42,550 --> 02:07:46,870
are never the definitive sets.
2445
02:07:46,870 --> 02:07:49,600
What you can say about
a Wikidata query is here
2446
02:07:49,600 --> 02:07:52,840
are some Flemish painters
who are sons of painters.
2447
02:07:52,840 --> 02:07:56,260
Here are some cities
with female mayors.
2448
02:07:56,260 --> 02:07:58,270
Whatever it is
you're querying about
2449
02:07:58,270 --> 02:08:01,030
is never guaranteed
to be complete
2450
02:08:01,030 --> 02:08:03,580
because Wikidata,
like Wikipedia, is
2451
02:08:03,580 --> 02:08:05,530
a work in progress.
2452
02:08:05,530 --> 02:08:13,240
And of course, the more
we teach Wikidata the
2453
02:08:13,240 --> 02:08:16,240
more useful it becomes.
2454
02:08:16,240 --> 02:08:22,520
OK so lets go and
see those queries.
2455
02:08:22,520 --> 02:08:25,990
So this is query.wikidata.org.
2456
02:08:25,990 --> 02:08:29,000
It's not the wiki.
2457
02:08:29,000 --> 02:08:29,500
All right?
2458
02:08:29,500 --> 02:08:32,530
So this isn't like some
page on the wiki itself.
2459
02:08:32,530 --> 02:08:35,099
This is kind of an
external system.
2460
02:08:35,099 --> 02:08:35,890
So it's not a wiki.
2461
02:08:35,890 --> 02:08:37,960
You can see I don't
have a user page here.
2462
02:08:37,960 --> 02:08:39,520
I don't have a history tab.
2463
02:08:39,520 --> 02:08:40,960
This isn't a wiki page.
2464
02:08:40,960 --> 02:08:44,560
This is a special kind
of tool or system.
2465
02:08:44,560 --> 02:08:51,330
And it invites me to
input a SPARQL query.
2466
02:08:51,330 --> 02:08:55,060
Now most of us do
not speak SPARQL.
2467
02:08:55,060 --> 02:08:59,800
It's a a technical language.
2468
02:08:59,800 --> 02:09:01,720
It's a query language.
2469
02:09:01,720 --> 02:09:06,760
Some of you may be thinking
about SQL, the database query
2470
02:09:06,760 --> 02:09:08,500
language.
2471
02:09:08,500 --> 02:09:13,330
SPARQL is named with kind
of a wink, or a nod, to SQL.
2472
02:09:13,330 --> 02:09:17,440
But, I warn you, if
you are comfortable in
2473
02:09:17,440 --> 02:09:22,750
SQL don't expect to carry
over your knowledge of SQL
2474
02:09:22,750 --> 02:09:23,550
into SPARQL.
2475
02:09:23,550 --> 02:09:26,140
They're not the same.
2476
02:09:26,140 --> 02:09:27,940
They are superficially similar.
2477
02:09:27,940 --> 02:09:28,440
Right?
2478
02:09:28,440 --> 02:09:31,530
So they both use
the keyword select,
2479
02:09:31,530 --> 02:09:35,010
and they use the word where,
and they use things like limit,
2480
02:09:35,010 --> 02:09:35,770
and order.
2481
02:09:35,770 --> 02:09:38,190
So again, if you know
this already from SQL
2482
02:09:38,190 --> 02:09:40,500
those mean roughly
the same things,
2483
02:09:40,500 --> 02:09:44,550
but don't expect it to
behave just like SQL.
2484
02:09:44,550 --> 02:09:49,800
You do need to spend some time
understanding how SPARQL works.
2485
02:09:49,800 --> 02:09:52,560
So, by all means, I
invite you to go and read
2486
02:09:52,560 --> 02:09:55,680
one of the many fine
SPARQL tutorials that
2487
02:09:55,680 --> 02:09:59,590
are out there on the web, or
to click the Help button here,
2488
02:09:59,590 --> 02:10:03,930
which also includes
help about SPARQL.
2489
02:10:03,930 --> 02:10:08,440
But I also know
that most of us when
2490
02:10:08,440 --> 02:10:12,580
we want to do some advanced
formatting on wiki,
2491
02:10:12,580 --> 02:10:16,090
for example, we don't go
and read the help page
2492
02:10:16,090 --> 02:10:18,220
on templates, right?
2493
02:10:18,220 --> 02:10:21,460
We go to a page that already
does what we want to do,
2494
02:10:21,460 --> 02:10:27,430
and adopt and adapt the code
from that other page, right?
2495
02:10:27,430 --> 02:10:30,610
So we just take something that
does roughly what we want,
2496
02:10:30,610 --> 02:10:33,280
and just copy it over and
change what we need to change.
2497
02:10:33,280 --> 02:10:35,620
That is a very pragmatic
and reasonable way
2498
02:10:35,620 --> 02:10:37,420
to do things which is why--
2499
02:10:37,420 --> 02:10:39,850
and the wiki data
engineers know this,
2500
02:10:39,850 --> 02:10:43,300
which is why they prepared
this very handy button for us
2501
02:10:43,300 --> 02:10:45,580
called examples.
2502
02:10:45,580 --> 02:10:47,710
We click the examples button.
2503
02:10:47,710 --> 02:10:52,390
And, oh my god, there is a ton
of-- well there's 312 example
2504
02:10:52,390 --> 02:10:55,582
queries for us to choose from.
2505
02:10:55,582 --> 02:10:57,040
And we can just
pick something that
2506
02:10:57,040 --> 02:11:00,310
is roughly like what
we're trying to find out,
2507
02:11:00,310 --> 02:11:02,740
and then just change
what needs changing.
2508
02:11:02,740 --> 02:11:05,410
So let's take a very simple one.
2509
02:11:05,410 --> 02:11:07,020
The cats query.
2510
02:11:07,020 --> 02:11:10,270
Maybe one of the simplest
you could possibly have.
2511
02:11:10,270 --> 02:11:13,510
And let's run it first
and then I'll kind of
2512
02:11:13,510 --> 02:11:16,420
walk you through it.
2513
02:11:16,420 --> 02:11:18,460
The goal here is not
to teach you SPARQL,
2514
02:11:18,460 --> 02:11:20,860
but to get you to be kind
of literate in SPARQL.
2515
02:11:20,860 --> 02:11:23,980
To kind of understand why
this does what it does.
2516
02:11:23,980 --> 02:11:25,730
So let's run this query first.
2517
02:11:25,730 --> 02:11:31,390
We click Run and here I
have results at the bottom.
2518
02:11:31,390 --> 02:11:34,060
The item, which is
just a Wikidata item,
2519
02:11:34,060 --> 02:11:35,290
which of course is a number.
2520
02:11:35,290 --> 02:11:38,860
Remember, wiki data thinks
of items as queue numbers.
2521
02:11:38,860 --> 02:11:40,900
And the label,
because we're humans
2522
02:11:40,900 --> 02:11:43,190
and we prefer words to numbers.
2523
02:11:43,190 --> 02:11:49,870
So these 114 results
are all the cats
2524
02:11:49,870 --> 02:11:53,310
that wiki data knows about.
2525
02:11:53,310 --> 02:11:55,380
Is this all the
cats in the world?
2526
02:11:55,380 --> 02:11:57,320
No of course not, remember?
2527
02:11:57,320 --> 02:11:59,730
It's all the cats Wikidata
knows about, which
2528
02:11:59,730 --> 02:12:01,410
means they're somehow notable.
2529
02:12:01,410 --> 02:12:05,130
I mean someone bothered to
describe them on Wikidata.
2530
02:12:05,130 --> 02:12:12,570
And Wikidata was told this
item is an instance of cat.
2531
02:12:12,570 --> 02:12:13,620
Right?
2532
02:12:13,620 --> 02:12:17,040
So these are those cats.
2533
02:12:17,040 --> 02:12:18,540
And we can click any of them.
2534
02:12:18,540 --> 02:12:20,190
I don't know,
Pixel, for example.
2535
02:12:20,190 --> 02:12:21,780
Click the Wikipedia item.
2536
02:12:21,780 --> 02:12:24,090
And here is the Wikidata
item about Pixel
2537
02:12:24,090 --> 02:12:25,860
with the queue number.
2538
02:12:25,860 --> 02:12:28,980
And he is a tortoiseshell cat.
2539
02:12:28,980 --> 02:12:32,640
And as you can see
instance of cat.
2540
02:12:32,640 --> 02:12:33,610
OK.
2541
02:12:33,610 --> 02:12:37,220
And he is five inches high.
2542
02:12:37,220 --> 02:12:41,780
And he is apparently documented
in Indonesian, In Bahasa.
2543
02:12:41,780 --> 02:12:45,080
Right here this is Pixel.
2544
02:12:45,080 --> 02:12:50,060
And he is apparently somehow
related to the Guinness World
2545
02:12:50,060 --> 02:12:52,160
Records book.
2546
02:12:52,160 --> 02:12:54,650
I don't speak Bahasa, so
I don't know exactly why
2547
02:12:54,650 --> 02:12:56,120
this cat is so notable.
2548
02:12:56,120 --> 02:12:58,889
But, of course, cats
can become notable
2549
02:12:58,889 --> 02:12:59,930
for all kinds of reasons.
2550
02:12:59,930 --> 02:13:02,204
Maybe they're a
YouTube sensation,
2551
02:13:02,204 --> 02:13:03,620
you know, maybe
they were involved
2552
02:13:03,620 --> 02:13:05,330
in some historical event.
2553
02:13:05,330 --> 02:13:09,410
I like this cat named Gladstone.
2554
02:13:09,410 --> 02:13:16,590
This cat named Gladstone is--
2555
02:13:16,590 --> 02:13:19,950
he has position
held Chief Mouser
2556
02:13:19,950 --> 02:13:22,320
to Her Majesty's Treasury.
2557
02:13:22,320 --> 02:13:25,230
This is an official
cat with a job.
2558
02:13:25,230 --> 02:13:29,190
And he has been holding this
job, mind you, since the 28th
2559
02:13:29,190 --> 02:13:31,570
of June this past year.
2560
02:13:31,570 --> 02:13:32,970
That's the start time.
2561
02:13:32,970 --> 02:13:35,760
And there is no end time
which means he currently
2562
02:13:35,760 --> 02:13:38,850
holds the position
of Chief Mouser
2563
02:13:38,850 --> 02:13:40,470
to her Majesty's Treasury.
2564
02:13:40,470 --> 02:13:42,750
His employer is Her
Majesty's Treasury.
2565
02:13:42,750 --> 02:13:44,290
He's a male creature.
2566
02:13:44,290 --> 02:13:46,650
And Wikidata knows
that this cat is
2567
02:13:46,650 --> 02:13:53,127
named after William Gladstone,
the Victorian prime minister.
2568
02:13:53,127 --> 02:13:54,960
Of course if I don't
know who this person is
2569
02:13:54,960 --> 02:13:57,540
I can click through
and learn that he
2570
02:13:57,540 --> 02:14:01,860
was a liberal politician
and prime minister, right?
2571
02:14:01,860 --> 02:14:03,390
He even has a Twitter account.
2572
02:14:03,390 --> 02:14:05,910
And Wikidata sends
me right to it.
2573
02:14:05,910 --> 02:14:08,040
The treasury cat
Twitter account.
2574
02:14:08,040 --> 02:14:11,010
And he has articles in
German, and English,
2575
02:14:11,010 --> 02:14:15,520
and of course Japanese,
because he's a cat.
2576
02:14:15,520 --> 02:14:16,020
All right.
2577
02:14:16,020 --> 02:14:19,500
So this was a very simple query.
2578
02:14:19,500 --> 02:14:21,400
Let's find out why it works.
2579
02:14:21,400 --> 02:14:21,900
OK.
2580
02:14:21,900 --> 02:14:25,800
So what did we actually
tell Wikidata to do for us?
2581
02:14:25,800 --> 02:14:31,650
We said, please select
some items for us
2582
02:14:31,650 --> 02:14:33,580
along with their labels.
2583
02:14:33,580 --> 02:14:34,080
OK?
2584
02:14:34,080 --> 02:14:36,180
Along with their
human readable labels
2585
02:14:36,180 --> 02:14:42,010
because if I remove this
label what I get is, see,
2586
02:14:42,010 --> 02:14:44,200
just a list of item numbers.
2587
02:14:44,200 --> 02:14:45,280
That's not as fun.
2588
02:14:45,280 --> 02:14:46,930
So that's what this
little bit did.
2589
02:14:46,930 --> 02:14:49,630
I just said, give me the
items, but also they're
2590
02:14:49,630 --> 02:14:52,330
human readable label.
2591
02:14:52,330 --> 02:14:54,620
And I want you to
select a bunch of items,
2592
02:14:54,620 --> 02:14:56,770
but not just any
random bunch of items,
2593
02:14:56,770 --> 02:15:01,210
I want to select items where
a certain condition holds.
2594
02:15:01,210 --> 02:15:02,790
What is the condition?
2595
02:15:02,790 --> 02:15:06,430
The condition is that the
item that I want you to select
2596
02:15:06,430 --> 02:15:14,360
needs to have property
31 with a value of Q146.
2597
02:15:14,360 --> 02:15:15,670
Well, that's helpful.
2598
02:15:15,670 --> 02:15:18,070
If I hover over these numbers--
2599
02:15:18,070 --> 02:15:19,750
Again, I get the human
readable version.
2600
02:15:19,750 --> 02:15:23,530
So I'm looking for
items that have property
2601
02:15:23,530 --> 02:15:28,841
instance of with the value cat.
2602
02:15:28,841 --> 02:15:29,340
Right?
2603
02:15:29,340 --> 02:15:31,173
Because that's literally
what I want, right?
2604
02:15:31,173 --> 02:15:33,960
I want all the items that have
a property, a statement, that
2605
02:15:33,960 --> 02:15:36,840
says instance of cat.
2606
02:15:36,840 --> 02:15:37,950
That's the condition.
2607
02:15:37,950 --> 02:15:41,640
I'm not interested in items
that are instance of book,
2608
02:15:41,640 --> 02:15:43,200
or instance of human.
2609
02:15:43,200 --> 02:15:46,290
I'm interested in
instance of cat.
2610
02:15:46,290 --> 02:15:51,090
That is the only condition
here in this query.
2611
02:15:51,090 --> 02:15:55,800
This complicated line I ask
you to basically ignore.
2612
02:15:55,800 --> 02:15:57,510
This is one of those
sacrifices that we
2613
02:15:57,510 --> 02:16:00,720
make for using a standard
language like SPARQL.
2614
02:16:00,720 --> 02:16:02,820
But the role of this
complicated line
2615
02:16:02,820 --> 02:16:04,920
is to basically
ensure that we get
2616
02:16:04,920 --> 02:16:07,860
the English label for that cat.
2617
02:16:07,860 --> 02:16:08,817
OK?
2618
02:16:08,817 --> 02:16:09,900
So don't worry about that.
2619
02:16:09,900 --> 02:16:11,550
Just leave it there.
2620
02:16:11,550 --> 02:16:13,320
And we run the query
and we get the list
2621
02:16:13,320 --> 02:16:17,330
of cats with their English
labels, and that is awesome.
2622
02:16:17,330 --> 02:16:21,510
By the way, if I change EN,
without really understanding
2623
02:16:21,510 --> 02:16:27,260
this line, if I change
EN to HE, for Hebrew,
2624
02:16:27,260 --> 02:16:30,160
I get the same results
with a Hebrew label.
2625
02:16:30,160 --> 02:16:33,670
Of course, these cats,
nobody bothered to give them
2626
02:16:33,670 --> 02:16:35,709
Hebrew labels unfortunately.
2627
02:16:35,709 --> 02:16:37,570
So I get the queue number.
2628
02:16:37,570 --> 02:16:42,874
But if I changed
it to Japanese, JA,
2629
02:16:42,874 --> 02:16:45,290
I would get still a bunch of
queue numbers for where there
2630
02:16:45,290 --> 02:16:47,389
isn't a Japanese label,
but I would get the labels
2631
02:16:47,389 --> 02:16:48,781
in Japanese.
2632
02:16:48,781 --> 02:16:49,280
OK?
2633
02:16:49,280 --> 02:16:51,260
So this is an example
of how you don't even
2634
02:16:51,260 --> 02:16:54,620
need to understand all
the syntax of this query
2635
02:16:54,620 --> 02:16:56,100
to adapt it to your needs.
2636
02:16:56,100 --> 02:16:58,070
If you want this
query as is, but you
2637
02:16:58,070 --> 02:17:00,320
want the labels in
Japanese, you can just
2638
02:17:00,320 --> 02:17:03,190
change the language code here.
2639
02:17:03,190 --> 02:17:06,559
OK so that is all
this query does.
2640
02:17:06,559 --> 02:17:08,870
Again, just give
me the items that
2641
02:17:08,870 --> 02:17:17,590
have property 31, instance of,
with a value 146, which is cat.
2642
02:17:17,590 --> 02:17:20,379
Let's take a question just
about this very simple query
2643
02:17:20,379 --> 02:17:25,809
before we advance to
more complicated queries.
2644
02:17:25,809 --> 02:17:29,200
Any questions just about this?
2645
02:17:29,200 --> 02:17:32,850
Like, did anyone kind of
really lose me talking
2646
02:17:32,850 --> 02:17:35,010
about this simple query?
2647
02:17:35,010 --> 02:17:39,389
Again, this query just tells
Wikidata, get me all the items
2648
02:17:39,389 --> 02:17:41,280
that somewhere among
their statements
2649
02:17:41,280 --> 02:17:44,219
have instance of cat.
2650
02:17:44,219 --> 02:17:46,670
That's the only condition.
2651
02:17:46,670 --> 02:17:47,740
No questions.
2652
02:17:47,740 --> 02:17:49,959
OK, feel free to ask if
you'd come up with one.
2653
02:17:49,959 --> 02:17:54,709
So let's complicate
things a little.
2654
02:17:54,709 --> 02:17:59,365
Let's ask only for male cats.
2655
02:17:59,365 --> 02:18:02,080
2656
02:18:02,080 --> 02:18:03,070
OK.
2657
02:18:03,070 --> 02:18:07,330
Remember this cat
Gladstone is male,
2658
02:18:07,330 --> 02:18:09,850
and we know this because
he has a property called
2659
02:18:09,850 --> 02:18:14,320
sex or gender, and the value
is male creature, right?
2660
02:18:14,320 --> 02:18:17,950
So let's add another
condition right here
2661
02:18:17,950 --> 02:18:19,860
under the first condition.
2662
02:18:19,860 --> 02:18:20,870
OK?
2663
02:18:20,870 --> 02:18:22,750
This is a new line.
2664
02:18:22,750 --> 02:18:24,940
And I'm adding a new
condition to the query.
2665
02:18:24,940 --> 02:18:30,520
I'm saying, not only do I
want this item that you return
2666
02:18:30,520 --> 02:18:35,469
to be instance of cat, I
also want this same item
2667
02:18:35,469 --> 02:18:39,280
to have another property,
the property sex or gender.
2668
02:18:39,280 --> 02:18:40,299
Right?
2669
02:18:40,299 --> 02:18:43,480
And I need to refer to
the property by number.
2670
02:18:43,480 --> 02:18:45,760
But don't worry,
Wikidata will help you.
2671
02:18:45,760 --> 02:18:49,500
So you start with this
prefix, Wikidata WDDT.
2672
02:18:49,500 --> 02:18:52,520
2673
02:18:52,520 --> 02:18:54,980
Again, just ignore
that prefix it's
2674
02:18:54,980 --> 02:18:58,940
one of the features of SPARQL
that we need to respect.
2675
02:18:58,940 --> 02:19:02,715
WDT colon, and then I can
just type control space
2676
02:19:02,715 --> 02:19:04,340
to do a search, to
do an auto complete.
2677
02:19:04,340 --> 02:19:08,090
So I can just type sex
and Wikidata helpfully
2678
02:19:08,090 --> 02:19:11,760
offers me a drop down
with relevant properties.
2679
02:19:11,760 --> 02:19:15,200
So I click property 21, which
is the sex or gender property.
2680
02:19:15,200 --> 02:19:17,629
And then I say, so I want
the sex or gender property
2681
02:19:17,629 --> 02:19:19,670
to have the Wikidata value.
2682
02:19:19,670 --> 02:19:21,799
Again, control space.
2683
02:19:21,799 --> 02:19:25,340
And I can just
say male creature.
2684
02:19:25,340 --> 02:19:25,850
See?
2685
02:19:25,850 --> 02:19:30,950
There's a different item
for male, as inhuman,
2686
02:19:30,950 --> 02:19:33,799
and a different one for
male creature, for reasons
2687
02:19:33,799 --> 02:19:34,910
that we won't go into.
2688
02:19:34,910 --> 02:19:36,535
Let's pick male
creature, because we're
2689
02:19:36,535 --> 02:19:38,040
talking about cats here.
2690
02:19:38,040 --> 02:19:38,540
All right.
2691
02:19:38,540 --> 02:19:42,080
And add a period here at
the end and click Run.
2692
02:19:42,080 --> 02:19:48,330
And instead of 114 cats, we get,
this time, we got 43 results.
2693
02:19:48,330 --> 02:19:53,360
Including our friend Gladstone
who is a male creature cat.
2694
02:19:53,360 --> 02:19:58,530
So that means all the
rest are female, right?
2695
02:19:58,530 --> 02:20:00,410
Wrong.
2696
02:20:00,410 --> 02:20:00,980
Wrong.
2697
02:20:00,980 --> 02:20:02,840
That does not mean that at all.
2698
02:20:02,840 --> 02:20:06,530
What it means is of
the 114 items that
2699
02:20:06,530 --> 02:20:11,960
have instance of cat,
only 43 have explicitly
2700
02:20:11,960 --> 02:20:14,690
sex male creature.
2701
02:20:14,690 --> 02:20:17,570
The rest of them do not.
2702
02:20:17,570 --> 02:20:21,800
Maybe because they have
sex female creature,
2703
02:20:21,800 --> 02:20:25,930
but maybe because they don't
have that property at all.
2704
02:20:25,930 --> 02:20:28,290
I'm emphasizing
this to kind of help
2705
02:20:28,290 --> 02:20:31,770
you train yourself to
correctly interpret
2706
02:20:31,770 --> 02:20:34,140
the results of
queries from Wikidata.
2707
02:20:34,140 --> 02:20:36,870
Don't jump into this kind
of simplistic conclusion,
2708
02:20:36,870 --> 02:20:41,820
OK there's 114 total, 43 male,
therefore the rest are female.
2709
02:20:41,820 --> 02:20:43,520
That is not correct.
2710
02:20:43,520 --> 02:20:45,030
OK?
2711
02:20:45,030 --> 02:20:49,740
But 43 of those explicitly
had another statement, sex
2712
02:20:49,740 --> 02:20:52,530
or gender, male creature.
2713
02:20:52,530 --> 02:20:55,020
So I just added
another condition,
2714
02:20:55,020 --> 02:20:58,290
and now my query is
asking two separate things
2715
02:20:58,290 --> 02:21:00,150
about the results.
2716
02:21:00,150 --> 02:21:04,472
They need to be a cat
and a male creature.
2717
02:21:04,472 --> 02:21:06,270
AUDIENCE: Maybe we
should see how many
2718
02:21:06,270 --> 02:21:08,100
cats have Twitter accounts.
2719
02:21:08,100 --> 02:21:11,440
But there is a
question from YouTube,
2720
02:21:11,440 --> 02:21:14,220
which is will you talk about
the export possibilities
2721
02:21:14,220 --> 02:21:17,280
of the result of the query?
2722
02:21:17,280 --> 02:21:18,420
ASAF BARTOV: Absolutely.
2723
02:21:18,420 --> 02:21:21,000
Absolutely I will in
just a little bit.
2724
02:21:21,000 --> 02:21:23,010
I mean there is, in
addition to just getting
2725
02:21:23,010 --> 02:21:28,350
this kind of table, I can get
these results in other formats.
2726
02:21:28,350 --> 02:21:30,360
And I can also
download these results.
2727
02:21:30,360 --> 02:21:32,820
I can click the Download
button and get them
2728
02:21:32,820 --> 02:21:35,070
as a comma separated
file, tab separated
2729
02:21:35,070 --> 02:21:38,910
file, a JSON file, which is
useful for programmatic uses.
2730
02:21:38,910 --> 02:21:40,590
I can also get a link.
2731
02:21:40,590 --> 02:21:42,330
So I can get a
link to this query.
2732
02:21:42,330 --> 02:21:45,990
I mean, I spent all this time
designing this beautiful query.
2733
02:21:45,990 --> 02:21:50,280
I can get a short URL that was
generated especially for me
2734
02:21:50,280 --> 02:21:52,170
right now with a tiny URL.
2735
02:21:52,170 --> 02:21:54,690
I can just paste this
into Twitter and go,
2736
02:21:54,690 --> 02:21:59,280
hey people look at all the male
cats that Wikidata knows about.
2737
02:21:59,280 --> 02:22:01,170
OK, this is not a
very exciting query.
2738
02:22:01,170 --> 02:22:03,900
But once I get to a really
complicated exciting query
2739
02:22:03,900 --> 02:22:07,650
I can totally share that
very easily through this.
2740
02:22:07,650 --> 02:22:09,750
And we will get to more
interesting queries
2741
02:22:09,750 --> 02:22:11,740
in just a second.
2742
02:22:11,740 --> 02:22:16,400
Any questions on this kind
of basic querying so far?
2743
02:22:16,400 --> 02:22:17,940
OK.
2744
02:22:17,940 --> 02:22:25,340
So that was a very
simple example.
2745
02:22:25,340 --> 02:22:30,250
Let's spend a moment exploring.
2746
02:22:30,250 --> 02:22:38,920
So this cat Gladstone was
named after this dude, William
2747
02:22:38,920 --> 02:22:43,550
Gladstone, who was an
important British politician.
2748
02:22:43,550 --> 02:22:45,760
I'm sure he's not the
only thing out there
2749
02:22:45,760 --> 02:22:48,970
in the universe that's named
after Gladstone, right?
2750
02:22:48,970 --> 02:22:52,120
I mean there has got
to be, I don't know,
2751
02:22:52,120 --> 02:22:54,790
park benches,
planets, asteroids,
2752
02:22:54,790 --> 02:22:59,590
something other than the
cat, named after this guy.
2753
02:22:59,590 --> 02:23:04,030
So we can ask Wikidata
to tell us all the things
2754
02:23:04,030 --> 02:23:06,850
that, you know, without
saying instance of something.
2755
02:23:06,850 --> 02:23:10,960
Like, I don't know, anything
named after William Gladstone.
2756
02:23:10,960 --> 02:23:12,760
So how do I do that?
2757
02:23:12,760 --> 02:23:15,310
Same principle.
2758
02:23:15,310 --> 02:23:19,850
Instead of asking about the
property instance of, property
2759
02:23:19,850 --> 02:23:25,360
31, instead of that, I
will ask about the property
2760
02:23:25,360 --> 02:23:26,860
named after--
2761
02:23:26,860 --> 02:23:29,120
sorry, named after--
2762
02:23:29,120 --> 02:23:30,830
I don't need to
remember the number.
2763
02:23:30,830 --> 02:23:32,240
I have auto-complete.
2764
02:23:32,240 --> 02:23:35,360
Named after is property 138.
2765
02:23:35,360 --> 02:23:37,430
And I want anything
at all that is
2766
02:23:37,430 --> 02:23:42,080
named after this person,
William Gladstone.
2767
02:23:42,080 --> 02:23:43,850
Here we go.
2768
02:23:43,850 --> 02:23:45,860
Which is 160852.
2769
02:23:45,860 --> 02:23:46,820
Whatever.
2770
02:23:46,820 --> 02:23:48,230
OK.
2771
02:23:48,230 --> 02:23:50,510
You notice I removed
instance of cat.
2772
02:23:50,510 --> 02:23:52,040
I remove the male creature.
2773
02:23:52,040 --> 02:23:55,130
I'm only asking,
get me all the items
2774
02:23:55,130 --> 02:23:58,940
that are somehow named after
that particular politician.
2775
02:23:58,940 --> 02:24:00,920
And I run the query,
and it turns out
2776
02:24:00,920 --> 02:24:05,007
the Wikidata knows
about three such things.
2777
02:24:05,007 --> 02:24:06,590
Does that mean that's
the only-- these
2778
02:24:06,590 --> 02:24:08,881
are the only three things
named after him in the world?
2779
02:24:08,881 --> 02:24:09,939
Of course not.
2780
02:24:09,939 --> 02:24:12,230
But these are the only three
items that are in Wikidata
2781
02:24:12,230 --> 02:24:17,720
and explicitly have the
property named after Gladstone.
2782
02:24:17,720 --> 02:24:20,150
For all I know, there
may be a village
2783
02:24:20,150 --> 02:24:23,600
in England called Gladstone
named after this person.
2784
02:24:23,600 --> 02:24:27,410
But if nobody added the
property, named after, linking
2785
02:24:27,410 --> 02:24:30,950
to the person, he wouldn't show
up in the results to my query.
2786
02:24:30,950 --> 02:24:33,750
So Wikidata knows about
three such things.
2787
02:24:33,750 --> 02:24:36,110
One of them is something
called the Gladstone Professor
2788
02:24:36,110 --> 02:24:37,360
of Government.
2789
02:24:37,360 --> 02:24:40,370
I can click through and see
that it's a chair at Oxford
2790
02:24:40,370 --> 02:24:41,180
University, right?
2791
02:24:41,180 --> 02:24:43,470
So it's a position.
2792
02:24:43,470 --> 02:24:49,520
And another is the William
Gladstone school number 18.
2793
02:24:49,520 --> 02:24:51,470
William Gladstone
school number 18.
2794
02:24:51,470 --> 02:24:52,900
Where is that?
2795
02:24:52,900 --> 02:24:55,380
That is in Sofia, Bulgaria.
2796
02:24:55,380 --> 02:24:56,470
Again.
2797
02:24:56,470 --> 02:24:59,000
All right, so that's a
particular school in Bulgaria
2798
02:24:59,000 --> 02:25:02,720
named after William Gladstone.
2799
02:25:02,720 --> 02:25:07,220
And finally, the third
result is, of course, our pal
2800
02:25:07,220 --> 02:25:09,800
Gladstone the Cheif Mouser.
2801
02:25:09,800 --> 02:25:12,674
If I click through,
that's the cat.
2802
02:25:12,674 --> 02:25:14,090
All right, so that
was an example.
2803
02:25:14,090 --> 02:25:15,700
I mean, you saw how easy it was.
2804
02:25:15,700 --> 02:25:18,980
I just named the property and
the value that I care about,
2805
02:25:18,980 --> 02:25:21,420
and I get the results.
2806
02:25:21,420 --> 02:25:23,289
Again, I mean, it's
kind of a silly example,
2807
02:25:23,289 --> 02:25:24,080
but think about it.
2808
02:25:24,080 --> 02:25:27,570
This is-- how else can
you answer that question?
2809
02:25:27,570 --> 02:25:30,470
There's no reference desk,
even at a great University
2810
02:25:30,470 --> 02:25:34,250
of Oxford, where you can
walk in and say, give me
2811
02:25:34,250 --> 02:25:37,470
a list of things
named after Gladstone.
2812
02:25:37,470 --> 02:25:40,590
There's no easy way to
answer that unless you happen
2813
02:25:40,590 --> 02:25:44,520
to have a very large
structured and linked
2814
02:25:44,520 --> 02:25:48,130
data store, like Wikidata.
2815
02:25:48,130 --> 02:25:50,560
All right, so that
was a silly example.
2816
02:25:50,560 --> 02:25:51,280
Let's take some--
2817
02:25:51,280 --> 02:25:53,113
AUDIENCE: There's a
bunch of stuff on there.
2818
02:25:53,113 --> 02:25:54,446
ASAF: Oh, OK.
2819
02:25:54,446 --> 02:25:57,430
AUDIENCE: Can you show
easy query on the video?
2820
02:25:57,430 --> 02:26:02,260
And somebody needs to know
how to just do property
2821
02:26:02,260 --> 02:26:05,750
exists without giving
a specific value.
2822
02:26:05,750 --> 02:26:11,030
And then once you show easy
query you reload the page and--
2823
02:26:11,030 --> 02:26:13,240
ASAF: I don't know easy query.
2824
02:26:13,240 --> 02:26:15,670
So is that a gadget?
2825
02:26:15,670 --> 02:26:17,110
I don't know what easy query is.
2826
02:26:17,110 --> 02:26:19,870
I don't use it.
2827
02:26:19,870 --> 02:26:24,760
So someone can maybe
send a link or something?
2828
02:26:24,760 --> 02:26:26,100
Oh it is a gadget.
2829
02:26:26,100 --> 02:26:27,100
I don't have it enabled.
2830
02:26:27,100 --> 02:26:31,610
2831
02:26:31,610 --> 02:26:32,480
That is nice.
2832
02:26:32,480 --> 02:26:42,080
So now, what I just did by hand,
by formulating the query named
2833
02:26:42,080 --> 02:26:45,200
after Gladstone--
2834
02:26:45,200 --> 02:26:48,390
I guess this is the--
2835
02:26:48,390 --> 02:26:48,960
Is it?
2836
02:26:48,960 --> 02:26:53,000
2837
02:26:53,000 --> 02:26:53,720
Yeah.
2838
02:26:53,720 --> 02:26:56,050
So this-- I just
clicked the three--
2839
02:26:56,050 --> 02:26:57,470
the ellipsis here.
2840
02:26:57,470 --> 02:26:58,460
Right after the name.
2841
02:26:58,460 --> 02:26:59,630
You see this?
2842
02:26:59,630 --> 02:27:03,050
This was just added by
enabling easy query,
2843
02:27:03,050 --> 02:27:04,640
which I just learned about.
2844
02:27:04,640 --> 02:27:07,640
So you just click this
and it auto-magically
2845
02:27:07,640 --> 02:27:09,620
made this kind of trivial query.
2846
02:27:09,620 --> 02:27:12,380
Of course, if I want a more
complicated query like,
2847
02:27:12,380 --> 02:27:14,510
I don't know, give me
all the things that
2848
02:27:14,510 --> 02:27:18,110
are named after Lincoln
but are a school,
2849
02:27:18,110 --> 02:27:21,650
I will still need to kind
of edit a custom query.
2850
02:27:21,650 --> 02:27:23,450
But this is a super
easy and very nice
2851
02:27:23,450 --> 02:27:28,620
way of just doing a very super
quick query for exactly this.
2852
02:27:28,620 --> 02:27:29,120
Right?
2853
02:27:29,120 --> 02:27:33,410
Like. what other items have
exactly this property and value
2854
02:27:33,410 --> 02:27:35,720
named after William Gladstone?
2855
02:27:35,720 --> 02:27:38,750
So, thank you to whoever
made this suggestion
2856
02:27:38,750 --> 02:27:42,140
to demonstrate that, and
I'm glad I learned something
2857
02:27:42,140 --> 02:27:45,230
too today.
2858
02:27:45,230 --> 02:27:48,590
Let's move to
another sample query.
2859
02:27:48,590 --> 02:27:50,360
Here's a fun example.
2860
02:27:50,360 --> 02:27:56,910
Popular surnames among
fictional characters.
2861
02:27:56,910 --> 02:27:58,650
Think about that for a second.
2862
02:27:58,650 --> 02:28:03,030
Popular surnames among
fictional characters.
2863
02:28:03,030 --> 02:28:06,510
So we're asking Wikidata
to go through all
2864
02:28:06,510 --> 02:28:10,120
the fictional
characters you know,
2865
02:28:10,120 --> 02:28:13,510
and of those look through
their surnames, group
2866
02:28:13,510 --> 02:28:15,910
them so that you can count
them, the repetitions
2867
02:28:15,910 --> 02:28:18,460
of the surnames,
and give me the most
2868
02:28:18,460 --> 02:28:21,550
popular surnames among them.
2869
02:28:21,550 --> 02:28:26,280
Additionally, I want you to
awesomely present the results
2870
02:28:26,280 --> 02:28:28,020
as a bubble chart.
2871
02:28:28,020 --> 02:28:29,220
Oh, yeah.
2872
02:28:29,220 --> 02:28:31,050
Wikidata can do that.
2873
02:28:31,050 --> 02:28:34,420
And I run the query.
2874
02:28:34,420 --> 02:28:36,750
And check it out.
2875
02:28:36,750 --> 02:28:41,130
The most popular names
among fictional characters
2876
02:28:41,130 --> 02:28:45,780
we can say that knows about are
Joan, Smith, Taylor, et cetera.
2877
02:28:45,780 --> 02:28:48,450
I mean for all we know,
the most popular name
2878
02:28:48,450 --> 02:28:50,770
among fictional characters
actually in the world
2879
02:28:50,770 --> 02:28:52,350
may be Wu.
2880
02:28:52,350 --> 02:28:54,790
Or something in Chinese
for all we know.
2881
02:28:54,790 --> 02:28:57,930
But if that has not been
modeled in Wikidata,
2882
02:28:57,930 --> 02:29:01,020
we're not going to get that.
2883
02:29:01,020 --> 02:29:03,540
So Taylor, Smith,
Jones, Williams,
2884
02:29:03,540 --> 02:29:06,870
seem to be the
most popular names.
2885
02:29:06,870 --> 02:29:08,400
And again, I could limit this.
2886
02:29:08,400 --> 02:29:11,520
I could make the
same query but add,
2887
02:29:11,520 --> 02:29:14,250
only among works whose
original language
2888
02:29:14,250 --> 02:29:19,020
was Italian, for example, to get
more interesting results if I
2889
02:29:19,020 --> 02:29:21,480
only care about
Italian literature.
2890
02:29:21,480 --> 02:29:24,720
But this is an example of
how I got awesome bubble
2891
02:29:24,720 --> 02:29:28,170
charts for free, and
I can just plug this
2892
02:29:28,170 --> 02:29:30,900
into an awesome
presentation that I make.
2893
02:29:30,900 --> 02:29:34,500
Of course I can still
look at the raw table.
2894
02:29:34,500 --> 02:29:37,940
So the query still resulted
in a bunch of data, right?
2895
02:29:37,940 --> 02:29:42,480
So Smith repeats 41 times,
Jones 38 times, Taylor 34 times,
2896
02:29:42,480 --> 02:29:43,750
et cetera, et cetera.
2897
02:29:43,750 --> 02:29:48,960
And down that list.
2898
02:29:48,960 --> 02:29:52,320
And I could, again, I could
export this into a file
2899
02:29:52,320 --> 02:29:56,100
and load it up in a spreadsheet,
and do additional processing
2900
02:29:56,100 --> 02:29:56,670
on it.
2901
02:29:56,670 --> 02:29:58,560
I can link to it.
2902
02:29:58,560 --> 02:30:02,530
I can do all kinds of
awesome things with it.
2903
02:30:02,530 --> 02:30:05,250
So that's another awesome query.
2904
02:30:05,250 --> 02:30:08,460
We don't have to go into
every line by line analysis
2905
02:30:08,460 --> 02:30:11,670
here of why this
works the way it does.
2906
02:30:11,670 --> 02:30:15,840
I want to show you some
other queries first.
2907
02:30:15,840 --> 02:30:22,470
Let's look at-- this is just
fun, overall causes of death.
2908
02:30:22,470 --> 02:30:24,870
Again a bubble
chart just looking
2909
02:30:24,870 --> 02:30:28,260
at people who died
of things, and have
2910
02:30:28,260 --> 02:30:30,760
a cause of death listed.
2911
02:30:30,760 --> 02:30:34,380
And we learn that the most
commonly listed cause of death
2912
02:30:34,380 --> 02:30:40,350
is myocardial infarction,
pneumonitis, cerebral vascular,
2913
02:30:40,350 --> 02:30:42,620
lung cancer, et
cetera, et cetera.
2914
02:30:42,620 --> 02:30:44,850
And again, in a bubble chart.
2915
02:30:44,850 --> 02:30:49,670
And so how does that work?
2916
02:30:49,670 --> 02:30:53,050
So just very briefly, the
important parts of this query
2917
02:30:53,050 --> 02:30:59,150
are I'm looking for something,
for some person, who
2918
02:30:59,150 --> 02:31:04,240
is instance of 31, instance
of Q5, which is human.
2919
02:31:04,240 --> 02:31:05,390
So a human.
2920
02:31:05,390 --> 02:31:07,130
Again, just to kind
of limit the query.
2921
02:31:07,130 --> 02:31:11,330
I'm not interested in
books or mountains.
2922
02:31:11,330 --> 02:31:14,420
I'm looking for humans
who have that same person,
2923
02:31:14,420 --> 02:31:21,150
that same variable PID,
should have a 509, meaning--
2924
02:31:21,150 --> 02:31:22,412
Hello.
2925
02:31:22,412 --> 02:31:24,620
Why don't I have the--
2926
02:31:24,620 --> 02:31:25,120
Yeah.
2927
02:31:25,120 --> 02:31:28,480
A 509, which is cause of death.
2928
02:31:28,480 --> 02:31:31,540
And that cause of death
is another variable,
2929
02:31:31,540 --> 02:31:32,930
that I'm calling CID.
2930
02:31:32,930 --> 02:31:35,410
Now, previously
we were saying you
2931
02:31:35,410 --> 02:31:36,850
know I want things
that are named
2932
02:31:36,850 --> 02:31:39,550
after Gladstone specifically.
2933
02:31:39,550 --> 02:31:42,000
Only things that have
that particular value.
2934
02:31:42,000 --> 02:31:44,320
Here I'm saying I'm
looking for things
2935
02:31:44,320 --> 02:31:47,110
that have some cause of death.
2936
02:31:47,110 --> 02:31:48,760
Not a specific one.
2937
02:31:48,760 --> 02:31:50,260
I just wanted to
get everything that
2938
02:31:50,260 --> 02:31:54,880
has a statement with some
value about property 509
2939
02:31:54,880 --> 02:31:56,530
cause of death.
2940
02:31:56,530 --> 02:31:57,940
OK?
2941
02:31:57,940 --> 02:32:04,410
And then this other bit of
magic here, the group by,
2942
02:32:04,410 --> 02:32:07,870
tells Wikidata I'm not
actually interested
2943
02:32:07,870 --> 02:32:09,100
in every individual thing.
2944
02:32:09,100 --> 02:32:12,310
I want you to group those
causes, and then count them
2945
02:32:12,310 --> 02:32:14,230
and give me the top ones.
2946
02:32:14,230 --> 02:32:15,523
So that's how this query works.
2947
02:32:15,523 --> 02:32:20,550
2948
02:32:20,550 --> 02:32:22,320
Here's that query I promised.
2949
02:32:22,320 --> 02:32:26,460
Painters whose fathers
were also painters.
2950
02:32:26,460 --> 02:32:28,630
I can only think of a couple.
2951
02:32:28,630 --> 02:32:31,890
I mean, Monet and Vogel.
2952
02:32:31,890 --> 02:32:34,800
But I'm sure Wikidata
knows many more.
2953
02:32:34,800 --> 02:32:38,620
So let's run this query.
2954
02:32:38,620 --> 02:32:40,270
And I have 100 results.
2955
02:32:40,270 --> 02:32:43,120
By the way, I have limited
it to 100 results just
2956
02:32:43,120 --> 02:32:44,650
to keep it kind of snappy.
2957
02:32:44,650 --> 02:32:47,530
But actually, we could
maybe try removing the limit
2958
02:32:47,530 --> 02:32:50,170
and see if Wikidata
could tell us
2959
02:32:50,170 --> 02:32:53,890
the total number in Wikidata.
2960
02:32:53,890 --> 02:32:55,120
Yeah, that wasn't too bad.
2961
02:32:55,120 --> 02:32:58,400
So 1,270 results.
2962
02:32:58,400 --> 02:32:59,140
OK.
2963
02:32:59,140 --> 02:33:04,150
Wikidata, already at this
early date and it's progress,
2964
02:33:04,150 --> 02:33:07,540
already knows about
more than 1,200 painters
2965
02:33:07,540 --> 02:33:10,980
who are sons of painters.
2966
02:33:10,980 --> 02:33:16,140
Sons of male painters, like
their father is a painter.
2967
02:33:16,140 --> 02:33:18,120
There may be
additional painters who
2968
02:33:18,120 --> 02:33:21,390
are sons of female painters
not included in this query.
2969
02:33:21,390 --> 02:33:24,990
Again, always remember what
exactly you are asking.
2970
02:33:24,990 --> 02:33:27,840
In this query I was
asking about the father.
2971
02:33:27,840 --> 02:33:30,330
I'm leaving out any
possible painters who
2972
02:33:30,330 --> 02:33:32,720
are sons of mother painters.
2973
02:33:32,720 --> 02:33:33,390
OK?
2974
02:33:33,390 --> 02:33:35,250
So how does this work?
2975
02:33:35,250 --> 02:33:39,630
I'm asking for the painter
along with the human label,
2976
02:33:39,630 --> 02:33:42,630
and the father along
with the human label.
2977
02:33:42,630 --> 02:33:47,610
So Michel Monet is the
son of Claude Monet.
2978
02:33:47,610 --> 02:33:54,180
And Domenico Tintoretto is the
son of the famous Tintoretto
2979
02:33:54,180 --> 02:33:57,210
whose label, you know, is just
Tintoretto like Michelangelo.
2980
02:33:57,210 --> 02:33:59,960
You know, you don't always
have to have the full name
2981
02:33:59,960 --> 02:34:02,420
in the common label.
2982
02:34:02,420 --> 02:34:07,010
Paloma Picasso is the
daughter of Pablo Picasso.
2983
02:34:07,010 --> 02:34:07,510
OK.
2984
02:34:07,510 --> 02:34:11,040
So Wikidata knows about
all these results.
2985
02:34:11,040 --> 02:34:14,610
Of course Holbein the Younger
son of Holbein the Elder.
2986
02:34:14,610 --> 02:34:15,760
And how did we get there?
2987
02:34:15,760 --> 02:34:20,860
Well we asked Wikidata
to look for something,
2988
02:34:20,860 --> 02:34:26,820
let's call it painter, which
has 106, which is occupation,
2989
02:34:26,820 --> 02:34:31,100
with a value painter.
2990
02:34:31,100 --> 02:34:31,600
Right?
2991
02:34:31,600 --> 02:34:35,310
This unwieldy number
1028181, that's painter.
2992
02:34:35,310 --> 02:34:40,250
So I'm asking for any item
that has occupation painter.
2993
02:34:40,250 --> 02:34:43,300
And let's call
that item painter.
2994
02:34:43,300 --> 02:34:49,770
I also want that painter to have
a property 22, which is father.
2995
02:34:49,770 --> 02:34:50,850
OK.
2996
02:34:50,850 --> 02:34:52,350
Father.
2997
02:34:52,350 --> 02:34:55,140
And I want it to
have some value.
2998
02:34:55,140 --> 02:34:58,770
OK, I'm putting it into
another variable called father.
2999
02:34:58,770 --> 02:35:01,320
I could have called
it, you know, frog.
3000
02:35:01,320 --> 02:35:04,230
That doesn't change
anything, just to be clear.
3001
02:35:04,230 --> 02:35:06,630
What matters is that this
is the property father.
3002
02:35:06,630 --> 02:35:10,320
I could have called
it anything I want.
3003
02:35:10,320 --> 02:35:13,590
So, and then, I have
a third condition.
3004
02:35:13,590 --> 02:35:18,010
That the father, like whatever
it says here in property 22,
3005
02:35:18,010 --> 02:35:22,590
I want that father to have
himself a property 106
3006
02:35:22,590 --> 02:35:27,750
occupation with a value painter.
3007
02:35:27,750 --> 02:35:28,730
OK?
3008
02:35:28,730 --> 02:35:30,800
These conditions
combined to give me
3009
02:35:30,800 --> 02:35:36,080
a list of people who have
a father and that father
3010
02:35:36,080 --> 02:35:37,850
has occupation painter as well.
3011
02:35:37,850 --> 02:35:40,550
Of course, if I suddenly,
or if you suddenly,
3012
02:35:40,550 --> 02:35:44,480
are consumed by
curiosity to know
3013
02:35:44,480 --> 02:35:51,344
who are some politicians
who are sons of carpenters?
3014
02:35:51,344 --> 02:35:52,760
You could just
change that, right?
3015
02:35:52,760 --> 02:35:56,700
Change the first value
from painter to politician.
3016
02:35:56,700 --> 02:36:02,624
Change the third line's value
from painter to carpenter.
3017
02:36:02,624 --> 02:36:04,040
Maybe that list
will be very short
3018
02:36:04,040 --> 02:36:06,680
because carpenters don't
tend to be notable,
3019
02:36:06,680 --> 02:36:08,910
so they wouldn't be
represented on Wikidata.
3020
02:36:08,910 --> 02:36:11,990
That's why this works relatively
well with painters, right?
3021
02:36:11,990 --> 02:36:14,420
Because most of
them are notable.
3022
02:36:14,420 --> 02:36:16,370
But generally you
could do that, right?
3023
02:36:16,370 --> 02:36:18,500
That's an example of
how you can take a query
3024
02:36:18,500 --> 02:36:22,340
and just replace one of those
values, or even the language.
3025
02:36:22,340 --> 02:36:26,840
So again, I could ask
for these same painters.
3026
02:36:26,840 --> 02:36:27,650
It's limited again.
3027
02:36:27,650 --> 02:36:31,190
These same painters,
but with Arabic labels.
3028
02:36:31,190 --> 02:36:34,880
Same query, but I have Arabic
labels for these painters.
3029
02:36:34,880 --> 02:36:37,250
And of course where
there is no Arabic label
3030
02:36:37,250 --> 02:36:40,360
I get the queue number.
3031
02:36:40,360 --> 02:36:40,860
OK?
3032
02:36:40,860 --> 02:36:43,650
So that's that query
that I promised you,
3033
02:36:43,650 --> 02:36:47,670
painters who sons of painters
can be done by Wikidata
3034
02:36:47,670 --> 02:36:49,830
in under one second.
3035
02:36:49,830 --> 02:36:51,480
How awesome is that?
3036
02:36:51,480 --> 02:36:52,950
We can also get some statistics.
3037
02:36:52,950 --> 02:36:55,920
So how about counting
total articles
3038
02:36:55,920 --> 02:36:59,740
in a given wiki by gender.
3039
02:36:59,740 --> 02:37:02,070
This is what we call
the content gender
3040
02:37:02,070 --> 02:37:06,900
gap, as distinct from the
participation gender gap.
3041
02:37:06,900 --> 02:37:10,276
This is the gender gap in
what we cover on Wikipedia.
3042
02:37:10,276 --> 02:37:11,400
So let's take one of these.
3043
02:37:11,400 --> 02:37:16,380
3044
02:37:16,380 --> 02:37:17,630
So this is a query.
3045
02:37:17,630 --> 02:37:23,130
Articles about women in
some given Wikipedia.
3046
02:37:23,130 --> 02:37:23,660
All right.
3047
02:37:23,660 --> 02:37:25,799
So let's take--
3048
02:37:25,799 --> 02:37:26,340
I don't know.
3049
02:37:26,340 --> 02:37:30,240
Let's take the Tamil Wikipedia.
3050
02:37:30,240 --> 02:37:32,460
That's language code TA.
3051
02:37:32,460 --> 02:37:34,950
So I just put TA here.
3052
02:37:34,950 --> 02:37:38,850
And I click Run, and
I get this count.
3053
02:37:38,850 --> 02:37:39,960
That's all I wanted.
3054
02:37:39,960 --> 02:37:41,720
I'm not actually
interested in the items,
3055
02:37:41,720 --> 02:37:44,962
like in the list of women
on the Tamil Wikipedia.
3056
02:37:44,962 --> 02:37:45,920
I just want the number.
3057
02:37:45,920 --> 02:37:48,510
So I selected the count here.
3058
02:37:48,510 --> 02:37:52,610
And this number
turns out to be 2159.
3059
02:37:52,610 --> 02:37:57,300
So there are 2000
articles about women
3060
02:37:57,300 --> 02:38:02,350
the Tamil Wikipedia that
Wikidata knows to be female.
3061
02:38:02,350 --> 02:38:02,850
Right?
3062
02:38:02,850 --> 02:38:05,730
I'm asking about the gender
field, property 21 again.
3063
02:38:05,730 --> 02:38:08,900
Remember, if there's some
article about a woman in Tamil
3064
02:38:08,900 --> 02:38:12,090
Wikipedia, but wiki
data doesn't have
3065
02:38:12,090 --> 02:38:14,460
a statement about the
gender, that person
3066
02:38:14,460 --> 02:38:15,640
will not be counted here.
3067
02:38:15,640 --> 02:38:18,240
So again, be careful
about kind of stating
3068
02:38:18,240 --> 02:38:22,800
that is exactly the number
of women articles on Tamil
3069
02:38:22,800 --> 02:38:23,340
Wikipedia.
3070
02:38:23,340 --> 02:38:24,600
That's probably not true.
3071
02:38:24,600 --> 02:38:27,560
I'm sure some of those
articles are missing
3072
02:38:27,560 --> 02:38:30,740
a sex or gender or property.
3073
02:38:30,740 --> 02:38:33,150
But for raw statistics,
that's probably good,
3074
02:38:33,150 --> 02:38:35,700
because some men are also
missing the sex or gender
3075
02:38:35,700 --> 02:38:37,620
statistic property.
3076
02:38:37,620 --> 02:38:41,820
So we could take the
same query for men.
3077
02:38:41,820 --> 02:38:43,170
It's essentially the exact same.
3078
02:38:43,170 --> 02:38:48,840
It just has this unwieldy
number for males, 6581097.
3079
02:38:48,840 --> 02:38:52,710
I can change this language
code again to TA for Tamil.
3080
02:38:52,710 --> 02:38:58,880
And how many men are covered
on Tamil Wikipedia 14,649.
3081
02:38:58,880 --> 02:38:59,610
OK.
3082
02:38:59,610 --> 02:39:06,880
So women, 2,100, men,
about seven times as many.
3083
02:39:06,880 --> 02:39:07,380
Right?
3084
02:39:07,380 --> 02:39:12,300
So that's the approximate
size of the content gender
3085
02:39:12,300 --> 02:39:14,610
gap on Tamil Wikipedia.
3086
02:39:14,610 --> 02:39:18,850
And again, I can complicate
this query as much as I want.
3087
02:39:18,850 --> 02:39:21,390
For example, I can
try and find out
3088
02:39:21,390 --> 02:39:30,390
if this gender gap is wider
or narrower among musicians,
3089
02:39:30,390 --> 02:39:31,350
just as an example.
3090
02:39:31,350 --> 02:39:35,850
I could just add a line here
that says occupation musician,
3091
02:39:35,850 --> 02:39:37,890
and then I'm only
counting articles
3092
02:39:37,890 --> 02:39:41,190
on Tamil Wikipedia about
musicians who are female
3093
02:39:41,190 --> 02:39:43,190
versus articles
on Tamil Wikipedia
3094
02:39:43,190 --> 02:39:45,030
about musicians who are male.
3095
02:39:45,030 --> 02:39:47,890
And I can kind of
compare the gender--
3096
02:39:47,890 --> 02:39:53,820
the content gender gap across
occupations on Tamil Wikipedia.
3097
02:39:53,820 --> 02:39:56,030
Do you see the
important point here?
3098
02:39:56,030 --> 02:39:58,490
Is that this is not just
kind of a one purpose query.
3099
02:39:58,490 --> 02:40:01,250
I can just with a single
additional conditional suddenly
3100
02:40:01,250 --> 02:40:04,370
make it a much more interesting
query, because I break it down
3101
02:40:04,370 --> 02:40:05,540
by occupation.
3102
02:40:05,540 --> 02:40:07,810
Or I break it down by century.
3103
02:40:07,810 --> 02:40:12,530
Do we have more of the coverage
gap in 19th century people
3104
02:40:12,530 --> 02:40:13,940
than in 21st century people?
3105
02:40:13,940 --> 02:40:15,560
I mean, I sure hope so, right?
3106
02:40:15,560 --> 02:40:18,480
The patriarchy is
weakening somewhat.
3107
02:40:18,480 --> 02:40:21,830
So I wouldn't be surprised if
there are many more notable men
3108
02:40:21,830 --> 02:40:23,430
covered about the 19th century.
3109
02:40:23,430 --> 02:40:25,784
But if we are also covering--
3110
02:40:25,784 --> 02:40:27,200
I mean it's the
gender gap is just
3111
02:40:27,200 --> 02:40:29,540
as wide for 21st century
people, that would
3112
02:40:29,540 --> 02:40:30,800
be a little disappointing.
3113
02:40:30,800 --> 02:40:35,870
Again that's something I
can fairly easily find out
3114
02:40:35,870 --> 02:40:38,980
on Wikidata query.
3115
02:40:38,980 --> 02:40:41,500
Any questions so far, or
are you just sharing links?
3116
02:40:41,500 --> 02:40:43,160
AUDIENCE: Yep there is one.
3117
02:40:43,160 --> 02:40:47,480
So somebody is wondering if you
can demonstrate, or at least
3118
02:40:47,480 --> 02:40:50,420
give a short answer of the
latter of this question.
3119
02:40:50,420 --> 02:40:52,530
Is it possible using
in Wikidata SPARQL
3120
02:40:52,530 --> 02:40:55,520
to find specific
Wikidata articles, e.g.
3121
02:40:55,520 --> 02:40:59,060
featured articles, of a
certain language which do not
3122
02:40:59,060 --> 02:41:01,160
exist in another language.
3123
02:41:01,160 --> 02:41:03,770
I know it is possible
to find category based
3124
02:41:03,770 --> 02:41:05,820
results using a PET scan tool.
3125
02:41:05,820 --> 02:41:09,110
But can we specify
that by selecting e.g.
3126
02:41:09,110 --> 02:41:10,055
featured articles?
3127
02:41:10,055 --> 02:41:11,390
ASAF BARTOV: Yes.
3128
02:41:11,390 --> 02:41:12,600
Excellent question.
3129
02:41:12,600 --> 02:41:14,120
It is possible, indeed.
3130
02:41:14,120 --> 02:41:17,570
And I will demonstrate
one such query.
3131
02:41:17,570 --> 02:41:19,190
Another query that
I already mentioned
3132
02:41:19,190 --> 02:41:24,840
largest cities in the
world with a female mayor.
3133
02:41:24,840 --> 02:41:29,190
This query-- let's
close some of these tabs
3134
02:41:29,190 --> 02:41:30,315
before my browser chokes.
3135
02:41:30,315 --> 02:41:33,600
3136
02:41:33,600 --> 02:41:36,840
So this query lists
the major world cities
3137
02:41:36,840 --> 02:41:39,120
run by women currently.
3138
02:41:39,120 --> 02:41:45,650
And the answer is Mumbai, Mexico
City, Tokyo, bunch of others.
3139
02:41:45,650 --> 02:41:49,470
3140
02:41:49,470 --> 02:41:52,371
And wait-- that's not it at all.
3141
02:41:52,371 --> 02:41:53,370
I clicked the wrong one.
3142
02:41:53,370 --> 02:41:55,050
That's the map of paintings.
3143
02:41:55,050 --> 02:41:55,800
OK.
3144
02:41:55,800 --> 02:41:57,370
Let's demonstrate
that for a second.
3145
02:41:57,370 --> 02:41:59,520
So this is the map
of all paintings
3146
02:41:59,520 --> 02:42:03,870
for which we know a location
with the count per location.
3147
02:42:03,870 --> 02:42:07,770
And the results are
awesomely presented on a map.
3148
02:42:07,770 --> 02:42:08,830
OK.
3149
02:42:08,830 --> 02:42:12,420
Again, under the hood this is
a table, of course, of results.
3150
02:42:12,420 --> 02:42:15,660
But, awesomely, I can
browse it as a map.
3151
02:42:15,660 --> 02:42:20,320
So here is a map of the
world with all the paintings
3152
02:42:20,320 --> 02:42:22,060
that Wikidata knows about.
3153
02:42:22,060 --> 02:42:23,920
Not just knows
about the paintings,
3154
02:42:23,920 --> 02:42:28,180
but knows about their
location in a museum.
3155
02:42:28,180 --> 02:42:30,670
Not surprisingly
Europe is much better
3156
02:42:30,670 --> 02:42:35,540
covered than Russia or Africa.
3157
02:42:35,540 --> 02:42:40,150
There is a huge gap in
contribution to Wikidata
3158
02:42:40,150 --> 02:42:41,740
from these countries.
3159
02:42:41,740 --> 02:42:43,780
And some of it can be fixed.
3160
02:42:43,780 --> 02:42:47,740
And of course there is much more
documentation, and much more
3161
02:42:47,740 --> 02:42:50,260
art in Europe.
3162
02:42:50,260 --> 02:42:54,280
But if we zoom in, I
don't know, Rome probably
3163
02:42:54,280 --> 02:42:55,900
has a few paintings.
3164
02:42:55,900 --> 02:42:56,400
Right?
3165
02:42:56,400 --> 02:43:00,080
3166
02:43:00,080 --> 02:43:02,288
Hello.
3167
02:43:02,288 --> 02:43:04,200
Sorry.
3168
02:43:04,200 --> 02:43:09,780
It's-- Yes.
3169
02:43:09,780 --> 02:43:13,290
Vatican City sounds
like a good bet, right?
3170
02:43:13,290 --> 02:43:14,290
I can zoom in here.
3171
02:43:14,290 --> 02:43:16,290
And I can just click
one of these dots
3172
02:43:16,290 --> 02:43:21,400
and see in this point
there are two paintings.
3173
02:43:21,400 --> 02:43:25,270
And in this one there is one
and it's the Archbasilica
3174
02:43:25,270 --> 02:43:27,460
of St. John Lateran.
3175
02:43:27,460 --> 02:43:31,060
Let's see, this is the
actual St. Peter, right?
3176
02:43:31,060 --> 02:43:33,650
Sistine Chapel has 23 paintings.
3177
02:43:33,650 --> 02:43:34,330
What?
3178
02:43:34,330 --> 02:43:36,670
The Sistine Chapel has way
more than 23 paintings.
3179
02:43:36,670 --> 02:43:40,330
Correct, but 23 of them
are documented on Wikidata.
3180
02:43:40,330 --> 02:43:43,330
Have their own item
for the painting, not
3181
02:43:43,330 --> 02:43:45,280
the Sistine Chapel,
the painting has
3182
02:43:45,280 --> 02:43:49,540
an item that lists its
being in the Sistine Chapel.
3183
02:43:49,540 --> 02:43:50,950
There are 23 of those.
3184
02:43:50,950 --> 02:43:52,270
OK.
3185
02:43:52,270 --> 02:43:54,310
There is definitely
room to document
3186
02:43:54,310 --> 02:43:57,040
the rest of the artworks
in the Sistine Chapel.
3187
02:43:57,040 --> 02:43:59,740
So, again, this is just
not the kind of query
3188
02:43:59,740 --> 02:44:03,330
you were able to
make before Wikidata,
3189
02:44:03,330 --> 02:44:07,750
and it's a fairly simple
query, as you can see.
3190
02:44:07,750 --> 02:44:13,020
There are examples using
maps like airports within 100
3191
02:44:13,020 --> 02:44:15,040
kilometers of Berlin.
3192
02:44:15,040 --> 02:44:18,310
Again using the coordinates
as a useful data point.
3193
02:44:18,310 --> 02:44:21,880
And here is a map showing me
only airports within a 100
3194
02:44:21,880 --> 02:44:25,990
kilometer radius from Berlin.
3195
02:44:25,990 --> 02:44:29,140
But I wanted to show
you the mayors query.
3196
02:44:29,140 --> 02:44:34,510
Let's click the-- oh I just
have the wrong link here.
3197
02:44:34,510 --> 02:44:41,040
But I can still find it
here by typing mayor.
3198
02:44:41,040 --> 02:44:44,590
Here we go, largest
cities with female mayor.
3199
02:44:44,590 --> 02:44:47,230
So this is a slightly
more complicated query.
3200
02:44:47,230 --> 02:44:53,010
But if I run it, I get the top
10, because I set limit to 10.
3201
02:44:53,010 --> 02:44:54,820
I get the top 10
cities in the world,
3202
02:44:54,820 --> 02:44:59,710
by population, size that
are currently run by women.
3203
02:44:59,710 --> 02:45:03,490
Tokyo, Mumbai, Yokohama,
Caracas, et cetera.
3204
02:45:03,490 --> 02:45:08,080
And one interesting thing that
you may want to notice here
3205
02:45:08,080 --> 02:45:10,690
is that I'm asking for cities.
3206
02:45:10,690 --> 02:45:13,660
I mean items, that
are instance of city.
3207
02:45:13,660 --> 02:45:16,420
And that have a
head of government,
3208
02:45:16,420 --> 02:45:18,640
that have some
statement about who
3209
02:45:18,640 --> 02:45:28,440
is in charge, and that statement
has sex that's listed up here
3210
02:45:28,440 --> 02:45:29,886
as female.
3211
02:45:29,886 --> 02:45:31,510
Don't worry about
the syntax right now.
3212
02:45:31,510 --> 02:45:34,590
I just want to show you
some specific angle here.
3213
02:45:34,590 --> 02:45:37,920
And I'm further
filtering these results.
3214
02:45:37,920 --> 02:45:45,400
I only want those items where
there is not the property
3215
02:45:45,400 --> 02:45:48,630
and the qualifier, end time.
3216
02:45:48,630 --> 02:45:50,390
Why is that important?
3217
02:45:50,390 --> 02:45:56,530
Because if a city once
had a female mayor,
3218
02:45:56,530 --> 02:45:59,890
but that mayor is not the mayor
anymore, because mayors change,
3219
02:45:59,890 --> 02:46:01,600
I don't want them in this query.
3220
02:46:01,600 --> 02:46:04,990
I want to query of
cities currently having
3221
02:46:04,990 --> 02:46:05,680
a female mayor.
3222
02:46:05,680 --> 02:46:07,990
And of course Wikidata
may have historical data
3223
02:46:07,990 --> 02:46:09,880
with start and
end time, as we've
3224
02:46:09,880 --> 02:46:14,530
seen, that documents this
person was the mayor of Tokyo
3225
02:46:14,530 --> 02:46:17,170
or San Francisco
between these years.
3226
02:46:17,170 --> 02:46:18,820
But if there is no
end times that means
3227
02:46:18,820 --> 02:46:21,520
they are currently the mayor.
3228
02:46:21,520 --> 02:46:24,490
So that's an example of
asking about a qualifier
3229
02:46:24,490 --> 02:46:28,180
of a statement, to again, to get
the results we actually want.
3230
02:46:28,180 --> 02:46:31,630
If we want current mayors it's
important to put this filter.
3231
02:46:31,630 --> 02:46:35,365
If we don't, we will get
historical female mayors
3232
02:46:35,365 --> 02:46:35,865
as well.
3233
02:46:35,865 --> 02:46:39,920
3234
02:46:39,920 --> 02:46:40,490
All right.
3235
02:46:40,490 --> 02:46:45,380
So these are some
example queries.
3236
02:46:45,380 --> 02:46:49,085
Questions about that?
3237
02:46:49,085 --> 02:46:51,620
3238
02:46:51,620 --> 02:46:53,030
Oh, the featured
article example.
3239
02:46:53,030 --> 02:46:58,280
3240
02:46:58,280 --> 02:47:01,700
So let's look at that.
3241
02:47:01,700 --> 02:47:07,050
3242
02:47:07,050 --> 02:47:12,660
So I have prepared
such a query recently.
3243
02:47:12,660 --> 02:47:15,300
Here we go.
3244
02:47:15,300 --> 02:47:18,570
So this is a query.
3245
02:47:18,570 --> 02:47:20,472
I just saved it here
on my user page.
3246
02:47:20,472 --> 02:47:21,930
I mean, this is
not Wikidata query.
3247
02:47:21,930 --> 02:47:25,390
This is just a meta page
containing the query usefully.
3248
02:47:25,390 --> 02:47:28,260
3249
02:47:28,260 --> 02:47:33,800
And let's run this.
3250
02:47:33,800 --> 02:47:38,030
So this query, it's actually
not very complicated.
3251
02:47:38,030 --> 02:47:40,030
It's just has a long
list of countries,
3252
02:47:40,030 --> 02:47:42,170
because I'm asking
about African countries.
3253
02:47:42,170 --> 02:47:42,670
OK.
3254
02:47:42,670 --> 02:47:45,010
I'm looking for human
females from one
3255
02:47:45,010 --> 02:47:51,060
of these countries that
have an article in English.
3256
02:47:51,060 --> 02:47:53,010
That's what this line means.
3257
02:47:53,010 --> 02:47:55,620
But not in French.
3258
02:47:55,620 --> 02:47:57,570
That's what this part means.
3259
02:47:57,570 --> 02:47:59,170
OK.
3260
02:47:59,170 --> 02:48:01,720
This part, these
two lines together.
3261
02:48:01,720 --> 02:48:03,190
But not in French.
3262
02:48:03,190 --> 02:48:05,920
And this is what's
called a badge.
3263
02:48:05,920 --> 02:48:09,430
That's Wikidata's concept of
good and featured articles.
3264
02:48:09,430 --> 02:48:10,600
It's called a badge.
3265
02:48:10,600 --> 02:48:16,500
So I want them to have some
badge on English Wikipedia.
3266
02:48:16,500 --> 02:48:17,000
OK?
3267
02:48:17,000 --> 02:48:22,250
So again, this query is
asking for the top 100 women
3268
02:48:22,250 --> 02:48:26,150
from Africa who are documented
on English Wikipedia,
3269
02:48:26,150 --> 02:48:28,730
in a featured or
good article status.
3270
02:48:28,730 --> 02:48:30,660
But not on French Wikipedia.
3271
02:48:30,660 --> 02:48:33,270
So this is a query that's
a to-do query, right?
3272
02:48:33,270 --> 02:48:35,630
That's a query
for French editors
3273
02:48:35,630 --> 02:48:40,100
to consider what they might
usefully translate or create
3274
02:48:40,100 --> 02:48:41,180
in French.
3275
02:48:41,180 --> 02:48:48,860
And if we run this see
we have three results.
3276
02:48:48,860 --> 02:48:50,720
I mean, we have many
women from Africa
3277
02:48:50,720 --> 02:48:52,460
covered on English Wikipedia.
3278
02:48:52,460 --> 02:48:57,500
But only three articles
have featured or good status
3279
02:48:57,500 --> 02:49:03,460
among those that do not have
French Wikipedia coverage.
3280
02:49:03,460 --> 02:49:04,900
Let me rephrase that.
3281
02:49:04,900 --> 02:49:07,990
Among the English Wikipedia
articles about African women
3282
02:49:07,990 --> 02:49:11,170
that don't have a
French counterpart,
3283
02:49:11,170 --> 02:49:14,520
only three are featured or good.
3284
02:49:14,520 --> 02:49:16,960
OK?
3285
02:49:16,960 --> 02:49:17,640
Do you see this?
3286
02:49:17,640 --> 02:49:19,720
The badge is good article.
3287
02:49:19,720 --> 02:49:23,550
This little incantation
here is what allows
3288
02:49:23,550 --> 02:49:25,950
you to ask about the badge.
3289
02:49:25,950 --> 02:49:28,730
This here.
3290
02:49:28,730 --> 02:49:33,420
And, by the way, the slides
will be uploaded to commons.
3291
02:49:33,420 --> 02:49:38,708
And we will-- how shall we make
it available on the YouTube
3292
02:49:38,708 --> 02:49:39,710
thing as well?
3293
02:49:39,710 --> 02:49:42,730
3294
02:49:42,730 --> 02:49:43,230
No, no.
3295
02:49:43,230 --> 02:49:45,870
But, I mean, for people who
will later watch this video.
3296
02:49:45,870 --> 02:49:52,119
3297
02:49:52,119 --> 02:49:54,160
Oh yeah, we can add it to
the YouTube description
3298
02:49:54,160 --> 02:49:55,368
and the comments description.
3299
02:49:55,368 --> 02:49:58,090
So in the-- if you're
watching this video later,
3300
02:49:58,090 --> 02:50:00,820
in the description, we will
add a link to this query
3301
02:50:00,820 --> 02:50:01,480
specifically.
3302
02:50:01,480 --> 02:50:03,340
Because it's not in
the slides right now.
3303
02:50:03,340 --> 02:50:03,910
It will be.
3304
02:50:03,910 --> 02:50:06,622
3305
02:50:06,622 --> 02:50:07,980
OK.
3306
02:50:07,980 --> 02:50:10,260
So.
3307
02:50:10,260 --> 02:50:13,590
Questions so far?
3308
02:50:13,590 --> 02:50:14,700
We're almost done.
3309
02:50:14,700 --> 02:50:16,260
We have a few minutes left.
3310
02:50:16,260 --> 02:50:18,090
So questions about queries?
3311
02:50:18,090 --> 02:50:20,130
I mean, I'm sure
there's tons of things
3312
02:50:20,130 --> 02:50:21,510
you don't know how to do yet.
3313
02:50:21,510 --> 02:50:24,720
And you maybe you didn't really
get the sense for SPARQL.
3314
02:50:24,720 --> 02:50:27,120
It's something you need
to really do on your own
3315
02:50:27,120 --> 02:50:28,290
on your computer.
3316
02:50:28,290 --> 02:50:29,465
See how it works.
3317
02:50:29,465 --> 02:50:30,090
Fiddle with it.
3318
02:50:30,090 --> 02:50:30,900
Change something.
3319
02:50:30,900 --> 02:50:33,270
See that it breaks
and complains.
3320
02:50:33,270 --> 02:50:37,470
But, very importantly-- oh I
had this in the other questions
3321
02:50:37,470 --> 02:50:38,340
slide.
3322
02:50:38,340 --> 02:50:42,480
Remember Wikidata project chat.
3323
02:50:42,480 --> 02:50:45,810
That's kind of the Wikidata
equivalent of the village pump.
3324
02:50:45,810 --> 02:50:47,790
It's the page on Wikidata
where you can just
3325
02:50:47,790 --> 02:50:49,830
show up and ask a question.
3326
02:50:49,830 --> 02:50:52,290
In my experience, the
Wikidata community
3327
02:50:52,290 --> 02:50:55,410
is very nice, very
welcoming, and very eager
3328
02:50:55,410 --> 02:51:00,100
to help newer people integrate
and learn how to do things.
3329
02:51:00,100 --> 02:51:01,800
There's also an IRC channel.
3330
02:51:01,800 --> 02:51:04,260
If you know what IRC is and
how to use it, by all means,
3331
02:51:04,260 --> 02:51:07,890
go to IRC channel Wikidata.
3332
02:51:07,890 --> 02:51:09,330
There's people
there all the time,
3333
02:51:09,330 --> 02:51:11,040
and you can just ask a question.
3334
02:51:11,040 --> 02:51:13,245
If you're trying to do a
query, and you don't quite
3335
02:51:13,245 --> 02:51:15,870
understand the syntax, or you're
not sure how to get the result
3336
02:51:15,870 --> 02:51:16,680
you want.
3337
02:51:16,680 --> 02:51:20,050
There are people there who
will gladly help you do that.
3338
02:51:20,050 --> 02:51:22,560
There is also a
Wikidata newsletter
3339
02:51:22,560 --> 02:51:25,680
published by the Wikidata team,
which is centered in Germany
3340
02:51:25,680 --> 02:51:27,330
and Wikipedia Germany.
3341
02:51:27,330 --> 02:51:31,890
And they send out a newsletter
in English with Wikidata news.
3342
02:51:31,890 --> 02:51:33,570
You know, new
properties, new items,
3343
02:51:33,570 --> 02:51:34,920
new things in the project.
3344
02:51:34,920 --> 02:51:36,840
But also sample queries.
3345
02:51:36,840 --> 02:51:39,300
So once a week there is
kind of an awesome query
3346
02:51:39,300 --> 02:51:43,440
to learn from, if you want
to learn that way instead
3347
02:51:43,440 --> 02:51:46,230
of reading like a
whole manual on SPARQL.
3348
02:51:46,230 --> 02:51:48,300
So I'm just encouraging
you to get help
3349
02:51:48,300 --> 02:51:49,470
in one of those channels.
3350
02:51:49,470 --> 02:51:51,000
Of course you can write to me.
3351
02:51:51,000 --> 02:51:55,920
Just reach out to me and
ask me questions as well.
3352
02:51:55,920 --> 02:51:58,860
I hope by now you agree
that Wikidata is love,
3353
02:51:58,860 --> 02:52:03,150
and Wikidata data is awesome.
3354
02:52:03,150 --> 02:52:06,480
If there are no questions,
we do have a tiny bit of time
3355
02:52:06,480 --> 02:52:11,510
to demonstrate one
more tool but that's--
3356
02:52:11,510 --> 02:52:12,010
no?
3357
02:52:12,010 --> 02:52:13,170
No questions.
3358
02:52:13,170 --> 02:52:17,600
OK so let's talk about--
3359
02:52:17,600 --> 02:52:19,100
well, the resonator
is kind of nice,
3360
02:52:19,100 --> 02:52:22,890
but it's a little like
the article placeholder.
3361
02:52:22,890 --> 02:52:25,530
So this is not Wikidata
this is a tool again
3362
02:52:25,530 --> 02:52:26,805
built by Magnus Manske--
3363
02:52:26,805 --> 02:52:29,310
AUDIENCE: There's also one
final question to you in case--
3364
02:52:29,310 --> 02:52:29,820
ASAF BARTOV: Oh,
there is a question.
3365
02:52:29,820 --> 02:52:30,390
AUDIENCE: Yeah.
3366
02:52:30,390 --> 02:52:32,348
ASAF BARTOV: Which
advantages and disadvantages
3367
02:52:32,348 --> 02:52:35,370
to create an item
before an article is
3368
02:52:35,370 --> 02:52:37,920
done on English Wikipedia?
3369
02:52:37,920 --> 02:52:42,340
Well, I mean, this example
that I just made right.
3370
02:52:42,340 --> 02:52:46,960
I'm reading this book
by a notable author.
3371
02:52:46,960 --> 02:52:47,810
OK.
3372
02:52:47,810 --> 02:52:51,400
I want this to
exist on Wikidata,
3373
02:52:51,400 --> 02:52:53,320
and to be mentioned
on Wikidata, so
3374
02:52:53,320 --> 02:52:56,950
that when people look up
that author in Wikidata
3375
02:52:56,950 --> 02:52:59,170
they will know about one
of his notable works.
3376
02:52:59,170 --> 02:53:02,470
But I'm not prepared to
put in the time investment
3377
02:53:02,470 --> 02:53:05,670
to build a whole article
on English Wikipedia.
3378
02:53:05,670 --> 02:53:07,420
Either because I don't
have the time, or I
3379
02:53:07,420 --> 02:53:09,040
don't have good sources.
3380
02:53:09,040 --> 02:53:11,560
Or maybe my English
is not good enough,
3381
02:53:11,560 --> 02:53:14,980
but it is good enough to just
record these very basic facts
3382
02:53:14,980 --> 02:53:17,850
and point to the Library of
Congress records et cetera.
3383
02:53:17,850 --> 02:53:20,170
So that it's better
than nothing.
3384
02:53:20,170 --> 02:53:23,170
So that's one reason
to maybe do it.
3385
02:53:23,170 --> 02:53:26,690
Another reason is to
be able to link to it.
3386
02:53:26,690 --> 02:53:30,190
So remember that
translator lady already
3387
02:53:30,190 --> 02:53:33,280
had an item on Wikidata, but if
she hadn't we could have just
3388
02:53:33,280 --> 02:53:38,560
created a very, very basic
rudimentary item about her just
3389
02:53:38,560 --> 02:53:41,740
saying, you know,
this name is human.
3390
02:53:41,740 --> 02:53:43,060
Country, Bulgaria.
3391
02:53:43,060 --> 02:53:45,220
Occupation, translator.
3392
02:53:45,220 --> 02:53:48,580
Even just that would have
would have been something,
3393
02:53:48,580 --> 02:53:51,610
and would have enabled me
to link to this person.
3394
02:53:51,610 --> 02:53:56,860
So these are legitimate reasons
to create Wikidata entities
3395
02:53:56,860 --> 02:54:01,510
without, or at least before,
creating a Wikipedia article.
3396
02:54:01,510 --> 02:54:02,709
If you are going to create--
3397
02:54:02,709 --> 02:54:04,750
I mean if you're at and
edit-a-thon or something,
3398
02:54:04,750 --> 02:54:07,690
and you have come to
create Wikipedia articles,
3399
02:54:07,690 --> 02:54:10,660
by all means, first create
the Wikipedia article,
3400
02:54:10,660 --> 02:54:13,982
then create the Wikipedia
item and link to it.
3401
02:54:13,982 --> 02:54:17,580
3402
02:54:17,580 --> 02:54:20,500
I hope that answers
the question.
3403
02:54:20,500 --> 02:54:24,940
So the reasonator
is simply a kind
3404
02:54:24,940 --> 02:54:31,330
of prettier view of
items in Wikidata.
3405
02:54:31,330 --> 02:54:35,980
So you can just type the name
of an item or the number.
3406
02:54:35,980 --> 02:54:39,010
Let's pick just a
random number, 42.
3407
02:54:39,010 --> 02:54:39,595
Say 42.
3408
02:54:39,595 --> 02:54:42,770
3409
02:54:42,770 --> 02:54:45,950
Which happens to
be, maybe you've
3410
02:54:45,950 --> 02:54:51,310
heard of this guy,
Douglas Adams.
3411
02:54:51,310 --> 02:54:55,490
He happened to have received
the queue number 42.
3412
02:54:55,490 --> 02:54:58,760
I'm sure it's a
cosmic coincidence
3413
02:54:58,760 --> 02:55:01,460
of infinite improbability.
3414
02:55:01,460 --> 02:55:03,470
And this is a view--
3415
02:55:03,470 --> 02:55:05,690
this is a tool that
is not Wikidata.
3416
02:55:05,690 --> 02:55:09,690
It's a tool built on top of
Wikidata called resonator.
3417
02:55:09,690 --> 02:55:14,750
And it gives us the information
from Q42, that is from the--
3418
02:55:14,750 --> 02:55:18,800
this item in Wikidata, which
looks like an item in Wikidata.
3419
02:55:18,800 --> 02:55:21,320
But it gives it to us in a
slightly more rational kind
3420
02:55:21,320 --> 02:55:22,430
of lay out.
3421
02:55:22,430 --> 02:55:24,200
It even kind of
generates a little bit
3422
02:55:24,200 --> 02:55:27,620
of pseudo article text for us.
3423
02:55:27,620 --> 02:55:30,429
You know, Douglas Adams was
a British writer, playwright,
3424
02:55:30,429 --> 02:55:31,970
screenwriter,
bla-bla-bla, an author.
3425
02:55:31,970 --> 02:55:35,630
He was born on this date, in
this place, to these people.
3426
02:55:35,630 --> 02:55:39,080
He studied at this place
between these years.
3427
02:55:39,080 --> 02:55:40,670
That's all machine generated.
3428
02:55:40,670 --> 02:55:42,230
Nobody wrote this text.
3429
02:55:42,230 --> 02:55:46,330
That's all taken from those
statements in Wikidata,
3430
02:55:46,330 --> 02:55:51,080
and generates this reasonable
reading summary paragraph.
3431
02:55:51,080 --> 02:55:54,140
And then it gives us this
little table of relatives.
3432
02:55:54,140 --> 02:55:55,610
It's all taken from Wikidata.
3433
02:55:55,610 --> 02:55:57,740
But as you can see,
this is already
3434
02:55:57,740 --> 02:56:02,120
a little more accessible than
the essentially arbitrary
3435
02:56:02,120 --> 02:56:05,120
ordering of statements
on Wikidata.
3436
02:56:05,120 --> 02:56:06,200
And that's OK.
3437
02:56:06,200 --> 02:56:08,060
I mean, that's
kind of by design.
3438
02:56:08,060 --> 02:56:10,100
Wikidata is the platform.
3439
02:56:10,100 --> 02:56:11,960
There is going to
be-- there are going
3440
02:56:11,960 --> 02:56:15,680
to be many new applications,
and platforms, and tools,
3441
02:56:15,680 --> 02:56:19,010
and visual interfaces
on top of Wikidata
3442
02:56:19,010 --> 02:56:23,000
to browse Wikidata in a more
friendly or more customized
3443
02:56:23,000 --> 02:56:24,480
ways.
3444
02:56:24,480 --> 02:56:27,080
For example, one of the
things that resonator
3445
02:56:27,080 --> 02:56:31,610
does for us is give us pictures
and maps and a timeline.
3446
02:56:31,610 --> 02:56:32,960
Check it out this.
3447
02:56:32,960 --> 02:56:38,990
Time line machine generated,
just from dates and points
3448
02:56:38,990 --> 02:56:44,090
in time, mentioned in the
relatively rich Wikidata
3449
02:56:44,090 --> 02:56:47,200
item about Douglas Adams.
3450
02:56:47,200 --> 02:56:47,700
Right?
3451
02:56:47,700 --> 02:56:50,030
So this timeline, for example
again, completely machine
3452
02:56:50,030 --> 02:56:51,140
generated.
3453
02:56:51,140 --> 02:56:53,270
But he was educated
between these years,
3454
02:56:53,270 --> 02:56:54,920
so I can put it on the timeline.
3455
02:56:54,920 --> 02:56:57,260
And this is the year he was
nominated for a Hugo awards,
3456
02:56:57,260 --> 02:56:59,570
so I can put that in a timeline.
3457
02:56:59,570 --> 02:57:00,600
Et cetera.
3458
02:57:00,600 --> 02:57:03,050
So that's just a super
quick demonstration
3459
02:57:03,050 --> 02:57:06,620
of that tool, the resonator.
3460
02:57:06,620 --> 02:57:10,310
Links are all here
in the slides.
3461
02:57:10,310 --> 02:57:13,390
And the final tool I wanted
to mention very quickly
3462
02:57:13,390 --> 02:57:16,220
is the mix and match tool.
3463
02:57:16,220 --> 02:57:21,980
You remember my explanation
about Wikidata as Nexus,
3464
02:57:21,980 --> 02:57:27,320
as connection point between many
databases, many data sources.
3465
02:57:27,320 --> 02:57:31,080
Those depend on
these equivalencies.
3466
02:57:31,080 --> 02:57:35,300
On Wikidata being taught
that this item is like that
3467
02:57:35,300 --> 02:57:37,940
ID in this other database.
3468
02:57:37,940 --> 02:57:41,810
And mix and match is a tool
again by, Magnus Manske.
3469
02:57:41,810 --> 02:57:44,690
Maybe you're detecting
a pattern here.
3470
02:57:44,690 --> 02:57:47,390
It's a tool by Magnus
that is designed
3471
02:57:47,390 --> 02:57:50,270
to enable us to kind
of take a foreign,
3472
02:57:50,270 --> 02:57:54,950
an external data set, put
it alongside Wikidata,
3473
02:57:54,950 --> 02:57:56,690
and kind of try and align them.
3474
02:57:56,690 --> 02:57:59,410
So this item in this
external dataset,
3475
02:57:59,410 --> 02:58:01,230
is that already
covered in Wikidata?
3476
02:58:01,230 --> 02:58:02,890
If so, by what queue number?
3477
02:58:02,890 --> 02:58:03,890
By what item?
3478
02:58:03,890 --> 02:58:06,170
If not, maybe we need
to create a Wikidata
3479
02:58:06,170 --> 02:58:07,610
item to represent it.
3480
02:58:07,610 --> 02:58:10,010
Or maybe it's a
duplicate, or something.
3481
02:58:10,010 --> 02:58:15,980
So the mix and match tool has
a list of external data sets,
3482
02:58:15,980 --> 02:58:18,140
as you can see.
3483
02:58:18,140 --> 02:58:21,260
The Art and Architecture
Thesaurus by the Getty Research
3484
02:58:21,260 --> 02:58:22,220
Institute.
3485
02:58:22,220 --> 02:58:26,690
Or the Australian
Dictionary of Biography.
3486
02:58:26,690 --> 02:58:28,880
All kinds of external
data sets here.
3487
02:58:28,880 --> 02:58:32,470
3488
02:58:32,470 --> 02:58:40,060
Somewhere here I had a specific
link to the Royal Society.
3489
02:58:40,060 --> 02:58:41,710
It can also give
me some statistics.
3490
02:58:41,710 --> 02:58:47,410
So there is an external data set
of all the Fellows of the Royal
3491
02:58:47,410 --> 02:58:48,001
Society.
3492
02:58:48,001 --> 02:58:48,500
Right?
3493
02:58:48,500 --> 02:58:54,970
The oldest academic
learned society in England.
3494
02:58:54,970 --> 02:58:57,415
And the internet is tired.
3495
02:58:57,415 --> 02:59:03,240
3496
02:59:03,240 --> 02:59:04,640
Here we go.
3497
02:59:04,640 --> 02:59:07,115
Nope.
3498
02:59:07,115 --> 02:59:08,105
Did that work?
3499
02:59:08,105 --> 02:59:12,560
3500
02:59:12,560 --> 02:59:15,390
Fellows of the Royal
Society, here we go.
3501
02:59:15,390 --> 02:59:17,970
So this one is complete.
3502
02:59:17,970 --> 02:59:21,330
I mean, people have manually
gone over every single item
3503
02:59:21,330 --> 02:59:24,330
there and either
matched it to Wikidata
3504
02:59:24,330 --> 02:59:27,390
or declared that it was not
in scope, or a duplicate
3505
02:59:27,390 --> 02:59:28,520
or whatever.
3506
02:59:28,520 --> 02:59:31,230
But let's look at site stats.
3507
02:59:31,230 --> 02:59:35,210
This is a fun kind of
aspect of this tool.
3508
02:59:35,210 --> 02:59:38,530
But that is not working.
3509
02:59:38,530 --> 02:59:40,820
Or it's taking too long.
3510
02:59:40,820 --> 02:59:43,940
So let's just demonstrate
how this works.
3511
02:59:43,940 --> 02:59:45,590
Maybe Britannica?
3512
02:59:45,590 --> 02:59:46,780
Is that done already?
3513
02:59:46,780 --> 02:59:52,570
3514
02:59:52,570 --> 02:59:53,990
Here we go.
3515
02:59:53,990 --> 02:59:55,330
Encyclopedia Britannica.
3516
02:59:55,330 --> 02:59:55,960
Yeah.
3517
02:59:55,960 --> 03:00:02,040
So the Encyclopedia
Britannica has
3518
03:00:02,040 --> 03:00:05,940
40% of the items there
are not yet processed.
3519
03:00:05,940 --> 03:00:07,830
So let's process one of them.
3520
03:00:07,830 --> 03:00:16,180
For example there is an item
in the Encyclopedia Britannica
3521
03:00:16,180 --> 03:00:19,960
called Boston, England.
3522
03:00:19,960 --> 03:00:23,050
As you know
All-American place names
3523
03:00:23,050 --> 03:00:26,050
are totally stolen
from elsewhere.
3524
03:00:26,050 --> 03:00:29,440
So there is a Boston
in England, though it's
3525
03:00:29,440 --> 03:00:30,700
no longer the famous one.
3526
03:00:30,700 --> 03:00:36,340
And the mix and match
tool has automatically
3527
03:00:36,340 --> 03:00:39,610
matched it based on
the label to queue
3528
03:00:39,610 --> 03:00:43,900
100, which is Boston big
city in the United States.
3529
03:00:43,900 --> 03:00:45,500
And that is incorrect, right?
3530
03:00:45,500 --> 03:00:48,910
That's kind of naive computer
going, well this is Boston,
3531
03:00:48,910 --> 03:00:50,820
and this other thing
is also Boston.
3532
03:00:50,820 --> 03:00:56,260
And it is asking me to
confirm this match or not.
3533
03:00:56,260 --> 03:00:57,400
You see?
3534
03:00:57,400 --> 03:01:01,120
So this is the Boston,
England from Britannica.
3535
03:01:01,120 --> 03:01:04,720
And the tool is asking
me, is this the same as
3536
03:01:04,720 --> 03:01:06,910
Boston queue 100 in America?
3537
03:01:06,910 --> 03:01:07,990
The answer is no.
3538
03:01:07,990 --> 03:01:10,110
I removed this.
3539
03:01:10,110 --> 03:01:11,860
I remove this match.
3540
03:01:11,860 --> 03:01:15,430
And now this Boston,
England is unmatched.
3541
03:01:15,430 --> 03:01:23,230
And I can match it to the
correct one in England.
3542
03:01:23,230 --> 03:01:27,370
I can do this by searching
English Wikipedia,
3543
03:01:27,370 --> 03:01:28,780
or searching Wikidata.
3544
03:01:28,780 --> 03:01:32,000
I mean, it has
these handy links.
3545
03:01:32,000 --> 03:01:36,910
So the English town
is in Lincolnshire.
3546
03:01:36,910 --> 03:01:38,230
Boston, Lincolnshire.
3547
03:01:38,230 --> 03:01:46,030
So I can go there and then
get the Wikidata item number.
3548
03:01:46,030 --> 03:01:49,810
See this is not queue
100, Boston in the states,
3549
03:01:49,810 --> 03:01:53,440
this is queue 311975
town in Lincolnshire.
3550
03:01:53,440 --> 03:01:57,310
I can get this queue
number, go back to the mix
3551
03:01:57,310 --> 03:01:58,160
and match tool--
3552
03:01:58,160 --> 03:01:59,110
Where was that?
3553
03:01:59,110 --> 03:02:00,180
Here we are.
3554
03:02:00,180 --> 03:02:01,510
And set queue.
3555
03:02:01,510 --> 03:02:08,650
I can tell the tool that this is
the right Boston, and click OK.
3556
03:02:08,650 --> 03:02:14,550
And now this town
in Lincolnshire,
3557
03:02:14,550 --> 03:02:17,100
you can see this here,
this item, queue 311975,
3558
03:02:17,100 --> 03:02:21,190
is linked to Britannica.
3559
03:02:21,190 --> 03:02:22,660
What does this mean?
3560
03:02:22,660 --> 03:02:23,820
Well, if we go there.
3561
03:02:23,820 --> 03:02:25,380
If we actually go
to the Wikidata
3562
03:02:25,380 --> 03:02:28,890
entity you will see
that in addition
3563
03:02:28,890 --> 03:02:34,140
to the few statements that
it already had, it now has,
3564
03:02:34,140 --> 03:02:38,610
thanks to my clicking, it now
has another identifier here.
3565
03:02:38,610 --> 03:02:39,270
See?
3566
03:02:39,270 --> 03:02:43,950
Encyclopedia Britannica
Online ID, with this link.
3567
03:02:43,950 --> 03:02:49,440
And if we click it, we
will indeed reach this page
3568
03:02:49,440 --> 03:02:51,510
in the Britannica
online, which is indeed
3569
03:02:51,510 --> 03:02:53,700
about this town in Lincolnshire.
3570
03:02:53,700 --> 03:02:54,510
You see?
3571
03:02:54,510 --> 03:02:58,650
So I've contributed one
of those mappings, one
3572
03:02:58,650 --> 03:03:01,950
of those identifiers,
into Wikidata.
3573
03:03:01,950 --> 03:03:04,860
And I didn't have
to do it manually.
3574
03:03:04,860 --> 03:03:07,980
This tool kind of prompted
me to either confirm
3575
03:03:07,980 --> 03:03:09,480
if it was correct,
I could have just
3576
03:03:09,480 --> 03:03:12,150
clicked confirm since
it wasn't correct.
3577
03:03:12,150 --> 03:03:16,920
I corrected it manually, but
it made this edit on my behalf.
3578
03:03:16,920 --> 03:03:21,180
So that's another tool that
encourages us to systematically
3579
03:03:21,180 --> 03:03:24,360
teach Wikidata more things.
3580
03:03:24,360 --> 03:03:25,860
And we're out of time.
3581
03:03:25,860 --> 03:03:29,430
Go edit Wikidata, Now
that you have the power,
3582
03:03:29,430 --> 03:03:30,510
you know the deal.
3583
03:03:30,510 --> 03:03:32,430
Use it for good,
and not for evil.
3584
03:03:32,430 --> 03:03:35,640
If you have questions,
this is my email address.
3585
03:03:35,640 --> 03:03:38,640
If you're watching this video
not live the description
3586
03:03:38,640 --> 03:03:41,610
will have links to the
slides, and to a bunch
3587
03:03:41,610 --> 03:03:44,610
of other useful
pieces of information.
3588
03:03:44,610 --> 03:03:49,510
Any last questions on IRC?
3589
03:03:49,510 --> 03:03:53,210
If not, thank you
for your attention.
3590
03:03:53,210 --> 03:03:56,470
And if you like this, and if you
feel that you now get Wikidata,
3591
03:03:56,470 --> 03:03:58,330
and you get what it's
good for, and you're
3592
03:03:58,330 --> 03:04:01,660
inspired to contribute, I have
only one request from you.
3593
03:04:01,660 --> 03:04:04,960
I mean, in addition to using
it for good not for evil,
3594
03:04:04,960 --> 03:04:07,630
I ask that you spread the word.
3595
03:04:07,630 --> 03:04:09,550
Show this video--
share this video
3596
03:04:09,550 --> 03:04:13,180
with other people in your
community, or around you.
3597
03:04:13,180 --> 03:04:16,000
Teach this yourself
once you're comfortable
3598
03:04:16,000 --> 03:04:17,650
with these concepts.
3599
03:04:17,650 --> 03:04:21,330
Feel free to use my slides.
3600
03:04:21,330 --> 03:04:23,580
Yeah, and edit Wikidata.
3601
03:04:23,580 --> 03:04:27,010
Thank you very
much, and goodbye.
3602
03:04:27,010 --> 03:04:32,456