Interview with Amir Sarabadani
If there’s a consensus, it can be changed
Interview with Amir Sarabadani
During the Algoliterary Encounters, the Algolit group invited Amir Sarabadani to present ORES, a project he is working on as a software engineer for Wikimedia Germany. ORES is short for “The Objective Revision Evaluation Service”, a web service and application programming interface (API) that provides machine learning as a service for Wikimedia projects. The system is designed to help automate critical wiki-work such as vandalism detection and removal. Amir has been active in Wikipedia since 2006 as a sysop, a bureaucrat and check user for the Persian chapter of Wikipedia, and as a developer for several Wikimedia projects. He is operator of Dexbot and one of the developers of the pywikibot framework. Amir was born in 1992 in Tehran, Iran where he studied physics and currently lives in Berlin. During lunch, Cristina Cochior and Femke Snelting took the opportunity to ask him a few more questions. Read the edited transcript from this conversation below:
Amir: OK, so having someone from inside Mediawiki to explain things for you is maybe some sort of an informative perspective, but it’s also a very pessimistic one, because we know how the whole thing works and we are always saying that there are a lot of things needing to be fixed. This is my first disclaimer, if you want to ask anything about Mediawiki, my opinion would be: “Ah this is horrible. We need to rewrite the whole thing...” So, keep that in mind.
Cristina: From attending the  Wikimedia Hackathon, I understood that some people are in the process of rewriting the Mediawiki API.
Amir: Well, that’s a good point. As a matter of resources, there is a dedicated team that has been recently appointed and they are trying to rewrite some parts of Mediawiki. There are also other things being rewritten, for example, logins have been rewritten recently. The software is so big that you cannot redo the whole thing from scratch; you have to pick up something, take it out, rewrite it and put it back again. And our team is working on revisions. It’s about maintaining an infrastructure, so whenever you’re done, there’s somewhere else that needs attention.
Cristina: Is this a possible reason why there hasn’t been so much intervention within the Mediawiki interface design until now?
Amir: To some degree, yes. For some extensions of Mediawiki, yes, that’s correct. But most of the reasons why the interface of Mediawiki hasn’t been changed, is because of the community’s inertia towards change. If we have to try something very small and maybe we get resistance from the community, we try to push back and fix it in a proper way. But also there are a lot of changes in the interface, especially for the editors, that you probably haven’t noticed. So for example, if you check the Edit system, it’s more modern now. It happened several months ago.
Cristina: Yes, indeed; I was wondering about changes on a structural level. Beside the additions to the interface that make [Wikipedia] more accessible, the basic structure of the article page, that only shows one version to the reader, seems to have remained unchanged since the early design of the Mediawiki. For example, the edits that build up an article are tucked behind the “View History” button and they’re not always easy to see at first, if you’re not already familiar with the interface. Are there any intentions to make these wiki operations more visible?
Amir: This is the conceptual side. I don’t think there are any plans in thinking or talking about it that way, because maybe that’s the whole point of being a wiki. This is the wiki by design. But there is another thing, Mediawiki itself does not have an interface. Instead, it’s called a skin. For example, the skin that Mediawiki has right now by default is Vector. You can change the skin to something else and you see a brand new interface. For example, if you visit it on a mobile, it’s a completely new layout, that’s another thing. It’s just another skin.
Femke: In the way Wikipedia works, there is so much attention to conversation and difference of opinion, but in the way it presents itself, the neutral point of view is very heavily asserted; only the consented version is presented as the definitive view. So we try to understand the place of discussion, debate, disagreement, ambiguity and in some way vandalism in the life of Wikipedia, not just as something that has to stay out of sight, but actually something that is vital to how knowledge is being produced. So after hearing you speak today, we wanted to ask you if this is something you think about. How do you deal with those issues?
Amir: I think I didn’t understand the question completely, but let me give you my answer and if I didn’t understand, you can let me know. The discussions that happen around an article are usually stored on the talk page. There, people talk about the article and its structure, but not the subject. Because there is a big problem with people talking about the subject of the article, if they say for example whether Islam is correct or not, but this is not related. They should be talking about the article on Islam. So they [the Wikipedia editor community] say that the article needs to be changed in a way that works better for users, or in a way that complies with the guidelines. So this is one part of the answer. The second part about reverting vandalism, I think this is all reflected in the history, but the history is not [visible] in the structure. The biggest problem is that the history [of an article] is not properly structured for new users, it’s just for super expert Wikipedians, who know how to deal with it already. This is something that never came up with the designers because the UX designers of Wikipedia are usually dividing people into two groups: readers and editors. They say that readers don’t need to interact with the history, so they don’t rewrite the interface for that. And they take it for granted editors already know how to work with history.
Cristina: I thought it was interesting that yesterday, in your talk, you referred to the concepts of “subjectivity’ and “objectivity” in relation to Wikipedia principles. You said that the assessment of whether an edit is vandalism or not is subjective, because it comes down to the personal interpretation of what vandalism is, and later on, you also referred to the objectivity principle on which Wikipedia is based. How do you see the relation between the two?
Amir: Well, the thing about Wikipedia, especially the policies, is that it’s not very objective. It’s very open to interpretations and it’s very complicated. I don’t know if I told you, but one of the laws of Wikipedia says to ignore all rules. It means: do everything you think is correct; if there’s a problem and you’re violating anything, it could be that you come to the conclusion that maybe we should change that law. It happens all the time. If there’s a consensus, it can be changed. There is a page on Wikipedia that’s called Five Pillars and it says that except these five pillars, you can change everything. Although I don’t think that’s very objective—everything is subjective on Wikipedia—but when there is an interaction of lots of people, it becomes more natural and objective in a way because there is a lot of discussion and sometimes there are people who try to change others’ opinions about some issues. When this happens, it makes everything more neutral.
The result is aiming to be neutral. And it is, because of the integration of lots of people that are cooperating with each other and who are trying to get things done in a way that doesn’t violate policies. So they tolerate things that they don’t like in the article or sometimes they even add them [themselves] to make it more neutral.
Femke: Could you give an example?
Amir: The biggest problem is usually writing about religion. I have seen people who are against a religion and try to make a critique of it, but when they are writing an article [on the subject], they [also] try to add something in there, like a defence of Muslims, in order to make it more neutral. The people who contribute usually value the pillars, including the pillar of neutrality.
Femke: There is vandalism that is not targeted, that is about simply asserting that ‘I can break something’, but there is also vandalism that wants to signal disagreement or irritation with a certain topic. Do you ever look at the relation between where the vandalism goes and what topics are being attacked?
Amir: Well, I didn’t, but there are lots of topics about that. Things that people have strong feelings about are always good targets for vandals. There is always vandalism around things that have strong politics. It can be sports, it can be religion, it can be any sensitive subject like homosexuality, abortion; in these matters it happens all the time. One thing that I think about is that sometimes when people are reading articles on Wikipedia, it’s outside of their comfort zone, so they try to change the article and bring it back within that, instead of expanding it.
Cristina: On another note, the “edit wars” phenomenon and debates that are happening on the discussion page of a specific article have been widely written about. There are of course many elements driving these tensions, that include but also go beyond the edit culture of Wikipedia. But I wonder, if there would be a possibility to have multiple readings of one page, would that result in less competing views, since more perspectives on one subject would be accessible? Or rather, how could Wikipedia display knowledge without having to turn polyvocality into monovocality?
Amir: There have been lots of debates about this. I don’t know if you’re familiar with Vox, the media company from the United States? One thing that they tried to implement was to make a version of Wikipedia that is customizable. For example, if you are pro Trump, you are given a different article than someone who is a democrat. But you can see the problem immediately, it diverges people. Just like what Facebook is doing right now, making people live inside their bubbles. I think this is the reason why people on Wikipedia are fighting against anything that has this divisive effect.
Femke: Yes. if you would make multiple wikis, you would support an algorithmically induced separation of world views. I understand why that raises concern. But on the other hand, why is there such a need to always come to a consensus? I’m wondering if this is always helpful for keeping the debate alive.
Amir: On Wikipedia, they knew that consensus is not something you can always reach. They invented a process called Conflict Resolution. When people talk and they see that they cannot reach any consensus, they ask for a third-party opinion. If they couldn’t find any agreement with the third-party opinion, they call for a mediator. But mediators do not have any enforcement authority. If mediators can resolve the conflict, then it’s done, otherwise the next step is arbitration. For example, the case of Chelsea Manning. What was her name before the transition? I think it’s Brandon Manning, right? So, there was a discussion over what the name on Wikipedia should be: Chelsea Manning or Brandon Manning. So there was lots of transphobia in the discussion and when nothing worked, it went to an ArbiCom (Arbitration Committee). An arbitration committee is like a very scary place, it has a court and they have clerks that read the discussion. The outcome was obviously that it should stay Chelsea Manning. It’s not like you need to reach consensus all the time, sometimes consensus will be forced on you. Wikipedia has a policy saying “Wikipedia is not”. One of the things Wikipedia is not is a place for democracy.
Femke: This Chelsea Manning case is interesting! Is this decision archived in the article somewhere?
Amir: The cases usually happen on the ArbiCom page, in the /Cases section. But finding this in the discussion is hard.
Femke: To go back to your work. During these Algoliterary Encounters we tried to understand what it means to find bias in machine learning. The proposal of Nicolas Malevé, who gave a workshop yesterday, was to neither try to fix it, nor to refuse dealing with systems that produce bias, but to work with it. He says bias is inherent to human knowledge, so we need to find ways to somehow work with it; he used the image of a “bias cut” in textile, which is a way to get the most flexibility out of woven materials. But we’re struggling a bit with what would that mean, how would that work... So I was wondering if you had any thoughts on the question of bias?
Amir: Bias inside Wikipedia is a tricky question because it happens on several levels. One level that has been discussed a lot is the bias in references. Not all references are accessible. [the waiter brings us the soups] So one thing that the Wikimedia foundation has been trying to do is to give free access to libraries that are behind a pay wall. [the following sentence is hard to distinguish because of clinking cutlery sounds] They reduce bias by only using open access references. Another type of bias is the access to the internet. There are lots of people who don’t have it. One thing about China is that [some pages on the Internet] are blocked. The content against the government of China inside Chinese Wikipedia is higher because the editors [who can access the website] are not people who are pro government, and try to make it more neutral. This happens in lots of places. But in the case of AI and the model that we use at Wikipedia, it’s more a matter of transparency. There is a book about how bias in AI models can break people’s lives, it’s called “Weapons of Math Destruction”. It talks about AI models that exist in the United States, that rank teachers. It’s quite horrible, because eventually there will be bias. The way to deal with it, based on the book and their research was first that the model should be open source—people should be able to see what features are used and the data should be open as well, so that people can investigate, find bias, give feedback and report back. There should be a way to fix the system. I don’t think all companies are moving in that direction, but Wikipedia, because of the values that they hold, are at least more transparent and they push other people to do the same.
[it’s becoming hard to speak and eat soup at the same time—we take a small break to eat]
Femke: It was very interesting to think of your work in the context of Wikipedia, where labour is legible, and is always part of the story. It’s never erased. In most machine learning environments the work of classifying, deciding and defining often does not get a lot of attention.
Amir: Actually, one of the things that I’ve been recently working on is to get more people involved in the ‘pre’ aspect of the bot, so for example: if someone is labelling edits for our model to be trained on, we show their name and how many labels they contributed. There was someone asking us for a list of contributors, because they wanted to give them a barn star to show appreciation for their work. Other reasons are the instant gratification, the credibility that it gives users and that it helps others.
Femke: But also it keeps those who are part of the labour of producing knowledge somehow part of the system. It doesn’t separate them from the actual outcome, which I think is important.
Femke: What I’m trying to understand is the role “binary separation” in machine learning. To refer to the example you used on Friday, when you showed a mass of Wikipedia edits, and then a sort of incision on the lines that could be considered vandalism or that are probably OK. But the separation is an either/or, it’s a left or right.
Amir: That example was very simplified. Overall, we just give a number between one and zero, so that’s a probability. If we order them, there is a spectrum of how likely it is for edits to be vandalism. We can do our incision wherever we want and I assure you that there are recent changes that [something] have, we can highlight them in different colours and we can look at them with a different precision.
Femke: I understand that some things are more likely to be vandalism than others, but still, because they’re expressed over a spectrum of two: either extremely vandalistic or not at all, we are dealing with a binary, no?
Amir: Yeah, you are completely right. The thing we are trying to tackle in regards to Wikipedia editing, is that we are trying to make a model not just as a binary separation. We have a good faith model, which predicts with the same system that moves between one and zero whether an edit has been made in good faith or not. For example, you may see if an edit was damaging, but it was made with a good intention. You see many people that want to help, but because they are new, they make mistakes. We try to tackle this by having a different model. So if an edit has both a high vandalism score and a high bad faith score, we can remove that with bots and we can interact with people who make mistakes but have a good intention.
Cristina: How do you see the good faith principle in relation to neutrality?
Amir: I think it’s completely related and I think it comes down to ‘is the user trying to help Wikipedia or not’: this is our brainstorm.
Femke: If you talk about the distinction between good faith and bad faith, it is still about faith-in-something. If you plot this “faith” according to the vector of a neutral point of view, you’re dealing with a different type of good faith and goodness than if you plot the faith along the vector of wanting more points of view instead of less.
Amir: I see. I think good faith means good intent. By defining what is “good” in this way, we are following the principles of the whole Wikipedia [community]. Good means helping people. Although it is a very subjective term. What we are trying to do right now is to make some sort of survey—to take out things that are computational and can’t be measured easily, like quality, and ask people whether they think an edit looks good or bad. To make things more objective, to make things come together from the integration of observations of lots of people. Obviously, there are a lot of gray areas.
Femke: I do not have problems with mistakes that could be made in this way or the subjectivities of those algorithms. I’ll try to ask a different question that maybe arrives at the same point: if you think about Wikipedia as a living community, the project changes with every edit; every edit is somehow a contribution to this living-knowledge-organism. So then, if you try to distinguish what serves the community and what doesn’t, and you try to generalise that, because I think that’s what the good faith-bad faith algorithm is trying to do, and start developing helper tools to support such a project, you do that on the basis of an abstract idea of what Wikipedia is and not based on the living organism of what happens every day. What I’m interested in is the relationship between vandalism and debate, and how we can understand the conventional drive in machine-learning processes. And how can we better understand these tensions and deal with them? If you place your separation of good faith-bad faith on pre-existing labelling and then reproduce that in your understanding of what edits are being made, how do then take into account movements that are happening, the life of the actual project?
Amir: Ok, I hope that I understood you correctly. It’s an interesting discussion. Firstly, what we are calling good faith and bad faith comes from the community itself, we are not doing the labelling for them, they are doing it themselves. So, in many different language communities of Wikipedia, the definition of what is “good faith” and what is “bad faith’ will differ. Wikimedia is trying to reflect what is inside the organism and not to change the organism itself. If the organism changes and we see that the definition of good faith and helping Wikipedia has changed, we will implement this as a feedback loop that lets people from inside of their community pass judgment on their edits. If they disagree with the labelling [results altogether], we can go back to the model and retrain the algorithm to reflect this change. It’s some sort of closed loop: you change things and if someone sees there is a problem, then they tell us and we can change the algorithm back. It’s an ongoing project.
Cristina: This feedback assessment can be done through false positives: the situations when test results are wrongly attributed a property that is not present. In that case, contributors can revert an edit, or contact you. Does that mean that the original dataset on which the algorithm is trained would also be reiterated over the years?
Amir: Yeah, it will be reiterated.
Femke: Already the fact that the labelling is made transparent and debatable is very helpful. In this way you can start to understand the relation between the labelling and the consequences in the model–I didn’t think about it, but that’s one of the things we were trying to do with The Annotator.
Amir: What we are trying to build, but we are not there yet, is that when someone labels an edit, it should be immediately visible for others, and these others can make a judgment for themselves whether they find it correct.
Femke: I have an issue with the fact that it is assumed that agreement is always possible and necessary. Even if the disagreements leading up to agreement are archived and legible, agreement is still a prerequisite for the whole machinery to start operating. If agreement is the main drive – I cannot help to think that there’s going to be trouble with everything that is non-agreeable. Everything that causes trouble will be difficult to take into account in a system that so much depends on agreement or non-ambiguity. Your example of Chelsea Manning is very interesting – I would like to look more carefully at how the Wikipedia community has dealt with that. In machine learning ... every time we come across moments where ambiguity and disagreements create problems, the problems seem to be put aside. I think you clearly explain how the process towards that agreement is dealt with within Wikipedia on the level of writing the articles, but also on the level of labelling the data. But still, we are dealing with a technology that values ...
Amir: ... consensus and agreement more than multiple points of view.
Amir: I think there’s some sort of trade-off. We are not trying to reach agreement for all parties and all people here, there are lots of times where if someone pushes too much, then everyone votes against it and we block that person. We do not try to come to an agreement, because there is a difference between coming to an agreement and pushing your idea. This is a difficult situation, there are lots of people that are being paid to push their agenda. I see sometimes how trying to come to an agreement can be time consuming, resource consuming. But I think Wikipedia has shown that it works.
Femke: Right. Thank you. And we have even eaten our soups in the meantime!
Amir: In your language, do you drink soup or do you eat it?
Cristina: Eat. How is it in Persian?
Amir: In Persian, drinking and eating are the same, so it’s not a problem at all! But in some languages I see that they drink soup, whereas in most languages they eat soup.
Femke: It would not be wrong to say I drink the soup, but–
Amir: –it feels weird.
- ↑ Algoliterary Encounters, Algolit. November 2017 http://constantvzw.org/site/Algoliterary-Lectures,2852.html
- ↑ https://foundation.wikimedia.org/wiki/User:Ladsgroup
- ↑ https://www.mediawiki.org/wiki/ORES
- ↑ Mediawiki is the software on which all the Wikimedia projects run.
- ↑ API stands for “Application Programming Interface’ and is a set of defined methods of communication between programming applications. The Mediawiki API allows other applications to pull information from Wikimedia projects and use it for their own aims.
- ↑ https://en.wikipedia.org/wiki/Wikipedia:Five_pillars
- ↑ https://en.wikipedia.org/wiki/Wikipedia:Barnstars
- ↑ The Annotator, developed during the Constant worksession Cqrrelations. May 2015 http://snelting.domainepublic.net/affiliation-cat/constant/the-annotator