Interview with Amir Sarabadani

From DiVersions
Revision as of 09:07, 1 September 2019 by FS (talk | contribs)
Jump to navigation Jump to search

During the Algoliterary Encounters[1], the Algolit group invited Amir Sarabadani[2] to present ORES[3], a project he is working on as a software engineer for Wikimedia Germany. ORES is short for “The Objective Revision Evaluation Service”, a web service and application programming interface (API) that provides machine learning as a service for Wikimedia projects. The system is designed to help automate critical wiki-work such as vandalism detection and removal. Amir has been active in Wikipedia since 2006 as a sysop, a bureaucrat and check user for the Persian chapter of Wikipedia, and as a developer for several Wikimedia projects. He is operator of Dexbot and one of the developers of the pywikibot framework. Amir was born in 1992 in Tehran, Iran where he studied physics and currently lives in Berlin. During lunch, Cristina Cochior and Femke Snelting took the opportunity to ask him a few more questions. Read the transcript from this conversation below:


Amir: OK, so having someone from the inside Mediawiki to explain things for you is maybe some sort of an informative perspective, but also a very pessimistic one, because we know how the whole thing works and we are always saying that there is a huge technical development [missing], it needs to be cleaned up and there are things to be fixed. This is my first disclaimer, if you want to ask anything about Mediawiki, my opinion would be “ah this is horrible. We need to rewrite the whole thing...” so, keep that in mind.


Cristina: I think some people are maybe doing that, rewriting the API at least.


Amir: Well, that's a good point. As a matter of resources, there is a dedicated team that has been recently appointed and they are trying to rewrite some parts of Mediawiki and also there are other teams, for example logins have been rewritten recently. Because it's so big that you cannot rewrite the whole thing from scratch, you are picking up something and taking it out and rewriting it and putting it back again. And our team is working on revisions. It's about maintaining an infrastructure, so when you're done there's somewhere else that needs attention.


Cristina: Would you say that this is reason why there hasn't been so much interference with the Mediawiki interface somehow?


Amir: To some degrees, yes. For some special extensions of Mediawiki, yes, that's correct. But most of the reasons why the interface of Mediawiki hasn't changed is because of community inertia towards changing. We have to try something very small and then get the resistance from the community trying to push back to then fix it in a proper way. But also there are a lot of changes in the interface, especially for the editors that you probably haven't noticed. So for example, if you check the Edit system, it's more modern now. It happened several months ago.


Cristina: But I mean it on a more fundamental level. There are changes that are made to the interface that make [Wikipedia] more accessible, but the basic structure of the article that only represents one version that will presented to the reader [remains unchanged], whereas the edits that have produced the article are tucked behind and they're not very easy to see at first if you're not familiar with the interface, for example.


Amir: This is the conceptual [side]. I don't think there are any plans in thinking or talking about it that way, because maybe that's the whole point of being a wiki, this is the wiki by design. But there is another thing, Mediawiki itself does not have any [one] interface. It has a skin. For example, the skin that Mediawiki has right now is called Vector. You can change the skin to something else and you see a brand new interface. For example if you visit it on mobile, it's a completely new layout, it's just another skin.


Femke: In the way Wikipedia works, there is so much attention to conversation and difference of opinion, but in the way it presents itself, the neutral point of view is very heavily asserted; only the consented version is presented as the definitive view. So we try to understand the place of discussion, debate, disagreement, ambiguity and in some way vandalism in the life of Wikipedia, not just as something that has to stay out of sight, but actually something that is vital to how knowledge is being produced. So after hearing you speak today, we wanted to ask you if this is something you think about. How you deal with those issues.


Amir: I think I didn't understand the question completely, but let me give my answer and if I didn't understand [you can let me know]. The discussions that happen for an article are usually stored on the talk page and there people talk about the article and the structure, but not the subject. Because there is a big problem with people talking about the subject of the article, they say for example whether Islam is correct or not, but this is not related. They should talk about the article about Islam, so they say that the article needs to be changed in a way that works better for users or in a way that complies with the guidelines. So this is one part. The part about reverting vandalism, I think this is all reflected in the history, but the history is not in the structure. The biggest problem is that the history [of an article] is not properly structured for new users, it's just for super expert Wikipedians who know how to deal with it already. This is something that never came up with the designers because the UX designers of Wikipedia are usually dividing people into two groups: readers and editors. So for readers, [UX designers] say, they don't need to interact with history, so they don't rewrite the interface of history, but for editors they take it for granted that editors know how to work with history.


Cristina: I thought it was interesting that yesterday in your talk you referred to the concepts of 'subjective' and 'objective'. You said that the assessment of vandalism is subjective, because it comes down to the personal interpretation of what vandalism is, but then you referred also to the objectivity principle on which Wikipedia is based. You seemed to view these two concepts as coexisting on the same platform. Did I read that correctly?


Amir: Well, the thing about Wikipedia, especially the policies, is that it's not very objective. It's very open to interpretations and it's very complicated. I don't know if I told you but there is a law of Wikipedia that says ignore all rules. It means do everything you think is correct and if there's a problem and you're violating anything, it could be that you come to a conclusion that maybe we should change that law. It happens all the time. If there's a consensus, it can be changed. There is a page on Wikipedia that's called Five Pillars and the five pillars say that except these five pillars, you can change everything. Although I don't think that's very objective, everything is subjective on Wikipedia. But when there is an interaction of lots of people, it becomes more natural and objective in a way because there is a lot of discussion and sometimes there are people who try to change others' opinions about some issues. When this happens, it makes everything more neutral. In another way, [the sounds are hard to distinguish at this point] by making battles, they both fight and the result is something neutral, which to some degrees is not great, but...


Cristina: Do you think that the result is aiming to be neutral?


Amir: The result is aiming to be neutral. And it is, because of the integration of lots of people that are cooperating with each other and who are trying to get things done in a way that doesn't violate policies. So they tolerate things that they don't like in the article or sometimes they even add them [themselves] to make it more neutral.


Femke: Could you give an example?


Amir: The biggest problem is usually writing about religion. I have seen people who are against a religion and try to make a criticism and when they are writing the article they try to add something in there, like a defence of Muslims, in order to make it more neutral. The people who contribute usually value these pillars, including the pillar of neutrality.


Femke: There is vandalism that is not targeted, that is about simply asserting that 'I can break something', but there is also vandalism that wants to signal disagreement or irritation with a certain topic. Do you ever look at the relation between where the vandalism goes and what topics are being attacked?


Amir: Well, I didn't, but there are lots of topics about that and things that people have strong feelings about are always good targets for vandals. There is always vandalism around things that have strong politics. It can be sports, it can be religion, it can be any sensitive subject like homosexuality, abortion, in these matters it happens all the time. One thing that I think about is that sometimes when people are reading articles on Wikipedia, sometimes it's outside of their comfort zone, so they try to change the article and bring it back in, instead of expanding it.


Cristina: I was wondering actually about this sense of ownership that some editors have over their articles. A lot of reverts and debates that are happening behind the scenes of a specific article is due to the fact that one person started creating the page and put a lot of effort in it and someone else wants to implement some changes with which the first person does not agree. Do you think that the fact that there is only one face of Wikipedia is related to that? If you would have the possibility to have multiple readings of one page, then there would also be more views on one subject.


Amir: There have been lots of debates about this. I don't know if you know Vox, the media company from the United States? One thing that they tried to implement was to make a version of Wikipedia that is customisable. For example, if you are pro-Trump, you are given a different article than someone who is a democrat. But immediately you can see the problem, it diverges people. Like what Facebook is doing right now, making people live inside their bubbles. I think this is the reason why the people on Wikipedia are fighting against anything that has this divisive effect.


Femke: Yes. if you would make multiple wikis, you would support an algorithmically induced separation of world views. I understand why that raises concern. But on the other hand, why is there such a need to always come to a consensus? I'm wondering if this is always helpful for keeping the debate alive.


Amir: For Wikipedia, they knew that consensus is not something you can always reach. They invented a process called Conflict Resolution. When people talk and they see that they cannot reach any consensus, they ask for a third-party opinion. If they couldn't find any agreement with the third-party opinion, they call for the mediator. But mediators do not have any enforcement authority. If mediators can resolve the conflict, then it's done, otherwise the next step is arbitration. For example the case of Chelsea Manning. What was her name before the transitioning? I think it's Brandon Manning, right? So, there was a discussion over what they name on Wikipedia should be: Chelsea Manning or Brandon Manning. So there was lots of transphobia in the discussion and when nothing worked, it went to an ArbiCom (Arbitration Committee). An arbitration committee is like a very scary place, it has a court and they have clerks that read the discussion and the outcome was obviously that it should stay Chelsea Manning. It's not like you need to reach consensus all the time, sometime consensus will be forced on you. Wikipedia has a policy saying “Wikipedia is not”. One of the things Wikipedia is not is a place for democracy.


Femke: This Chelsea Manning case is interesting! Is this decision archived in the article somewhere?


Amir: The cases usually happen on the ArbiCom page and on the case as such page. But finding this in the discussion is hard.


Femke: To go back to your work. During these Algoliterary Encounters we tried to understand what it means to find bias in machine learning. The proposal of Nicolas Maleve, who gave a workshop yesterday, was to neither try to fix it, nor to refuse dealing with systems that produce bias, but to work with it. He says bias is inherent to human knowledge, so we need to find ways to somehow work with it; he used the image of a ‘bias cut’ in textile, which is a way to get the most flexibility out of woven materials. But we're struggling a bit with what would that mean, how would that work... So I was wondering if you had any thoughts on the question of bias?


Amir: Bias inside Wikipedia is a tricky question because it happens on several levels. One level that has been discussed a lot is the bias in references. Not all references are accessible. [Waiter brings us the soups] So one thing that the Wikimedia foundation has been trying to do is to give free access to libraries that are behind a pay wall. [the following sentence is hard to distinguish because of clinking cutlery sounds] They reduce the bias by only using open access references. Another type of bias is the internet connection, access to the internet. There are lots of people who don't have it. One thing about China, is that [Internet there] is blocked. The content against the government of China inside Chinese Wikipedia is higher because the editors [who can access the website] are not people who are pro government, and try to make it more neutral. So, this happens in lots of places. But in the matter of AI and the model that we use at Wikipedia, it's more a matter of transparency. There is a book about how bias in AI models can break people's lives, it's called “Weapons of Math Destruction”. It talks about [AI] models that exist in the United States that rank teachers and it's quite horrible because eventually there there will be bias. The way to deal with it based on the book and their research was first that the model should be open source, people should be able to see what features are used and the data should be open also, so that people can investigate, find bias, give feedback and report back. There should be a way to fix the system. I think not all companies are moving in that direction, but Wikipedia, because of the values that they hold, are at least more transparent and they push other people to do the same thing.


[It's hard to speak and eat soup at the same time. We take a small break to eat.]


Femke: It was very interesting to think of your work in the context of Wikipedia, where labour is legible, and is always part of the story. It's never erased. In most machine learning environments the work of classifying, deciding and defining often does not get a lot of attention.


Amir: Actually, one of the things that I've been working on is to get people involved in the pre aspect of the bot, so for example: if someone is enabling edits for our model to be trained on, we show their name and how many labels they did. So they can rank each other. There was someone asking us [here the sounds are hard to distinguish because of noise in the background]. Some of the reasons are the instant gratification, [that] it helps credibility and [that] it helps people.


Femke: But also it keeps those who are part of the labour of producing knowledge somehow part of the system. It doesn't separate them from the actual outcome, which I think is important.


Amir: Exactly.


Femke: What I'm trying to understand is the role of ‘binary separation' in machine learning. To refer to the example you used on Friday, when you showed a mass of Wikipedia edits, and then a sort of incision on the lines that could be considered vandalism or that are probably OK. But the separation is an either/or, it's a left or right.


Amir: That example was very simplified. Overall, we just give a number between one and zero, so that's a probability. If we order them, there is a spectrum of how likely it is for edits to be vandalist. And we can do our incision as much as we want and I assure you that there are recent changes where [something] is off, we can highlight them in different colours and we can look at them with a different precision.


Femke: I understand that some things are more likely to be vandalism than others, but still, because they're expressed over a spectrum of two: either extremely vandalist or not at all, we are dealing with a binary, no?


Amir: Yeah, you are completely right. The thing we are trying to tackle in terms of Wikipedia editing, we are trying to make a model not just in terms of binary separation. We have a good faith model, which predicts with the same system between one and zero that an edit has been made in good faith or not. For example, if you see if an edit is damaging, but it was made with a good intention. You see many people that want to help, but because they are new, they make mistakes. We try to tackle this by having another model. So if an edit has both a high vandalist score and a high bad intent score, we can remove it with bots and we can interact with people who make mistakes but have a good intention.


Cristina: And how do you see the good faith principle in relation to neutrality?


Amir: I think it's completely related and I think it comes down to is this user trying to help Wikipedia or not: this is our brainstorm.


Femke: If you talk about the distinction between good faith and bad faith, it is still about faith-in-something. If you plot this ‘faith’ according to the vector of a neutral point of view, you're dealing with a different type of good faith and goodness than if you plot the faith along the vector of wanting more points of view instead of less.


Amir: I see. I think good faith means good intent. By defining what is 'good' in this way, we are following the principles of the whole Wikipedia, good is helping people. Although, it is a very subjective term, and what we are trying to do right now is to make some sort of survey. To take out things that are very computative and can't be measured easily, like quality, and ask people whether they think an edit looks good or bad. To make things more objective, to make things come together from the integration of observations of lots of people. Obviously, there are a lot of gray areas.


Femke: I do not have problems with mistakes that could be made in this way or the subjectivities of those algorithms. I'll try to ask a different question that maybe arrives at the same point: if you think about Wikipedia as a living community, the project changes with every edit; every edit is somehow a contribution to this living-knowledge-organism. So then, if you try to distinguish what serves the community and what doesn't, and you try to generalise that, because I think that's what the good faith-bad faith algorithm is trying to do, and start developing helper tools to support such a project, you do that on the basis of an abstract idea of what Wikipedia is and not based on the living organism of what happens every day. What I'm interested in is the relationship between vandalism and debate, and how we can understand the conventional drive in machine-learning processes. And how can we better understand these tensions and deal with them? If you place your separation of good faith-bad faith on preexisting labelling and then reproduce that in your understanding of what edits are being made, how do then take into account movements that are happening, the life of the actual project?


Amir: Ok, I hope that I understood you correctly. It's an interesting discussion. Firstly, what we are calling good faith and bad faith comes from the community itself, we are not doing labelling for them, they are doing labelling for themselves. So, in many different language Wikipedias, the definition of what is good faith and what is bad faith will differ. Wikimedia is trying to reflect what is inside the organism and not to change the organism itself. If the organism changes, and we see that the definition of good faith and helping Wikipedia has been changed, we are implementing this feedback loop that lets people from inside of their community pass judgment on their edits and if they disagree with the labelling, we can go back to the model and retrain the algorithm to reflect this change. It's some sort of closed loop: you change things and if someone sees there is a problem, then they tell us and we can change the algorithm back. It's an ongoing project.


Cristina: And this would be done through the idea of false positives, where if something doesn't get the result that it should, then contributors can revert it. And that means that the dataset on which the algorithm is trained would also be reiterated in many phases over the years?


Amir: Yeah, it will be reiterated.


Femke: Already the fact that the labelling is made transparent and debatable, is very helpful. In this way you can start to understand the relation between the labelling and the consequences in the model – I didn't think about it, but that's one of the things we were trying to do with The Annotator.[4].


Amir: What we are trying to build, but we are not there yet, is that when someone can label an edit, that it will be immediately visible for others, and these others can make a judgment for themselves whether they find it correct. Femke: I have an issue with the fact that it is assumed that agreement is always possible and necessary. Even if the disagreements leading up to agreement are archived and legible, agreement is still a prerequisite for the whole machinery to start operating. If agreement is the main drive – I cannot help to think that there's going to be trouble with everything that is non-agreeable. Everything that causes trouble will be difficult to take into account in a system that so much depends on agreement or non-ambiguity. Your example of Chelsea Manning is very interesting – I would like to look more carefully at how the Wikipedia community has dealt with that. In machine learning ... every time we come across moments where ambiguity and disagreements create problems, the problems seem to be put aside. I think you clearly explain how the process towards that agreement is dealt with within Wikipedia on the level of writing the articles, but also on the level of labelling the data. But still, we are dealing with a technology that values ...


Amir: ... consensus and agreement more than multiple points of view.


Femke: Yes!


Amir: I think there's some sort of trade-off. We are not trying to reach agreement for all parties and all people here, there are lots of times where if someone pushes too much, then everyone votes against it and we block that person. We do not try to come to an agreement, because there is a difference between coming to an agreement and pushing your idea. This is a difficult situation, there are lots of people that are being paid to push their agenda. I see sometimes how trying to come to an agreement can be time consuming, resource consuming. But I think on Wikipedia, it shows that it works.


Femke: Right. Thank you. And we have even eaten our soups in the meantime!


Amir: In your language, do you drink soup or do you eat it?


Femke: Eat.


Cristina: Eat. How is it in Persian?


Amir: In Persian, drinking and eating are the same, so it's not a problem at all! But in some languages I see that they drink soup, where in most languages they eat soup.


Femke: It would not be wrong to say I drink the soup, but –


Amir: – it feels weird.