How do we quantify the importance of the nodes in a network? To answer this question mathematicians came up with the so-called Erdős number to show how far someone is from “the master” in a network of publications. Movie-enthusiasts have created the Bacon number as its analogy, based on co-occurrences in movies. But what does it have to do with Star Wars? Which character or actor is the key person in this universe? Is it really true that every blockbuster has a happy ending? We are trying to answer these questions with the revised version of our study carried out last year and hope to find answers with the help of interactive visualisations.

## Erdős and Bacon

What is needed to create a new theory in network science? Apparently, a windy winter night is enough when Footloose and The Air Up There are on TV one after the other. And of course three American university students who having watched the movies begin to speculate: Kevin Bacon has played in so many movies that maybe there is no actor in Hollywood who hasn’t played with him yet. Well, probably it is not true, but backing up the theory with a bit of mathematics and research, a new term, the Bacon number has been born.

The Erdős number was defined in 1969 by Casper Goffmann in his famous article ‘And what is your Erdős number?’ It is based on a similar observation about the legendary productive Hungarian mathematician Paul Erdős who had so many publications in his life (approx. 1525 articles) in so different fields, that it was possible and worth classifing mathematicians and scientists based on their distance from Erdős in a network of publications. According to this, Paul Erdős’s Erdős nuber is 0, since he is the origo of this theory. Any scientist who has ever published anything together with Erdős, has the Erdős number 1. Anyone who has published together with someone with the Erdős number 1 will get the Erdős number 2, and so on. Generally speaking, everyone has the Erdős number of the person of the lowest Erdős number they have published with, plus one.

In case of Kevin Bacon and Hollywood the principle is the same, but instead of publications it is based on movies and the connection is not authoring an article with someone but playing in the same movie with someone. It is only a coincidence and a historical legacy that it is called Bacon number, because although Erdős is the most productive mathematician in history with almost twice as many publications as Euler, who came second on the list, Bacon is not really a central figure in Hollywood. If we check the network of actors in Hollywood, the average distance from everyone else is 2.79 in case of Bacon, which is enough only for the 876th place in the ranking. As a comparison Rod Steiger, who is the first on this list, has a value of 2.53.

## One Saga, Seven Episodes

But what does it have to do with Kenny Baker? We were wondering who the Kevin Bacon of the Star Wars universe was therefore we collected the cast members of both the original and the prequel trilogy also adding the actors of Episode VII that was released last December. We visualised our findings on an interactive graph. The title – ‘The center of the Star Wars universe’ – is honorary, because the concept of distance related to the Bacon number can hardly be interpreted on this graph. Nevertheless, the prestige value of the origo and the position it occupies within the network can be a valid basis of comparison as well as the definition of the relations based on the co-starring of the actors.

On the visualisation – to make the network more transparent – we only show the actors who played in at least two different Star Wars movies. There is a relationship between two actors if they have starred in the same movie. The more movies the actors have co-starred in, the stronger their relationship is.

Network of actors having played in at least two different Star Wars movies. The interactive version of the graph can be found here.

By clicking on the nodes of the interactive visualisation you can see the number of movies the actors played in, which characters they embodied, as well as the number of their relations. The colors of the nodes correspond to the set of trilogies the actors played in. There is a clear distinction between actors only starring in the original – light blue – and the ones who played in the prequel trilogy – dark blue. This may not be so surprising considering that 16 years passed between the releases of Episode VI and I and 28 years between Episode IV and Episode III.

Naturally, there are actors who connect the two trilogies’ crew, although their number is limited. They are forming the nodes in the center of the network and as for their size they are the biggest ones. This also indicates that these actors have the largest number of relations and the highest number of shortest paths between two peaks. Actors of this group played in both the original and the prequel trilogy (light green nodes), another group of them additionally got roles in Episode VII as well (dark green nodes).

We can also find two additional subgroups on the graph. The light blue one shows the actors playing a key role in the original trilogy and in Episode VII as well. Carry Fisher playing Leia and Harrison Ford playing Han Solo are the most typical representatives of this category. Alec Guinness, who played Obi-Wan Kenobi in the original trilogy, may certainly be the most interesting member of this group. Although he passed away in 2000 he still appears in the credits of Episode VII, due to an archive voice recording. Finally, the only actor appearing in both the prequel trilogy and Episode VII is Ewan McGregor, also with a voice recording – it seems that the latest episode couldn’t decide which Jedi master to favor: the young or the old one.

## The Big Four

Let’s take a look from a different angle and see how the actors, according to their characters, are set in the network of the Star Wars universe and who is the luckiest to call himself the origo.

There are four characters altogether who have appeared in all the seven Star Wars movies so far: Anakin Skywalker, Obi-Wan Kenobi, C-3PO and R2-D2. Of course the young and the old Anakin and Obi-Wan are played by different actors, therefore they can’t make it to the very top with the two droids. There was a very close competition between Kenny Baker (R2-D2) and Anthony Daniels (C-3PO), but in Episode VII Anthony Daniels took the leading role, since Kenny Baker was only a consultant in playing R2-D2. This fact however doesn’t affect their roles in the network since both of them appeared in the credits of all seven movies. What is more, they are both versatile actors; they played more than one character – Kenny Baker was also Paploo, the ewok in Episode VI and Anthony Daniels was also Dannl Faytonni in Episode II. Considering the recent death of Kenny Baker however, we decided to claim him the winner of the title ‘Kevin Bacon of the Star Wars universe’ as a posthumous award. (In reality, Anthony Daniels is just as worthy of the title as he is.)

The runner-up is of course Frank Oz, who played Yoda in six of the seven Star Wars movies (in Episode VII with his voice). Actors like Ian McDiarmid playing Senator Palpatine (in Episode V only on the DVD-edition) and Peter Mayhew playing Chewbacca- both actors played in five-five movies- have a distinctive place on the list. Last but not least actors of the original trilogy also appearing in Episode VII, like Carry Fisher or Mark Hamill may claim the third place.

The most universal node of the network is no doubt Natalie Portman playing Padmé Amidala in the prequel trilogy. Her Baker number is of course 1, her Bacon number is 2 and her Erdős number is 5. She did a PhD in Psychology at Harvard and published several papers, earning a decent Erdős number (among the 134 thousand scientists with Erdős number, the median is 5).

## Sentimental Scenes

We automatically split the Star Wars movie scripts stored in the IMSDb database into scenes, then we analysed them with the help of Hu and Liu’s sentiment dictionary. The sentiment scores of each scene from all the episodes can be seen in the interactive visualisations below. The bars marked with brighter colors represent scenes with positive sentiment, the darker bars denote negative ones. The deeper a dark bar reaches the more negative the sentiment of a scene is; the higher a bright bar reaches, the more positive its sentiment is. In case of neutral sentiment scores there is no visible bar. If we point our cursor at the visualizations’ bars beside the exact sentiment score we can see the given scene’s location and the top 3 characters as well – i.e. the characters who either played or are mentioned in the scene.

Generally speaking, the episodes of Star Wars can be characterized mainly by negative sentiment – which is especially true for the episodes of the original trilogy (Episode IV, V and VI). The most negative ones are Episode V and VI and the most positive one is Episode II. In Episode VII the distribution of positive-negative sentiments is more similar to the movies of the prequel trilogy. If we look for the indicators of happy ending, we can find them in Episode I, III and V; these movies end with either positive or neutral scenes. Although positive scenes can be found at the end of each movie, based on the script analysis only half of the movies have a ‘happy ending’.

The sentiment scores of the original trilogy’s movies . The interactive version of the graph can be found here.

The sentiment scores of the prequel trilogy . The interactive version of the graph can be found here.

The sentiment scores of Episode VII. The interactive version of the graph can be found here.

Movies are worth analysing from the characters’ point of view as well. To do this another interactive data visualisation lends a helping hand which presents the dialogs between characters in a network format. It also shows which characters play most frequently in the movies and what kind of sentiment is typical when they occur.

The conversation graph of the prequel trilogy. The interactive version can be found here.

The conversation graphs reveal that the dialogs in the original trilogy were more focused and mainly the main characters were involved – several supporting actors didn’t even get an opportunity to speak out. In contrast, the conversations are more equally distributed between the main and the supporting characters in the episodes of the prequel trilogy. This trend can also be seen on the graph of Episode VII. The characters of Anakin Skywalker and Darth Vader are good examples of sentiment changes; since in the first two episodes Anakin equally appears in both negative and positive roles then a shift occurs: in the third episode he takes part in more and more scenes filled with negative sentiment, and after his transition to Darth Vader he appears almost only in negative scenes.

Written by Kitti Balogh, Virág Ilyés, and Gergely Morvay