Visualizing Sirius, Across Fan Fiction:
Box Plots
Information for reading this visualization:
-
The graph above is called a box and whisker plot. It basically summarizes a group of data, in this case, the similarity index of the top 100 words most similar to Sirius in use and context for each of the nine years of Harry Potter fan fiction.
-
For each book, you will see:
-
Red dots that mark each of the top 100 most-similar words to Sirius. If you hover over any given dot, it will show you what word that dot represents and its similarity index. The higher the index, the more similar that word is to Sirius in terms of use and context in that book. Dots that fall either above the maximum or below the minimum are outliers that do not fit the rest of the data set.
-
A dark line at the top of the year column that marks the maximum number of the data set, excluding outliers.
-
A dark line at the bottom of the year column that marks the minimum number of the data set, excluding outliers.
-
A line between the light gray box and the dark gray box that marks the median of the data set, or the number at the exact middle of the data. This is not an average but rather the number that falls in the middle if you line the data from least to greatest.
-
A line at the top of the lighter gray box that marks the number half-way between the median and the maximum number
-
A line at the bottom of the darker gray box that marks the number half-way between the median and the minimum number.
-
-
Box and whisker plots are used to track how groups of data shift from one data set to another. In particular, you can use this graph to see how similarity changes from one year to the next. If the box and whiskers are lower down the graph, that means the most similar words are less similar than in other years. If the box and whisker is more compact, it means that the top 100 words all have very close similarity indexes. A long box tells you that the data points are more spread out. This chart is useful for seeing how tightly-knit word similarities are from year to year.
Questions to prompt analysis:
-
Do you see any trends across the years?
-
Are there any years that act differently than the rest of the years? Why might word similarity shift in that year?
-
Why might some years have really tight boxes while other years have more spread out boxes?
-
Why might the outlying words (the dots outside the box and whiskers) fall either above or below the norm for that year?
So, what did you find? Leave a comment to tell us what you thought and check out what others have been seeing!
If you have any questions on how to read this chart, check out the How To on this particular page here.