Visualizing Sirius, Across Books:
Box Plots
Information for reading this visualization:
-
The graph above is called a box and whisker plot. It basically summarizes a group of data, in this case, the similarity index of the top 100 words most similar to Sirius in use and context for each of the seven Harry Potter novels.
-
For each book, you will see:
-
Red dots that mark each of the top 100 most-similar words to Sirius. If you hover over any given dot, it will show you what word that dot represents and its similarity index. The higher the index, the more similar that word is to Sirius in terms of use and context in that book. Dots that fall either above the maximum or below the minimum are outliers that do not fit the rest of the data set.
-
A dark line at the top of the book column that marks the maximum number of the data set, excluding outliers.
-
A dark line at the bottom of the book column that marks the minimum number of the data set, excluding outliers.
-
A line between between the light gray box and the dark gray box that marks the median of the data set, or the number at the exact middle of the data. This is not an average but rather the number that falls in the middle if you line the data from least to greatest.
-
A line at the top of the lighter gray box that marks the number half-way between the median and the maximum number
-
A line at the bottom of the darker gray box that marks the number half-way between the median and the minimum number.
-
-
Box and whisker plots are used to track how groups of data shift from one data set to another. In particular, you can use this graph to see how similarity changes from one year to the next. If the box and whiskers are lower down the graph, that means the most similar words are less similar than in other books. If the box and whisker is more compact, it means that the top 100 words all have very close similarity indexes. A long box tells you that the data points are more spread out. This chart is useful for seeing how tightly-knit word similarities are from book to book.
Questions to prompt analysis:
-
Do you see any trends across the books?
-
Are there any books that act differently than the rest of the books? Why would word similarity shift in that book?
-
Why might some books have really tight boxes while other books have more spread out boxes?
-
Why might the outlying words (the dots outside the box and whiskers) fall either above or below the norm for that book?
So, what did you find? Leave a comment to tell us what you thought and check out what others have been seeing!
If you have any questions on how to read this chart, check out the How To on this particular page here.