Data Dive: 6,887 IB Prescribed Texts

Whether you’re a DP student already or looking ahead to the Diploma Programme down the road, if you’re a student here at City you’ll eventually run into the IB Prescribed Texts. IB English teachers, particularly DP teachers, are required to select texts to teach from IB’s official list of prescribed authors, available at

IB presents these prescribed texts, which provide the entire foundation for the 11th and 12th grade English curricula at City, to the public in chunks of 25 authors at a time. Twenty five data points might seem like a nice chunk of data to play with, especially for psychology students, but, when you consider that there are 6,887 prescribed authors in total, rendering them 25 at a time makes them impossible to analyze.

Luckily, with the help of some data scraping and a lot of copy and pasting, we’ve managed to get all of the data on IB prescribed texts into a single Google Sheet with 12 columns of information and nearly 7,000 rows. Now that’s a data set I can analyze. For reference, I’ll also include a link to all 6,887 rows of raw data at the bottom of this article so that you can run your own tests if you wish.

What’s the gender breakdown?

This is the question that drove me to start this investigation, and I’ve found it’s the most common one people ask, so let’s start there. As you can see, with a pure analysis of all 6,887 entries, the gender ratio of IB prescribed texts favors male authors at nearly 75.2%. It’s important to note at this point that while IB students can study any of the 6,887 authors, only 316 of those authors are “recommended” by IB. The same test run on only the set of recommended texts, the ones most likely to actually be taught, returns results that aren’t much better, but the set of recommended texts does make a small improvement, with women represented at 25.9% of the total body of work.

This is, on its face, pretty alarming, and I won’t make any attempt to soften that. For exact reference, in 2017 Our World in Data classified the human “sex ratio” as 49.6% percent female, so we can say with relative certainty that the total set of IB prescribed literature is not representative of humanity as a whole, with women underrepresented by approximately 25.7 percentage points, although the “sex ratio” statistic is admittedly four years out of date.

However, it’s worth noting that City High Middle English teachers obviously don’t teach all 6,887 authors on the IB reading list, so tests like the one above can’t claim to be representative of literature studies here at City. For a more accurate view, let’s narrow the data set down to only English language IB texts. While non-English texts are taught in translation in the first semester of 11th grade English, on the whole this should be a more representative sample.

When limited to only the English language texts, the ratio is much more representative. In fact, women are slightly underrepresented in this set, although only by a minuscule 0.7 percentage points. Clearly there are patterns to be found in language variation within the IB texts, so let’s look there next.

What languages do they write in?

It seems that IB texts are actually pretty evenly distributed by language, with languages like English, Spanish, and French taking larger shares than some of the other languages but still leaving plenty of texts to go around. The clear outlier though is Chinese which, at 526 individual authors, claims nearly twice as many as English, the runner up at 288 authors.

Things are a little more balanced when we limit the set to only recommended texts, with roughly six authors each, although Chinese still takes the lead. However, this might be explained quite simply by the fact that China is a really big country and a lot of people live there. Statista, which provides open access data on a variety of topics, suggests that more people speak Chinese as a first language than any other language on Earth, so the IB prescribed reading list is relatively representative of humanity in that way at least.

Where are they from?

The country of origin metric overlaps significantly with language of origin, but nevertheless it’s interesting to see the IB prescribed literature laid out on a map. IB truly is a global program, and I’ve never been able to see that as clearly as I can here.

As fascinated as I’ve been by exploring the data, I know I’ve only scratched the surface. Put those rows and columns together and there are 82,644 unique data points to play with, and one person can’t discover everything about all that data. I’ve attached the Google Sheet with all of the data I worked with below, if you’re interested in learning more you can analyze it, make copies of it, and explore it however you want. Please feel free to share your findings with others in the comments section of this article. After all, freedom of information is the journalistic spirit, and we can do even more with big data out there just waiting to be read.


Former Editor in Chief of The City Voice, finally graduated City High Middle School as part of the Class of 2022.

Notify of
Inline Feedbacks
View all comments