Data Project: Vital Stats on Caldecott Medal Winners, 1938-1975

I have always loved picture books. Between the ages of four and twelve, my reply to “What do you want to be when you grow up?” was always “An author and illustrator.”

But I wanted to look at picture books in a different way: as a set of quantifiable data points.

For someone happy to write a 25-page paper analyzing concepts like the intersection of art theory and social realism in Middlemarch, a data-driven analysis of children’s literature was definitely a departure from the norm.

But the small data set from which articles on topics like trends in racial representation in children’s literature were able to draw bothered me. I wished there was some sort of standardized collection of data to writers and researchers could lean on in order to gauge change over time.

Enter the Caldecott Medal, which has been awarded every year since 1938.

Though the committee which chooses the winners states that the Caldecott is not awarded for “didactic intent or for popularity,” the title “most distinguished American picture book for children” certainly indicates a book that exemplifies national values. And even if winning a Caldecott isn’t intended to reflect nation-wide popularity, the award catapults these books into the national spotlight as well as the homes of many families.

So I attempted to analyze the individual components of each Caldecott winning book from 1938-1975 as objectively as possible in my project Caldecott Stats.

I wanted to to see if any trends in genre, morals, or representation came to light. There were some very clear trends I found, particularly in the type of stories that were being told. For example, there was a very clear, sharp increase in the number of fables or fairytales winning the Caldecott throughout the sixties and early seventies. 

However, analyzing the particulars of race, culture, and class status proved much more difficult than I initially hoped.


The category (apart from genre) that proved the easiest to parse was gender.

→ The percentage of books centered on male protagonists steadily increased between 1938 and 1975. (The decade with the largest percent of female protagonists was, actually, the 1950s.) The representation of mixed-group protagonists or stories without a protagonist peaked in the 1940s and dropped off the map entirely after the 1950s.

What was particularly surprising about the rise of the percentage of male protagonists per decade was that this trend corresponded with an increase in the percentage of female Caldecott winners per decade.

(When noting these trends, keep in mind that it’s the illustrator, not author, who wins the Caldecott medal each year. This certainly complicates the numbers, and is one of many difficulties I ran into when working on this project.)


Unfortunately, many areas of analysis are still missing from the current project website.

I’m not sure when I will be able to devote time updating the project, but in the meantime you can check out the Caldecott Stats project in all its incomplete glory.