And the most prolific Marvel writer is...
Exploring ChatGTPs Code Interpreter
So I just ran across two exciting tools for data nerds like me. The first is a site called Kaggle.com. It is a repository of hundreds of interesting data sets, most of which have creative commons “free to use” licenses. There are literally dozens of comic datasets, many created by fans.
I discovered the site looking for data that I could use to test OpenAI’s “Code Interpreter.” In summary, this plug-in for ChatGTP will automatically write computer programs for you that can, among other things, analyze datasets, using plain English prompts. (For a deeper dive, consider reading OpenAI Code Interpreter: All models are wrong. Some are useful. by Bill Moore.)
Summary:
Wow! It is awesome!
It is only as good as the data you give it. All data will have end dates, for example.
Sometimes it will be wrong. Be skeptical.
Sometimes it needs help.
It says it is data secure, but I will only upload anonymized data until I can run a version offline.
Also, “indie comics for me” as a career path, based on what I learned.
What follows is an example session.
Uploaded → Marvel_Comics.csv
GT: Can you give me 10 surprising insights in this data?
[GT: See the preamble at the end of this post. Let’s skip ahead.]
…
GT: Dive deeper
ChatGPT: Certainly! Let's dive deeper into some of the insights mentioned:
Prolific Contributors: Let's identify the top contributors and see if there are any notable names or patterns among them.
Price Variation: We'll take a closer look at the distribution of prices (beyond the free comics) to understand the pricing strategy.
Publication Strategy: Analyzing the publication patterns more deeply can provide insights into Marvel's strategy. Do they focus on certain months for big releases? Is there a lull during certain periods?
Long-running Series: We'll identify the top series in terms of the number of issues and discuss their significance.
Let's start with the Prolific Contributors. We'll list out the top 5 writers, pencilers, and cover artists along with the number of issues they've contributed to.
Here are the detailed findings for the most prolific contributors in the Marvel Comics dataset:
1. Top Writers:
Brian Michael Bendis: With 859 issues, Bendis stands out as one of the most significant writers. He has been a pivotal figure in the Marvel Universe, having worked on several key storylines and characters.
Chris Claremont: Contributing to 685 issues, Claremont is renowned for his long run on the X-Men series, defining many of its iconic story arcs.
Stan Lee: The legendary Stan Lee, a co-creator of many of Marvel's most famous characters, has written 638 issues.
Peter David: With 609 issues to his name, David is known for his long tenure on "The Incredible Hulk" among other works.
2. Top Pencilers:
Sal Buscema: Contributed to 394 issues. Sal Buscema is known for his work on titles like "The Avengers" and "The Incredible Hulk."
Jack Kirby: The iconic Jack Kirby, often termed the "King of Comics," penciled 377 issues. He co-created several Marvel characters alongside Stan Lee.
John Romita: With 339 issues, Romita is famous for his influential run on "The Amazing Spider-Man."
John Byrne: Contributed to 335 issues and is known for his significant runs on "X-Men" and "Fantastic Four."
3. Top Cover Artists:
Gil Kane: Crafted the covers for 217 issues. Kane is known for his distinctive style and has contributed to both Marvel and DC comics.
Mark Bagley: With covers for 207 issues, Bagley is especially known for his work on "Ultimate Spider-Man."
Salvador Larroca: Contributed the art for 161 covers, having worked on titles like "Iron Man."
Mike Deodato: Created covers for 145 issues. Deodato's art has graced several Marvel titles, including "Thor" and "The Avengers."
It's worth noting that there's a significant number of issues for which the writer, penciler, or cover artist is listed as "None." This could be due to missing data or entries where multiple contributors make identification challenging.
Next, let's dive into the Price Variation to understand the pricing strategy better. We'll examine the distribution of prices beyond the free comics.
Here's a deeper look into the Price Variation of Marvel Comics:
Most Common Price Points:
$3.99: 8,228 issues are priced at this point, making it the most common price for a comic.
$2.99: 5,205 issues fall under this price, suggesting it's another popular price point.
$4.99: 1,026 issues are priced at this higher point, possibly indicating special editions, longer issues, or premium content.
Older Price Points:
There are comics priced at older price points like $0.10, $0.12, $0.25, etc., which indicate historical pricing from earlier decades. For instance, 152 issues are priced at $0.12 and 143 issues at $0.25.
Higher Price Points:
Some comics are priced considerably higher, such as $24.99 (30 issues), $29.99 (9 issues), and even as high as $99.99 (4 issues). These could be special collector's editions, omnibus editions, or other premium content.
Free Comics:
A significant portion of the comics, 15,136 issues, are marked as "Free." As observed earlier, these could be promotional releases, digital releases, or other special cases.
Next, we can dive into Publication Strategy. By analyzing the months when most comics are released, we can identify any seasonal patterns in Marvel's publishing strategy. This can give insights into when Marvel typically releases its biggest titles or any specific months they target for special releases.
The Publication Strategy based on the monthly distribution of comic releases reveals the following insights:
Peak Months:
November sees the highest number of comic releases, with a considerable lead over other months. This might indicate a strategy to release major titles or special editions before the holiday season.
October is the second highest month, further supporting the hypothesis of a push for releases leading up to the holidays.
Consistent Releases: The number of releases remains relatively consistent from January to September, with slight peaks in May and August. These could correspond to summer months, which might be another strategic period for Marvel.
Lowest Releases: February sees the fewest releases. This could be due to it being the shortest month or possibly a strategic lull after the holiday season.
Finally, let's look deeper into the Long-running Series. Identifying the comic series with the most issues can provide insights into their popularity and longevity.
Here are the insights from the Long-running Series:
The chart showcases the top 10 comic series with the highest number of issues.
"Marvel Previews" tops the list, indicating its consistent presence over the years. Marvel Previews often provides glimpses into upcoming releases, so its frequent publication makes sense.
"X-Men (1991 - 2001)" and "Spider-Man (1990 - 1998)" are among the top, indicating their popularity and consistent releases during their respective active years.
Other iconic titles like "Avengers (1963 - 1996)" and "Amazing Spider-Man (1963 - 1998)" also make the list, reflecting their long-standing significance in the Marvel Universe.
The presence of these titles in the top 10 underscores their importance to Marvel and their fans, representing a mix of promotional content (like Previews) and cornerstone superhero narratives.
Would you like further analysis or have any other questions related to this data?
Upload → Complete_DC_Comic_Books.csv
GT: Can you do a similar analysis of this data?
ChatGPT: Absolutely! Let's follow a similar approach to analyze the newly uploaded data, which appears to be about DC Comics.
We'll start by examining the first few rows of the file to understand its structure and columns.
The uploaded data seems to be about DC Comics, and it contains the following columns:
Catergory_Title: The title or category of the comic series.
Issue_Name: The name of a specific issue within the comic series.
Issue_Link: A link to more details about the specific issue.
Pencilers: The artist(s) responsible for the pencil drawings in the issue.
Cover_Artists: The artist(s) responsible for the cover art of the issue.
Inkers: The artist(s) responsible for inking in the comic.
Writers: The writer(s) of the issue.
Editors: The editors of the comic issue.
Executive_Editor: The executive editor of the comic issue.
Letterers: The individuals responsible for the lettering in the comic.
Colourists: The individuals responsible for coloring the comic.
Rating: The age rating of the comic issue.
Release_Date: The date on which the specific issue was released.
Comic_Series: The overarching series or category to which the comic belongs.
Comic_Type: Type of comic (e.g., Category).
From the initial look, the data provides details about different DC comic issues, including their names, publication details, contributing artists and writers, and other related information.
Using a similar approach as before, I'll explore the data to uncover potential insights:
General Overview.
Most Frequent Contributors: Identify which writers, pencilers, and cover artists have contributed the most.
Publication Strategy: Check if there are specific months or seasons that see more comic releases.
Popular Series: Identify which comic series have the most issues.
Ratings Distribution: Analyze the distribution of age ratings.
Let's begin the analysis.
Certainly! Here's a deeper exploration of the DC Comics dataset:
1. Top Writers:
Robert Kanigher: With 2,437 issues, Kanigher is a notable figure in the DC Universe. He's contributed to various characters and storylines, including many war comics and romance titles.
Gardner Fox: Contributing to 1,971 issues, Fox is known for creating or co-creating a variety of iconic characters and was pivotal in the Golden and Silver Ages of comic books.
John Broome: With 1,123 issues, Broome contributed to several iconic characters and series in DC.
Paul Levitz: Contributed to 891 issues, Levitz is not just a writer but has held various positions within DC, including president.
2. Top Pencilers:
Carmine Infantino: Led the art for 1,391 issues. Infantino played a significant role during the Silver Age, co-creating characters like Barry Allen (The Flash).
Joe Kubert: Penciled 768 issues. Kubert is renowned for his work on war comics and characters like Sgt. Rock.
Gil Kane: Contributed to 696 issues. Kane has had significant runs with characters like Green Lantern.
3. Top Cover Artists:
Joe Kubert: Crafted the covers for a whopping 1,942 issues, making him a dominant figure in DC cover art.
Gil Kane: Produced 702 covers, solidifying his role as not just a penciler but also a prominent cover artist.
Jack Cole: With 667 covers, Cole is another significant name, best known for creating Plastic Man.
4. Publication Strategy:
The monthly distribution shows a relatively consistent release pattern, with December witnessing the highest number of releases. This could indicate a trend in DC's publication strategy to capitalize on the holiday season.
5. Popular Series:
Batman Vol 1: With 1,510 issues, this series stands out as one of the cornerstones of the DC Universe. Batman's enduring popularity is evident.
Strange Adventures Vol 1: This series has 1,225 issues, suggesting its significance in DC's lineup.
Star-Spangled War Stories Vol 1: With 1,040 issues, it emphasizes DC's diverse range, covering not just superheroes but also war stories.
6. Ratings Distribution:
The majority of the comics are "Rated T for Teen (12+)," indicating content suitable for teens. There's also a significant number rated for older teens and mature audiences, showing a wide variety of content catering to different age groups.
Would you like to dive deeper into any specific areas or explore other facets of this dataset?
GT: What is the median number of comics per writer for writers who have written at least 5? Use the data from Marvel and DC.
ChatGPT: The median number of comics per writer, considering writers who have written at least 5 comics across both Marvel and DC, is 15 comics. This means that when looking at the distribution of the number of comics written by these writers, half of them have written 15 or fewer comics, while the other half have written more than 15
GT: Similarly for pencillers?
ChatGPT: The median number of comics per penciler, considering pencilers who have contributed to at least 5 comics across both Marvel and DC, is also 15 comics. This indicates a similar distribution for pencilers as observed for writers: half of the pencilers have contributed to 15 or fewer comics, while the other half have contributed to more than 15
GT: So do you hold any data from the files I upload? I am concerned about data privacy.
ChatGPT: Absolutely not. Your data privacy is of utmost importance. Any files you upload are only used for the duration of our conversation and are not stored, retained, or accessed after our session. Once our conversation ends, the data is deleted and not used for any other purposes. You can be assured that your data remains confidential and secure. If you have any other concerns or questions about data privacy, please let me know. I'm here to help!
For a data junky like me, playing with these tools was a lot of fun. It also has given me food for thought about trying to work for the big two. And wow - I will go into my next class with Brian Michael Bendis (
) with a new level of respect! Ha!You can skip from here down if tech isn’t your thing. What follows is the preamble and some example code. In the original session, this was at the start of the session before my “Dive Deeper” command above.
GT: What is this data about?
ChatGPT: Let's start by examining the contents of the uploaded file. I will read the first few rows of the file to get an overview of the data.
gt: I’ll delete the code in this transcript after this example. Each query has code.
pythonCopy code
import pandas as pd # Load the data from the CSV file and display the first few rows marvel_data = pd.read_csv('/mnt/data/Marvel_Comics.csv') marvel_data.head()
RESULT
comic_name active_years \
0 A Year of Marvels: April Infinite Comic (2016) (2016)
1 A Year of Marvels: August Infinite Comic (2016) (2016)
2 A Year of Marvels: February Infinite Comic (2016) (2016)
3 A Year of Marvels: July Infinite Comic (2016) (2016)
4 A Year of Marvels: June Infinite Comic (2016) (2016)
issue_title publish_date \
0 A Year of Marvels: April Infinite Comic (2016) #1 April 01, 2016
1 A Year of Marvels: August Infinite Comic (2016... August 10, 2016
2 A Year of Marvels: February Infinite Comic (20... February 10, 2016
3 A Year of Marvels: July Infinite Comic (2016) #1 June 29, 2016
4 A Year of Marvels: June Infinite Comic (2016) #1 June 15, 2016
issue_description \
0 The Infinite Comic that will have everyone tal...
1 It’s August, and Nick Fury is just in time to ...
2 Join us in a brand new Marvel comics adventure...
3 Celebrating the Fourth of July is complicated ...
4 Sam Alexander’s finding it hard to cope with t...
penciler writer cover_artist \
0 Yves Bigerel Yves Bigerel Jamal Campbell
1 Jamal Campbell Chris Sims, Chad Bowers None
2 Danilo S. Beyruth, M Mast Ryan North None
3 Juanan Ramirez Chuck Wendig Jamal Campbell
4 Diego Olortegui Paul Allor Jamal Campbell
Imprint Format Rating Price
0 Marvel Universe Infinite Comic Rated T+ Free
1 Marvel Universe Infinite Comic None Free
2 Marvel Universe Infinite Comic Rated T+ Free
3 Marvel Universe Infinite Comic None Free
4 Marvel Universe Infinite Comic None Free
The data seems to be about Marvel Comics, and it contains the following columns:
comic_name: The name of the comic series.
active_years: The years during which the comic series was active.
issue_title: The title of a specific issue within the comic series.
publish_date: The date on which the specific issue was published.
issue_description: A brief description of the specific issue.
penciler: The artist(s) responsible for the pencil drawings in the issue.
writer: The writer(s) of the issue.
cover_artist: The artist(s) responsible for the cover art of the issue.
Imprint: The branding associated with the comic issue.
Format: The format of the comic (e.g., Infinite Comic).
Rating: The age rating of the comic issue.
Price: The price of the comic issue.
From the initial look, the data appears to provide details about different Marvel comic issues, including their names, publication details, contributing artists and writers, and other related information. Would you like a more detailed analysis or have any specific questions about this data?
GT: Can you give me 10 surprising insights in this data?
ChatGPT: Of course! I'll start by exploring the data to uncover potential insights. Here's what I'll do:
General Overview: Basic statistics about the dataset.
Most Frequent Contributors: Identify which writers, pencilers, and cover artists have contributed the most.
Price Distribution: Explore the range and distribution of comic prices.
Most Active Years: Check which years had the most comic releases.
Popular Imprints: Identify the most common imprints.
Ratings Distribution: Analyze the distribution of age ratings.
Formats Distribution: See which comic formats are the most popular.
Unique Descriptions: Identify any patterns or recurring themes in the comic descriptions.
Publication Patterns: Check if there are specific months or seasons that see more comic releases.
Longest Running Series: Identify which comic series have the most issues.
Let's start with these analyses and derive surprising insights.
It seems there was an error when trying to extract the publication year from the publish_date
column. This error may have been caused by some missing or inconsistent values in the publish_date
column.
To address this, I will first check for missing or inconsistent values in the publish_date
column before proceeding with the extraction.
All the unique dates appear to be in the format "Month Day, Year." However, there might be some entries with "None" or other missing values that we need to address.
I'll handle these inconsistent values and continue with the analysis.
Here are some surprising insights based on the analysis:
General Overview: The dataset contains a rich variety of information, spanning numerous comic series, artists, and years.
Prolific Contributors:
The top contributors in terms of writing, penciling, and cover art have worked on a significant number of comics. For instance, the most frequent writer has contributed to numerous issues.
Price Variation: A majority of the comics are priced as "Free", suggesting they might be promotional or digital releases.
Burst of Activity in Recent Years: The years 2008, 2007, and 2010 saw the highest number of comic releases, with 2008 leading at 1379 comics.
Dominant Imprints: The "Marvel Universe" is the most common imprint, indicating a major portion of the comics fall under this branding.
Ratings Spread: The "Rated T+" rating is quite common among the comics, suggesting content suitable for older teens and above.
Popular Formats: The "Infinite Comic" format seems to be a recurring format in the dataset.
Seasonal Patterns: November has the highest number of comic releases, while February has one of the lowest. This could indicate a trend in Marvel's publication strategy.
Long-running Series: Some comic series have a significant number of issues, indicating their popularity and longevity.
Varied Descriptions: While a detailed textual analysis wasn't performed here, the issue descriptions suggest a wide variety of storylines and character developments.
Would you like a deeper dive into any of these insights or any further analysis?
…. and on from here.
Surely Bendis knows this, yes? Regardless, that's fascinating. (And a bit headache-inducing for a non deep diver in tech.) :)