absetfan
is a comprehensive web scraping tool for retrieval of information from the site fanfiction.net. “Fanfiction” is a storyform where fans of a particular “fandom,” or narrative from one of several media types (ex, books, movies, comics, etc), create written pieces using details from those original narratives. These details can be as specific or general as the fan ordains; the stories could involve the same setting as the original fandom, the same world conditions (ex, include magic), the same or similar characters, and any permutation of associated characteristics. Writers range from novice to highly skilled.
The authors of this package see fanfiction.net as a potentially rich and underutilized text data source; operating since 1998 with a community of over 10 million registered users, the site hosts a substantial volume of curious and slightly unorthodox information (link). Access to individual fandom story listings may offer insight to current usages of the storytelling platform, as well as shifting popularities of different narratives (through exploration of reviews, post numbers, follows, favorites, and so on). Working with the raw data presented within story chapters could facilitate practice in text mining, or reveal patterns in narrative construction. Tracking such patterns could offer analysis relevant to a greater cultural context, as narratives like 50 Shades of Gray originally got their beginnings from fan written work (in this case, a Twilight fanfiction).
Overall, absentfan
may offer access to a plethora of user queried data excellent for use in exploring R base functions and text mining techniques.
For more on fanfiction, we recommend you look here (via).
Featured Functionality
1. Scrape all of the entries in a particular fandom
For instance, a user might really love the movie Zoolander, and want to see if people have written any stories on the protagonist, Derek. For this, absentfan
offers the function scrapeAllEntries
, which is the equivalent of going to https://www.fanfiction.net/movie/Zoolander/ and copying all of the listed entries with their associated information. scrapeAllEntries
takes as parameters a “story,” or the title of a fandom (here, “Zoolander”), and a “type,” or medium in which the story exists (here, “movie”).
Note that in accordance with the fanfiction.net terms of use, the scraper works as “human speed;” that is, each page is scraped at the rate of one second per page.
zool <- scrapeAllEntries("Zoolander", "movie")
head(zool)
#> name author
#> 1 Toasting Away the Bad Times KatLeePT
#> 2 Passionate Vacation KatLeePT
#> 3 Hotter KatLeePT
#> 4 The Lion King 3 Scar's return Xephos yogcast
#> 5 The Boys of Summer KatLeePT
#> 6 Hansel Does the Do rhyejess
#> description
#> 1 Hunks shouldn't drink alone. Slash. Drabble.
#> 2 Summer vacation had never been so wild. Slash. Drabble.
#> 3 Home is always where it's hotter! Slash. Drabble.
#> 4 Scar returned from the dead with his new army! How will Simba and the others defeat him?
#> 5 Derek always gets jealous whenever any one else catches Hansel's eye, but there's no really no reason for him to be so jealous. Slash. Drabble.
#> 6 Hansel does Derek Jr.'s hair.
#> rating language theme chapters words reviews favs follows
#> 1 K+ English Drama/Romance 1 463 1 1 <NA>
#> 2 T English Romance/Drama 1 463 2 1 <NA>
#> 3 K+ English Humor/Romance 1 461 2 <NA> <NA>
#> 4 K+ English Drama/Romance 1 158 5 <NA> <NA>
#> 5 K English Humor/Romance 1 218 3 3 <NA>
#> 6 K English <NA> 1 107 2 4 <NA>
#> updated published
#> 1 NA 1/24/2013
#> 2 NA 1/24/2013
#> 3 NA 1/24/2013
#> 4 NA NA
#> 5 NA 8/3/2012
#> 6 NA 10/23/2011
Now, the user can see how many stories are available on fanfiction.net about Zoolander, their summaries, as well as their ratings, follows, and update information. For personal use, this is a great snapshot for searching through available fanfics. For larger data sets, this data frame can offer a comprehensive look at what kinds of narratives are being written in a particular fandom (perhaps through using summaries as documents in textual analysis), as well as when people were interacting with these stories (through publish and follow/review data) and how big this community really is (through tracking how many unique creators appear in the data).
Additional functionality
Certain fandoms have a very large amount of associated fiction; at the point of package creation, Harry Potter leads with 799K entries, and Star Wars with 50.1K. Due to the aforementioned terms, this may cause scraping of larger fandoms to take a much longer time.
The max.entries
feature of this function allows for a specific number of entries to be scraped, resulting in a data frame of a limited number of rows. Users then can access a subset of the information posted on the fanfiction.net site without having to needlessly wait.
Simply specify the total number of entries to be scraped to the parameter max.entries
:
zool <- scrapeAllEntries("Zoolander", "movie", max.entries=3)
zool
#> name author
#> 1 Toasting Away the Bad Times KatLeePT
#> 2 Passionate Vacation KatLeePT
#> 3 Hotter KatLeePT
#> description rating language
#> 1 Hunks shouldn't drink alone. Slash. Drabble. K+ English
#> 2 Summer vacation had never been so wild. Slash. Drabble. T English
#> 3 Home is always where it's hotter! Slash. Drabble. K+ English
#> theme chapters words reviews favs follows updated published
#> 1 Drama/Romance 1 463 1 1 <NA> NA 1/24/2013
#> 2 Romance/Drama 1 463 2 1 <NA> NA 1/24/2013
#> 3 Humor/Romance 1 461 2 <NA> <NA> NA 1/24/2013
2. Scrape all of the chapter information to a data frame
Another for instance; a user is interested in tracking a particular fanfiction written about the play Almost, Maine. They have read the story called Kiss, and are looking for the raw text to be able to track most frequent terms, run LDA, and see whether any insight can be drawn from the structure of this particular narrative.
The absentfan
package offers the getFullStory
function, which again takes as parameters a “story,” or the fandom in question; a “type,” or medium the story is published in; and the “title” of the particular fanfiction entry the user desires.
The resulting function generates a data frame with two columns containing all of the textual chapter information of a particular fanfiction. One column contains the raw text of an entry; another tracks the chapter assignment to that text entry, if applicable.
almost <- getFullStory("Almost, Maine", "play", "Kiss")
head(almost)
#> text
#> 1 And there he stands in between the performances. The one that played East. That one boy you used to be so in love with... but you no longer are. It almost hurts to see him there and have all those memories. It does hurt, but you can ignore it.
#> 2 The sad thing, however? At auditions, you almost got a part. Almost. The very word that is the basis of the whole play, and it applies so perfectly to your situation. You almost got placed in a scene with him. Almost. You felt the energy between the two of you, and it suddenly made you understand why, exactly, people act.There was such a raw, emotional connection. For a split second, there was no you and him. There was just one being, one ball of energy, auditioning for a play underneath the hot stage lights that made you sweat and your eyes water. You could feel it, and by looking in his eyes, you knew he did too. And when his lips touched your cheek, about an inch away from your left ear, it felt like getting zapped with electricity. You aren't one for sentimentality or sappiness - really, you aren't! - but that was a moment where things felt right.You were Rhonda and he was Dave, and the energy between you was electric. All of the things in the past you shared? You let it out right there on stage. That was when you finally let go of it. All of it. After he kissed you, you yelled at him with everything you had in you - all of the hate, anger, regret, love, and pain - and hoped you could get the part. You only did children's theatre, so you needed this. You needed the chance to be mature and let yourself go! You had to feel all of that as strongly as you possibly could.And you did. You left everything you had on that stage, with your teacher whispering to her auditioning assistant and the eyes of your peers still glued on you.
#> 3 He was cast as East, and you automatically became jealous of the girl cast as Glory. You wanted to connect with him. You were longing for some sense of closure, some way to tie up and throw away all of the love you felt for him.You weren't cast - at all. You volunteered for concessions during one showing, but went home because you felt sick.
#> 4 Why did you still care?You saw him, and you loved him. It was simple. But you weren't good enough. You made him hate you, albeit accidentally. You were the villain for making things be this way. Perhaps he couldn't feel things like love. Maybe somebody broke his heart and it turned to slate. He could've been on his way to go tell somebody yes, or to get back the love that he had given away.With him, the most you could hope for was that he wanted to be close by going the long way around. You knew this wasn't true, but maybe, just maybe, it was.
#> 5 You went to see the very first showing, and you went to see the very last.After the very last one, you tried to get close to him. You wanted to kiss him and see if that brought some sense of closure. You waited for him.Eventually you saw him leave through the back door. You followed him, hoping that you could get that kiss. By the time you had gotten out there, it was almost too late. He was unlocking his car door.You paused under the glow of the streetlight, wondering what to say. You eventually just ran to him, but he had already slammed his car door behind him.
#> 6 You were alone in the parking lot under the stars and street lamp. He was gone. You were here.You were so close to it. You could've gotten that last bit of closure if you were just a bit faster.You almost had it.
#> chapter
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
3. Create a readable html output of the scraped narrative
For users interested in simply reading the stories, there is an option to generate an html output of the scraped narrative. Simply call the story using the parameters in the format of the chapter scraper (getFullStory
), but within a storyPrint
function call.
storyPrint("Steven Universe", "cartoon", "Gem Funeral")
Maintaining the Scraper
Update Repository of Story Tags
The absentfan
package uses a pre-scraped list of href tags from the fanfiction.net media pages to get information about stories or the stories themselves. If the user feels that the fandom they are looking for is missing due to it being very recent (after 2018), they may use the updateTypeMedia
function to update the overall list of fandoms available for scraping with this package.