The popularity of Memes on Reddit

Prateek Nigam
15 min readApr 15, 2021

If a meme would go viral or not

What are memes?

A meme is a virally communicated picture, GIF, or video decorated with text or piece of text normally sharing pointed analysis on social images, social thoughts, or recent developments. The word “meme” was a term authored by Richard Dawkins to portray how social data spread. There is a term known as “Web image” which can likewise be considered as a subset image that mostly spreads socially. At the point when somebody makes reference to the word image generally alludes to entertaining pictures or videos which might remember text for online media.

Images are made utilizing the social information you acquired from years by sitting in front of the Television or perusing online media. This information the majority of the occasions fill in as a mysterious fixing that changes images from customary jokes over to viral material. Images can likewise be considered as a piece or part of social data that is spread over the web. We can likewise say that images can likewise be determined by copying some data which we get over through media or web.

What makes a meme popular?

The primary motivation behind images is to make individuals laugh and get entertained and this can happen when the image which is shared is straightforward and contains interesting, basic, funny, simple, and relatable substance. Perhaps the main elements of spreading images resemble spreading bliss in a world brimming with pressure. Basically, the images become well known when they are caused dependent on the current circumstance or latest thing which we get past TV or the web. For an image to become mainstream or circulate around the web you need to ensure it’s found in any case. One of the quickly developing locales where images are mainstream is Pinterest. To make a fruitful image ensure the substance referenced in the image can straightforwardly associate with the intended interest group (can be effectively perceived by everybody). Likewise, a specific ought to be shareable by everybody.

source: https://www.thrillist.com/entertainment/nation/first-meme-ever

These days we can see that images are getting quite possibly the quickest methods of imparting which can likewise be viewed as a decent factor for an image to turn into a web sensation.

Can the popularity of memes be predicted?

Fundamentally indeed, up somewhat we can say that the prominence of the images can be anticipated tolerably dependent on the substance of the image. Additionally, the prevalence of the images can be anticipated through some Machine Learning Models. An effective model can be made by including an exact informational index and some arrangement of highlights which will assist us with anticipating the fame of the image. Images can likewise go famous by considering some worldwide issues which most presumably disregarded. Images disseminated in more different and very many associated crowds are bound to turn into a web sensation. Also, individuals are bound to share images identified with content that they have partaken in before. By researching the hashtags, tweets, catchphrases present hence the text in the images a model can be made which will anticipate the notoriety of images.

Memes have become an undeniably unavoidable type of contemporary social correspondence that pulled in a ton of exploration interest as of late. For this situation study, we have investigated the dataset of images from Reddit. This article not just gives a view over the investigation of the boundaries that make an image viral yet in addition plays out a substance-based prescient examination of what makes an image turn into a web sensation.

Utilizing AI strategies, we additionally study what picture-related credits have over text-based ascribes on image prevalence. We track down that the accomplishment of an image can be anticipated dependent on its substance alone tolerably well, we additionally tracked down that the picture-related and text-based attributes have huge incrementation in forecast combinedly.

Exploratory Data Analysis

We fetched the dataset from Kaggle. The dataset contained post ID, the image URL, and the up/downvotes, and other metadata for that particular meme. This ought to be a decent beginning stage for regular computer vision tasks.

First, we divided data into four subcategories:

  • Categorical features
  • Numerical Features
  • Image-based Features
  • Text-based features

And Analysed each category individually,

Categorical features

Analyzing the distributions of the distinct authors over the memes published over a hand full of authors played a major proportion in availing the upvotes or downvotes.

we can observe that although there are 2197 distinct authors, the probability of meme producers with a higher count, authors are below 10–15.

We at that point took the main 10 distributed creators and gave a significant level investigate their offer appropriation of work, on a pie graph and saw that of them around 21.9% which is the maximum among all, cut is possessed by Holofan5life, which assists us with interpreting that given an image having a place with these small bunch gather of names, there is a high possibility that their image will turn into a web sensation it is possible that it be contrarily cast a ballot or upcast a ballot.

Designing the Target column

With the upvotes, we can define the target variable, as whichever is more than the median of ups will be dank while the other one will be not.

Mean 23802.590204587726
Median 14757.0
Min 494
max 293544

On Observation of static distribution of upvotes, we can split on medium so as to avoid the effect of outliers, as observed in a red line representing meme is shifted a little right.

Thus creating target feature- Dank_or_not

Extra highlights like URLs to get to the post on Reddit and downvotes, image awards, downs, and posting creators were disposed of from the get-go in light of the fact that they were fragmented, populated for the most part with zeros, including thumbnail properties.

A considerable lot of the highlights scratched from Reddit metadata were at that point mathematical, like made UTC and ups

We can handle the image pictures, titles, and text from the pictures to enhance our list of capabilities with more substance-based highlights.

Explanatory analysis can be done on the textual and image related attributes with a focus on the impact they have on meme popularity, and can apply feature engineering onto these features.

Image-based Features

Text-Based Features

A large portion of the humor and meaning of memes are contained in the text which appears inside a meme image, so we can extract those words and apply observation on them, and analyze the predictability of being dank or not based on the context.

Thus for extracting text from images, we used easy-ocr.

source: https://medium.com/@nandacoumar/optical-character-recognition-ocr-image-opencv-pytesseract-and-easyocr-62603ca4357

Once we extracted sentences from each image, we now, have a new column, that contains a concatenation of all the sentences or words in the meme image, and thereafter we can use this to calculate the sentiment and hence we create a new column or feature as an explanation to the predictability, before which we actually cleaned the sentences, and removed all the not used words.

Using the processed text data we extracted some potentially predictive attributes such as sentiment and word count. First, we calculated the sentiment scores that quantify the feeling or tone of the text. If the text is positive or happy, it scores closer to 1, and negative or sad texts score closer to 0.

Date Time Feature

we then used the date-time feature to design a new feature that will explain the impact of time zones on the prediction of meme's virality.

And thus plotted different graphs as analysis to different portions of times, including months, days, time, and hour of publish and their impact over the upvotes.

We plotted a violin chart of upvotes for memes in different months of a particular year. After taking a view of this graph we can say that the range of upvotes for memes is high in last month i.e. December. Also by observing this graph more closely we can see that the object of 3rd month i.e. March appears to be wider as compared to the object of other months.

So from this graph, we can conclude that

1) More memes were published in the month of march but the range of upvotes is low.

2) Moderate or fewer memes were published in the month of December but the range of upvotes for that meme is high.

Given graph is regarding Meme Distribution with respect to an hour. According to this graph, we can say that the more number of memes are published 11 AM.

The Below given graph is of upvotes of memes with respect to hours. By looking at this graph we can say that the memes that are distributed between 10 to 11 PM are getting more upvotes. Also, memes published at 2 AM have a high range of upvotes.

Numerical Features

Plotting a pair plot to get a bivariate analysis of the features among themselves along with target as hue, and therefrom we can observe if any two features if are linearly separating the two classes.

a grid of Axes such that each numeric features will be shared across the y-axes across a single row and the x-axes across a single column. The diagonal plots are treated differently: a univariate distribution plot is drawn to show the marginal distribution of the data in each column.

Data Observation:

Photo by Firmbee.com on Unsplash

Based on the analysis and design below useful features are extracted out, which includes :

  • title
  • author
  • id
  • downs
  • thumbnail. height
  • thumbnail.width
  • date_created
  • meme_path
  • clean_sent
  • sentiment
  • word_count
  • year
  • month
  • day
  • hour

Which can be now modeled up to come with the popularity predictability.

Model Implementation

Since we have the information which incorporates text-based information, picture-based information, straight-out information, and numeric information, we will begin with an individual information classes-based model.

Image data based Models

A picture is only a matrix of pixel values but we can not simply flatten the image (e.g. 4x3 image matrix into a 12x1 vector) and feed it to a simple Multi-Level Perceptron feed-forward network, as it would have no good accuracy for complex images having pixel dependencies throughout, and thus CNN will perform better.

A ConvNet is able to capture the dependencies in an image because of relevant filters. The architecture performs a better fitting to the image dataset due to the reduction in the number of parameters involved and the reusability of weights.

source: https://www.kdnuggets.com/2020/12/generating-beautiful-neural-network-visualizations.html

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning calculation that can take in an information picture, appoint significance (learnable loads and inclinations) to different perspectives/objects in the picture, and have the option to separate one from the other. The pre-handling needed in a ConvNet is a lot lower when contrasted with other characterization calculations. While in crude strategies channels are hand-designed, with enough preparation, ConvNets can become familiar with these channels/attributes.

The engineering of a ConvNet is practically equivalent to that of the network example of Neurons in the Human Brain and was propelled by the association of the Visual Cortex. Singular neurons react to upgrades just in a confined district of the visual field known as the Receptive Field. An assortment of such fields covers to cover the whole visual territory.

A ConvNet can effectively catch the Spatial and Temporal conditions in a picture through the use of applicable channels. The engineering plays out a superior fitting to the picture dataset because of the decrease in the number of boundaries included and the reusability of loads. As such, the organization can be prepared to comprehend the refinement of the picture better.

For our data, we use the same concept of CNN :

Sample dataset Images

we defined a simple CNN based predictive architecture

Simple CNN Architecture

which returned Results as :

accuracy: 0.46273291925465837
auc_score: 0.5

Then we used transfer learning :

  • Transfer learning with VGG16
  • Transfer learning with V3
  • Transfer learning with ResNet50
  • Transfer learning with EfficientNet

and plotted the comparison plot as

Thus observed that models other than the model with transfer learning with VGG, showed an accuracy of below 50% thus performing mere random prediction, while the Vgg16 transferred model performed with an accuracy of 60% in prediction of the target column.

Text-Based Models

Text-based classification is the task of assigning a sentence or set of mere words to an appropriate category.

Presently there could exist two things, we utilize a straightforward arrangement of words and dependent on the mix of simple words group the new test situation, while on the opposite side we can utilize complex profound learning calculations and ideas, RNN based models including LSTM, Bidirectional RNN, where we will store the actual pith of the sentence and afterward anticipating the characterized yield.

Thus starting with a baseline model we designed below models:

  • Base Simple MultiLayered Perceptron Model
  • LSTM
  • Bidirectional LSTM
  • Transfer learning with BIRT (DistilBert)

Before performing models, words need to be passed through the embedding layer, ie. they are needed to be pre tokenized and padded.

Text information can be viewed as either in an arrangement of characters, succession of words, or grouping of sentences. Most normally, text information is considered as an arrangement of words for most issues.

As profound learning models don’t get text, we need to change over text into a mathematical portrayal consequently we use Tokenization that parts sentences into words and encodes these into numbers.

{‘a’: 3, ‘and’: 5, ‘in’: 6, ‘is’: 9, ‘of’: 8, ‘s’: 10, ‘the’: 1, ‘to’: 4, ‘when’: 7, ‘you’: 2}

Sample sequence generated

[1491, 2478, 355, 824, 2479, 40, 11, 501, 14, 119, 283, 2480]

Sample padded sequence with max_length = 25

[1491, 2478, 355, 824, 2479, 40, 11, 501, 14, 119, 283, 2480, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We at that point address each sentence by arrangements of numbers. Presently normally there will be sentences of various lengths. Be that as it may, all neural organizations need to have contributions of a similar size which is the issue addressed by cushioning.

Pre-handling crude content information and setting it up for profound learning models is presently done and now the cushioned successions are currently fit to be utilized for the neural organization.

Simple NN

Simple Neural Network
accuracy: 0.5032863849765258
auc_score: 0.5028230755124208

LSTM

Long Short-Term Memory (LSTM) networks are a sort of repetitive neural organization fit for learning request reliance in succession forecast issues.

LSTMs are an uncommon kind of RNN that preserves long-term dependency in a more successful manner contrasted with the fundamental RNNs. This is especially valuable to beat the vanishing gradient problem. In spite of the fact that LSTM has a chain-like construction like RNN, LSTM utilizes numerous entryways to painstakingly manage the measure of data that will be permitted into every hub state.

LSTM Model
accuracy: 0.5004694835680751
auc_score: 0.5
Bidirectional LSTM
accuracy: 0.5126760563380282
auc_score: 0.5127082481061942

BERT

source: http://jalammar.github.io/illustrated-bert/

Now that we have already pre-trained the model and its layers have been tuned to reasonably handle language, we can start using it for downstream tasks, thus performing transfer learning on to it.

We performed all these models and created a comparison chart to see which model performed better for our scenario.

BERT - Distilbert
accuracy: 0.5126760563380282
auc_score: 0.5127082481061942

Comparison of Text-based Models:

Text-based models comparison

Fusion Models

we definitely realize that a meme makes it's importance only with the content in it alongside the picture. The more comical portrayal of the mix is, the better are the climb of image over web-based media, and that is the place where we can not just take text or picture exclusively to order the info, however, what better we can do is we can intertwine both the component map yield from the two models lastly utilize the blend as another model, and thus we preassume that there might be a significant improvement in the performance and result.

This can be accomplished by Deep Hybrid Learning, which is the resultant combination accomplished by joining Deep Learning algorithms.

A crossover profound neural network model is planned by the combination of homogeneous convolutional neural network (CNN) classifiers. The troupe of classifiers is worked by differentiating the information includes and fluctuating the introduction of loads of the neural network.

So we used three different fusion:

  • CNN1D + CNN2D Conv1D for text and Conv2D for image
  • CNN2D + LSTM Conv2D for image and LSTM For text
  • CNN2D + Bidirectional LSTM Conv2D for image and Bidirectional LSTM For text.
Conv2D + Bidirectional LSTM architecture

and we found with the observation that CNN2D+LSTM Bidirectional performed better

comparison chart for Fusion model

Post Quantization

Once we decided and trained the model, we used post-quantization in order to optimize the size of the model.

Post-training quantization is a transformation strategy that can lessen model size while additionally improving CPU and hardware accelerator latency, with little debasement in model precision. We can quantize an already-trained float TensorFlow model when you convert it to a TensorFlow Lite arrangement.

Thus changing size from
File size: 203.751mb
to
File size: 16.997mb .

Deployment of model

Since the model is finished developed and prepared the inquiry is the way will we exhibit it. Indeed, this is the place where Model Deployment will help us.

Model Deployment helps us feature our work to the world and settle on better choices with it. However, conveying a model can get somewhat precarious on occasion. Prior to sending the model plenty of things should be investigated, like information stockpiling, pre-handling, model structure, and checking. This can be somewhat befuddling as the number of apparatuses that play out these model arrangement errands effectively is not many.

Streamlit is a famous open-source structure utilized for model sending by AI and information science groups. Furthermore, the best part is it’s liberated from cost and absolutely in python.

pipeline

The pipeline is a progression of steps tied together in the ML cycle that regularly includes acquiring the information, preparing the information, preparing/testing on different ML calculations lastly getting some yield (as a forecast, and so forth) .

An ML pipeline is basically a mechanized ML work process for expectation. Starting from the preprocessing of the input provided in real time and sending it to the output, for prediction and then getting the output.

We deployed our app, to streamlit wherein the application, meme upload is asked and after the uploading of a meme in the application, at the backend, it predicts the output and reverts to the user with the image display and alongside returns the confidence percentage of the prediction made, please find the video attached in reference to the concept.

Conclusion and Future Work

Although the model is tuned and produced of a combination of two models, we see the score to be around 60%, which is less, and to anticipate image isn’t a component-based technique, for an image to become a web sensation or not relies upon the pattern, simple pictures and text would not legitimize the consistency of the prevalence of image, rather than close to more factors can be taken in thought and a superior performing model can be intended to and forecast precision can be improved to some more degree

Advanced NLP-based techniques can be brought up and with a better and more dataset availability, the model can be tuned further.

References:

Reach out to me at:

--

--