For the Love of Data

Recently, on a sunny Easter Sunday, I took the plunge and “put a ring on it” as all of the cool kids say these days. A day that I thought would never come, but after a whirlwind twelve months my partner and I are now engaged!

So, of course, you would think my next step to be choosing a date, pick a venue, have a party. Oh no ... firstly what I want to know is, what does the data look like?

Well, I need to make sure I didn’t make a mistake, don’t I?

Now I know what you’re going to say: “Tom, you can’t measure love!” and yes of course I would agree, (not without an EKG or biopsy) but what I can measure is the amount of times we said “I Love You” . . . in the digital world of course.

Harvesting this data did have its difficulties, but that’s what Love is all about, as Vincent Van Gogh famously stated “Love always bring difficulties, that is true, but the good side of it is that it gives energy.” That energy was needed for the mammoth task of sifting through hundreds of thousands of rows of text messages from the past twelve months.

Luckily, 99% of our digital messaging correspondence is done through WhatsApp, which has a very useful export chat history tool, the rest of the 1% through mediums such as Facebook Messenger and regular SMS texts were minor so I could pull these manually.

The data exported from WhatsApp provided me with the date and timestamp of the message, along with the sender, which is all the data I would require, except for a number added to each message allowing me to count how many messages were sent for each hour, day, week, month, or year.

When I had all the data I needed all I had to do was identify every instance where the Phrase “Love You” was mentioned. To achieve this, I used a “grep” command in Linux, a command that searches the text for your chosen character string, in our case “Love You”, and then deletes any other row not containing said characters, leaving us with just the rows that we are interested in.

comparative love analysis table

Here is a look at a section of our simple dataset.

With the data clean and ready, I then used Tableau, (arguably one of the world’s most popular Business Intelligence tools), to visualise the data, and this is what I found:

romantic love measurement graph

The line Graph above shows my messages in green and my partners messages in pink over twelve months, each point on the bottom X axis represents a week. Interestingly, and obviously, here the trend or seasonality of our data is reflected on when we are physically together and when we are apart. For example, as you can see in July of 2018 there is a large dip in “I love you” communication, this was down to being on Holiday where we can say it to each other’s face whilst sipping on Margheritas in our factor 50, the same can be observed at other events highlighted such as Christmas where we are physically together for a substantial period of time. On the flipside peaks can also be witnessed, in September of 2018 I was away on business, and you can see the age old saying “Absence makes the heart grow fonder” prove itself true. We also got a bit soppy around Valentine’s Day.

The pattern of when we are most likely to share those magic words can also be viewed collectively by the hour:

amorousness over-24-hours bar chart

You can see that our messages peak at 8 a.m. (when I have left for work) and 12 p.m. (my usual lunch hour). Characteristically, the count of messages is very close between us. You can see that over a whole year we both said those three words exactly 59 times between 8am and 9am.

intimate expression quantitive difference = 4

To conclude, observing the line chart we can see a mutually reciprocated pattern, there is little deviation between myself and her, which is itself a reassuring statistic. You can actually see how close by simply running a count the messages. There was only 4 in it!

So be sure to tell your partner how much you love them, because you never know who might be grepping before a wedding!