Tensor Discovery —  The Ultimate Social Media Discovery Tool for Influencer Marketing.

Read our gray paper on how Tensor Discovery works.

 min. read
March 23, 2022

At Tensor Social we have a lot of data, petabytes of raw data. But it’s not enough just to capture the data. We have to turn the data into an actionable set of insights to get the full value out of it. That’s why we created Tensor Discovery. We consider Tensor Discovery to be one of those revolutionary technologies that has the potential to shape how the whole influencer marketing industry will look like in the near future. We believe that one company, no matter how big and powerful, can’t change the whole industry, that’s why we decided to fully disclose all of our know-hows and make Tensor Discovery available for everyone to try for free.

So, what is Tensor Discovery? It helps identify influencers based on the content they post. For example, try searching for “highheels.” Currently Tensor Discovery works for 1,300,000+ topics/keywords. Search results are sorted according to the relevance score of each influencer for each topic, unlike the technology available prior to Tensor Discovery, that only searches for a mention of a keyword in a profile bio or in a post. Can you imagine Google search results without a relevance score? For example, a keyword “influencer marketing” Google finds “About 1,420,000 results (0.66 seconds)”. How useful would it be to look through all 1,420,000 websites? Will you easily find anything useful? That was the state of influencer identification and discovery before Tensor Social offered Tensor Discovery technology.

The influencer identification problem

Advertisers who use influencers for promotion have a big problem: identification, or in other words — targeting. Advertisers who use AdWords can use keywords to focus on a narrow segment of the audience most relevant to the goods or services that the advertiser wishes to promote. On social media, no such possibility exists: there are tens of millions of potential influencers on Instagram to choose from. How can an advertiser choose the ones that are most suitable thematically so that the sponsored post will look authentic and not out of place? The advertiser could actually be an active user of Instagram, with a priori knowledge of which influencers are a good fit (or at least know some of them). Or the advertising can simply be placed at random in a scatter-shot manner, hoping that it will reach at least some small percentage of the target audience. Neither of these approaches are very good; so it makes sense that influencer marketing on Instagram is not highly effective for niche, small or mid-sized advertisers. It is only the big brands that currently make much use of influencer marketing, brands for whom wide reach is more important than exact targeting as they use influencer marketing as a marketing tool for branding.

At Tensor Social we decided to try and solve this problem, and thereby to make influencer marketing on Instagram easy and accessible for everyone — not just the big players. We think we’ve done it, and now we’re going to explain how.

The simplest and most obvious way to separate bloggers into topic segments is by using a topic index. We create a tree of topics, we decide where each blogger fits in, and we’re done. This is what the majority of advertising systems do, including such systems as Facebook Ads. The fact is this is not an effective solution. The number of different topics that are potentially of interest to advertisers is extremely large. We would not be mistaken by saying it is potentially infinite: the narrower the advertiser’s niche, the finer the subdivision of topics that is required. For instance, “Food” is a topic. It is self-evident that a restaurant needs not just “Food”, but “Food > Restaurants”. If it is a Thai restaurant, it needs an even more narrow topic: “Food > Restaurants > Thai cuisine”. And if the restaurant is in London, it wouldn’t be a bad idea to add it to “Food > Restaurants > Thai cuisine > London”, and so on.

The topic tree cannot grow infinitely:

  • Firstly, it would just become too difficult for the advertiser to work with it. The optimal size of an index that an ordinary person can examine and keep track of is no more than ~100 entries.
  • Secondly, the more topics there are, the harder it is to associate an influencer with a specific topic — which means the volume of work involved increases. Doing this work manually (remember, there are 43 million bloggers) is not possible. We can make use of machine learning — but even then, we still need to start by building a training dataset. This dataset is assembled by people; people make mistakes; people are subjective in their opinions and assessments (different people will place the same blogger under different topics). The problem becomes even more complex if we bear in mind that many bloggers publish posts related simultaneously to several topics.
  • Thirdly, we would have to spend a lot of effort on keeping the topic index up-to-date (new trends are constantly appearing, new topics are being born and old ones are dying), we would also have to constantly update the training dataset and retrain the model.

Given all of these factors, we realized that a fixed set of topics could not provide the necessary level of ease and quality; so we started to look into other ways of solving the problem.

What if the topic could be expressed using a set of keywords, as in AdWords? Then advertisers would be able to choose any topic they wanted, no matter how narrow it is, and there would be no need for them to crawl through an enormous topic index. But in AdWords, keywords correspond to what potential customers are explicitly looking for: so what should the keywords for Instagram correspond to? Hashtags? It isn’t that simple.

  • First of all, bloggers who write about cars certainly don’t always have to use the hashtag #car: they could use #auto, #fastcars, #wheels, #drive or brandnames like #bmw or #audi, etc. In principle, the same problem exists in AdWords, and advertisers have to work hard to capture all the various possible keywords relevant to their topic area.
  • Secondly, a blogger can just incidentally use a hashtag, for example #car, if they bought a new car, or just saw and photographed an interesting car on the street. This does not necessarily mean that they write on automotive topics.
  • Thirdly, bloggers often use popular tags that have nothing to do with the topic of their post, in order to be included in the search results for the given tag (hashtag spam). For example, the hashtag #car might be tagged to a photo of a bearded hipster, a landscape at sunset or a selfie in a new dress surrounded by friends.

We have been able to solve these problems by creating our own machine learning models that have processed a huge amount of data about Instagram users’ posts and have learnt how to accurately determine the topic of posts and its correspondence to the specified keywords.

How Tensor Discovery is built

Topic models and topic spaces

In the field of machine learning, there is a term “Topic models”: see Wikipedia for the details. Let’s try to explain what it’s about in a simplified manner. Imagine a very primitive social network where people have only two basic interests: Food and Japan. If we express the “strength” of an interest by a real number between 0 and 1, then any hashtags that bloggers use can be placed on a two-dimensional diagram.

Any hashtag can evidently be described by a pair of numbers, corresponding to X and Y coordinates within the topic space. Tags referring to the same topic are grouped into clusters, i.e. they have similar coordinates. Using this chart, we can see how relevant a post is to our topics by calculating its approximate position in the topic space (using averaged coordinates of all tags included in the post), and then estimating the distance from the post to particular thematic clusters.

The closer the post is to a thematic cluster, the higher its relevance score on that topic.

High-dimensional spaces

Of course, in the real world there are many more topics: it will be impossible to put them all in the two-dimensional space of our diagram. So, for real modeling, we use not just two axes but a hundred or more — and the mathematics we need is more complicated too, relying on vectors and tensors in high-dimensional spaces. High-dimensional spaces have a surprisingly high capacity. Imagine we’re working with a very crude model, which only uses two concepts on each axis: “near” and “far”. Thus, it divides the coordinate space for each axis into two halves. On a two-dimensional graph, this model lets us define four different topic areas. If we use three-dimensional space instead of two-dimensional, i.e. we divide the faces of a cube into two halves each, we obtain eight smaller cubes: eight possible topics.

In a four-dimensional space we obtain 2⁴=16 topics, and in a 100-dimensional space the number is 2¹⁰⁰, or about 10³⁰ topics — considerably more than the quantity of information the human brain can store (about 2.5 times 10¹⁵ bytes), and actually more than the whole internet (about 10²⁴ bytes in 2014).

As we see, even a crude topic model working in a 100-dimensional space can include more topics than could possibly exist in reality.

Model learning

How can hashtags’ coordinates be determined, in a high-dimensional space that we cannot even see? This is where complicated machine learning algorithms come into their own. At first, tags are distributed randomly in space, then algorithms begin to analyze information regarding which tags are found in posts together and which are not. Coordinates of hashtags that occur together are gradually drawn towards each other, creating clusters. Tags that do not occur together, on the other hand, are pushed apart and gradually drift into opposite corners of the “universe of hashtags”.

Allowing the algorithms to witness a sufficient quantity of “tags occur together/do not occur together” situations and to learn from these situations involved processing an enormous quantity of information. The model that is currently used at Deep Social has been trained on more than a billion posts made by hundreds of millions of Instagram users. One can say that we have managed to take on the collective wisdom of all these people, expressed in their deliberate choice of hashtags for their posts, and transferred that knowledge into our model.

An approximate visualization of how the topics are located in space after the learning process can be seen below.

This can help you to take your own journey through our “universe of topics” — finding topic clusters, studying their content and the content of neighbouring clusters, etc.

How does Tensor Discovery work?

The use of Tensor Discovery technology to select bloggers can be summarized as follows:

  • You choose one hashtag, or several, defining the topic you need.
  • Coordinates in topic space are calculated based on these hashtags. A cloud of relevant tags (i.e. tags that are close to the coordinates we have obtained) helps understand these coordinates.
  • The system searches for bloggers whose posts are on topics as close as possible to the topic we want. Everything happens in a similar way to building a tag cloud, but instead of tags, a “cloud” of topic-relevant bloggers is calculated. In this case, the blogger does not necessarily have to use one of the tags specified (although if the tags match, the relevance of the blogger will certainly be higher). It will be enough if a blogger uses tags that are just close to the given topic. This feature distinguishes Tensor Discovery from a traditional keyword search. Advertisers do not have to guess which exact hashtags bloggers are using (and the number of “synonyms” for tags can be very high, as shown in the examples below). They only need to provide a general topic, and Tensor Discovery takes care of the rest.

Examples of Topic Formation

Let’s start with something very simple. We’ll define the topic using one hashtag, let’s say the auto brand “BMW”.

The most relevant tags are in yellow, whilst in purple the least relevant tags. The intermediate colors (green, blue) correspond to the average relevance. The size of each hashtag is proportional to its popularity. As you can see, Tensor Discovery has coped beautifully with forming the ‘BMW’ topic and has found a vast quantity of relevant hashtags that most of us would never even have thought existed. Let’s make the task more complicated, by forming a topic around several German auto brands: bmw, audi, mercedes, vw.

This example shows Tensor Discovery’s capacity for generalization.

  • Tensor Discovery has understood that we mean cars in general (hashtags #car, #cars)
  • It has also realized that the topic needs to give a preference to German cars (tags circled in red)
  • It has added “missing” hashtags to our topic on its own initiative: #porsche (another German auto brand) and alternative variants #mercedesbenz, #benz and #volkswagen

Now we’ll make the task even more complicated: let’s create a topic based on an ambiguous tag. We’ll use #apple: which could mean the brand, but it could also just mean the fruit.

The brand topic predominates, as you see; but the fruit topic is also present, represented by the hashtags #fruit, #apples, and #pear. Let’s try to refine the results towards a purely “fruit” topic. We can do that by adding a few hashtags that refer to Apple the brand, with negative weights. The set of hashtags will look like this: #apple, #iphone:-1, #macbook:-0.05 #macintosh:-0.005

As you see, we now have a pure fruit-based topic, which could be used to identify food bloggers who are fans of apples and apple dishes. You can see some live examples of how you can form a topic and select relevant bloggers on our website. Tensor Discovery is now available with the keyword cloud and search priority adjustment filter. You can adjust the search priority either towards relevance or account size. Also you can see which keywords are used for your search in the Tensor Discovery keyword cloud. Play with the search priority filter and see how the keywords cloud works https://app.tensorsocial.com/

Disclaimer: All of our data is proprietary metrics based on estimated data and not necessary reflective of actual data. This article is based on data obtained before March 2020. We provide estimated audience data only for public accounts registered on the social network before March 2020.