How To Build an oEmbed Integration for Your Startup, and Why It’s Necessary

by Alex Krupp6/13/2017

Your startup isn’t going to have the same user-growth trajectory as Facebook.

No one’s is. It doesn’t matter how good your idea or execution are, it’s just math.

When Facebook launched, there were almost a billion people with access to a computer connected to the Internet.

But there wasn’t anything connecting the people behind those computers. Nor was there much good content. The enormous network coupled with very low saturation made this one of the greatest arbitrage opportunities of all time.

To quote Jonah Peretti:

“There was no competition. There were things accidentally happening that sometimes would go viral, and then we were like one of maybe a few dozen people trying to actually make viral web culture, when nobody was doing it. So the networks were completely open in the sense that no one was even trying to make content that intentionally would go viral.

There’s sometimes moments where networks are so amenable to spread. Duncan [Watts] uses this forest-fire analogy, which is if there’s a forest where the underbrush is wet, the trees are far apart, there’s not many dead trees, you could take a flamethrower to it and it won’t burn. If the forest is dry and it’s been hot and the trees are close together, you can just drop a match and the whole thing will burn. I think there was a period between 2001 and 2003 when the dry forest was ready to burn. If you made something that was pretty funny and you made something that had certain qualities that caused people to want to share and talk and discuss, then things would spread pretty far. Now you see people do a really cool project or a cool Tumblr and they don’t end up on the Today Show.”

In this article we’ll first take a look at why having an embed has become increasingly important given the current state of the web. Then we’ll dive into how to actually implement this for your own startup.

Specifically, we’ll take a deeper look at an open protocol called oEmbed, and how to build an oEmbed integration that’s compatible with Embed.ly. The idea is that this will make it super easy for anyone to directly embed your content within Reddit threads, Medium posts, Confluence pages, etc.

As an added bonus, these embeds don’t have to be simple static pages, they can be fully interactive. To give an example, here is a quick video showing the embed we built for our own startup:

Even though this looks complex, it’s actually quite simple.

Why build an embed?

  1. Network saturation — As explained above, the best new products are no longer likely to go viral on their own. So how do you get users? In short, people won’t sign up for your website unless they see your content at least 100 times within the sites and apps they already visit. And of course just seeing your content isn’t enough; they need to get differentiated value each time, and connect the experience with your brand.

    For our startup the differentiated part is easy, because we’re (currently) the only platform for publishing email conversations. But if, for example, we had built a podcasting platform, it would require a ton of work to make sure that everyone clicking the play button was A) aware that the content was from this new platform rather than from Soundcloud, Spotify, etc., and B) had some basic understanding of the product and its benefits.

  2. Single-player value — if your site is only useful once you have 100M users, you don’t have a viable business. Even if you’re building a social network, your product needs to provide value even if there is only one person using it.

    The canonical example here is Wikipedia, which got its start by importing the entire 1910 edition of the Encyclopaedia Britannica as well as the U.S. census data for every town and city. This caused Wikipedia articles to start ranking in Google results, where some percentage of people who landed on these pages began contributing their own content.

  3. Unit economics — Although these days the phrase is mostly used when discussing on-demand startups, the fact is that unit economics are equally important for social sites and content platforms. At a fundamental level, for every minute users spend creating content on your site, there needs to be some sort of ROI in terms of pageviews, engagement, subscribes, or conversions. It doesn’t matter if we’re talking about publishing a blog post, a photo, a video, an email conversation, etc., at the end of the day the thing that matters is the ratio of time invested vs the outcomes you care about. Having an embed is a way of dramatically improving this ratio for your users.

Those who have been following the tech industry for a while probably know that building an embed has been pretty standard advice ever since YouTube attributed this feature to their viral growth after launching it in July 2005. So what exactly has changed? Back then embeds were primarily about driving site growth as a whole. Even though YouTube had hundreds of millions of pageviews, there were still only a few thousand videos uploaded per day, so pretty much every great video eventually got seen by everyone on the site.

Whereas today users are primarily responsible for promoting their own content, so embedding isn’t just a feature you’ll need during the growth phase of your startup, but rather it’s a part of the core value proposition you’ll need to get your first 1,000 users.

The oEmbed protocol

The best way to make your content embeddable is to implement oEmbed, an open protocol for telling web platforms how to create an embeddable version of any piece of content.

How does this work?

The best way to explain is by example. Let’s say we have this email thread on our site about good hikes in southern Connecticut:

https://www.fwdeveryone.com/t/e8RFukWTS5Wo54fBNbZ2yQ/good-hikes-southern-ct

What we want is to take this thread and embed it within a Reddit post, like this:

https://www.reddit.com/r/Connecticut/comments/65kbok/good_hikes_for_southern_ct/

So how do we do that? To start with, we need to build a special version of this article that’s designed to live within an iFrame:

https://oembed.fwdeveryone.com/?thread-id=e8RFukWTS5Wo54fBNbZ2yQ

This iFrame is hosted on a separate subdomain of our site. All it contains is just enough html, css, and javascript to render an embedded thread on either desktop or mobile. As you can see from the link above, we’re passing in the thread id as a URL parameter. There is some javascript that takes this URL parameter and uses it to make a request to our API to get the text of the thread. Once that endpoint returns its data, the thread is rendered.

There are two important considerations here:

  1. Speed — The process of rendering the content needs to be as fast as possible. Large media organizations aren’t going to use your embed if it slows down their page load time. In practice, this means the time it take your iFrame to fetch data and render content should be less than 300 milliseconds, preferably faster.

  2. Responsiveness — Your content needs to look good across a wide range of page sizes. Even if you don’t care about your normal site supporting older iOS devices, the people running the sites you want your content embedded within might not want their pages looking broken to folks still using the iPhone 5.

So now we’re done right?

Well, not quite. We need a way to tell sites like Reddit how to actually render our iFrame.

How does this work?

Let’s start with two basic vocabulary terms:

Provider — The party providing the content that they want embedded within sites like Reddit, Medium, etc.
Consumer — Sites like Reddit, Medium, Confluence, etc., which allow their users to render the content of third-parties within their platforms.

In our case, we’re the provider.

What we need to do next is build a GET endpoint on our website that accepts one or more URL query parameters, and uses these query parameters to return a JSON response containing the information needed to render that article. The query parameters this endpoint needs to accept are as follows:

  • url — The URL of the resource that a user on a consumer platform (e.g. a Redditor) wants to embed. If the URL isn’t a valid resource, then you’re required to return a 404 NOT FOUND error. If the URL is a valid resource, but it’s not publicly accessible or the person who wants to embed it doesn’t have permission, then you’re required to return a 401 UNAUTHORIZED error.
  • maxwidth — The consumer adds this query parameter to specify the maximum width (in pixels) they are willing to accept for your iFrame. If, for example, the your endpoint gets called with a maxwidth=280, but the minimum width you implement for your iFrame is 400px, then you are required to return a 501 NOT IMPLEMENTED error.
  • maxheight — Same deal as the above, but for height. E.g. if your endpoint gets called with maxheight=400, but your iFrame requires a height of at least 600px, then you need to return the 501 error.
  • format — This is optional, and can be either JSON or XML. If the consumer calls your endpoint with format=xml, and you don’t implement XML, then again just return a 501 NOT IMPLEMENTED error. (In our case we only implement JSON.)

To simplify things, let’s look at the case where this endpoint is called with only the query parameter ‘url’, where the value is the URL of one of our email threads.

In our case, this means hitting the following endpoint like so:

https://api.fwdeveryone.com/oembed?url=https://www.fwdeveryone.com/t/e8RFukWTS5Wo54fBNbZ2yQ

This returns the following JSON response:

{
   "version": "1.0",
   "type": "rich",

   "provider_name": "FWD:Everyone",
   "provider_url": "https://www.fwdeveryone.com"

   "author_name": "Alex Krupp",
   "author_url": "https://www.fwdeveryone.com/u/alex3917",

    "html": "<iframe src=\"https://oembed.fwdeveryone.com?thread-id=e8RFukWTS5Wo54fBNbZ2yQ\" width=\"700\" height=\"825\" scrolling=\"yes\" frameborder=\"0\" allowfullscreen></iframe>",
    "width": 700,
    "height": 825,

    "thumbnail_url": "https://ddc2txxlo9fx3.cloudfront.net/static/fwd_media_preview.png",
    "thumbnail_width": 280,
    "thumbnail_height": 175,

    "referrer": "",
    "cache_age": 3600,        
}

Let’s quickly walk through each response parameter and explain what it means.

  • version is the version of the oEmded protocol we’re implementing. Basically you’re just required to add “version”: “1.0” to the JSON response.
  • type refers to the ‘type’ of embed we’re implementing. Each embed type has a few JSON response parameters you’re required to implement. For example, if the thing you want to make embeddable is a photo, then your JSON response is required to contain the URL of the photo, as well as its width and height. The options for embed type are: ○ photo — For if the thing we want to make embeddable is a photo. ○ video — For if the thing we want to make embeddable is a video. ○ link — For if we just want this endpoint to return information about the URL, but we don’t actually have an embeddable iFrame. (This could be useful for things like providing extra information about a link when a user hovers over it, and/or showing a preview image for the URL.) ○ rich — For when we have an embeddable iFrame, but the content within that iFrame isn’t a single photo or a single video. (Most of the time this is the option you want.)
  • provider_name — The name of your website
  • provider_url — The URL of your website
  • author_name — The name of the author of specific piece of content we want to make embeddable. In our case, the name of the person who uploaded the email thread.
  • author_url — A link to the profile page of the person (or organization) above.
  • html — This response parameter is required for the ‘rich’ oEmbed type. Basically should be an iFrame HTML element that contains a src attribute with the URL to render the iFrame for the resource you want to embed, as well as a width and height attribute. In our case, we also add the attributes scrolling, frameborder, and allowfullscreen to specify how we want the embed to look visually. Some sites that consume oEmbed will respect these attributes, others will strip them out or change them to match the rest of the page stylistically. If your embed requires scrolling (like ours) but one of your consumers is stripping out that attribute, this usually just requires sending them an email explaining the situation.
  • width — The width the iFrame should be. If the consumer called this endpoint with the maxwidth query parameter, then the width here needs to be no greater than the maxwidth.
  • height — The height the iFrame should be. Same deal as above, but for maxheight.
  • thumbnail — The URL of an image you want to use as a thumbnail for this embed. In our case, we just use the logo for our website. But for threads with image attachments, we could instead use one of these images as the thumbnail. Having a thumbnail image is optional, but Reddit won’t render your embed unless you include one.
  • thumbnail_width — The width of the thumbnail image. (Consumers won’t necessarily render the thumbnail using the dimensions you specify.)
  • thumbnail_height — The height of the thumbnail image. (Same deal as above.)
  • referrer — The consumer of your content has the option to pass in a referrer string as a query parameter. If they do so, you should return this string in your JSON response.
  • cache_age — How long the consumer should cache the response for this endpoint.

In our case we implemented this with Django Rest Framework, so here’s more-or-less what this looks like:

class OEmbed(APIView):
    def get(self, request):
        # The second param passed to the .get() method is the default value, which
        # is returned if the specified key (first param) isn't found in the dictionary.
        url = request.query_params.get('url', '')
        max_width = request.query_params.get('maxwidth', 0)
        max_height = request.query_params.get('maxheight', 0)
        resp_format = request.query_params.get('format', '')
        referrer = request.query_params.get('referrer', '')

        if resp_format and not resp_format == 'json':
            return Response(data={}, status=501)

        if max_width < 280:
            return Response(data={}, status=501)

        if max_height < 825:
            return Response(data={}, status=501)

        try:
            thread_id = utils.get_thread_id_from_url(url)
        except ObjectDoesNotExist:
            return Response(data={}, status=404)

        try:
            thread = thread_service.get_thread_from_thread_id(request, thread_id)
        except InvalidPermissionError:
            return Response(data={}, status=401)

        width = max_width if (max_width and max_width <= 700) else 700
        height = 825

        resp = thread_service.build_oembed_response(thread, width, height, referrer)
        return Response(data=resp, status=status.HTTP_200_OK)

Note that for readability I’m omitting the query parameter sanitization, e.g. stripping any XSS from the referrer string.

Embed.ly

So are we done yet?

In theory, yes, but in practice, not quite. The deal is that most mainstream oEmbed consumers use something called Embed.ly (YC w10), which makes it easier for consumers to implement the spec.

The best way to explain the value that Embed.ly provides is to start by looking at what the embedding process looks like with the Embed.ly integration enabled:

  1. A user on an oEmbed consumer platform performs an action that would trigger the creation of an embed within that platform. For example, a Redditor submits the URL of an article on your site to Reddit, or a Medium author pastes the URL into a new blog post they’re writing.
  2. The oEmbed consumer platform checks to see if this URL is in their cached list of Embed.ly providers. If so, the site makes a request to Embed.ly with the URL.
  3. Embed.ly makes the request to the oEmbed endpoint on your website with the URL being requested, and gets the JSON response with all the information needed to render the embed.
  4. Embed.ly returns this JSON response to the oEmbed consumer, and then the consumer site uses this response to create the iFrame.

The main benefits that Embed.ly provides are:

  • A universal API to interact with each site that implements oEmbed, instead of the consumer platform needing to create an integration for each provider.
  • Testing each provider’s endpoint before whitelisting their site to ensure they are implementing the spec correctly.
  • Cacheing the JSON responses from the providers
  • Providing a standardized way for iFrames to resize their own height after being embedded via the window.postMessage API. E.g. if your embed contains text, then you may want the height to increase as the width decreases.

The full requirements for becoming an Embedly provider are listed here: http://embed.ly/providers/new

Once you’ve verified that your embed is working correctly and meets all of Embed.ly’s requirements, just submit it via the above link. If all goes well it should be approved within a few days.

A couple miscellaneous tips:

  • If your iFrame requires scrolling, you need to ask Embed.ly to enable this.
  • Make sure your content is production-ready: gzipped, minified, static assets served from a CDN, etc.
  • Ensure your iFrame has the correct canonical URL.

So now are we done?

Maybe. But there are a few quirks specific to each oEmbed consumer platform that are important to know about.

Getting whitelisted for Medium

In order to integrate with Medium, getting your Embed.ly integration enabled is the first step, but then need to get whitelisted by Medium. What they’re looking for is as follows:

  • Has a Do Not Track policy that meets Medium’s standards.
  • Has branding that makes it clear the embed is a third-party service and not affiliated with Medium.
  • Performance (doesn’t slow down the page and renders well on all platforms.)
  • Security (Has a CSP on the API and the iFrame, and is served only via HTTPS.)

If integrating with Medium is your main reason for building the embed, you can try getting pre-approved by sending over some PDFs with design mockups. (But I don’t represent Medium and make no guarantees.)

Requirements for Reddit

Reddit doesn’t require any sort of whitelisting, so your content will be embeddable as soon as your Embed.ly integration goes live. There are a couple things to note though:

  • Your oembed JSON response must return a thumbnail image in order for the embed to render, even though this is optional in the spec.
  • Reddit does not implement native height resizing. So if your site is such that different pieces of content naturally have different heights, you either need to prerender your content to figure out the height, or else just choose a fixed height and make your content scrollable.
  • Embedding will only work on subreddits that have media previews enabled, unless the user selects the “auto-expand media previews” option on their settings page. (For moderators, each subreddit has a settings page with an “expand media previews on comments pages” option.)

Slack unfurling

If your VCs told you that you need to integrate with Slack, then the thing you want to check out is their Everything you ever wanted to know about unfurling blog post.

Slack doesn’t use Embed.ly. Instead you need to implement the Discovery section of the oEmbed spec.

Author

  • Alex Krupp

    Alex is cofounder of FWD:Everyone. Before that he cofounded LaunchHear (YC W10).