Instagram engineer delves into emoji madness

Company engineers did painstaking work to allow users to search by emoji

A portion of Instagram's regular expression search query for finding emojis

A portion of Instagram's regular expression search query for finding emojis

Emojis: Kids may love their simplicity, but programmers will loathe their complexities.

Last month, the Instagram photo-sharing service started recognizing emojis in its hashtag searches, making the company the first major social networking service to offer this capability. A user could affix a sprightly emoji to a photo hashtag so the snap could be found by other users searching for that emoji. The Internet rejoiced.

Now, one of the Instagram engineers responsible for this technical feat has shared the company's approach in a blog item posted Wednesday that should be perused by any developer looking to outfit a social Internet service or consumer app with similar emoji goodness. Turns out that supporting the little digital icons is no easy task.

"Identifying characters can be difficult across programming languages. Only by parsing the standard, finding character variations and understanding language differences do they become possible to support," Instagram engineer Piyush Mangalick wrote in the new post.

While elders may bemoan emojis' putative deleterious effect on language, one thing is for sure: The youth love them. Today, almost 60 percent of user text generated on Instagram contains emojis. Among Instagram's 300 million users, emojis are now more widely used than acronyms. LOL.

First popularized in Japan during the last decade, emojis convey a wide range of subjects and emotions through the use of simple symbols and pictographs, usually fitted on a 12-by-12-pixel grid. They are often used as shorthand to eliminate the laborious typing of words on small devices. The Unicode standard for encoding the world's languages on computers adopted a set of 1,282 emojis in 2010, which paved the way for their widespread use on Apple and Android devices.

Including emojis in Instagram's hashtag index at first seemed like a simple task. With Unicode, each character -- be it a letter, symbol or emoji -- is represented by a string of hexadecimal numbers, which a programming language or operating system can translate into the appropriate character by using the Unicode guide.

Unfortunately, creating a single way to search these raw Unicode strings across different platforms was not possible, Mangalick said. Emojis used a subset of Unicode, called UTF-16, that allows the numeric strings to be of differing lengths. That made them tricky to parse, given that different programming languages used different escape keys, or markers, to signify the end of the numeric string. Additionally, some emojis required two strings of numbers.

Apple muddied the waters further by offering users the ability to encode some emojis in various colors, which resulted in non-standard strings. Android also had a set of non-standard emoji encodings. For Instagram to use emojis correctly, an Android device had to recognize an iPhone emoji, and vice versa.

For the solution, Instagram turned to regular expressions, a dense but extremely versatile language for searching for patterns in text. Regular expressions, called regex for short, were designed for tasks such as recognizing complex sets data strings within larger, more complex strings of data.

In the IT world, regular expressions searches justifiably have gained a reputation for being fiendishly complicated. Instagram's regular expressions for finding emojis may be the most complicated yet.

The company painstakingly crafted a regex search pattern for Python 2.7, the company's preferred language for its back-end search service, that would identify all the possible emojis a user could use. The list was more than 3,600 characters long. Imagine entering that into Google without a single mistake.

And that was just the regex for Python. Instagram had to identify emojis across all the platforms it supported. So company engineers had to craft separate, though equally voluminous, regex patterns for Google's and Apple's choices, Java and Objective-C.

The work paid off, however, not only in terms of the positive publicity that the emoji support generated for Instagram, but also by helping the company stay in touch with its digitally expressive user base. If emojis ever do surpass the use of text itself, as pundits fear and Instagram predicts, then Instagram is well poised for this colorful future.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags softwareapplication developmentLanguages and standardsInstagram

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Father’s Day Gift Guide

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Luke Hill

MSI GT75 TITAN

I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Featured Content

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?