Instagram engineer delves into emoji madness

Company engineers did painstaking work to allow users to search by emoji

A portion of Instagram's regular expression search query for finding emojis

A portion of Instagram's regular expression search query for finding emojis

Emojis: Kids may love their simplicity, but programmers will loathe their complexities.

Last month, the Instagram photo-sharing service started recognizing emojis in its hashtag searches, making the company the first major social networking service to offer this capability. A user could affix a sprightly emoji to a photo hashtag so the snap could be found by other users searching for that emoji. The Internet rejoiced.

Now, one of the Instagram engineers responsible for this technical feat has shared the company's approach in a blog item posted Wednesday that should be perused by any developer looking to outfit a social Internet service or consumer app with similar emoji goodness. Turns out that supporting the little digital icons is no easy task.

"Identifying characters can be difficult across programming languages. Only by parsing the standard, finding character variations and understanding language differences do they become possible to support," Instagram engineer Piyush Mangalick wrote in the new post.

While elders may bemoan emojis' putative deleterious effect on language, one thing is for sure: The youth love them. Today, almost 60 percent of user text generated on Instagram contains emojis. Among Instagram's 300 million users, emojis are now more widely used than acronyms. LOL.

First popularized in Japan during the last decade, emojis convey a wide range of subjects and emotions through the use of simple symbols and pictographs, usually fitted on a 12-by-12-pixel grid. They are often used as shorthand to eliminate the laborious typing of words on small devices. The Unicode standard for encoding the world's languages on computers adopted a set of 1,282 emojis in 2010, which paved the way for their widespread use on Apple and Android devices.

Including emojis in Instagram's hashtag index at first seemed like a simple task. With Unicode, each character -- be it a letter, symbol or emoji -- is represented by a string of hexadecimal numbers, which a programming language or operating system can translate into the appropriate character by using the Unicode guide.

Unfortunately, creating a single way to search these raw Unicode strings across different platforms was not possible, Mangalick said. Emojis used a subset of Unicode, called UTF-16, that allows the numeric strings to be of differing lengths. That made them tricky to parse, given that different programming languages used different escape keys, or markers, to signify the end of the numeric string. Additionally, some emojis required two strings of numbers.

Apple muddied the waters further by offering users the ability to encode some emojis in various colors, which resulted in non-standard strings. Android also had a set of non-standard emoji encodings. For Instagram to use emojis correctly, an Android device had to recognize an iPhone emoji, and vice versa.

For the solution, Instagram turned to regular expressions, a dense but extremely versatile language for searching for patterns in text. Regular expressions, called regex for short, were designed for tasks such as recognizing complex sets data strings within larger, more complex strings of data.

In the IT world, regular expressions searches justifiably have gained a reputation for being fiendishly complicated. Instagram's regular expressions for finding emojis may be the most complicated yet.

The company painstakingly crafted a regex search pattern for Python 2.7, the company's preferred language for its back-end search service, that would identify all the possible emojis a user could use. The list was more than 3,600 characters long. Imagine entering that into Google without a single mistake.

And that was just the regex for Python. Instagram had to identify emojis across all the platforms it supported. So company engineers had to craft separate, though equally voluminous, regex patterns for Google's and Apple's choices, Java and Objective-C.

The work paid off, however, not only in terms of the positive publicity that the emoji support generated for Instagram, but also by helping the company stay in touch with its digitally expressive user base. If emojis ever do surpass the use of text itself, as pundits fear and Instagram predicts, then Instagram is well poised for this colorful future.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags application developmentLanguages and standardsInstagramsoftware

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

Crucial Ballistix Elite 32GB Kit (4 x 8GB) DDR4-3000 UDIMM

Learn more >

Gadgets & Things

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Family Friendly

Lexar® JumpDrive® S57 USB 3.0 flash drive 

Learn more >

Stocking Stuffer

Plox Star Wars Death Star Levitating Bluetooth Speaker

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

GGG Evaluation Team

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell

LIFEBOOK UH574

The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi

STYLISTIC Q702

The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott

STYLISTIC Q702

My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?