- New gTLD database
IGF workshop report on IDNs and local languages
This is a meeting summary of a workshop held at the 2011 IGF in Nairobi, run by .Nxt
Workshop Report, Internet Governance Forum 2011, Nairobi
Workshop #69: IDNs and New gTLDs: Why Local Languages are the Answer to a Truly Global Internet
Chris Disspain, CEO, auDA
Kieren McCarthy, CEO, .Nxt
- Chuck Gomes, VP of Policy and Compliance, VeriSign
- Indrajit Banerjee, Director, Knowledge Societies Division, UNESCO
- Emily Taylor, Strategic Consultant, .Nxt
- Ram Mohan, CTO, Afilias
Organized by: .Nxt
The workshop looked at Internationalized Domain Names (IDNs) and new generic Top-Level Domains (gTLDs) as entry points into the global Internet infrastructure for the non-English speaking world.
The panel and audience discussed:
- UNESCO's role in promoting multilingualism in cyberspace
- The successes and limitations of IDNs in serving underserved language communities
- The role of politics and the market in achieving or thwarting multilingualism
A number of panelists agreed that IDNs will function as a gateway or catalyst. What multilingualism in cyberspace requires though is, in EVP and CTO of Afilias Ram Mohan's words, "uniformity of user experience. IDNs are kind of the gateway. The internationalized domain name merely gets you off the ground, but the uniformity of user experience is really where it's lacking - that kind of acceptance has to happen at the application level." Here, Mohan is a pessimist and makes a provocative prediction: "If you look at the current state of affairs and you look at what I think is general apathy from the application market to implement the multilingual pieces into their applications, I predict that in five or ten years' time, we may only have 40 or 50 languages that are actually universally accessible wherever you go in the world."
Panelists grappled with the market and political realities that make reaching out to underserved language communities so challenging. UNESCO's Director of the Knowledge Societies Division Indrajit Banerjee looked to policymakers for leadership. VeriSign's VP of Policy and Compliance Chuck Gomes talked about the shared responsibility for encouraging application makers to implement the new standards.
Emily Taylor, strategic consultant for .Nxt, described it as "one of these things in life, the closer you get to where you want to be, the more frustrating the gap is... to have IDNs at the second level was almost worse for many people than not, because it highlights in a particularly right-to-left script, just the torture of finding a resource on the Internet, having to change your keyboard language halfway through typing a domain, be muddled up about what's the top level and second level and where all the hierarchy is."
There was some discussion of whether or not language communities could create their own applications and what resources could be provided to support them in doing so. Panelists also talked about the likelihood and repercussions of underserved communities turning to mobile phones for applications that support the local language.
The workshop was not without optimistic moments. Ram Mohan enthused that we live in exciting times. "We don't recognize it because we're right in the middle of it, but perhaps 10 years out we can look back and say, wow, that was amazing to be at this start. In five years' time, not only [do] I think will there be a thriving set of domain names that are in local languages, but I am very hopeful that there will be an integrated user experience."
Moderator and CEO of auDA Chris Disspain described the level of passion communities around the world share for this issue. That passion may be our best hope for the development of a full multilingual internet. It has gotten us this far.
Top-level domains in different languages have only just begun to tap into the global demand for access to and communication over the Internet. IDNs and new gTLDs may be the single most effective entry points into the global Internet infrastructure for the non-English speaking world.
This workshop engaged those at the forefront of this extraordinary expansion and reviewed where we are, what we have learned and where we are going.
There is a wealth of new information about non-English domain extensions, including the initial results and impact of IDNs in countries as diverse as Russia, Israel and Saudi Arabia; plans to provide hundreds of new non-English top-level domains in the next year; and, studies of the nascent IDN markets and what they can tell us about use and demand.
This workshop grappled with issues raised by this new information, including:
- metrics of multilingualism online (and what is wrong with them)
- what can be learned from the first-year experiences of IDN operators;
impact of IDN Internet addresses globally
- challenges that remain for the non-Latin world, and
- future opportunities and changes
What follows is a summary of the workshop, broken down by broad issue.
UNESCO's work on multilingualism in cyberspace
Moderator Chris Disspain - CEO and manager of .auDA - launched the workshop by asking Indrajit Banerjee, Director, Knowledge Societies Division, UNESCO for his perspective on UNESCO's work on multilingualism in cyberspace.
Banerjee described UNESCO's main mandate as building peace in the minds of men, pointing out that a lot of work in peace and reconciliation involves understanding diversity and promoting and celebrating both linguistic and cultural diversity, which are essential to any knowledge society. But, he noted, "if nothing is done, half of the 6,000-plus languages which are spoken today will disappear by the end of the century. In 2008, it was estimated that 12 languages - accounted for 98 percent of Internet web pages. English was 72 percent of web pages, still the dominant language online."
UNESCO's work on multilingualism is focused in the following areas:
- preservation of endangered languages including indigenous languages
- promotion of the use of mother tongue in education
- promotion of languages in media, mainly public service broadcasting on the Internet
- measuring linguistic diversity in cyberspace
- promoting development of local content in multiple languages, and
- providing assistance to Member States in the formulation of comprehensive language policies, including multilingualism in cyberspace, and building capacities
Banerjee characterized the report on IDNs and languages presented to the panel as useful and timely, but noted that it is still a work in progress. He strongly recommended going to the website to read the UNESCO Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace. "This recommendation guides, to a great extent, our 193 member countries in their policies in the area of language." It requires Member States to report to UNESCO on actions they have taken. "Some Member States reported on their progress [in the implementation] of IDNs and new gTLDs. Egypt and Jordan reported successful experience in launching the domain names using Arabic letters. Measures were taken to provide the use of letters of the domain names."
Chris Disspain wondered what influence panelists might expect the introduction of IDNs to have on UNESCO's work.
Banerjee thinks it is going to have "a huge impact, as this report will show to some extent. As the reports from Member States come in, we are noticing that this new arrangement that ICANN is promoting is an extremely positive step forward, but I think this alone is not enough - I think we have to pursue our strategy, we should talk to our Member States, we should ensure that there are incentives for people to use these domain names and the most important thing - UNESCO is grappling with is content in local languages because the relevance of content in local languages is the key. You might have domain names and so on, but how relevant is this content to the citizens of any country?" CTO of Afilias Ram Mohan mentioned Workshop 96, "Economic Aspects of Local Content Creation and Local Internet Infrastructure", where a report was presented demonstrating the connection between the availability of local language content and the cost of access: the more local language content there is on the Internet, the cheaper the access. Mohan argued that this fact alone should persuade Member States and other governments to promote multilingualism in cyberspace.
IDNs should actually make it easier to measure the presence of local language content online. Emily Taylor, strategic consultant for .Nxt, explained that search engines capture perhaps 50 percent of the content, and email, chat and instant messaging are inaccessible. Since the data reveals, predictably enough, that local registries with country codes tend to implement only the characters that support their local languages, they may provide UNESCO with a much more effective method of tracking a particular language online. Looking forward to the next study, Taylor said, "we're hoping that next year we can expand on this, expand on the number of domains that we're looking at, and also, I hope, maybe - start to touch on that linkage with content, because I think then we would start to get to a very interesting story." Participant Mohamed El Bashir, a country-code Top-Level Domain (ccTLD) manager from Qatar, hopes future reports will provide more data on how IDNs are changing both content development and how users access content.
Limits to the impact of IDNs
IDNs provide what Ram Mohan describes as a "a unique hard-to-misspell gateway to local resources." Emily Taylor provided the example of dot-rf, the Russian Federation domain, which launched with more than 500,000 registrations in its first month. "[U]sers in Russia love it because they're not making the misspellings, they're not getting lost, they find it easy to operate with a domain that speaks their language."
Chris Disspain asked VeriSign's VP of Policy and Compliance Chuck Gomes for his feedback on the new IDN gTLDs, and on the fact that, while websites can use all sorts of languages and scripts and domain names in Arabic or Cyrillic, IDNs cannot be used for email.
"The registrants of domain names in the gTLD world have been waiting for 11 years to get - a full IDN experience," said Gomes, "so I think it will have significant impact, but - it's not just domain names in IDN script. [It's the] second level, top level, even third levels, etcetera, that are really needed." Though IDN TLDs are being introduced in the ccTLD world, and will be introduced in the new gTLD world, browser manufacturers have yet to adopt the IDNA 2008 revised standard. "There will be some holes there until the browser manufacturers implement that - Are we going to see that impact [of IDNs] immediately? Probably not. We'll see degrees - of the impact, and then as the browsers are updated to the latest standard, as email comes on board -- and that may happen sooner than some of the browsers, we don't know, but -- there will be part of the problem solved," said Gomes.
Ram Mohan defined the fundamental issue as uniformity of user experience. "The internationalized domain name merely gets you off the ground, but the uniformity of user experience is really where it's lacking. And you look at email, browsers, search engines. Yesterday there was a panel where he we heard that if you type in an IDN URL in Facebook or Twitter, it does not automatically convert it into a web link because - they don't recognize an IDN domain name as a URL, right? So that kind of acceptance has to happen at the application level."
Mohan does not dismiss the importance of email, but he believes that where the rubber really hits the road is in the ability to perform simple tasks online, like signing up for a free email account, an instant messaging account or a Facebook account. These are likely to require an email address, to which they will send an authentication password. "Try typing in just a normal email address," said Mohan, "but make it an IDN top-level domain and an IDN second-level domain. So, what I'm saying is, even if your email address was in ASCII, if your name was in ASCII, so it's ASCII, and you type something in a local language, dot-local language, you will find -- at least in our labs we found in 2010 -- we found over 75 percent of the most commonly used applications online, the user registration forms... do not support IDNs."
The IETF's and standards bodies' work on making email function completely in local script has lead to draft implementations by several countries. Mohan's company, Afilias, has a draft implementation, too. "We've been participating in those trials since 2007. In 2009 for the first time, we managed to get emails going, we wrote an email that was completely in Arabic [including the address], written right to left, sent across to folks in China and it went over the regular Internet, where the entire email when it hit the DNS - was in - the local language, so that process is working." The email did not go to the spam folder and the Chinese replied, copying the Koreans. The reply reached its destinations. Mohan believes there are implementations for 14 languages that allow for email to go back and forth. "The standard is in last call and final call, if you will, inside of the IETF, we expect that it will - actually become a full-scale standard in, if not this meeting coming up, then [the] meeting right after that, so perhaps early 2012."
Emily Taylor believes that IDNs will greatly contribute to the development of a full multilingual Internet, but "I think of them in terms of being a catalyst, that without the whole chemical reaction that is necessary to enhance multilingualism online, it can't be done - it's one of these things in life, the closer you get to where you want to be, the more frustrating the gap is, and as Chuck said, to have IDNs at the second level was almost worse for many people than not, because it highlights in a particularly right-to-left script, just the torture of finding a resource on the Internet, having to change your keyboard language halfway through typing a domain, [being] muddled up about what's the top level and second level and where all the hierarchy is."
Chuck Gomes agreed with Taylor about IDNs serving as a catalyst. "This can be a catalyst, the more - top level IDN names are registered, that will probably help, [but the application market is] still going to weigh it in their own business metrics - are we going to get enough value out of this to do it?" Having introduced an IDN and run an IDN as a registry, Gomes counts himself as among those who "have the responsibility to try and encourage the application makers to implement the new standards - it will be all of us, really, it will be a community effort to make that happen."
Ram Mohan made what he called, "a provocative prediction. You were saying if you do nothing, you know, 50 percent of the 6,000 languages may go away. I think it's actually perhaps even more dire than that, because, if you look at the current state of affairs and you look at what I think is general apathy from the application market to implement the multilingual pieces into their applications, I predict that in five or ten years' time, we may only have 40 or 50 languages that are actually universally accessible wherever you go in the world, and I think a lot of that will be because there is not an initiative to actually help move this forward. We can depend upon market demand to drive some of it and that market demand I think is what will get us to 50 or 60 languages or scripts. But beyond that, what is the real reason for someone in India to implement a special set of code that affects just one language that has, relatively speaking, only six million people speaking it - when the opportunity cost is far greater [and] I could instead take the same amount of effort and I could address 40 million people in another language?"
The problem, said Mohan, is that "no single body or single organization that can actually turn the key and get this engine rolling, and because of that, each one of us in our little piece do the best we can." That will not suffice. Mohan illustrated that using the example of IDN email implementation. "We have this program out, we've been inviting individuals and universities and companies to go partner with us, we're giving away the service for free, and there is very little traction on a global basis because you are one small organization and getting that level of interest is a pretty daunting task."
Promoting multilingualism in politics and the marketplace
Indrajit Banerjee agreed with Mohan, citing the UNESCO Atlas of the World's Languages in Danger. "It's worse than studies actually reveal." Banerjee also agreed, "It will not be any single player who is going to solve this problem." Language and politics are interconnected because of limited resources, political realities and the privileged status of some languages relative to others. Most of the time, Banerjee lamented, the issues associated with multilingualism in cyberspace are discussed only by technical people. The discussion "doesn't reach the policymakers who say, 'get the job done, I don't care how you get it done, I want these languages to be in cyberspace.'" But that is a part of what it needed.
Banerjee thinks promoting multilingualism will require:
- sharing the true scope of the problem, "the direness of the situation"
- explaining how critically important it is to provide multilingualism in cyberspace, and
- creating a coalition of players, of the most important players, "including you, Mr. Chairman, ICANN and other players, to see what should be done as a prospective approach."
A participant inspired a lot of panelist response with her question about why people and countries have not taken IDNs seriously. There is the fact that IDNs address only one part of the problem, but Emily Taylor argued that IDNs have been taken seriously. "Within the IGF environment alone, this topic has been something where there has been quite a lot of push and pressure -- quite rightly -- for the technical community to bring this to everybody, but the simple answer to your question is that it's only this year, or the past 12 to 18 months, that internationalized domain names have been available all the way through the domain name and that's for a combination of tech reasons and also organizational priorities within the ICANN environment." Taylor went on to credit Chris Disspain as "one of the key people who instigated the fast track for countries."
Ram Mohan pointed out that the Indian and Arabic scripts are used in many languages with individual requirements for each specific language, and "that's why overnight success has taken so long."
Chris Disspain reflected: "It's also why, perhaps not for the last time but certainly for the first time, when the resolution was passed introducing IDNs, this is the first time I've ever seen a roomful of people crying. It was really quite extraordinary."
It strikes Andrew Mack, a participant from Washington, D.C., that "you're fighting a losing balance on the language preservation side unless we can find some link-up with funding, some link-up with some sort of a market mechanism or market funding, because otherwise it's like an archive." Chuck Gomes agreed. It is too expensive for many underserved language communities to apply for a new gTLD. The proposal, then -- and "so far we haven't seen any action" -- is to bundle multiple IDN versions of a new gTLD, so for the same ASCII string or another IDN string, at minimal additional cost, underserved language communities can be included.
Some languages will be lost, however. A participant from Niue in the Pacific Islands, Immanuel, asked what could be done for his dying language, used by some of the 1,200 people in his community living on the island of Niue, with 20,000 more community members living off the island. Immanuel is looking for ways to encourage more children to learn the language, perhaps with apps for games. Indrajit Banerjee advocated that Immanuel and his community "lobby UNESCO very hard to promote and preserve your language and do everything possible in the eventuality that it disappears with the next generations, [so] it is well preserved and all its richness is preserved." Emily Taylor underscored the importance of the preservation of dead and dying languages. There is "such a lot to learn from what is left behind."
Indrajit Banerjee provided a historical perspective with his reference to how underserved language communities had previously fallen victim to the requirements of television markets. Banerjee pointed out that one of the advantages of the Internet for underserved communities is that it does not share with the mainstream media the extraordinary expenses associated with crossing borders. Cultures that may not make up a significant market in one location, could gain numbers and market strength by crossing borders. As Chuck Gomes said, we do not have to think of languages in a national context.
Emily Taylor pointed out that while endangered languages are a real problem, "we're not talking about endangered languages, we're not even at the endangered languages. We're not properly supporting Arabic, Chinese script, hugely popular languages at the moment." A participant from Senegal volunteered that IDNs in Arabic script are needed inside West African countries, "where there are many people who are considered to be illiterate because they don't read and write in the official language - but many people know how to read and write in Arabic."
What could UNESCO do, if sufficient solutions do not arise from the marketplace? According to Indrajit Banerjee, while it's not their area of expertise or competence, UNESCO would certainly take action to ensure that its 193 Member States take the necessary steps to see that multilingualism is accommodated in cyberspace. If required, UNESCO would "use all our community power to ensure that Member States put pressure on the organizations concerned, especially the technical people, to get this done, and the point is that UNESCO will not stand for anything - unless it's in some multitechnical problem, which I understand it isn't, that would hamper us from promoting languages and linguistic and cultural diversity."
Participant Mohamed El Bashir is running an IDN ccTLD in Arabic. It went live on 18 October with lots of local demand and fewer than 100 domains in one month, most of them for government entities. Asked if he, as the ccTLD manager, or the government will need to involve themselves in persuading application providers to provide the ability to interface with those in Arabic, El Bashir answered strongly in the affirmative. He has already had to intervene with a browser vendor to ensure that IDN strings are enabled to work on their browser. El Bashir did, however, find another browser vendor to be more proactive, "once they [saw] ICANN [had] delegated the IDN string in the root, they included that in the update of the browser. We have to reach out and talk to those guys. Also, we're trying to reach out to Twitter, and the community is doing that, by the way. Currently there's a community of good bloggers and Twitter who are trying to ensure that [the] Arabic language [has] better support within Twitter."
Emily Taylor expressed optimism about the prospects for language-community based efforts. She reported that at Facebook, they "use users as the translators of not only the content but the framework itself and to set up a process where the bulk of the work is done locally by amateurs, if you like, but people who are expert in the language because they speak it. And also, of course, the whole thing of creating an interface in your own local language is not as simple as just a straight like-for-like translation, it is also what is also culturally appropriate for you or meaningful in those labels. So, that seemed to me to be a very hopeful and effective way of mobilizing the speakers of smaller languages."
Chris Disspain asked about underserved language communities' ability to create their own applications and become self-sufficient. Ram Mohan surmised that some language communities could self-organize and build something for themselves -- and some will, "but I fear that in more cases they just won't know where to go, the resources are not apparent, the tool kits don't exist." What Mohan proposes is the creation of an IDN technology commons. "We don't have that. And that's something that brings in organizations from industry - that bring those resources and tools, et cetera, and make them available for common, shared use. We desperately need that if we want to preserve and continue to let languages come online and thrive online, because with a common set of shared technologies - then the scenario - of a local language community self-organizing becomes very viable." Asked about who would be in charge of such a commons, Mohan provided the hoped-for response: "We should ask UNESCO."
Without an IDN technology commons, underserved language communities, according to Mohan, "may end up going directly to using their mobile phones which do support the local language and - doing SMSs or the next generation of SMS because that does support local languages far better than what the wired Internet, if you will, does."
Chris Disspain agreed that a significant proportion of IDN users will be using them on their mobile phones and asked Chuck Gomes what issues will emerge as a result. "There obviously are security issues that arise," said Disspain, and "we may be in the situation where not only do we have to actually cajole, persuade, force, whatever, people to do the applications, et cetera. There's also going to have to be a massive training exercise on how to use the technology now that people have access to it linguistically."
In the least developed parts of the world, people are accessing the Internet by mobile, agreed Chuck Gomes. "Now - let's just talk about IDN domain names. In the DNS and IDN domain name is the nonsensical string that starts with 'xn--' and a whole bunch of characters after it, they're going to be longer names. Now, when they're translated into the local script, it shouldn't be that long, but those - are going to be issues that we're going to confront." Gomes also expressed concern about Whois, an "archaic protocol" that "doesn't accommodate IDNs at all. Is work going on in that? Yes, but it's work coming very slow. So a lot of those issues, being able to go to Who Is, and what are you going to see? Are you going to see 'xn--' and some meaningless string of characters? The average user, most of us wouldn't know what it meant without looking it up somewhere. So there are some real issues there to be worked."
Expectations of progress in the next five years
Chris Disspain asked each panelist to look five years into the future of multilingualism in cyberspace.
"Already the world is a much more multilingual place, more people speak more languages than [at] any other point before," enthused Indrajit Banerjee, but "I'm hoping that this trend of learning - those languages which offer economic opportunities, or mainstream dominant languages, changes at some point. Some incentives can be provided so that people also learn languages that are not thriving."
"I think we'll really start to see the advantages of IDNs in about five years," said Chuck Gomes. "I think that's more than enough time for the application developers to do some really good work, for the communities to get behind the new scripts in domain names and in the content."
Looking out five years, Emily Taylor hoped to see many ccTLDs and gTLDs in different scripts that "prove to be a catalyst not just for application developers, but for users to create content." She also hopes to find "that we start to understand better the multilingual landscape that we have because it will enable organizations such as UNESCO that care about language preservation to target their resources more effectively and also to use the Internet - to the benefit of multilingualism, to connect people, the cultures speaking those languages, and also to preserve and - keep available examples of languages or cultures that have died out."
Ram Mohan declared that we live in exciting times. "We don't recognize it because we're right in the middle of it, but perhaps ten years out we can look back and say, wow, that was amazing to be at this start. In five years' time, not only [do] I think there will be a thriving set of domain names that are in local languages, but I am very hopeful that there will be an integrated user experience. That's something that we really need to make progress on, but I expect that for the major languages that are spoken, for the major populations that speak [the] languages, I certainly hope that we will see integrated user experiences in local languages, and the very start, the budding start of thinking about how to share these various technologies in perhaps a commons-like environment, that allows for languages that are endangered, to begin a new cycle of life online. That's what I'm really hoping for."
Chris Disspain concluded the workshop by explaining his own optimism, which he credits to "the level of passion that exists in the communities now only just being able to be fully served by accessing the worldwide web or the Internet. It is a completely different world where communication has gone to a whole new level, and if we think Twitter is really smart and Facebook is really smart, wait and see what happens next. The scary thing, of course, to somebody like me is, if it's happening in Chinese or Arabic, I wouldn't have the faintest idea what's going on."