Method Name Syntax

tl;dnr: Skip the motivation and jump straight to the recommendation

This is a discussion of some of the issues around the naming of methods, and proposes possible solutions to some of the problems arising.

It concerns itself only with the syntax of method names. In the past the CCCBR has gotten itself embroiled in various controversies involving the semantics of names — for example, prohibiting use of words CCCBR members considered offensive. This discussion ignores such matters entirely.

Much of what is discussed here involves the minutiae of the orthography of various languages, and how this is handled in the Unicode standard for electronic representation of such things. I am no expert on any of this, and so may well have much wrong, and have omitted many important things, here. This is simply my best shot given my current state of knowledge, augmented with the comments in New Framework for Ringing discussion on Slack.

It should be noted that the CCCBR today distinguishes between method names and method titles, the name being a part of the title. It appears this distinction will continue in the Central Council Framework for Method Ringing (CCFMR). As currently used no serious syntactic issues arise for parts of a title other than the name. Any complexity arising in the parts of a method’s title beyond its name involve classification of methods. Such matters are also not part of this discussion.

This discussion begins with some thoughts on why we even have method names, that being possibly germane to how we structure those names. It then moves on to consider what sets of characters we might choose to use to form names; this consideration naturally leaks into a later section, too, as issues about what characters to include also arise from how we compare names, the subject of a later section. And is leaked into by yet another, later section, on current practice since we probably want to preserve existing method names. There is also a section devoted to current practice, which sadly is ill-defined and confusing. Then the promised section on comparing names. And finally a concrete recommendation.

Why have method names?

The primary reason we worry about names, or more precisely, titles, of methods is so that different ringers can talk about the same method. So that when ringers from Edinburgh get together with ringers from Sydney, and one of them says “Let’s ringing Cambridge Surprise Major” they all mean the same thing.

This, of course, glosses over the fact that deciding whether or not two methods are the same or different is itself a not so obvious problem, and is addressed, directly or indirectly, by much of what is in the CCFMR.

There are other, lesser, reasons we worry about names:

There are two, complementary issues with binding method names to methods. These issues are really about method titles, but the only tricky part is the name portion of a method’s title. These two issues are:

The rest of this discussion will ignore the name/title distinction, and informally talk about things like “Must a name uniquely identify a method”, as a short hand for discussing names in the context of titles.

While for the most part we like to have a reasonably unique name, for any given method, I don’t think there is a great deal of harm in some methods being known by two or more names, so long as it doesn’t get out of hand. Though the current CCCBR decisions deprecate this, it is a prohibition ignored by, say, Bastow Doubles, St Helen’s Doubles and Cloister Doubles, which seems to have caused no harm.

Even I, however, draw the line at the same name referring to unrelated methods. If we are going to keep collections of named methods, it seems inescapable that we must always have a name always mean the same method to satisfy our primary reason for having names, described above.

For either of these issues to be addressed it becomes important to be able to tell whether or not two names are the same. While at first this may sound trivial, it is not. Is the name “London No.3 Surprise Royal” the same as “London No 3 Surprise Royal”? What about “Aluminum Surprise Major” and “Aluminium Surprise Major”? “Décembre Delight Major” and “Decembre Delight Major”?

A further consideration is what practical influence does the CCCBR have over method names. It is primarily what names are enshrined in the method collections that the Council maintains and publishes. To a lesser extent how the Council publishes performances which cite methods. It has far less control over what ringers say, or even write, themselves. The CCFMR is really more about how the Council chooses to record identifiers for methods than it is about how ringers refer to methods, or identify them in their own minds.

Character set

An obvious issue to be considered is what characters can be used to form method names?

Historically change ringing has been practiced almost exclusively by English speakers. Based on this a not indefensible position might be to limit method names to the twenty-six Latin letters used to write English words. There are some impediments to this, however:

Perhaps the most persuasive argument that we need a richer repertoire of characters in names than a minimal English language set is that ringers have, by using them already, demonstrated an appetite for such richness. If we are attempting to support what ringers choose to do rather than limit what they may do it behooves us to support a rich character set.

Assuming we wish to support all existing names, one obvious thing to do is augment the letters with all other characters, including diacritics, that have been used to date. Even this is not as obvious as it might seem, as there is there is ambiguity about what characters have been used to date, as discussed in the next section on current practice.

And freezing the characters allowed to just those used to date also raises some possible problems:

In addition to Latin letters, possibly augmented with diacritics, some languages using the Latin script also add a small repertoire of other characters or ligatures. For example, the eth, ð used in Icelandic and Faroese, or the ash, æ, in Danish, Norwegian, Icelandic and Faroese, where it is viewed and used as a distinct letter, as opposed to the diphthong ae it is in English. It may well be that we don’t need to worry about such characters as they are only used in languages of countries to which change ringing has not yet spread. But it is still an issue to bear in mind.

When deciding what characters to include it is important to consider one of the reasons we want names: for use in method collections supplied and maintained by the CCCBR. Such collections are, of course, now maintained electronically. And use of such collections by software is an important consideration, as well. In the past much software could support only a severely limited character repertoire, and such a limitation can still afflict legacy software which is still in use. Recently written software, on the other hand, can be easily crafted to store and manipulate any of the over 130,000 characters of Unicode. However many, most, computer fonts available do not cover the whole of Unicode, so display of names using obscure characters can be a problem. And similar problems arise when preparing printed versions of collections, or printing method names in performance reports.

A further consideration, most clearly for electronic use, but also for even non-electronic communication, is that punctuation, besides possibly being used within names, may be needed to demark method names from surrounding matter. If we swallow all common punctuation marks into the reportoire of characters from which names can be built those having such needs will have to craft more complex escape mechanisms when embedding method names in other text. Not an insurmountable problem, but again one worth bearing in mind.

If we decide to allow a wider variety of diacritics than have been used to date it might be appropriate to adopt all those of some electronic standard that covers a variety of languages. For example, we might cite the Unicode standard (http://www.unicode.org/versions/Unicode10.0.0/), and include all characters with the major category “letter” selected from from the Basic Latin, Latin-1 Supplement and Latin Extended-A blocks. It is worth noting that in this context a basic letter with a diacritic, such as é (e-acute), is considered a letter.

This particular choice covers all European languages using the Latin script, and, with one exception, Uluṟu Delight Minor, includes all diacritics used in method names to date. The latter case would also hold if we excluded the Latin Extended-A block, but in that case there would be some characters required by, for example, French and Dutch, which would be omitted.

An alternative scheme might be to more finely tune exactly which characters we do and don’t want, and enumerate them. However trying to craft our own repertoire of letter characters is going to be both fussy and error-prone, and require a long enumeration of letters in the CCFMR, which might be awkward. If we do want to support some repertoire of diacritics it will probably be best to succinctly cite some outside standard, even if that brings in more than we might otherwise want.

One important character we’ve glossed over here is the space character. There are many extant method names that are made up of two or more words. And there is at least one case, White Hall Surprise Major and Whitehall Surprise Major, where two method names differ only in how they are broken into words. Thus it would seem essential to include the space character in the repertoire of characters allowed.

However it is more complicated than just including it. While we would undoubtedly consider “Need” and “Ned” different names, we surely don’t want to consider “New Cambridge” and “New  Cambridge” (the latter with two spaces) distinct. And we almost certainly don’t want names like “ Cambridge” or “Cambridge ” (with leading or lying spaces). So, while we will have to include the space character, its use will have to be modified by extra considerations in some way.

Similar considerations may also apply to the use of the hyphen.

Even leaving aside space and hyphen, simply having a repertoire of characters from which we can assemble names may not be sufficient. Even if we limit the available punctuation characters to those in existing method names, will we be comfortable with names such as “.”, “,”, “)” or “&”?

Or even “'&&&"&(&&).,,,.,&=&”?

It seems likely that sensible name construction may require thought about how non-letter characters can be used.

Were such names allowed it would complicate both indexing method collections and calling spliced. In neither case insurmountably so, but enough to be an annoyance we should bear in mind.

Something further not yet mentioned is superscript and subscript numerals. These have been used, sort of, in existing method names. This seems best discussed further in the imminent, next section, on current practice.

Current practice

Arguably the de facto standard at the time this document was first written was the Methods Committee’s collection of methods, maintained by Tony Smith. That collection continues to be maintained by Tony and will continue to be referred to here, though it has now been superseded by the Council’s new online collection.

Unfortunately the now superseded Methods Committee’s collection is itself inconsistent. This collection is presented in multiple formats, and and method names are not all presented in the same form. Sometimes a richer representation is used, and at others a more primitive one. Here are some example pairings:

Janáček Surprise MajorJanacek Surprise Major
E=mc² Surprise MajorE=mc2 Surprise Major
UB₃₁₃ Surprise MajorUB313 Surprise Major
Nu.Q™ Alliance MaximusNu.QTM Alliance Maximus

The more primitive form generally omits diacritics, just using the base letter; and converts subscript and superscript numerals to lining numerals; and makes a similar transformation to .

In this collection names are normally presented in title case with the first word and all other important words capitalized. For example, “Champion of the Thames”. But not always: “Sugar beet Surprise Major”.

As far as I know little thought or definition has gone into names used here, things just sort of happening with no pre-planning, and has changed over time. This is not necessarily a bad thing since it allows easily adhering to what ringers have chosen to do, but it may (or may not) eventually lead to confusion. And it certainly complicates things for people writing software who have to try to intuit what is needed, and has generally led to inconsistency in the results.

Other resources include:

Comparing names

One of the most fundamental things we need to do with method names is compare them for some sort of equality.

An initial, naïve comparison is simply to compare the two names as sequences of characters, and if the sequences are of the same length and each character in the same position is the same, declare the names the same, and otherwise different.

When this comparison says they are the same, all is well. But we may disagree with it when it says they are different.

The first issue is case. We don’t want “Cambridge” and “cambridge” to be different, so we must ignore case in this comparison. Even if we insist that for the well known method it always be spelled “Cambridge”, we will want to view “cambridge” as equal to it so a different method doesn’t get named “cambridge”.

This gets more complicated with diacritics, however. While an English speaker may prefer either “résumé” or “resume” the accents are rarely viewed as mandatory, and we would probably view “Résumé Surprise Major” as equivalent to “Resume Surprise Major”. In many other languages, however, accents are a mandatory part of spelling. For example, the German words “schon” and “schön” have completely different meanings; in fact, if you’re ever in Mainz you can visit the Kulturclub Schon Schön. If you were unable to use umlauts for some reason you’d spell this club’s name “Kulturclub Schon Schoen”, not “Kulturclub Schon Schon”.

I believe there is no way to make a one size fits all comparator of words that works for any language. Typically, modern software deals with these issues by using “locales”, and only compares words in a context for one language. Method names, though, are a cross-locale problem. It will probably be best to simply treat method names as essentially English, possibly augmented with loan words. When we want to compare two names, simply ignore any accents. This isn’t right for many languages, but it is probably the best we can do.

A further complication for those implementing software to compare method names, though not directly relevant for defining that comparison, is the issue of precomposed characters versus combining diacritical marks. Typically a letter with a diacritic can be described in Unicode in at least two ways: as a single character, or as a two character sequence, and software needs to be aware that it may be holding the same name in two different formats. These need to be canonicalized into one or the other form for comparison. This may need to be noted in some ancillary material in the CCFMR as advice to software authors. It will also be useful to us below in the Recommendation section.

Diacritics are not the only issue in this vicinity. Consider the letter æ. In Enlish this is simply a ligature of a and e, but in Danish it is a distinct letter. It is tempting to compare æ as equivalent to ae. This may be an appropriate way forward, for example “Cæsium Surprise Major” is the same as “Caesium Surprise Major”. However this is a little more complex than dropping accents, as English words spelled using æ are occasionally spelled differently when the ligature is not available. Consider, for example “æternal” and “eternal”. On balance, though, it will likely be best to treat such ligatures as equivalent to the two letter sequences.

Since various punctuation characters are (or, at least, have been) allowed in method names the issue of space adjacent to punctuation characters becomes important. There is currently a method known “London No.3 Surprise Royal”. We probably want to consider that as equivalent to “London No. 3 Surprise Royal” and “London No 3 Surprise Royal”, lest someone give a new method one of those latter names. Similarly “Calf of Man (Low) Lighthouse Surprise Minor”, “Calf of Man(Low) Lighthouse Surprise Minor”, “Calf of Man(Low)Lighthouse Surprise Minor”, and “Calf of Man ( Low ) Lighthouse Surprise Minor”.

A similar issue arises with superscript and subscript numerals. Do , and 4 compare as the same? It seems prudent that they do.

Recommendation

Based on the on all the above, together with folks’ comments in recent months, here’s my third pass recommendation of how to form method names. Here “name” is being used strictly as defined in the CCFMR, not as shorthand for “title”. I don’t claim this recommendation is the best we can do, and it does gloss over some of the pitfalls described above; it’s just as good as I’ve been able to think of so far for a practical approach.

In the following “the Unicode standard” refers to version 10.0.0 (http://www.unicode.org/versions/Unicode10.0.0/). Various attributes of individual characters are given the files comprising the Unicode Character Database (UCD, http://unicode.org/ucd/), and particularly the file https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt, which contains both general category information and case folding information. Unicode blocks are defined in https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt. Normalization is described in https://www.unicode.org/reports/tr15/tr15-45.html#Norm_Forms.

Method names are a sequence of from 1 to 120 characters selected from

subject to the further constraints that a name must (a) contain at least one character of Unicode general category Lu, Li or Nd, and (b) that a name may neither begin nor end with a Space character, nor may it contain within it two consecutive Space characters.

Two names are considered the same if they would be reduced to the same sequence of characters by the following process:

Notes: