[ANSTHRLD] The SCA Armorial And Regular Expressions
kobrien at texas.net
Thu Dec 20 22:07:30 PST 2007
At 02:44 PM 12/20/2007 -0600, you wrote:
>Thank you for your insight. If I understand you correctly the Welsh name:
>Morgan William David
>Would conflict with:
>Morgan ap William ap David
>Morgan William ap David
>Morgan David William
>Morgan David ap William
And a bunch of others, such as:
Morgan ap Dafydd ap Guillem
>If so, then I could easily explain how one can search for all of these
>possibilities (and a few more) in a single query:
>Morgan(( ap)? (William|David))+
>One could broaden one's search with a slightly different query:
>Morgan .*? (William|David)
>This would find all of the above names and any other names that contained
>Morgan followed by William or David.
That would be clear by change in the number of elements.
>I am not worried about conveying the technique right now; I just want to
>make sure that an article describing these things would be of use.
Actually, what would be really cool would be a web front end where someone
could type in a name to search on and a backend function generated the
regex expression and searched using it. Then folks who are not technically
inclined could use it easily.
>But, it sounds like one of the major issues in conflict checking is knowing
>that these kinds of permutations (either in structure or spelling (e.g.
>Gaelic and Anglicized names)) exist. This would not be within the scope of
>the article (or within my extremely limited education on the subject).
The real problem in online conflict checking for names is figuring out how
to get the conflict to pop up when the elements don't match. Searching on
matching elements isn't hard if you follow the "hints" page at oanda.sca.org.
The trick is finding the conflict for a submission where the elements don't
match that of the submission. For example, here's a few:
[the conflicts are at the end of this email:]
1) Katherine Nyk Donald
2) Megge de Richemond
3) Eliza Davis
4) John de la Mare (actually has 2 conflicts)
>However, with a good understanding of pattern matching, one could maintain a
>list of common transformations. For instance, consider Latin names:
>Male first names ending in -us are synonymous with names ending in -o
>(Davidus = Davido = David). So, in the list one would document:
>-us, -o: (us|o)?
That's nominative versus one of the other declentions (ablative maybe?
It's not striking me as genitive but then I'm tired). In a properly formed
name, that change is based on the position of the element in the name.
>One could add some other common Latin issues:
>J, I: [ij] # J's and I's are sometimes interchanged
>-es, -e: es?
>-as, -a: as?
>So, when researching the name Iohanne, one would look on the Latin list
>above and see that the I, and the -e should be transformed thusly for the
>That would match all common spellings:
Now, some of these aren't Latin forms. For example <Johanne> you see in
Middle English records (non-Latinized). So, the suggestion here is a good
one to jump this language change.
>Of course, as you mention Gaelic makes this really difficult. I won't touch
>Kellie = Ceallach with a ten foot poll! However, it seems as though
>Genitive endings are somewhat regular. Ach => Aigh: a(ig|c)h, so we could
>add that to the Irish list.
Actually, there's something like 10 different genitive formations.
And you have to account for decades of misspellings in the O&A.
It's really ugly. Let me think on this one some. My brain is mush at the
moment due to being in the middle of a corporate move that has me working
>So, I hope no one thinks I am trying to move in and make a mess of things.
>I am just curious about this process and talking out loud. I definitely
>don't want to make more work for anyone; I am just quite interested in the
>subject matter! :-)
As someone who has deals with name recognition and matching professionally
at the moment (talk about turning your hobby into your work!), I totally
If you're interested in figuring this type of thing out, it could be really
useful as part of a search engine on the O&A. I don't know whether the
interface part is your area or not, but you could probably find someone to
do an interface where a user enters the submitted name and the engine
generates one or more regex and uses them to search.
An engine that did this would need the programming logic to mimic some of
the sections of the RfS to be really useful and avoid false negative matches.
Given names would need to be handled by a different logic flow than bynames
since diminutives conflict in given names but not necessarily in bynames.
For examples of what conflicts with what, you can get lists to test your
You can ignore the majority of the Gaelic rulings before 04/2002.
For "sound" conflicts, you may be able to run them via a modified version
of soundex (where you use a soundex number for the first letter in addition
to the rest).
And here's the conflicts:
1) Caitlin MacDonnell (05/82 via Meridies)
2) Margaret Richemont (11/94 via Atenveldt)
3) Elizabeth Davies (08/92 via Atenveldt)
4) Jeanne de la Mer (02/85 via Meridies) and Sean Dalamara (10/95 via Atlantia)
More information about the Heralds