I wrote a Python script for similar purpose several months ago and I can share it here. The CSL styles are parsed with Python’s xml.etree.ElementTree for easier further analyzing.
It’s not trivial to determine if macros create same output directly from their code and I’ve not finished this feature. First, many CSL attributes has default values (e.g., <text variable="title" form="long"> and <text variable="title"> produce same output). However some of them are context-dependent (e.g., <group delimiter=""> inside a <group delimiter=", "> is not the same as <group>). Second, the condition attributes of <if> and <else-if> accepts multiple variables and their order doesn’t affect the output.
Yeah, I’d start by just literal matching: are they the exact same macro – since a lot of styles are derived from each other, that’s going to take you pretty far.
The next low-hanging fruit would be to strip stuff that’s always irrelevant, i.e. form="long", vertical-align="baseline", and various font-...="normal" (I think that’s about it).
Everything else is going to be quite complicated
I agree that is elegant, but I’m worried that we’re developing past actual user needs. What’s the exact user story here? I’m just not convinced that we have a ton of use cases where this would help: most changes people need to make are – thanks to the style matching by the visual editor – quite small, and for folks who struggle with making those style, swapping around macros is also going to be a challenge.
For sure there are details to sort out with the idea, but I think forcing duplication across styles probably is too high a cost to pay for any benefit, particularly if we do adopt a new model?
Consider that the best, most widely-used, CSL 1.0 styles are mostly macro definitions.
This, in any case, is the simplest change I’m making, and easy-to-reverse if it is a bad idea (it’s just a few lines of code in the model definition, and independent of the rest of it), or add to CSL 1.0 if it’s a good idea.
In the new model, the difference between in-style and external templates.
@zepinglee forgive this probably dumb question, but what does the “most-common” property indicate?
PS - I decided to look at some of the very common “author” macros. Seems they’re there pretty much only to configure author substitution. The default substitutions are also extremely common across all the styles.
Hence, I decided to do this in the new model, which is the default value.