It seems a lot of this project uses String.toLowerCase() instead of String.toLowerCase(Locale.ROOT), which causes issues on Turkish locales.
In tr-TR, title becomes tıtle, and (I think more importantly) Xdiv becomes Xdıv, which I think is breaking layout or parsing of elements.
For the same document, for my BoxRenderer, on an English locale I get the following elements sent through startElementContents:
- Xdiv, html, body, label, Xspan, select, option, ...
However if I set -Duser.country=TR -Duser.language=tr I instead get the following:
- Xdiv, html, body, Xdiv, label, ...
Clearly there's a different code path in the different locale, I suspect it has to do with toLowerCase() and/or toUpperCase() throughout either CssBox or jQueryStyle. For example in cz.vubtr.web.csskit.ElementMatcherSafeCI#matchesClass or org.fit.cssbox.css.HTMLNorm#attributesToStyles.
Related: radkovo/jStyleParser#29
It seems a lot of this project uses
String.toLowerCase()instead ofString.toLowerCase(Locale.ROOT), which causes issues on Turkish locales.In tr-TR,
titlebecomestıtle, and (I think more importantly)XdivbecomesXdıv, which I think is breaking layout or parsing of elements.For the same document, for my
BoxRenderer, on an English locale I get the following elements sent throughstartElementContents:However if I set
-Duser.country=TR -Duser.language=trI instead get the following:Clearly there's a different code path in the different locale, I suspect it has to do with
toLowerCase()and/ortoUpperCase()throughout either CssBox or jQueryStyle. For example incz.vubtr.web.csskit.ElementMatcherSafeCI#matchesClassororg.fit.cssbox.css.HTMLNorm#attributesToStyles.Related: radkovo/jStyleParser#29