Entities are the biggest hurdle!
All over people are trying to sort this but seem to be looking at it the wrong way.
I have successfully created a char length hook using another popular html editor which allows both ideal hook points not only for parsing but also 'when' (keyup or onchange etc), latter which can really bog down some entry points... yes, it MUST be tested on large bodies of html code not just a few lines!!
I discovered it was entity identification (reliable) and subsequent conversion as the final hurdle to updating a separate div box text count val on the page. Cleaning up the other stuff, I believe, is trivial (just attack it with a regX - or two!!).
Sorry I can't be more specific as I cannot use this HTMLeditor (iRite) because of the serial number issue I have mentioned elsewhere in the forums (frustratingly requiring online use every time despite being paid for) so therefore returning to the other manual editor implementation I have not sorted this issue for this JS software.
Maybe there's a clue there somewhere.
At the end of the day so to speak naturally the most important stage is the POST data server side verification that counts (no pun) regarding data validation no matter what the client side scrubbing can do (which can be avoided by the user anyway).
PS PS Edit:
I forgot to add my bit re the blocking of chars beyond a predetermined length. As I wrote one can parse the char length and show perhaps a larger red font (ie) when a limit is reached and exceeded.
But as for blocking further entry I suggest as many have discovered even with plain-vanilla textareas that one shouldn't bother; all methods have been tried. What happens is some visitors are guaranteed to be frustrated by...
1) the weird results the different applications of char limiting cause with the cursor ending up all over the joint...
2) a cursor, however placed (on purpose or accidentally) in the middle of the already set texts causes weird and not so wonderful results when a key is pressed.
3) it is very difficult (there are JS thingies to try and do it) to CUT the body text at a suitable place BETWEEN html tags and repair any tag "pairs" that would be affected too!!
So if sale conversions or important messages etc are hoped for, DON'T play with char length blocking. Just let them know visually the entry as it is will be truncated.
Hope that helps.
(my programs never contain bugs, but do occasionally include FREE undocumented extras)