Safely truncate Sitecore Rich Text / Html

We came across the need to be able to truncate text (html) entered in a rich text field to be able to display the same on compact modules / show article summaries etc.
There are many different approaches that can be taken here to be able to safely truncate html. But here’s the code that passed our testing and is currently live! We are parsing the html into XML to achieve this.

Please note:
This code truncates the text to the nearest word, and if the character limit is reached within the first word, the truncation is done mid word instead of returning no text at all. It will also allow you to optionally append ellipses to the truncated text.

        public static string TruncateHtml(string text, int charCount, bool appendEllipses = false)
        {
            text = HttpUtility.HtmlDecode(text);

            if (charCount <= 0) return text;

            try
            {
                // your data, probably comes from somewhere, or as params to a methodint 
                XmlDocument xml = new XmlDocument();
                xml.LoadXml("" + text + "");
                // create a navigator, this is our primary tool
                XPathNavigator navigator = xml.CreateNavigator();
                XPathNavigator breakPoint = null;

                // find the text node we need:
                while (navigator.MoveToFollowing(XPathNodeType.Text))
                {
                    int remainingCharacters = charCount;
                    charCount -= navigator.Value.Length;
                    if (charCount <= 0) { string lastText = TruncateText(navigator.Value, remainingCharacters) + (StripHtml(text).Length > charCount && appendEllipses ? "..." : "");
                        navigator.SetValue(lastText);
                        breakPoint = navigator.Clone();
                        break;
                    }
                }

                // first remove text nodes, because Microsoft unfortunately merges them without asking
                while (navigator.MoveToFollowing(XPathNodeType.Text))
                {
                    if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After)
                    {
                        navigator.DeleteSelf();
                    }
                }

                // moves to parent, then move the rest
                navigator.MoveTo(breakPoint);
                while (navigator.MoveToFollowing(XPathNodeType.Element))
                {
                    if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After)
                    {
                        navigator.DeleteSelf();
                    }
                }

                // moves to parent
                navigator.MoveToRoot();
                navigator.MoveToFollowing(XPathNodeType.Element);
                return navigator.InnerXml;
            }
            catch (Exception)
            {
                return text;
            }
        }

        private static string TruncateText(string str, int maxCharCount, bool appendEllipses = false)
        {
            if (string.IsNullOrWhiteSpace(str)) return string.Empty;

            str = HttpUtility.HtmlDecode(str);
            if (str.Length <= maxCharCount || maxCharCount == 0) return str; int originalStrLen = str.Length; if (str.Length > maxCharCount)
                str = str.Substring(0, maxCharCount + 1);

            int ellipsePos = str.LastIndexOfAny(new[] { ' ', '.', ',', ';', '!', '-', ']', '}', ')', '*' });
            if (ellipsePos != -1 && str.Length > ellipsePos && ellipsePos > 0)
                str = str.Substring(0, ellipsePos);

            if (ellipsePos == -1)
            {
                str = str.Substring(0, str.Length - 1);
            }

            if ((str.Length < originalStrLen) && appendEllipses)
            {
                str = str.TrimEnd('.').TrimEnd(',').TrimEnd(';').TrimEnd('-');
            }

            return str + ((str.Length < originalStrLen) && appendEllipses ? "..." : string.Empty);
        }

The TruncateText() method only truncates a given string to the nearest word, while the TruncateHtml() method is responsible for converting the passed in text into XML and safely truncating the same, taking care of maintaining valid XML and hence valid HTML.

We have also wrapped any passed in text into an external XML tag , to make sure that passed in free text also can be parsed in this way.

When it came to using this in Sitecore, here’s an example of how we have put this to use!

@Html.Raw(Editable(Model, m => m.Title, x => TextUtilityStatic.TruncateHtml(x.Title, 50, true)))

So this way, the field would still be editable in page editor mode, but output the truncated html on the page!

Advertisements

, , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: