How to Parse HTML Tag from String in SQL Server

How to parse html tags and return only text from string in SQL Server without using regular expressions or remove text between < and > and get only text from string in SQL Server

To implement this functionality we need to create one user defined function to parse html text and return plain text.

Function to replace html tags in string

CREATE FUNCTION [dbo].[RemoveHTMLTag]

(@TextHTML VARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

AS

     BEGIN

 

         –DECLARE @TextHTML VARCHAR(MAX) = ‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’;

 

         DECLARE @StartText INT, @EndText INT, @Length INT;

       

        SET @StartText = CHARINDEX(‘<‘, @TextHTML);

         SET @EndText = CHARINDEX(‘>’, @TextHTML, CHARINDEX(‘<‘, @TextHTML));

         SET @Length = (@EndText @StartText) + 1;

         WHILE @StartText > 0

               AND @EndText > 0

               AND @Length > 0

             BEGIN

                 SET @TextHTML = STUFF(@TextHTML, @StartText, @Length, );

                 SET @StartText = CHARINDEX(‘<‘, @TextHTML);

                 SET @EndText = CHARINDEX(‘>’, @TextHTML, CHARINDEX(‘<‘, @TextHTML));

                 SET @Length = (@EndText @StartText) + 1;

             END;

         RETURN LTRIM(RTRIM(@TextHTML));

     END;

GO

 

SELECT dbo.RemoveHTMLTag(‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’);

Parse HTML Tag from String in SQL Server

Also can try following option,

CREATE FUNCTION [dbo].[RemoveHTMLTag]

(–@TextHTML VARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

AS

     BEGIN

         –DECLARE @TextHTML VARCHAR(MAX)= ‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’;

 

         DECLARE @ReplaceTxt XML, @result VARCHAR(MAX);

        

        SET @ReplaceTxt = REPLACE(@TextHTML, ‘&’, );

         WITH doc(contents)

              AS (SELECT chunks.chunk.query(‘.’)

                  FROM @ReplaceTxt.nodes(‘/’) AS chunks(chunk))

              SELECT @result = contents.value(‘.’, ‘varchar(max)’)

              FROM doc;

         SELECT @result;

     END;

GO

SELECT dbo.RemoveHTMLTag(‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’);

By using above UDF, We can clean the HTML tags from the string.

Happy SQL Querying 🙂

Advertisements