How to Parse HTML Tag from String in SQL Server

How to parse html tags and return only text from string in SQL Server without using regular expressions or remove text between < and > and get only text from string in SQL Server

To implement this functionality we need to create one user defined function to parse html text and return plain text.

Function to replace html tags in string

CREATE FUNCTION [dbo].[RemoveHTMLTag]

(@TextHTML VARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

AS

     BEGIN

 

         –DECLARE @TextHTML VARCHAR(MAX) = ‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’;

 

         DECLARE @StartText INT, @EndText INT, @Length INT;

       

        SET @StartText = CHARINDEX(‘<‘, @TextHTML);

         SET @EndText = CHARINDEX(‘>’, @TextHTML, CHARINDEX(‘<‘, @TextHTML));

         SET @Length = (@EndText @StartText) + 1;

         WHILE @StartText > 0

               AND @EndText > 0

               AND @Length > 0

             BEGIN

                 SET @TextHTML = STUFF(@TextHTML, @StartText, @Length, );

                 SET @StartText = CHARINDEX(‘<‘, @TextHTML);

                 SET @EndText = CHARINDEX(‘>’, @TextHTML, CHARINDEX(‘<‘, @TextHTML));

                 SET @Length = (@EndText @StartText) + 1;

             END;

         RETURN LTRIM(RTRIM(@TextHTML));

     END;

GO

 

SELECT dbo.RemoveHTMLTag(‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’);

Parse HTML Tag from String in SQL Server

Also can try following option,

CREATE FUNCTION [dbo].[RemoveHTMLTag]

(–@TextHTML VARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

AS

     BEGIN

         –DECLARE @TextHTML VARCHAR(MAX)= ‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’;

 

         DECLARE @ReplaceTxt XML, @result VARCHAR(MAX);

        

        SET @ReplaceTxt = REPLACE(@TextHTML, ‘&’, );

         WITH doc(contents)

              AS (SELECT chunks.chunk.query(‘.’)

                  FROM @ReplaceTxt.nodes(‘/’) AS chunks(chunk))

              SELECT @result = contents.value(‘.’, ‘varchar(max)’)

              FROM doc;

         SELECT @result;

     END;

GO

SELECT dbo.RemoveHTMLTag(‘This <i>is</i> Microsoft <b>SQL</b> Query <i>Test</i>’);

By using above UDF, We can clean the HTML tags from the string.

Happy SQL Querying 🙂

Advertisements

SQL Tips and Tricks

— TO GET STORED PROCEDURE TEXT/INFORMATION

SP_HELPTEXT  ‘<STORE PROCEDURE NAME>’;

— TO RENAME THE STORED PROCEDURE

SP_RENAME ‘<CURRENT STORED PROCEDURE NAME>’, ‘<NEW STORED PROCEDURE NAME>’;

— TO CHECK TABLE DEPENDENCIES

SELECT DISTINCT SYSOBJ.NAME, SYSOBJ.XTYPE

FROM SYSCOMMENTS SYSCOM

INNER JOIN SYSOBJECTS SYSOBJ ON SYSCOM.ID = SYSOBJ.ID

WHERE SYSCOM.TEXT LIKE ‘%<TABLE NAME>%’;

— TO FIND TABLE NAME CONTAINS COLUMN NAME

SELECT COLUMN_NAME, TABLE_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE COLUMN_NAME LIKE ‘%<COLUMN NAME>%’;

— TO SEARCH STORED PROCEDURE CONTAINING A CPARTICULAR TEXT

SELECT ROUTINE_NAME, ROUTINE_DEFINITION

FROM INFORMATION_SCHEMA.ROUTINES

WHERE ROUTINE_DEFINITION LIKE ‘%<COLUMN NAME>%’

AND ROUTINE_TYPE = ‘procedure’;

— TO GET SEQUENCE NO

SELECT ROW_NUMBER() OVER(PARTITION BY OfficeLocation ORDER BY UserID ASC) AS RecordNo , UserID , UserEmail , OfficeLocation , UserName FROM dbo.UserMaster;

— TO SEARCH STORED PROCEDURE CONTAINING A CPARTICULAR TEXT

Getting the next value of the primary key from MSSQL table

SELECT IDENT_CURRENT(‘<Table Name>‘);

SELECT IDENT_CURRENT(‘<Table Name>‘)+IDENT_INCR(‘<Table Name>‘);

SELECT MAX(<primary key column name>) FROM <Table Name>;

This won’t work if the database is empty, because it returns NULL.

⇒ Please note that this method can be used to predict the next id, but does not guarantee this return value. Because the time you execute request and time it take to display result, another transaction occurred, making return value null and void.

IDENT_CURRENT (Transact-SQL)

Returns the last identity value generated for a specified table or view. The last identity value generated can be for any session and any scope.

Replace multiple space with single space – SQL Server

Replace multiple space with single space – SQL Server

DECLARE @Temp TABLE (OriginalString VARCHAR(8000))

INSERT INTO @Temp (OriginalString)
SELECT ‘ Hello. This is Hello World. ‘

SELECT REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(OriginalString)), ‘ ‘, ‘ ‘ + CHAR(7)), CHAR(7) + ‘ ‘, ”), CHAR(7), ”) AS [result]
FROM @Temp
WHERE CHARINDEX(‘ ‘, OriginalString) > 0

Happy Coding 🙂