Optimiser WebClient qui récupère le code source [Résolu]

Signaler
-
 Utilisateur anonyme -
Bonjour,
je suis en train de développer une fonction pour mon programme qui permet d'analyser le texte d'une page web en téléchargent le code source de la page et en comparant ce dernier avec une liste de mot. J'utilise l'API de Fiddler qui est un proxy qui récupère toutes les URL's qui transitent sur le PC, le problème c'est que cette API récupère toutes les URL's, c'est à dire que dans ma RichTextbox les URL's contenant les png, jpg, js ... sont affichées. Ainsi, par exemple pour la page d'accueil de Developpez.com, je récupère une cinquantaine d'URL's. Le problème c'est que mon programme n'est pas capable de traiter tant de requêtes, et surtout des requêtes inutiles car un .js, png, jpg ... Ne peut pas être analysé en tant que texte...
Il faudrait donc que ma RichTextbox n'affiche que les vraies URL's comme par exemple : http://www.developpez.com/, ou que mon programme soit capable de créer un nouveau webclient pour chaque URL dans la RichTextbox.
Pensez-vous que c'est possible ?
J'espère que quelqu'un pourra m'aider car cela fait des semaines que je cherche plusieurs heures par jours comment résoudre le problème.
Je vous met le lien de l'API FiddlerCore, mon code source et le projet.
Merci de votre aide !
http://fiddler2.com/fiddlercore
https://dl.dropboxusercontent.com/u/79764502/FiddlerCoreVB.zip
Option Explicit On
Imports Fiddler
Imports System.Net
Imports System.IO
 
Public Class Form1
 
    Dim url As String = Nothing
    Public Sub New()
 
        ' This call is required by the designer.
        InitializeComponent()
 
        ' Add any initialization after the InitializeComponent() call.
 
        AddHandler FiddlerApplication.BeforeResponse, AddressOf FiddlerBeforeResponseHandler
        AddHandler FiddlerApplication.BeforeRequest, AddressOf FiddlerBeforeRequestHandler
 
        AddHandler Application.ApplicationExit, AddressOf ShutdownFiddlerApp
 
        Dim oFlags As FiddlerCoreStartupFlags = FiddlerCoreStartupFlags.Default
        FiddlerApplication.Startup(0, oFlags)
        MsgBox("Started proxy on port " & FiddlerApplication.oProxy.ListenPort)
    End Sub
 
    Private Sub ShutdownFiddlerApp()
        FiddlerApplication.Shutdown()
        MsgBox("Unloaded proxy")
        Threading.Thread.Sleep(1000)
 
    End Sub
 
    Private Sub FiddlerBeforeRequestHandler(ByVal tSession As Session)
        RichTextBox1.BeginInvoke(New AsyncMethodCaller(AddressOf AddText), tSession.fullUrl)
        url = tSession.ToString
    End Sub
 
    Private Sub FiddlerBeforeResponseHandler(ByVal tSession As Session)
 
 
    End Sub
 
    Public Sub Verification()
        Try
 
            Dim webClient As New System.Net.WebClient
            Dim result As String = webClient.DownloadString(url)
            Dim client As WebClient = New WebClient()
            Dim data As Stream = client.OpenRead(url)
            Dim reader As StreamReader = New StreamReader(data)
            Dim Page As String = reader.ReadToEnd
            ListeMot.AddRange(IO.File.ReadAllLines("C:UsersClémentDocumentsListeDeMots.txt"))
            MsgBox(Page)
            Dim found As Boolean = False
            For Each s As String In ListeMot
                If Page.ToLower.Contains(" " & s.ToLower & " ") Then
                    MsgBox("Un mot interdit a été détecté :" & s)
                End If
 
            Next
        Catch
        End Try
    End Sub
    Dim ListeMot As New List(Of String)
 
    Private Sub AddText(sText As String)
 
        RichTextBox1.AppendText(sText & vbCrLf)
        Dim Thread As New Threading.Thread(AddressOf Verification)
        Thread.Start()
 
    End Sub
 
    Protected Overrides Sub Finalize()
        MyBase.Finalize()
    End Sub
 
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
 
    End Sub
End Class
 
Public Delegate Sub AsyncMethodCaller(sText As String)

2 réponses


Bonjour,
J'ai trouvé la solution après une très longue semaine de recherche. Je met à disposition mon code si dessous si ça peut intéresser quelqu'un parce que j'ai vraiment galéré à trouver !
Option Explicit On
Imports Fiddler
Imports System.Net
Imports System.IO

Public Class Form1

    Dim url As String = Nothing

    Public Sub New()
        ' This call is required by the designer.
        InitializeComponent()
        ' Add any initialization after the InitializeComponent() call.
        AddHandler FiddlerApplication.BeforeResponse, AddressOf FiddlerBeforeResponseHandler
        AddHandler FiddlerApplication.BeforeRequest, AddressOf FiddlerBeforeRequestHandler
        AddHandler Application.ApplicationExit, AddressOf ShutdownFiddlerApp
        Dim oFlags As FiddlerCoreStartupFlags = FiddlerCoreStartupFlags.Default
        FiddlerApplication.Startup(0, oFlags)
        MsgBox("Started proxy on port " & FiddlerApplication.oProxy.ListenPort)
    End Sub

    Private Sub ShutdownFiddlerApp()
        FiddlerApplication.Shutdown()
        MsgBox("Unloaded proxy")
        Threading.Thread.Sleep(1000)
    End Sub

    Private Sub FiddlerBeforeRequestHandler(ByVal tSession As Session)
        RichTextBox1.BeginInvoke(New AsyncMethodCaller(AddressOf AddText), tSession.fullUrl)
    End Sub

    Private Sub FiddlerBeforeResponseHandler(ByVal tSession As Session)
    End Sub

    Public Sub Verification()
        Try
            ListeMot.AddRange(IO.File.ReadAllLines("C:UsersClémentDocumentsListeDeMots.txt"))
            Dim found As Boolean = False
            For Each s As String In ListeMot
                If url.ToLower.Contains("-" & s.ToLower & "-") Or url.ToLower.Contains("." & s.ToLower & ".") Or url.ToLower.Contains("." & s.ToLower & "-") Or url.ToLower.Contains("-" & s.ToLower & ".") Then
                    MsgBox("Un mot interdit dans l'URL a été détecté :" & s)
                End If
            Next
        Catch
        End Try
        Try
            Dim webClient As New System.Net.WebClient
            Dim result As String = webClient.DownloadString(url)
            Dim client As WebClient = New WebClient()
            Dim data As Stream = client.OpenRead(url)
            Dim reader As StreamReader = New StreamReader(data)
            Dim Page As String = reader.ReadToEnd
            ListeMot.AddRange(IO.File.ReadAllLines("C:UsersClémentDocumentsListeDeMots.txt"))
            Dim found As Boolean = False
            For Each s As String In ListeMot
                If Page.ToLower.Contains(" " & s.ToLower & " ") Then
                    MsgBox("Un mot interdit a été détecté :" & s)
                End If
            Next
        Catch
        End Try
    End Sub

    Dim ListeMot As New List(Of String)

    Private Sub AddText(sText As String)
        Dim url2 As Uri
        url2 = New Uri(sText)
        If url2.AbsolutePath.EndsWith(".js") = False And url2.AbsolutePath.EndsWith(".jpg") = False And url2.AbsolutePath.EndsWith(".gif") = False And url2.AbsolutePath.EndsWith(".png") = False And url2.AbsolutePath.EndsWith(".css") = False And url2.AbsolutePath.EndsWith(".ico") = False And url2.AbsolutePath.EndsWith(":443") = False Then
            RichTextBox1.AppendText(sText & vbCrLf)
            url = sText
            Dim Thread As New Threading.Thread(AddressOf Verification)
            Thread.Start()
        End If
    End Sub

    Protected Overrides Sub Finalize()
        MyBase.Finalize()
    End Sub
End Class

Public Delegate Sub AsyncMethodCaller(sText As String)

Bonjour,

j'ai optimisé encore une fois le code, le programme plantait lors de la lecture de vidéos, trop grand nombre d'URLs...
Option Explicit On
Imports Fiddler
Imports System.Net
Imports System.IO
Imports Microsoft.Win32

Public Class Form1

    Dim ListeMots As New List(Of String)
    Dim url As String = Nothing
    Public Delegate Sub AsyncMethodCaller(sText As String)

   

    Private Sub ShutdownFiddlerApp()
        FiddlerApplication.Shutdown()
        MsgBox("Unloaded proxy")
        Threading.Thread.Sleep(1000)
    End Sub

    Private Sub FiddlerBeforeRequestHandler(ByVal tSession As Session)
        RichTextBox1.BeginInvoke(New AsyncMethodCaller(AddressOf AddText), tSession.fullUrl)
    End Sub

    Private Sub FiddlerBeforeResponseHandler(ByVal tSession As Session)
    End Sub

    Public Sub VerificationPage()
        Try
            Dim webClient As New System.Net.WebClient
            Dim result As String = webClient.DownloadString(url)
            Dim client As WebClient = New WebClient()
            Dim data As Stream = client.OpenRead(url)
            Dim reader As StreamReader = New StreamReader(data)
            Dim Page As String = reader.ReadToEnd
            ListeMots.AddRange(IO.File.ReadAllLines("C:Users\" & System.Environment.UserName & "DocumentsListeDeMots.txt"))
            Dim found As Boolean = False
            For Each s As String In ListeMots
                If Page.ToLower.Contains(" " & s.ToLower & " ") = True Or Page.ToLower.Contains("<h1>" & s.ToLower & "</h1>") = True Then
                    RichTextBox1.Clear()
                    MsgBox("Un mot interdit a été détecté :" & s)

                End If
            Next
        Catch
        End Try

    End Sub
    Public Sub VerificationURL()
        Try
            Dim found As Boolean = False
            For Each s As String In ListeMots
                If url.ToLower.Contains("-" & s.ToLower & "-") = True Or url.ToLower.Contains("." & s.ToLower & ".") = True Or url.ToLower.Contains("." & s.ToLower & "-") = True Or url.ToLower.Contains("-" & s.ToLower & ".") = True Then
                    RichTextBox1.Clear()
                    MsgBox("Un mot interdit dans l'URL a été détecté :" & s)

                End If
            Next
        Catch
        End Try
    End Sub
    Private Sub AddText(sText As String)
        RichTextBox1.SelectAll()
        Dim nbrligne As Integer = RichTextBox1.GetLineFromCharIndex(RichTextBox1.SelectionLength())
        If nbrligne > 40 Then
            RichTextBox1.Clear()
        End If
        Dim url2 As Uri
        url2 = New Uri(sText)
        If url2.AbsolutePath.EndsWith(".js") = False And url2.AbsolutePath.EndsWith(".jpg") = False And url2.AbsolutePath.EndsWith(".gif") = False And url2.AbsolutePath.EndsWith(".png") = False And url2.AbsolutePath.EndsWith(".css") = False And url2.AbsolutePath.EndsWith(".ico") = False And url2.AbsolutePath.EndsWith(".jpeg") = False And RichTextBox1.Text.Contains(sText) = False And url2.ToString.Length < 200 = True Then
            RichTextBox1.AppendText(sText & vbCrLf)
            url = sText
            Dim ThreadVerificationPage As New Threading.Thread(AddressOf VerificationPage)
            ThreadVerificationPage.Start()
            Dim ThreadVerificationURL As New Threading.Thread(AddressOf VerificationURL)
            ThreadVerificationURL.Start()
        End If

    End Sub

    Protected Overrides Sub Finalize()
        MyBase.Finalize()
    End Sub

    Public Sub New()
        ' This call is required by the designer.
        InitializeComponent()
        ' Add any initialization after the InitializeComponent() call.
        AddHandler FiddlerApplication.BeforeResponse, AddressOf FiddlerBeforeResponseHandler
        AddHandler FiddlerApplication.BeforeRequest, AddressOf FiddlerBeforeRequestHandler
        AddHandler Application.ApplicationExit, AddressOf ShutdownFiddlerApp
        Dim oFlags As FiddlerCoreStartupFlags = FiddlerCoreStartupFlags.Default
        FiddlerApplication.Startup(0, oFlags)
        MsgBox("Started proxy on port " & FiddlerApplication.oProxy.ListenPort)

    End Sub
End Class