Split Text

Hope your day is going well. As you might know, I enjoy spending the mornings of Saturdays and Sundays reading and experimenting. To me that is the best way of learning. I use about eight hours per weekend to learn.

Earlier this year I purchased the book Getting Started with Natural Language Processing by Ekaterina Kockmar published by Manning. A few years back I took an on-line course on machine learning which touched on some of the topics also covered by this book.

On a separate note, I will use the VSCode IDE and the plugin GitHub Copilot to generate code for this post. At this point I need to disclose that I am a Microsoft employee. That said, I have been using VSCode for several years. What is new is that I recently installed GitHub Copilot. To learn more about Getting started with GitHub Copilot take a few minutes reading the article.

There might be better ways to work with Python and IDEs. One popular method is to use a Jupyter Notebook that may be of assistance if you wish to use a Jupyter Notebook. The main advantage of the notebook is that you can copy the code and the data in a single package.

In this post I will develop the Python code using the VSCode IDE and will run it in the Anaconda environment. I have been using Anaconda for a few years.

When I start a new Python project I like to update Anaconda as is shown here:

# **** If you want a stable set of packages that have been tested for interoperability ****
(base) C:\Users\johnc>conda update conda
Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: /
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/win-64::anaconda==custom=py38_1
  - defaults/win-64::anaconda-navigator==2.0.4=py38_0
  - defaults/win-64::astropy==4.3.1=py38hc7d831d_0
  - defaults/win-64::bokeh==2.3.3=py38haa95532_0
  - defaults/noarch::dask==2021.8.1=pyhd3eb1b0_0
  - defaults/win-64::imagecodecs==2021.3.31=py38h5da4933_0
  - defaults/noarch::imageio==2.9.0=pyhd3eb1b0_0
  - conda-forge/noarch::ipympl==0.7.0=pyhd8ed1ab_0
  - defaults/win-64::lcms2==2.12=h83e58a3_0
  - defaults/win-64::libtiff==4.2.0=hd0e1b90_0
  - defaults/win-64::matplotlib==3.4.2=py38haa95532_0
  - defaults/win-64::matplotlib-base==3.4.2=py38h49ac443_0
  - defaults/win-64::openjpeg==2.4.0=h4fc8c34_0
  - defaults/win-64::pillow==8.3.1=py38h4fa10fc_0
  - defaults/win-64::scikit-image==0.18.1=py38hf11a4ad_0
  - defaults/noarch::seaborn==0.11.2=pyhd3eb1b0_0
  - defaults/noarch::tifffile==2021.4.8=pyhd3eb1b0_2
  - defaults/win-64::_anaconda_depends==2020.07=py38_0
done

## Package Plan ##

  environment location: C:\Users\johnc\anaconda3

  added / updated specs:
    - conda

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    anaconda-client-1.11.1     |   py38haa95532_0         154 KB
    astroid-2.14.2             |   py38haa95532_0         395 KB
    boltons-23.0.0             |   py38haa95532_0         421 KB
    comtypes-1.1.14            |   py38haa95532_0         271 KB
    conda-23.3.1               |   py38haa95532_0         972 KB
    conda-repo-cli-1.0.41      |   py38haa95532_0         142 KB
    cryptography-39.0.1        |   py38h21b164f_0         1.0 MB
    curl-7.88.1                |       h2bbff1b_0         147 KB
    cython-0.29.33             |   py38hd77b12b_0         1.9 MB
    flit-core-3.8.0            |   py38haa95532_0          85 KB
    fsspec-2023.3.0            |   py38haa95532_0         234 KB
    future-0.18.3              |   py38haa95532_0         704 KB
    giflib-5.2.1               |       h8cc25b3_3          88 KB
    importlib-metadata-6.0.0   |   py38haa95532_0          39 KB
    importlib_metadata-6.0.0   |       hd3eb1b0_0           8 KB
    ipython-8.10.0             |   py38haa95532_0         1.1 MB
    jaraco.classes-3.2.1       |     pyhd3eb1b0_0           9 KB
    jpeg-9e                    |       h2bbff1b_1         320 KB
    jsonpatch-1.32             |     pyhd3eb1b0_0          15 KB
    jsonpointer-2.1            |     pyhd3eb1b0_0           9 KB
    jsonschema-4.17.3          |   py38haa95532_0         155 KB
    jupyter_client-8.1.0       |   py38haa95532_0         196 KB
    jupyter_console-6.6.3      |   py38haa95532_0          63 KB
    jupyter_core-5.3.0         |   py38haa95532_0         107 KB
    jupyterlab_server-2.21.0   |   py38haa95532_0          82 KB
    jupyterlab_widgets-3.0.5   |   py38haa95532_0         179 KB
    jxrlib-1.1                 |       he774522_2         337 KB
    keyring-23.13.1            |   py38haa95532_0          83 KB
    libarchive-3.6.2           |       h2033e3e_1         1.8 MB
    libcurl-7.88.1             |       h86230a5_0         328 KB
    libdeflate-1.17            |       h2bbff1b_0         151 KB
    libpng-1.6.39              |       h8cc25b3_0         369 KB
    libwebp-base-1.2.4         |       h2bbff1b_1         304 KB
    libxml2-2.10.3             |       h0ad7f3c_0         2.9 MB
    libxslt-1.1.37             |       h2bbff1b_0         448 KB
    lxml-4.9.2                 |   py38h2bbff1b_0         1.1 MB
    nbclassic-0.5.3            |   py38haa95532_0         6.0 MB
    networkx-2.8.4             |   py38haa95532_1         2.6 MB
    notebook-6.5.3             |   py38haa95532_0         555 KB
    openssl-1.1.1t             |       h2bbff1b_0         5.5 MB
    packaging-23.0             |   py38haa95532_0          69 KB
    pandas-1.5.3               |   py38hf11a4ad_0        10.5 MB
    pandoc-2.12                |       haa95532_3        14.6 MB
    pip-23.0.1                 |   py38haa95532_0         2.7 MB
    pkginfo-1.9.6              |   py38haa95532_0          69 KB
    pycurl-7.45.2              |   py38hcd4344a_0         132 KB
    pylint-2.16.2              |   py38haa95532_0         756 KB
    pyopenssl-23.0.0           |   py38haa95532_0          97 KB
    pytoolconfig-1.2.5         |   py38haa95532_1          32 KB
    pywinpty-2.0.10            |   py38h5da7b33_0         229 KB
    requests-2.28.1            |   py38haa95532_1          98 KB
    rope-1.7.0                 |   py38haa95532_0         438 KB
    scikit-learn-1.2.2         |   py38hd77b12b_0         6.5 MB
    scipy-1.10.0               |   py38h321e85e_1        18.7 MB
    sqlite-3.41.1              |       h2bbff1b_0         897 KB
    statsmodels-0.13.5         |   py38h080aedc_1         9.7 MB
    tbb-2021.8.0               |       h59b6b97_0         149 KB
    tqdm-4.65.0                |   py38hd4e2768_0         149 KB
    urllib3-1.26.15            |   py38haa95532_0         194 KB
    werkzeug-2.2.3             |   py38haa95532_0         341 KB
    wheel-0.38.4               |   py38haa95532_0          83 KB
    xlwings-0.29.1             |   py38haa95532_0         1.2 MB
    zstandard-0.19.0           |   py38h2bbff1b_0         340 KB
    zstd-1.5.4                 |       hd43e919_0         683 KB
    ------------------------------------------------------------
                                           Total:        99.8 MB

The following NEW packages will be INSTALLED:

  boltons            pkgs/main/win-64::boltons-23.0.0-py38haa95532_0
  jaraco.classes     pkgs/main/noarch::jaraco.classes-3.2.1-pyhd3eb1b0_0
  jsonpatch          pkgs/main/noarch::jsonpatch-1.32-pyhd3eb1b0_0
  jsonpointer        pkgs/main/noarch::jsonpointer-2.1-pyhd3eb1b0_0
  jxrlib             pkgs/main/win-64::jxrlib-1.1-he774522_2
  libwebp-base       pkgs/main/win-64::libwebp-base-1.2.4-h2bbff1b_1
  pytoolconfig       pkgs/main/win-64::pytoolconfig-1.2.5-py38haa95532_1

The following packages will be UPDATED:

  anaconda-client                     1.11.0-py38haa95532_0 --> 1.11.1-py38haa95532_0
  astroid                             2.11.7-py38haa95532_0 --> 2.14.2-py38haa95532_0
  comtypes                         1.1.10-py38haa95532_1002 --> 1.1.14-py38haa95532_0
  conda                               23.1.0-py38haa95532_0 --> 23.3.1-py38haa95532_0
  conda-repo-cli                      1.0.27-py38haa95532_0 --> 1.0.41-py38haa95532_0
  cryptography                        38.0.4-py38h21b164f_0 --> 39.0.1-py38h21b164f_0
  curl                                    7.87.0-h2bbff1b_0 --> 7.88.1-h2bbff1b_0
  cython                             0.29.32-py38hd77b12b_0 --> 0.29.33-py38hd77b12b_0
  flit-core          pkgs/main/noarch::flit-core-3.6.0-pyh~ --> pkgs/main/win-64::flit-core-3.8.0-py38haa95532_0
  fsspec                           2022.11.0-py38haa95532_0 --> 2023.3.0-py38haa95532_0
  future                                      0.18.2-py38_1 --> 0.18.3-py38haa95532_0
  giflib                                   5.2.1-h8cc25b3_1 --> 5.2.1-h8cc25b3_3
  importlib-metadata                  4.11.3-py38haa95532_0 --> 6.0.0-py38haa95532_0
  importlib_metadata                      4.11.3-hd3eb1b0_0 --> 6.0.0-hd3eb1b0_0
  ipython                              8.8.0-py38haa95532_0 --> 8.10.0-py38haa95532_0
  jpeg                                        9e-h2bbff1b_0 --> 9e-h2bbff1b_1
  jsonschema                          4.16.0-py38haa95532_0 --> 4.17.3-py38haa95532_0
  jupyter_client                       7.4.8-py38haa95532_0 --> 8.1.0-py38haa95532_0
  jupyter_console                      6.4.4-py38haa95532_0 --> 6.6.3-py38haa95532_0
  jupyter_core                         5.1.1-py38haa95532_0 --> 5.3.0-py38haa95532_0
  jupyterlab_server                   2.16.5-py38haa95532_0 --> 2.21.0-py38haa95532_0
  jupyterlab_widgets pkgs/main/noarch::jupyterlab_widgets-~ --> pkgs/main/win-64::jupyterlab_widgets-3.0.5-py38haa95532_0
  keyring                             23.4.0-py38haa95532_0 --> 23.13.1-py38haa95532_0
  libarchive                               3.6.2-hebabd0d_0 --> 3.6.2-h2033e3e_1
  libcurl                                 7.87.0-h86230a5_0 --> 7.88.1-h86230a5_0
  libdeflate                                 1.8-h2bbff1b_5 --> 1.17-h2bbff1b_0
  libpng                                  1.6.37-h2a8f88b_0 --> 1.6.39-h8cc25b3_0
  libxml2                                 2.9.14-h0ad7f3c_0 --> 2.10.3-h0ad7f3c_0
  libxslt                                 1.1.35-h2bbff1b_0 --> 1.1.37-h2bbff1b_0
  lxml                                 4.9.1-py38h1985fb9_0 --> 4.9.2-py38h2bbff1b_0
  nbclassic                            0.4.8-py38haa95532_0 --> 0.5.3-py38haa95532_0
  networkx                             2.8.4-py38haa95532_0 --> 2.8.4-py38haa95532_1
  notebook                             6.5.2-py38haa95532_0 --> 6.5.3-py38haa95532_0
  openssl                                 1.1.1s-h2bbff1b_0 --> 1.1.1t-h2bbff1b_0
  packaging                             22.0-py38haa95532_0 --> 23.0-py38haa95532_0
  pandas                               1.5.2-py38hf11a4ad_0 --> 1.5.3-py38hf11a4ad_0
  pandoc                                    2.12-haa95532_1 --> 2.12-haa95532_3
  pip                                 22.3.1-py38haa95532_0 --> 23.0.1-py38haa95532_0
  pkginfo                              1.8.3-py38haa95532_0 --> 1.9.6-py38haa95532_0
  pycurl                              7.45.1-py38hcd4344a_0 --> 7.45.2-py38hcd4344a_0
  pylint                              2.14.5-py38haa95532_0 --> 2.16.2-py38haa95532_0
  pyopenssl          pkgs/main/noarch::pyopenssl-22.0.0-py~ --> pkgs/main/win-64::pyopenssl-23.0.0-py38haa95532_0
  pywinpty                             2.0.2-py38h5da7b33_0 --> 2.0.10-py38h5da7b33_0
  requests                            2.28.1-py38haa95532_0 --> 2.28.1-py38haa95532_1
  rope               pkgs/main/noarch::rope-0.22.0-pyhd3eb~ --> pkgs/main/win-64::rope-1.7.0-py38haa95532_0
  scikit-learn                         1.2.0-py38hd77b12b_0 --> 1.2.2-py38hd77b12b_0
  scipy                               1.10.0-py38h321e85e_0 --> 1.10.0-py38h321e85e_1
  sqlite                                  3.40.1-h2bbff1b_0 --> 3.41.1-h2bbff1b_0
  statsmodels                         0.13.5-py38h080aedc_0 --> 0.13.5-py38h080aedc_1
  tbb                                   2021.6.0-h59b6b97_1 --> 2021.8.0-h59b6b97_0
  tqdm                                4.64.1-py38haa95532_0 --> 4.65.0-py38hd4e2768_0
  urllib3                            1.26.14-py38haa95532_0 --> 1.26.15-py38haa95532_0
  werkzeug                             2.2.2-py38haa95532_0 --> 2.2.3-py38haa95532_0
  wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3e~ --> pkgs/main/win-64::wheel-0.38.4-py38haa95532_0
  xlwings                            0.27.15-py38haa95532_0 --> 0.29.1-py38haa95532_0
  zstandard                           0.18.0-py38h2bbff1b_0 --> 0.19.0-py38h2bbff1b_0
  zstd                                     1.5.2-h19a0ad4_0 --> 1.5.4-hd43e919_0

Proceed ([y]/n)? y		<=== proceed

Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: -

    Windows 64-bit packages of scikit-learn can be accelerated using scikit-learn-intelex.
    More details are available here: https://intel.github.io/scikit-learn-intelex

    For example:

        $ conda install scikit-learn-intelex
        $ python -m sklearnex my_application.py

done

(base) C:\Users\johnc>

Once all has been updated, let’s warm up with some simple code.

# **** using the Anaconda command prompt ****
(base) C:\Users\johnc>python
Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

# **** ****
>>> print("Hello world");
Hello world

# **** ****
>>> print("My name is Bond, James Bond");
My name is Bond, James Bond

# **** ****
>>> print("Hello world"); print("My name is Bond, James Bond");
Hello world
My name is Bond, James Bond

# **** ****
>>> print("Hello World");\
... print("My name is Bond, James Bond");
Hello World
My name is Bond, James Bond

# **** run the Python code in the ex1.py file ****
(base) C:\Users\johnc>python c:\temp\ex1.py
first line
second line

The first few lines are to switch from a compiled to a scripted language. It always takes me a few hours after I have not used Python for a couple months.

In the last line we ask Anaconda to run the Python interpreter on the specified file. It seems that the file prints two lines.

Now that we tested that we can execute a Python file, we can use the VSCode IDE to write some code and periodically execute it.

# **** run the ex1.py file ****
(base) C:\Users\johnc>python c:\temp\ex1.py
first line
second line

# **** edit ex1.py with VSCode ****
(base) C:\Users\johnc>code c:\temp\ex1.py

# **** run the updated file ****
(base) C:\Users\johnc>python c:\temp\ex1.py
first line
second line
third and last line

# **** after a second update ... (note the line terminators) ****
(base) C:\Users\johnc>python c:\temp\ex1.py
first line.
second line.
third and last line!

# **** close VSCode ****

# **** type the contents of the ex1.py file ****
(base) C:\Users\johnc>type c:\temp\ex1.py
print("first line.");
print("second line.");
print("third and last line!");

We start by verifying that we can still access the ex1.py file by executing the Python script.

We then invoke the VSCode IDE and open the c:\temp\ex1.py file. Once the IDE ones we add a third line to the script. We save the update and execute the code. We can now see three lines.

We go back to the IDE and update the three lines by adding line terminators. We save our changes and run the script one more time. The updated lines are displayed.

We then close the VSCode IDE and we are done with this file.

# **** start the Python interpreter from the Anaconda prompt ****
(base) C:\Users\johnc>python
Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

# **** define the text string ****
>>> text = "Define which data represents each class for the machine learning algorithm"

# **** print the contents of text ****
>>> print(text)
Define which data represents each class for the machine learning algorithm

# **** split the words in text ****
>>> text.split(" ")
['Define', 'which', 'data', 'represents', 'each', 'class', 'for', 'the', 'machine', 'learning', 'algorithm']
>>>

From an Anaconda console, we start the Python interpreter. That way we are able to execute Python commands. This seems to be a great way to experiment and get individual commands to work just like we desire.

We define a variable named text and assign some text to it. To verify that all is well de print the content of text. All seems to be well so far.

In the last line we split the text using a space as a delimiter. The words are displayed. One of the steps in our next post in this blog will be to develop a spam filter. One of the tasks is to split the text into words. In this case we just split simple words and we did not consider that the first word ‘Define’ is in uppercase. In order not to differentiate ‘Define’ from ‘define’ we should get all words in lowercase and then continue to process the text. Will find up more about this shortly.

# **** input text ****
text = 'Define which data represents "ham" class and which data represents "spam" class for the machine learning algorithm.'

# **** split includes words: '"ham"', '"spam"', and 'algorithm.' ****
print('text:', text.split())

# **** define list of delimiters ****
delimiters = [' ', '.', '"']

# **** print delimiters ****
print('delimiters: ', delimiters)

# **** define variable words to keep list of processed words ****
words = []

# **** define variable word to keep current word ****
word = ''

# **** loop through each character in text ****
for c in text:
    
    # **** if character is in delimiters list ****
    if c in delimiters:

        # **** add current word in lowercase to list of words (if not blank) ****
        if word != '':
            words.append(word.lower())

            # **** print last word in words list ****
            print('word:', words[-1])

        # **** reset current word ****
        word = ''

    # **** if character is not in delimiters list ****
    else:

        # **** add character to current word ****
        word += c

# **** print list of words ****
print('words:', words)

This script starts by defining some text. Note that the text includes words like Define, “ham”, “spam” and algorithm.

We specify a set of delimiters in an attempt to eliminate punctuation marks from the text.

An array for words is defined. A word is defined to hold the current complete word as our algorithm traverses the input text.

The loop traverses the text one character at a time. If the current character is a delimiter and the current word is not blank, we append the word to words; otherwise if the character is not a delimiter we add the character to the word.

After all is said and done, we display the list of words.

# **** run python script ****
(base) C:\Users\johnc>python c:\temp\split2.py
text: ['Define', 'which', 'data', 'represents', '"ham"', 'class', 'and', 'which', 'data', 'represents', '"spam"', 'class', 'for', 'the', 'machine', 'learning', 'algorithm.']
delimiters:  [' ', '.', '"']
word: define
word: which
word: data
word: represents
word: ham
word: class
word: and
word: which
word: data
word: represents
word: spam
word: class
word: for
word: the
word: machine
word: learning
word: algorithm
words: ['define', 'which', 'data', 'represents', 'ham', 'class', 'and', 'which', 'data', 'represents', 'spam', 'class', 'for', 'the', 'machine', 'learning', 'algorithm']

(base) C:\Users\johnc>

The run of the split2.py script is shown. Given that the input text is quite simple, the list of words looks acceptable at this time.

Let’s now take a look at the output of script c:\temp\split3.py which follows:

A set of three lines seem to indicate that some package is being initialized.

Once that is done, the script displays a set of delimiters. The delimiters are followed by a set of words that were displayed after splitting some text.

The words are then displayed.

The list of words seem to have been cleared. Some processing occurs, and the results are displayed. Note that the number of splitted words in the first pass are different from the ones in the second one. This seems to indicate that there are many ways to split text, and some are better than others.

# **** run python script ****
(base) C:\Users\johnc>python c:\temp\split3.py
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\johnc\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
text: ['Define', 'which', 'data', 'represents', '"ham"', 'class', 'and', 'which', 'data', 'represents', '"spam"', 'class', 'for', 'the', 'machine', 'learning', 'algorithm.']
delimiters:  [' ', '.', '"']
word: define
word: which
word: data
word: represents
word: ham
word: class
word: and
word: which
word: data
word: represents
word: spam
word: class
word: for
word: the
word: machine
word: learning
word: algorithm
words: ['define', 'which', 'data', 'represents', 'ham', 'class', 'and', 'which', 'data', 'represents', 'spam', 'class', 'for', 'the', 'machine', 'learning', 'algorithm']
words: []
words: ['Define', 'which', 'data', 'represents', '``', 'ham', "''", 'class', 'and', 'which', 'data', 'represents', '``', 'spam', "''", 'class', 'for', 'the', 'machine', 'learning', 'algorithm', '.']

(base) C:\Users\johnc>

In the next post in this blog we will get into the steps needed to create a working spam filter.

import nltk                         # Import the Natural Language Toolkit
from nltk import word_tokenize      # Import the word tokenizer
nltk.download('punkt')              # Download the Punkt tokenizer


# **** define get_words function ****
def get_words(text):

    # **** split text into words ****
    words = word_tokenize(text)

    # **** return list of words ****
    return words


# **** input text ****
text = 'Define which data represents "ham" class and which data represents "spam" class for the machine learning algorithm.'

# **** split includes words: '"ham"', '"spam"', and 'algorithm.' ****
print('text:', text.split())

# **** define list of delimiters ****
delimiters = [' ', '.', '"']

# **** print delimiters ****
print('delimiters: ', delimiters)

# **** define variable words to keep list of processed words ****
words = []

# **** define variable word to keep current word ****
word = ''

# **** loop through each character in text ****
for c in text:
    
    # **** if character is in delimiters list ****
    if c in delimiters:

        # **** add current word in lowercase to list of words (if not blank) ****
        if word != '':
            words.append(word.lower())

            # **** print last word in words list ****
            print('word:', words[-1])

        # **** reset current word ****
        word = ''

    # **** if character is not in delimiters list ****
    else:

        # **** add character to current word ****
        word += c

# **** print list of words ****
print('words:', words)


# **** clear list of words ****
words.clear()

# **** verify list of words has been cleared ****
print('words:', words)

# **** tokenize text ****
words = get_words(text)

# **** display list of tokenized words ****
print('words:', words)

The code that generated the previous output start by importing some Natural Language Toolkit () software that we will use to experiment in this piece of code.

Similar to a previous listing in this post, we declare a function in which we use the word_tokenize() function to split the word in the text.

We then assign some input text.

In the first split attempt we just use the split() method and display the results. Note that the results contain the words: ‘Define’, ‘“ham”’, ‘“spam”, and ‘algorithm.’.

We then define a list of delimiters to split the text. The delimiters are then printed.

We define a list to keep the words and a variable to keep individual words which will be generated as we parse each letter from our text.

The code that follows is the same we previously used to extract the words from our text variable. The list of words is then printed. Note that our code now produces the words: ‘define’, ‘ham’, ‘spam’, and ‘algorithm’.

The list of words is cleared and displayed.

We now use the get_words() function which uses the word_tokenize() from the NLTK toolkit. The results are then displayed.

The list of words is similar yet different from the previous list. This was done to help us understand the need of tokenization as a step in splitting the input text for our spam task.

I have to say that it was quite interesting providing comments in our code which were processed by GitHub Copilot to generate code. By editing comments one is able to guide the underlying software to get the desired output. I believe this will be an important step when using this tool.

Hope you enjoyed the post. I will put the associated code from this post in my SplitText GitHub repository.

Enjoy,

John

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.