![]() The error I receive is this: Traceback (most recent call last):įile "cluster_example.py", line 20, in get_wordsįile "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoizeįile "cluster_example.py", line 14, in normalize_wordįile "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball. Subject: Re: Bug859294: hunspell-dict-ko: FTBFS: UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 38: ordinal not in range(128) Date: Mon, 04:13:29 +0900 OK, another encoding parameter is needed. # called when you are classifying previously unseen examples!Ĭluster.classify(vectorspaced(title)) for title in job_titlesįor cluster_id, title in sorted(zip(classified_examples, job_titles)): You have to know which encoding a string is using. Example: s s.decode ('someencoding').encode ('ascii', 'replace') Use the correct encoding your string was encoded in first place, instead of 'someencoding'. # NOTE: This is inefficient, cluster.classify should really just be You can solve the problem by explicity decoding your bytestring (using the appropriate encoding) before trying to reencode it to ascii. I have a file called example.txt with the following contents. ![]() Here is an example of how the error occurs. To solve the error, specify the correct encoding, e.g. # cluster = KMeansClusterer(5, euclidean_distance)Ĭluster.cluster() 'ascii' codec can't decode byte 0xc3 in position 17: ordinal not in range (128) Environment Red Hat Enterprise Linux 6 Red Hat Enterprise Linux 7 yum Subscriber exclusive content A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more. The Python 'UnicodeDecodeError: 'ascii' codec can't decode byte in position' occurs when we use the ascii codec to decode bytes that were encoded using a different codec. Word in title_components and not word in stopwords Now the demo code I'm trying to run is this: import sysįrom nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance Rules bands follow performing jägermeister stage So for example, my text file is something like this: belong finger death punch I'm using NLTK to perform kmeans clustering on my text file in which each line is considered as a document.
0 Comments
Leave a Reply. |