Support/Intern/2011/Brinda/Carrot2: Difference between revisions

Line 21: Line 21:
Carrot2 can be used upto ~8500 documents. However as the number of documents increases, it becomes slower and takes more memory. Tuning a large set of document also becomes slow and takes around 5-10 mins for tuning a single attribute. Its important to note that Carrot2 will cluster documents and give you a better idea about what the issues are so that you can look out for questions of those kinds in the forum. However it does not provide a reliable number of documents with a specific problem as there is an overlapping of documents with different cluster names. You also need to manually separate documents into different clusters in the Other Topics especially with a large set of documents.  
Carrot2 can be used upto ~8500 documents. However as the number of documents increases, it becomes slower and takes more memory. Tuning a large set of document also becomes slow and takes around 5-10 mins for tuning a single attribute. Its important to note that Carrot2 will cluster documents and give you a better idea about what the issues are so that you can look out for questions of those kinds in the forum. However it does not provide a reliable number of documents with a specific problem as there is an overlapping of documents with different cluster names. You also need to manually separate documents into different clusters in the Other Topics especially with a large set of documents.  


One of the issues of working with unclustered data is that people don't use the same words for the same issue like restore/save tabs, blurry/fuzzy. Carrot2 doesn't provide a way to club these words together and this results in the users manually reading the clusters and clubbing it together themselves. Similarly Carrot2 is incapable of figuring out spelling mistakes. There is no way of telling the tool that words like crash/cras/crsh etc are all same thereby resulting in incorrect clustering of few documents.<br>  
One of the issues of working with unclustered data is that people don't use the same words for the same issue like restore/save tabs, blurry/fuzzy. Carrot2 doesn't provide a way to club these words together and this results in the users manually reading the clusters and then clubbing it together. Similarly Carrot2 is incapable of figuring out spelling mistakes. There is no way of telling the tool that words like crash/cras/crsh etc are all same thereby resulting in incorrect clustering of few documents.<br>  


Carrot doesn’t allow you to name clusters and then collect documents based on that. So if you are looking for something specific especially in a large dataset, it will be difficult unless there are enough documents of that issue for carrot2 to club together. Also, Carrot2 does not always cluster with the cluster names that you might be looking for. For example all issues related to fvd were grouped together under the cluster name “invalid security certificate”. You can not change the cluster names according to your preference. <br>
Carrot doesn’t allow you to name clusters and then collect documents based on that. So if you are looking for something specific especially in a large dataset, it will be difficult unless there are enough documents of that issue for carrot2 to club together. Also, Carrot2 does not always cluster with the cluster names that you might be looking for. For example all issues related to fvd were grouped together under the cluster name “invalid security certificate”. You can not change the cluster names according to your preference. <br>
70

edits